Unexpected response with Llama.cpp + Mrader's quant
#23
by
pedrostu
- opened
Hi, I am trying to use the model for EN to French translation. I can only use quants, so I use Mrader's Apertus-8B-2509.Q4_K_M.gguf. When I host it with llama-server and use the curl request
curl -s \
--request POST --url http://127.0.0.1:8080/v1/chat/completions \
--header "Content-Type: application/json" \
--data '{"messages": [ { "role": "user", "content": "Translate this to French: A dog eats." }]}'
I get unexpected response
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Translate this to French: Il y a des hommes qui ont fait un travail de recherche scientifique, a trouvé une théorie scientifique et fait un travail pratique pour créer un appareil, qui est en fait un appareil.<|im_end|>"}}],"created":1767784620,"model":"Apertus-8B-2509.Q4_K_M.gguf","system_fingerprint":"b7524-5ee4e43f2","object":"chat.completion","usage":{"completion_tokens":47,"prompt_tokens":33,"total_tokens":80},"id":"chatcmpl-T4W0b4dKf13yODTKwLpVxq3ekcXTfJD5","timings":{"cache_n":9,"prompt_n":24,"prompt_ms":4.488,"prompt_per_token_ms":0.18700000000000003,"prompt_per_second":5347.5935828877,"predicted_n":47,"predicted_ms":543.31,"predicted_per_token_ms":11.559787234042552,"predicted_per_second":86.50678249986197}}
Though it seems to give normal responses with other general queries like "What is a dog?"