eos/eom token issue, or possibly chat template?
I'm using the Q6_K quant (270ish GB quant) run using llama.cpp's llama-server binary, build b7312
without using --jinja flag, I get <|im_end|> visible at the end of assistant output, but the model does cease generation after that
when I tried it with the --jinja flag the model failed to cease generation after the assistant output, and continued to generate a user response and then another assistant response before I manually terminated
original zai-org/GLM-4.6 has a file called chat_template.jinja and unsloth's gguf also advises to run using --jinja flag for llama.cpp
Is your version modified to avoid jinja, considering chat format has more problems without --jinja than with?
Chat format issues aside, I am really enjoying this model, and I think at the moment this is the most powerful uncensored local model available.
Hi, I used the chat template that is provided directly from zai-org/GLM-4.6, so if there are compatibility issues with llama.cpp that's probably why. I usually use text completion instead of chat completions, so I didn't notice this issue. It looks like lcpp includes a GLM-4.6 jinja: https://github.com/ggml-org/llama.cpp/blob/master/models/templates/GLM-4.6.jinja
So if you run your llama-server with --chat-template-file /path/to/llama.cpp/models/templates/GLM-4.6.jinja it should load the model with that template and work? I'll try to get the default template on the models updated, I know HF had some updates about metadata editing a little while back so perhaps I can get this updated without having to re-upload all of the first shards.
Ah nevermind, I did mess up - there's no chat template included due to how the flies were copied over during the ablation process when modifying the model. I'll get this sorted within 24h, thanks for the shout.
@astcan I've fixed the chat template issue for all of the quants, the first shard now has the template included in it from the llama.cpp repo. I'll make a small mention on the README about the change, users who have downloaded the model before will need to download and replace the first shard if they want the fixed template update.
Glad I could bring it to your attention, and many thanks!