eos/eom token issue, or possibly chat template?

by astcan - opened Dec 7, 2025

Dec 7, 2025

I'm using the Q6_K quant (270ish GB quant) run using llama.cpp's llama-server binary, build b7312

without using --jinja flag, I get <|im_end|> visible at the end of assistant output, but the model does cease generation after that
when I tried it with the --jinja flag the model failed to cease generation after the assistant output, and continued to generate a user response and then another assistant response before I manually terminated
original zai-org/GLM-4.6 has a file called chat_template.jinja and unsloth's gguf also advises to run using --jinja flag for llama.cpp

Is your version modified to avoid jinja, considering chat format has more problems without --jinja than with?

Chat format issues aside, I am really enjoying this model, and I think at the moment this is the most powerful uncensored local model available.

AesSedai

Owner Dec 7, 2025

Hi, I used the chat template that is provided directly from zai-org/GLM-4.6, so if there are compatibility issues with llama.cpp that's probably why. I usually use text completion instead of chat completions, so I didn't notice this issue. It looks like lcpp includes a GLM-4.6 jinja: https://github.com/ggml-org/llama.cpp/blob/master/models/templates/GLM-4.6.jinja

So if you run your llama-server with --chat-template-file /path/to/llama.cpp/models/templates/GLM-4.6.jinja it should load the model with that template and work? I'll try to get the default template on the models updated, I know HF had some updates about metadata editing a little while back so perhaps I can get this updated without having to re-upload all of the first shards.

AesSedai

Owner Dec 7, 2025

Ah nevermind, I did mess up - there's no chat template included due to how the flies were copied over during the ablation process when modifying the model. I'll get this sorted within 24h, thanks for the shout.

AesSedai

Owner Dec 7, 2025

@astcan I've fixed the chat template issue for all of the quants, the first shard now has the template included in it from the llama.cpp repo. I'll make a small mention on the README about the change, users who have downloaded the model before will need to download and replace the first shard if they want the fixed template update.

astcan

Dec 8, 2025

Glad I could bring it to your attention, and many thanks!

AesSedai changed discussion status to closed Dec 8, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment