Issue with tool calling

#1
by isevendays - opened

rv log_server_r: request: POST /v1/chat/completions 192.168.0.55 500
got exception: {"code":500,"message":"Value is not callable: null at row 56, column 70:\n {%- if '' in content %}\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n ^\n {%- set content = (content.split('')|last).lstrip('\n') %}\n at row 56, column 72:\n {%- if '' in content %}\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n ^\n {%- set content = (content.split('')|last).lstrip('\n') %}\n at row 56, column 85:\n {%- if '' in content %}\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n ^\n {%- set content = (content.split('')|last).lstrip('\n') %}\n at row 56, column 106:\n {%- if '' in content %}\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n ^\n {%- set content = (content.split('')|last).lstrip('\n') %}\n at row 56, column 108:\n {%- if '' in content %}\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n ^\n {%- set content = (content.split('')|last).lstrip('\n') %}\n at row 56, column 9:\n {%- if '' in content %}\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n ^\n {%- set content = (content.split('')|last).lstrip('\n') %}\n at row 55, column 36:\n{%- else %}\n {%- if '' in content %}\n ^\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n at row 55, column 5:\n{%- else %}\n {%- if '' in content %}\n ^\n {%- set reasoning_content = ((content.split('')|first).rstrip('\n').split('')|last).lstrip('\n') %}\n at row 54, column 12:\n {%- set reasoning_content = m.reasoning_content %}\n{%- else %}\n ^\n {%- if '' in content %}\n at row 52, column 1:\n{%- set content = visible_text(m.content) %}\n{%- if m.reasoning_content is string %}\n^\n {%- set reasoning_content = m.reasoning_content %}\n at row 48, column 35:\n{{- '/nothink' if (enable_thinking is defined and not enable_thinking and not content.endswith("/nothink")) else '' -}}\n{%- elif m.role == 'assistant' -%}\n ^\n<|assistant|>\n at row 45, column 1:\n{% for m in messages %}\n{%- if m.role == 'user' -%}<|user|>\n^\n{% set content = visible_text(m.content) %}{{ content }}\n at row 44, column 24:\n{%- endfor %}\n{% for m in messages %}\n ^\n{%- if m.role == 'user' -%}<|user|>\n at row 44, column 1:\n{%- endfor %}\n{% for m in messages %}\n^\n{%- if m.role == 'user' -%}<|user|>\n at row 1, column 1:\n[gMASK]\n^\n{%- if tools -%}\n","type":"server_error"}
srv log_server_r: request: POST /v1/chat/completions 192.168.0.55 500

(base) root@ktransformers:~/llama.cpp# git pull
From https://github.com/ggml-org/llama.cpp

  • [new tag] b6090 -> b6090
    Already up to date.
    (base) root@ktransformers:~/llama.cpp#
./build/bin/llama-server \
                         --alias glm-4.5 \
                         --model /root/models/GLM-4.5-GGUF/UD-Q3_K_XL/UD-Q3_K_XL/GLM-4.5-UD-Q3_K_XL-00001-of-00004.gguf \
                         --jinja \
                         --ctx-size 131072 \
                         --cache-type-k q8_0 \
                         --cache-type-v q8_0 \
                         -fa \
                         --parallel 1 \
                         --temp 0.6 \
                         --top_p 0.9 \
                         --n-gpu-layers 99 \
                         --threads 104 \
                         --host 0.0.0.0 \
                         --port 8080 \
                         -ot "\.(6|7|8|9|[0-9][0-9]|[0-9][0-9][0-9])\.ffn_(gate|up|down)_exps.=CPU" \
                         --min-p 0.01 \
                         --threads-batch 52 \
                         -b 8192 -ub 8192 \
                         --cont-batching \
                         --no-mmap

This appears to only be an issue with the unsloth quants. I got the same error attempting to use UD-Q4_K_XL with Open Hands AI. When I switched to the Q4_K_M quant I made myself using the latest llama.cpp it stopped happening and I can use Open Hands AI as normal.

@createthis do you use function calling? Without function calling it does work fine, but I use tools with Claude Code proxy.

@isevendays I have no idea what that means in the context of llama.cpp. I see people throwing around “tool calling” all the time, but when I tell Open Hands AI to use “native tool calling” with an environment variable it fails with every model I’ve tried: deepseek-v3-0324, kimi-k2, qwen3-coder, etc.

I also see unsloth re-uploading models with fix “tool calling” chat templates all the time, but they worked before the fix and they work after the fix for me.

Sorry, I just don’t know what “tool calling” or “function calling” means to people.

Open Hands instructs the model to use XML style tags to call functions Open Hands provides or functions provided by MCP servers. I use that. I don’t think that’s technically “native tool calling”, but I can’t figure out what the difference is either.

@isevendays If you can run a Q4_K_M quant, try this one I just uploaded: https://huggingface.co/createthis/GLM-4.5-GGUF/tree/main/q4_k_m

I only have a 4Tb drive and I recently deleted my FP16, so I'd like to make you a Q3_K_M but I'd have to delete something else from the drive to make another FP16, so this is the best I can do at the moment.

@createthis thanks! I'm downloading that.

I'm using function calling currently with Qwen

./build/bin/llama-server \
                         --alias Qwen3 \
                         --model /root/models/Qwen3-235B-A22B-Instruct-2507-GGUF/UD-Q4_K_XL/UD-Q4_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q4_K_XL-00001-of-00003.gguf \
                         --jinja \
                         --ctx-size 131072 \
                         --cache-type-k q8_0 \
                         --cache-type-v q8_0 \
                         -fa \
                         --parallel 1 \
                         --temp 0.7 \
                         --top_p 0.8 \
                         --top_k 20 \
                         --n-gpu-layers 99 \
                         --threads 104 \
                         --host 0.0.0.0 \
                         --port 8080 \
                         --n-cpu-moe 77 \
                         --min-p 0.0 \
                         --threads-batch 52 \
                         -b 8192 -ub 4192 \
                         --cont-batching \
                         --no-mmap

I can't use OpenHands, it didn't work for me, because it freezes too often and I couldn't restore the session.

I'm using my custom developed proxy that transforms between Claude Code <- My Simple Proxy -> OpenAI llama.cpp server.

I was recently developing a complex feature and I found out that Qwen3-235B-A22B was very capable! It could basically work across go, java and python tech stack at the same time.

I'm not sure my hardware can handle Q4, as I'm using Q3/Q2 for models over 400B parameters, but I'll give it a try.

I'll consider open sourcing my simple proxy code, as it does tool correction, rule correction, LLM correction to make Claude Code work with open LLMs.

I have tested your quant and I also have issues

got exception: {"code":500,"message":"Unknown argument ensure_ascii for function tojson at row 11, column 37:\n{% for tool in tools %}\n{{ tool | tojson(ensure_ascii=False) }}\n                                    ^\n{% endfor %}\n at row 11, column 1:\n{% for tool in tools %}\n{{ tool | tojson(ensure_ascii=False) }}\n^\n{% endfor %}\n at row 10, column 24:\n<tools>\n{% for tool in tools %}\n                       ^\n{{ tool | tojson(ensure_ascii=False) }}\n at row 10, column 1:\n<tools>\n{% for tool in tools %}\n^\n{{ tool | tojson(ensure_ascii=False) }}\n at row 2, column 17:\n[gMASK]<sop>\n{%- if tools -%}\n                ^\n<|system|>\n at row 2, column 1:\n[gMASK]<sop>\n{%- if tools -%}\n^\n<|system|>\n at row 1, column 1:\n[gMASK]<sop>\n^\n{%- if tools -%}\n","type":"server_error"}

My command is

./build/bin/llama-server \
                         --alias glm-4.5 \
                         --model /root/models/GLM-4.5-GGUF/q4_k_m/q4_k_m/GLM-4.5-Q4_K_M-00001-of-00005.gguf \
                         --jinja \
                         --ctx-size 131072 \
                         --cache-type-k q8_0 \
                         --cache-type-v q8_0 \
                         -fa \
                         --parallel 1 \
                         --temp 0.6 \
                         --top_p 0.9 \
                         --n-gpu-layers 99 \
                         --threads 104 \
                         --host 0.0.0.0 \
                         --port 8080 \
                         --n-cpu-moe 90 \
                         --min-p 0.01 \
                         --threads-batch 52 \
                         -b 8192 -ub 8192 \
                         --cont-batching \
                         --no-mmap

@isevendays you wrote:

got exception: {"code":500,"message":"Unknown argument ensure_ascii for function tojson at row 11, column 37:\n{% for tool in tools %}\n{{ tool | tojson(ensure_ascii=False) }}\n ^\n{% endfor %}\n at row 11, column 1:\n{% for tool in tools %}\n{{ tool | tojson(ensure_ascii=False) }}\n^\n{% endfor %}\n at row 10, column 24:\n\n{% for tool in tools %}\n ^\n{{ tool | tojson(ensure_ascii=False) }}\n at row 10, column 1:\n\n{% for tool in tools %}\n^\n{{ tool | tojson(ensure_ascii=False) }}\n at row 2, column 17:\n[gMASK]\n{%- if tools -%}\n ^\n<|system|>\n at row 2, column 1:\n[gMASK]\n{%- if tools -%}\n^\n<|system|>\n at row 1, column 1:\n[gMASK]\n^\n{%- if tools -%}\n","type":"server_error"}

That's a completely different exception than the one you initially posted at the top of this thread. What is ensure_ascii? Is that part of the tool you're calling?

@createthis that's part of the gguf file - chat template. The chat template seems to be incorrect for tool calls.

@isevendays I'll take your word for it as I don't use native tool calls. Maybe I'll spend some time playing with that functionality next week. I'd like to understand it better.

Sign up or log in to comment