Testing IQ5_K

#2
by shewin - opened

W790E Sage + QYFS + 512G + RTX5090


Computed blk.46.attn_kv_b.weight as 512 x 8960 and stored in buffer CUDA0
===================================== llama_new_context_with_model: f16
llama_new_context_with_model: n_ctx = 100096
llama_new_context_with_model: n_batch = 4096
llama_new_context_with_model: n_ubatch = 4096
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: mla_attn = 3
llama_new_context_with_model: attn_max_b = 512
llama_new_context_with_model: fused_moe = 1
llama_new_context_with_model: grouped er = 1
llama_new_context_with_model: fused_up_gate = 1
llama_new_context_with_model: fused_mmad = 1
llama_new_context_with_model: rope_cache = 0
llama_new_context_with_model: graph_reuse = 1
llama_new_context_with_model: k_cache_hadam = 0
llama_new_context_with_model: split_mode_graph_scheduling = 0
llama_new_context_with_model: reduce_type = f16
llama_new_context_with_model: sched_async = 0
llama_new_context_with_model: ser = -1, 0
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 2745.81 MiB
llama_new_context_with_model: KV self size = 2745.78 MiB, c^KV (q8_0): 2745.78 MiB, kv^T: not used
llama_new_context_with_model: CUDA_Host output buffer size = 0.59 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 4277.20 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 814.05 MiB
llama_new_context_with_model: graph nodes = 4171
llama_new_context_with_model: graph splits = 2
XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload

main: n_kv_max = 100096, n_batch = 4096, n_ubatch = 4096, flash_attn = 1, n_gpu_layers = 99, n_threads = 101, n_threads_batch = 101

PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
4096 1024 0 0.623 6579.74 6.467 158.35
4096 1024 4096 0.644 6362.65 6.940 147.55
4096 1024 8192 0.785 5218.57 8.210 124.73
4096 1024 12288 0.867 4724.69 8.661 118.24
4096 1024 16384 0.995 4118.57 9.505 107.74
4096 1024 20480 1.163 3521.51 10.317 99.25
4096 1024 24576 1.363 3005.77 10.811 94.72
4096 1024 28672 1.337 3062.61 11.633 88.03
4096 1024 32768 1.467 2792.33 12.991 78.82
4096 1024 36864 1.617 2532.51 13.217 77.48

2026-01-21_20-22

2026-01-21_18-42

Sign up or log in to comment