cpatonn commited on
Commit
df167aa
·
verified ·
1 Parent(s): e4a5c03

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ base_model: ai21labs/AI21-Jamba2-Mini
6
+ ---
7
+
8
+ # Introduction
9
+
10
+ Jamba2 Mini is an open source small language model built for enterprise reliability. With 12B active parameters (52B total), it delivers precise question answering without the computational overhead of reasoning models. The model's SSM-Transformer architecture provides a memory-efficient solution for production agent stacks where consistent, grounded outputs are critical.
11
+
12
+ Released under Apache 2.0 License with a 256K context window, Jamba2 Mini is designed for enterprise workflows that demand accuracy and steerability. For more details, read the [full release blog post](ai21.com/blog/introducing-jamba2).
13
+
14
+ # Key Advantages
15
+ * **Superior reliability-to-throughput ratio:** Maintains high performance at 100K+ token contexts
16
+ * **Category-leading benchmarks:** Excels on IFBench, IFEval, Collie, and FACTS
17
+ * **Statistically significant quality wins:** Outperforms comparable models on real-world enterprise tasks
18
+ * **256K context window:** Processes technical manuals, research papers, and knowledge bases
19
+ * **Apache 2.0 License:** Fully open source for commercial use
20
+ * **Production-optimized:** Lean memory footprint for scalable deployments
21
+
22
+ # Evaluation Results
23
+ Jamba2 Mini leads on instruction following and grounding metrics, demonstrating exceptional steerability and context faithfulness. In blind side-by-side evaluations on 100 real-world enterprise prompts, the model achieved statistically significant wins on output quality and factuality compared to Ministral3 14B.
24
+
25
+ <img src="https://huggingface.co/ai21labs/AI21-Jamba2-Mini/resolve/main/assets/Enterprise%20Reliability%20Benchmarks%20for%20Mini%20Models.png" width="900"/>
26
+
27
+ # Training and Evaluation Details
28
+ Jamba2 models were developed using a comprehensive post-training pipeline starting from Jamba 1.5 pre-training. The models underwent mid-training on 500B carefully curated tokens with increased representation of math, code, high-quality web data, and long documents. A state passing phase optimized the Mamba layers for effective context length generalization. Training continued with cold start supervised fine-tuning to establish instruction-following and reasoning capabilities, followed by DPO optimization.
29
+
30
+ The final training stages involved multiple on-policy reinforcement learning phases, progressively moving from short-context verifiable rewards to longer context training with mixed verifiable and model-based rewards. Evaluation focused on two key enterprise reliability signals: instruction-following benchmarks measuring steerability, and grounding benchmarks testing context faithfulness. Human evaluators assessed performance on real-world enterprise tasks using blind, counterbalanced side-by-side comparisons, rating outputs on factuality, style, constraint-adherence, instruction-following, and helpfulness.
31
+
32
+
33
+ # Quickstart
34
+ ## Run with vLLM
35
+ Best results require vLLM version **0.12.0** or higher.
36
+
37
+ ```
38
+ vllm serve "ai21labs/AI21-Jamba2-Mini" --mamba-ssm-cache-dtype float32 --enable-auto-tool-choice --tool-call-parser hermes --enable-prefix-caching --quantization experts_int8
39
+ ```
40
+
41
+
42
+ ## Run with Transformers
43
+
44
+ ```
45
+ pip install transformers>=4.54.0
46
+ pip install flash-attn --no-build-isolation
47
+ pip install causal-conv1d>=1.2.0
48
+ pip install mamba-ssm
49
+ ```
50
+
51
+ ```
52
+ import torch
53
+ from transformers import AutoModelForCausalLM, AutoTokenizer
54
+
55
+ model = AutoModelForCausalLM.from_pretrained("ai21labs/AI21-Jamba2-Mini",
56
+ dtype=torch.bfloat16,
57
+ attn_implementation="flash_attention_2", device_map="auto")
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained("ai21labs/AI21-Jamba2-Mini")
60
+
61
+ messages = [
62
+ {"role": "system",
63
+ "content": "You are an HR Policy Assistant.
64
+ Answer employee questions using only the provided policy documents.
65
+ If the answer isn't in the documents, say so clearly.
66
+ Be concise and cite the specific policy section when possible."
67
+ },
68
+ {"role": "user",
69
+ "content": "Context documents: {retrieved_chunks}.
70
+ Employee question: {user_question}.
71
+ Answer:"
72
+ },
73
+ ]
74
+
75
+ prompts = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
76
+
77
+ outputs = model.generate(**tokenizer(prompts, return_tensors="pt").to(model.device), do_sample=True, temperature=0.6)
78
+
79
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
80
+ print(generated_text)
81
+ ```
82
+
83
+ For more deployment guides and resources, visit our [official documentation](https://docs.ai21.com/home).
chat_template.jinja ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if bos_token is defined and bos_token is not none %}{{- bos_token -}}{%- endif %}
2
+ {%- if tools %}
3
+ {{- '<|im_start|>system\n' }}
4
+ {%- if messages|length > 0 and messages[0].role == 'system' %}
5
+ {{- messages[0].content + '\n\n' }}
6
+ {%- endif %}
7
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
8
+ {%- for tool in tools %}
9
+ {{- "\n" }}
10
+ {{- tool | tojson }}
11
+ {%- endfor %}
12
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
13
+ {%- else %}
14
+ {%- if messages|length > 0 and messages[0].role == 'system' %}
15
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
16
+ {%- endif %}
17
+ {%- endif %}
18
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
19
+ {%- for message in messages[::-1] %}
20
+ {%- set index = (messages|length - 1) - loop.index0 %}
21
+ {%- if ns.multi_step_tool and message.role == "user" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
22
+ {%- set ns.multi_step_tool = false %}
23
+ {%- set ns.last_query_index = index %}
24
+ {%- endif %}
25
+ {%- endfor %}
26
+ {%- for message in messages %}
27
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
28
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
29
+ {%- elif message.role == "assistant" %}
30
+ {%- set content = message.content %}
31
+ {{- '<|im_start|>' + message.role + '\n' + content }}
32
+ {%- if message.tool_calls %}
33
+ {%- for tool_call in message.tool_calls %}
34
+ {%- if (loop.first and content) or (not loop.first) %}
35
+ {{- '\n' }}
36
+ {%- endif %}
37
+ {%- if tool_call.function %}
38
+ {%- set tool_call = tool_call.function %}
39
+ {%- endif %}
40
+ {{- '<tool_call>\n{"name": "' }}
41
+ {{- tool_call.name }}
42
+ {{- '", "arguments": ' }}
43
+ {%- if tool_call.arguments is string %}
44
+ {{- tool_call.arguments }}
45
+ {%- else %}
46
+ {{- tool_call.arguments | tojson }}
47
+ {%- endif %}
48
+ {{- '}\n</tool_call>' }}
49
+ {%- endfor %}
50
+ {%- endif %}
51
+ {{- '<|im_end|>\n' }}
52
+ {%- elif message.role == "tool" %}
53
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
54
+ {{- '<|im_start|>user' }}
55
+ {%- endif %}
56
+ {{- '\n<tool_response>\n' }}
57
+ {{- message.content }}
58
+ {{- '\n</tool_response>' }}
59
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- endif %}
62
+ {%- endif %}
63
+ {%- endfor %}
64
+ {%- if add_generation_prompt %}
65
+ {{- '<|im_start|>assistant\n' }}
66
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,268 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "JambaForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "attn_layer_offset": 4,
7
+ "attn_layer_period": 8,
8
+ "bos_token_id": 1,
9
+ "dtype": "float16",
10
+ "eos_token_id": 519,
11
+ "expert_layer_offset": 1,
12
+ "expert_layer_period": 2,
13
+ "hidden_act": "silu",
14
+ "hidden_size": 4096,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 14336,
17
+ "mamba_conv_bias": true,
18
+ "mamba_d_conv": 4,
19
+ "mamba_d_state": 16,
20
+ "mamba_dt_rank": 256,
21
+ "mamba_expand": 2,
22
+ "mamba_proj_bias": false,
23
+ "max_position_embeddings": 262144,
24
+ "model_type": "jamba",
25
+ "num_attention_heads": 32,
26
+ "num_experts": 16,
27
+ "num_experts_per_tok": 2,
28
+ "num_hidden_layers": 32,
29
+ "num_key_value_heads": 8,
30
+ "num_logits_to_keep": 1,
31
+ "output_router_logits": false,
32
+ "pad_token_id": 0,
33
+ "quantization_config": {
34
+ "config_groups": {
35
+ "group_0": {
36
+ "format": "pack-quantized",
37
+ "input_activations": null,
38
+ "output_activations": null,
39
+ "targets": [
40
+ "Linear"
41
+ ],
42
+ "weights": {
43
+ "actorder": null,
44
+ "block_structure": null,
45
+ "dynamic": false,
46
+ "group_size": 32,
47
+ "num_bits": 4,
48
+ "observer": "mse",
49
+ "observer_kwargs": {},
50
+ "strategy": "group",
51
+ "symmetric": true,
52
+ "type": "int"
53
+ }
54
+ }
55
+ },
56
+ "format": "pack-quantized",
57
+ "global_compression_ratio": null,
58
+ "ignore": [
59
+ "model.layers.0.mamba.in_proj",
60
+ "model.layers.0.mamba.x_proj",
61
+ "model.layers.0.mamba.dt_proj",
62
+ "model.layers.0.mamba.out_proj",
63
+ "model.layers.0.feed_forward.gate_proj",
64
+ "model.layers.0.feed_forward.up_proj",
65
+ "model.layers.0.feed_forward.down_proj",
66
+ "model.layers.1.mamba.in_proj",
67
+ "model.layers.1.mamba.x_proj",
68
+ "model.layers.1.mamba.dt_proj",
69
+ "model.layers.1.mamba.out_proj",
70
+ "model.layers.1.feed_forward.router",
71
+ "model.layers.2.mamba.in_proj",
72
+ "model.layers.2.mamba.x_proj",
73
+ "model.layers.2.mamba.dt_proj",
74
+ "model.layers.2.mamba.out_proj",
75
+ "model.layers.2.feed_forward.gate_proj",
76
+ "model.layers.2.feed_forward.up_proj",
77
+ "model.layers.2.feed_forward.down_proj",
78
+ "model.layers.3.mamba.in_proj",
79
+ "model.layers.3.mamba.x_proj",
80
+ "model.layers.3.mamba.dt_proj",
81
+ "model.layers.3.mamba.out_proj",
82
+ "model.layers.3.feed_forward.router",
83
+ "model.layers.4.self_attn.q_proj",
84
+ "model.layers.4.self_attn.k_proj",
85
+ "model.layers.4.self_attn.v_proj",
86
+ "model.layers.4.self_attn.o_proj",
87
+ "model.layers.4.feed_forward.gate_proj",
88
+ "model.layers.4.feed_forward.up_proj",
89
+ "model.layers.4.feed_forward.down_proj",
90
+ "model.layers.5.mamba.in_proj",
91
+ "model.layers.5.mamba.x_proj",
92
+ "model.layers.5.mamba.dt_proj",
93
+ "model.layers.5.mamba.out_proj",
94
+ "model.layers.5.feed_forward.router",
95
+ "model.layers.6.mamba.in_proj",
96
+ "model.layers.6.mamba.x_proj",
97
+ "model.layers.6.mamba.dt_proj",
98
+ "model.layers.6.mamba.out_proj",
99
+ "model.layers.6.feed_forward.gate_proj",
100
+ "model.layers.6.feed_forward.up_proj",
101
+ "model.layers.6.feed_forward.down_proj",
102
+ "model.layers.7.mamba.in_proj",
103
+ "model.layers.7.mamba.x_proj",
104
+ "model.layers.7.mamba.dt_proj",
105
+ "model.layers.7.mamba.out_proj",
106
+ "model.layers.7.feed_forward.router",
107
+ "model.layers.8.mamba.in_proj",
108
+ "model.layers.8.mamba.x_proj",
109
+ "model.layers.8.mamba.dt_proj",
110
+ "model.layers.8.mamba.out_proj",
111
+ "model.layers.8.feed_forward.gate_proj",
112
+ "model.layers.8.feed_forward.up_proj",
113
+ "model.layers.8.feed_forward.down_proj",
114
+ "model.layers.9.mamba.in_proj",
115
+ "model.layers.9.mamba.x_proj",
116
+ "model.layers.9.mamba.dt_proj",
117
+ "model.layers.9.mamba.out_proj",
118
+ "model.layers.9.feed_forward.router",
119
+ "model.layers.10.mamba.in_proj",
120
+ "model.layers.10.mamba.x_proj",
121
+ "model.layers.10.mamba.dt_proj",
122
+ "model.layers.10.mamba.out_proj",
123
+ "model.layers.10.feed_forward.gate_proj",
124
+ "model.layers.10.feed_forward.up_proj",
125
+ "model.layers.10.feed_forward.down_proj",
126
+ "model.layers.11.mamba.in_proj",
127
+ "model.layers.11.mamba.x_proj",
128
+ "model.layers.11.mamba.dt_proj",
129
+ "model.layers.11.mamba.out_proj",
130
+ "model.layers.11.feed_forward.router",
131
+ "model.layers.12.self_attn.q_proj",
132
+ "model.layers.12.self_attn.k_proj",
133
+ "model.layers.12.self_attn.v_proj",
134
+ "model.layers.12.self_attn.o_proj",
135
+ "model.layers.12.feed_forward.gate_proj",
136
+ "model.layers.12.feed_forward.up_proj",
137
+ "model.layers.12.feed_forward.down_proj",
138
+ "model.layers.13.mamba.in_proj",
139
+ "model.layers.13.mamba.x_proj",
140
+ "model.layers.13.mamba.dt_proj",
141
+ "model.layers.13.mamba.out_proj",
142
+ "model.layers.13.feed_forward.router",
143
+ "model.layers.14.mamba.in_proj",
144
+ "model.layers.14.mamba.x_proj",
145
+ "model.layers.14.mamba.dt_proj",
146
+ "model.layers.14.mamba.out_proj",
147
+ "model.layers.14.feed_forward.gate_proj",
148
+ "model.layers.14.feed_forward.up_proj",
149
+ "model.layers.14.feed_forward.down_proj",
150
+ "model.layers.15.mamba.in_proj",
151
+ "model.layers.15.mamba.x_proj",
152
+ "model.layers.15.mamba.dt_proj",
153
+ "model.layers.15.mamba.out_proj",
154
+ "model.layers.15.feed_forward.router",
155
+ "model.layers.16.mamba.in_proj",
156
+ "model.layers.16.mamba.x_proj",
157
+ "model.layers.16.mamba.dt_proj",
158
+ "model.layers.16.mamba.out_proj",
159
+ "model.layers.16.feed_forward.gate_proj",
160
+ "model.layers.16.feed_forward.up_proj",
161
+ "model.layers.16.feed_forward.down_proj",
162
+ "model.layers.17.mamba.in_proj",
163
+ "model.layers.17.mamba.x_proj",
164
+ "model.layers.17.mamba.dt_proj",
165
+ "model.layers.17.mamba.out_proj",
166
+ "model.layers.17.feed_forward.router",
167
+ "model.layers.18.mamba.in_proj",
168
+ "model.layers.18.mamba.x_proj",
169
+ "model.layers.18.mamba.dt_proj",
170
+ "model.layers.18.mamba.out_proj",
171
+ "model.layers.18.feed_forward.gate_proj",
172
+ "model.layers.18.feed_forward.up_proj",
173
+ "model.layers.18.feed_forward.down_proj",
174
+ "model.layers.19.mamba.in_proj",
175
+ "model.layers.19.mamba.x_proj",
176
+ "model.layers.19.mamba.dt_proj",
177
+ "model.layers.19.mamba.out_proj",
178
+ "model.layers.19.feed_forward.router",
179
+ "model.layers.20.self_attn.q_proj",
180
+ "model.layers.20.self_attn.k_proj",
181
+ "model.layers.20.self_attn.v_proj",
182
+ "model.layers.20.self_attn.o_proj",
183
+ "model.layers.20.feed_forward.gate_proj",
184
+ "model.layers.20.feed_forward.up_proj",
185
+ "model.layers.20.feed_forward.down_proj",
186
+ "model.layers.21.mamba.in_proj",
187
+ "model.layers.21.mamba.x_proj",
188
+ "model.layers.21.mamba.dt_proj",
189
+ "model.layers.21.mamba.out_proj",
190
+ "model.layers.21.feed_forward.router",
191
+ "model.layers.22.mamba.in_proj",
192
+ "model.layers.22.mamba.x_proj",
193
+ "model.layers.22.mamba.dt_proj",
194
+ "model.layers.22.mamba.out_proj",
195
+ "model.layers.22.feed_forward.gate_proj",
196
+ "model.layers.22.feed_forward.up_proj",
197
+ "model.layers.22.feed_forward.down_proj",
198
+ "model.layers.23.mamba.in_proj",
199
+ "model.layers.23.mamba.x_proj",
200
+ "model.layers.23.mamba.dt_proj",
201
+ "model.layers.23.mamba.out_proj",
202
+ "model.layers.23.feed_forward.router",
203
+ "model.layers.24.mamba.in_proj",
204
+ "model.layers.24.mamba.x_proj",
205
+ "model.layers.24.mamba.dt_proj",
206
+ "model.layers.24.mamba.out_proj",
207
+ "model.layers.24.feed_forward.gate_proj",
208
+ "model.layers.24.feed_forward.up_proj",
209
+ "model.layers.24.feed_forward.down_proj",
210
+ "model.layers.25.mamba.in_proj",
211
+ "model.layers.25.mamba.x_proj",
212
+ "model.layers.25.mamba.dt_proj",
213
+ "model.layers.25.mamba.out_proj",
214
+ "model.layers.25.feed_forward.router",
215
+ "model.layers.26.mamba.in_proj",
216
+ "model.layers.26.mamba.x_proj",
217
+ "model.layers.26.mamba.dt_proj",
218
+ "model.layers.26.mamba.out_proj",
219
+ "model.layers.26.feed_forward.gate_proj",
220
+ "model.layers.26.feed_forward.up_proj",
221
+ "model.layers.26.feed_forward.down_proj",
222
+ "model.layers.27.mamba.in_proj",
223
+ "model.layers.27.mamba.x_proj",
224
+ "model.layers.27.mamba.dt_proj",
225
+ "model.layers.27.mamba.out_proj",
226
+ "model.layers.27.feed_forward.router",
227
+ "model.layers.28.self_attn.q_proj",
228
+ "model.layers.28.self_attn.k_proj",
229
+ "model.layers.28.self_attn.v_proj",
230
+ "model.layers.28.self_attn.o_proj",
231
+ "model.layers.28.feed_forward.gate_proj",
232
+ "model.layers.28.feed_forward.up_proj",
233
+ "model.layers.28.feed_forward.down_proj",
234
+ "model.layers.29.mamba.in_proj",
235
+ "model.layers.29.mamba.x_proj",
236
+ "model.layers.29.mamba.dt_proj",
237
+ "model.layers.29.mamba.out_proj",
238
+ "model.layers.29.feed_forward.router",
239
+ "model.layers.30.mamba.in_proj",
240
+ "model.layers.30.mamba.x_proj",
241
+ "model.layers.30.mamba.dt_proj",
242
+ "model.layers.30.mamba.out_proj",
243
+ "model.layers.30.feed_forward.gate_proj",
244
+ "model.layers.30.feed_forward.up_proj",
245
+ "model.layers.30.feed_forward.down_proj",
246
+ "model.layers.31.mamba.in_proj",
247
+ "model.layers.31.mamba.x_proj",
248
+ "model.layers.31.mamba.dt_proj",
249
+ "model.layers.31.mamba.out_proj",
250
+ "model.layers.31.feed_forward.router",
251
+ "lm_head"
252
+ ],
253
+ "kv_cache_scheme": null,
254
+ "quant_method": "compressed-tensors",
255
+ "quantization_status": "compressed",
256
+ "sparsity_config": {},
257
+ "transform_config": {},
258
+ "version": "0.12.3.a20251110"
259
+ },
260
+ "rms_norm_eps": 1e-06,
261
+ "router_aux_loss_coef": 0.001,
262
+ "sliding_window": null,
263
+ "tie_word_embeddings": false,
264
+ "transformers_version": "4.57.3",
265
+ "use_cache": true,
266
+ "use_mamba_kernels": false,
267
+ "vocab_size": 65536
268
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.56.1"
7
+ }
model-00001-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e720a43124506c908ec4d79f3cdd37e028a2446dc0b1f86b26574eeff95d2cc0
3
+ size 4990988488
model-00002-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5720ae23e0e74c3c73a1e960d975ab46b7a18f2cfdf802d268bf46e90470e610
3
+ size 4999415960
model-00003-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ef2e1898bf4849b0d66b2b80f0fb9b3a4789d6c892415322c78ec55398e5091
3
+ size 4893638064
model-00004-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c28756793b47a952bc6ff7ea0c551fb0ee19547964fb451cfa0f62c824c7f8a6
3
+ size 4979344296
model-00005-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:787f1ad9b4aa5d296c807ff0e0ddcd6b9e9fea66bd1b9875daeb3befc1e4018e
3
+ size 4989193848
model-00006-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b6c05a662b1a855e9fa0f0760ec1d5df79719ea7565ca467d0679cf0470c284
3
+ size 4982623184
model-00007-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d80a2173b573ca7724855185352fea374a491ba38f5534d5b30c20a1c7b7ae21
3
+ size 4988013744
model-00008-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5490fb3d23894343e05fea11f492b1e243c22c921ef30fd070cd5d7f3e4821fb
3
+ size 3490635104
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
recipe.yaml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ default_stage:
2
+ default_modifiers:
3
+ AWQModifier:
4
+ config_groups:
5
+ group_0:
6
+ targets: [Linear]
7
+ weights:
8
+ num_bits: 4
9
+ type: int
10
+ symmetric: true
11
+ group_size: 32
12
+ strategy: group
13
+ block_structure: null
14
+ dynamic: false
15
+ actorder: null
16
+ scale_dtype: null
17
+ zp_dtype: null
18
+ observer: mse
19
+ observer_kwargs: {}
20
+ input_activations: null
21
+ output_activations: null
22
+ format: null
23
+ targets: [Linear]
24
+ ignore: [model.embed_tokens, model.final_layernorm, 're:.*feed_forward[.]gate_proj',
25
+ 're:.*feed_forward[.]up_proj', 're:.*feed_forward[.]down_proj', 're:.*router', 're:.*mamba.*',
26
+ 're:.*self_attn.*', lm_head]
27
+ mappings:
28
+ - smooth_layer: re:.*input_layernorm$
29
+ balance_layers: ['re:.*q_proj$', 're:.*k_proj$', 're:.*v_proj$', 're:.*mamba[.]in_proj$']
30
+ - smooth_layer: re:.*v_proj$
31
+ balance_layers: ['re:.*o_proj$']
32
+ - smooth_layer: re:.*mamba[.]dt_layernorm$
33
+ balance_layers: ['re:.*mamba[.]dt_proj$']
34
+ - smooth_layer: re:.*pre_ff_layernorm$
35
+ balance_layers: ['re:.*gate_proj$', 're:.*up_proj$']
36
+ - smooth_layer: re:.*up_proj$
37
+ balance_layers: ['re:.*down_proj$']
38
+ offload_device: !!python/object/apply:torch.device [cpu]
39
+ duo_scaling: true
40
+ n_grid: 20
special_tokens_map.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "bos_token": {
18
+ "content": "<|startoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "eos_token": {
25
+ "content": "<|im_end|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ "pad_token": {
32
+ "content": "<|pad|>",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ },
38
+ "unk_token": {
39
+ "content": "<|unk|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ }
45
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<|pad|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<|startoftext|>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "3": {
23
+ "content": "<|unk|>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "518": {
31
+ "content": "<|im_start|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "519": {
39
+ "content": "<|im_end|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": true
45
+ },
46
+ "520": {
47
+ "content": "<|object_ref_start|>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": true
53
+ },
54
+ "521": {
55
+ "content": "<|object_ref_end|>",
56
+ "lstrip": false,
57
+ "normalized": false,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": true
61
+ },
62
+ "522": {
63
+ "content": "<|box_start|>",
64
+ "lstrip": false,
65
+ "normalized": false,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": true
69
+ },
70
+ "523": {
71
+ "content": "<|box_end|>",
72
+ "lstrip": false,
73
+ "normalized": false,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": true
77
+ },
78
+ "524": {
79
+ "content": "<|quad_start|>",
80
+ "lstrip": false,
81
+ "normalized": false,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": true
85
+ },
86
+ "525": {
87
+ "content": "<|quad_end|>",
88
+ "lstrip": false,
89
+ "normalized": false,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": true
93
+ },
94
+ "526": {
95
+ "content": "<|vision_start|>",
96
+ "lstrip": false,
97
+ "normalized": false,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": true
101
+ },
102
+ "527": {
103
+ "content": "<|vision_end|>",
104
+ "lstrip": false,
105
+ "normalized": false,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": true
109
+ },
110
+ "528": {
111
+ "content": "<|vision_pad|>",
112
+ "lstrip": false,
113
+ "normalized": false,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": true
117
+ },
118
+ "529": {
119
+ "content": "<|image_pad|>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": true
125
+ },
126
+ "530": {
127
+ "content": "<|video_pad|>",
128
+ "lstrip": false,
129
+ "normalized": false,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": true
133
+ },
134
+ "531": {
135
+ "content": "<tool_call>",
136
+ "lstrip": false,
137
+ "normalized": false,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": false
141
+ },
142
+ "532": {
143
+ "content": "</tool_call>",
144
+ "lstrip": false,
145
+ "normalized": false,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "533": {
151
+ "content": "<|fim_prefix|>",
152
+ "lstrip": false,
153
+ "normalized": false,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "534": {
159
+ "content": "<|fim_middle|>",
160
+ "lstrip": false,
161
+ "normalized": false,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "535": {
167
+ "content": "<|fim_suffix|>",
168
+ "lstrip": false,
169
+ "normalized": false,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "536": {
175
+ "content": "<|fim_pad|>",
176
+ "lstrip": false,
177
+ "normalized": false,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "537": {
183
+ "content": "<|repo_name|>",
184
+ "lstrip": false,
185
+ "normalized": false,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "538": {
191
+ "content": "<|file_sep|>",
192
+ "lstrip": false,
193
+ "normalized": false,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "539": {
199
+ "content": "<tool_response>",
200
+ "lstrip": false,
201
+ "normalized": false,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ },
206
+ "540": {
207
+ "content": "</tool_response>",
208
+ "lstrip": false,
209
+ "normalized": false,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": false
213
+ },
214
+ "541": {
215
+ "content": "<think>",
216
+ "lstrip": false,
217
+ "normalized": false,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": false
221
+ },
222
+ "542": {
223
+ "content": "</think>",
224
+ "lstrip": false,
225
+ "normalized": false,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": false
229
+ }
230
+ },
231
+ "additional_special_tokens": [
232
+ "<|im_start|>",
233
+ "<|im_end|>",
234
+ "<|object_ref_start|>",
235
+ "<|object_ref_end|>",
236
+ "<|box_start|>",
237
+ "<|box_end|>",
238
+ "<|quad_start|>",
239
+ "<|quad_end|>",
240
+ "<|vision_start|>",
241
+ "<|vision_end|>",
242
+ "<|vision_pad|>",
243
+ "<|image_pad|>",
244
+ "<|video_pad|>"
245
+ ],
246
+ "bos_token": "<|startoftext|>",
247
+ "clean_up_tokenization_spaces": false,
248
+ "eos_token": "<|im_end|>",
249
+ "extra_special_tokens": {},
250
+ "legacy": true,
251
+ "model_max_length": 1000000000000000019884624838656,
252
+ "pad_token": "<|pad|>",
253
+ "spaces_between_special_tokens": false,
254
+ "tokenizer_class": "LlamaTokenizerFast",
255
+ "unk_token": "<|unk|>",
256
+ "use_default_system_prompt": false
257
+ }