Qwen3-0.6B-DISTILL-glm-4.7-think
This model is a fine-tuned version of Qwen/Qwen3-0.6B trained on high-reasoning conversational data from GLM 4.7 by Z.ai.
Model Details
- Base Model: Qwen/Qwen3-0.6B
- Fine-tuning Dataset: TeichAI/glm-4.7-2000x
- Context Length: 1048576 tokens
- Special Feature: Thinking/Reasoning with
<think>tags
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")
messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
- Epochs: 2
- Learning Rate: 2e-5
- Batch Size: 8 (with gradient accumulation)
- Precision: FP16
- Hardware: Multi-GPU with DeepSpeed ZeRO-3
License
Apache 2.0
- Downloads last month
- 7