Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k

Retrain Notice (2026-03-07): > This model was retrained from scratch again to address the high final loss observed in the initial QLoRA version. The upgrade to 16-bit LoRA with r=128 and rsLoRA enabled has resulted in a much lower final loss and a more stable, "lossless" transfer of reasoning capabilities.

# Model Introduction

Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k is a highly capable reasoning and coding model fine-tuned on top of the Qwen3.5-9B hybrid dense architecture. The model's core directive is to leverage state-of-the-art Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus interactions, with a specialized focus on extended output generation and improved Luau programming capability.

Through Supervised Fine-Tuning (SFT) focusing on structured reasoning logic and a massive 32k output length max, this model excels in breaking down complex user problems, planning step-by-step methodologies within strictly formatted <think> tags, and delivering comprehensive, nuanced solutions—even for highly extensive generation tasks.

# Benchmark

Benchmark Baseline (9B) Distilled (9B)
GPQA Diamond (0-shot) 46.46 38.38
ARC-Challenge (25-shot) 67.57 68.43
HellaSwag (0-shot) 76.30 76.19
MMLU Overall (0-shot) 1.07 12.59
Humanities 2.25 21.49
Social Sciences 0.49 7.70
STEM 0.35 6.47
Other 0.58 10.17
U.S. History 22.06 78.43
World History 16.88 71.31

The benchmark is taken in 8-bit & 0.0 temperature using lm eval. Higher the score is better.

# Training Pipeline Overview

Base Model (Qwen3.5-9B)
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA 16-bit (r=128, α=128, rsLoRA)
(Response-Only Training masked on "<|im_start|>assistant\n")
(Max 32k Output Length)
+
nohurry/Opus-4.6-Reasoning-3000x-filtered + luau coding samples
(shuffled)
 │
 ▼
Final Model (Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k)

# Supervised Fine-Tuning (SFT) Details

  • Objective: To inject high-density reasoning logic, establish a strict internal thinking format prior to output, and train the model to sustain coherent generation over exceptionally long contexts.
  • Extended Output Capacity: Trained specifically to handle up to 32,768 (32k) tokens of maximum output (recommended), allowing for massive codebases, comprehensive essays, and deeply detailed reasoning traces.
  • LoRA Configuration: Fine-tuned efficiently using LoRA (16-bit) with both Rank (r) set to 128 and Alpha (α) set to 128, ensuring strong adaptation and retention of complex Opus-level logic.
  • Rank Scaling (rsLoRA): Enabled Rank-Stabilized LoRA. This uses a specialized scaling factor (1/√r) which allows for the higher rank of 128 to be utilized effectively without exploding gradients, leading to a significantly lower and more stable final loss.
  • Method: Utilized Unsloth for highly efficient memory and compute optimization. A critical component was the train_on_responses_only strategy, masking instructions so the loss is purely calculated over the generation of the <think> sequences and the subsequent solutions.
  • Format Enforcement: All training samples were systematically normalized so the model strictly abides by the structure <think> {internal reasoning} </think>\n {final answer}.

# Datasets Used

The dataset consists of highly curated, filtered reasoning distillation data, supplemented by specialized coding sets:

Dataset Name Description / Purpose
nohurry/Opus-4.6-Reasoning-3000x-filtered Provides comprehensive, high-quality Claude 4.6 Opus reasoning trajectories.
Custom Luau Coding Set 75 meticulously crafted various Luau coding samples generated natively by Opus 4.6, injecting specialized high-quality domain knowledge for Roblox/Luau scripting capability.

# Training Compute & Loss Curve

  • Hardware: 1x NVIDIA A100 (80GB)
  • Training Duration: ~2 Hour (previously was ~4 Hours)
  • Estimated Total Cost: $2.50 (previously was $3.50)
  • Distillation Efficacy: The loss curve demonstrated a strong, healthy downward trajectory throughout the run, confirming successful knowledge transfer from the Opus teacher model. The model converged steadily from an initial loss of 0.614357 down to a final loss of 0.222413.

# Core Skills & Capabilities

  1. Massive Output Generation: Capable of sustaining coherent, high-quality output for up to 32k tokens, making it ideal for writing extensive code, documentation, or deep analytical reports in a single shot.
  2. Modular & Structured Thinking: Inheriting traits from Opus-level reasoning, the model confidently parses prompts and outlines plans sequentially in its <think> block, avoiding exploratory "trial-and-error" self-doubt.
  3. Luau Proficiency: Thanks to the targeted 75-sample dataset, the model exhibits improved syntax adherence and logic formulation for the Luau programming language.

# Limitations & Intended Use

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM. Extended 32k outputs may experience minor drift or hallucinate external facts if relying on real-world verification without grounding.
  • Intended Scenario: Best suited for offline analytical tasks, heavy coding (especially Luau), math, and logic-dependent prompting where the user needs transparent internal logic and extremely long, continuous outputs.

# Acknowledgements

This model's development was made possible by the foundational tools and contributions from the broader AI ecosystem:

  • Unsloth AI: For their state-of-the-art framework, enabling highly efficient, memory-optimized LoRA tuning and seamless 32k context scaling.
  • Qwen Team: For engineering the robust and highly capable Qwen3.5-9B dense base architecture.
  • Dataset Contributors: Special recognition to nohurry for the rigorous curation of the Claude 4.6 Opus reasoning trajectories, which serves as the core cognitive engine for this project's SFT phase.

-https://ko-fi.com/khtsly

Downloads last month
4,192
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for khtsly/Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k-GGUF

Finetuned
Qwen/Qwen3.5-9B
Adapter
(29)
this model

Dataset used to train khtsly/Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k-GGUF

Collection including khtsly/Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k-GGUF