Vibe Coding Router v5
A three-tier cascaded router for coding tasks that routes prompts between:
- Local: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
- Sonnet: Claude Sonnet 4.6 (medium-complexity cloud)
- Opus: Claude Opus 4.6 (max-capability cloud)
What's New in v5
v4 suffered from inverted routing — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:
- 7 new complexity features (45 handcrafted total):
is_coding_task,junk_score,scope_breadth,imperative_verb_density,noun_phrase_density,interaction_complexity,requirement_clause_count - Centered complexity premium: Adjusts training margins by
premium * (complexity_score - center)so complex tasks push toward cloud and simple tasks push toward local - Junk prompt clamping: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
- Reward weight cap: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance
Architecture
Two cascaded binary MLP routers trained with Privileged Information Distillation (PID):
- Router A (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU
- Router B (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU
Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).
Training
- Data: 1,644 coding prompts with real quality scores from all three models
- Judge: GPT-5.4 scoring correctness, completeness, code quality, explanation
- Loss: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
- Label smoothing: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
- Complexity premium: 2.0, centered at 0.3
- HP sweep: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
- Threshold A: 0.60 (manually tuned for routing behavior — see note below)
- Threshold B: 0.474 (calibrated on validation set)
Threshold Note
The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.
Real-World Routing (28 test queries, threshold_a=0.60)
| Category | Local | Sonnet | Opus |
|---|---|---|---|
| Simple (8) | 5 (62%) | 0 | 3 (38%) |
| Medium (8) | 3 (38%) | 0 | 5 (62%) |
| Complex (6) | 1 (17%) | 1 (17%) | 4 (67%) |
v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).
Test Set Results (calibrated thresholds)
| Metric | Value |
|---|---|
| Utility | 0.6205 |
| Oracle Utility | 0.7179 |
| Regret | 0.0973 |
Files
router_a.safetensors— Router A weights (32×16 MLP, 13KB)router_b.safetensors— Router B weights (128×64 MLP, 76KB)config.json— Model config, thresholds, HP, training resultsscaler.pkl— StandardScaler for feature normalizationembedding_extractor.pkl— PCA-reduced sentence-transformers extractorsweep_results.json— Full 108-config HP sweep results
Usage
from router.three_tier_inference import ThreeTierRouter
router = ThreeTierRouter("models/three_tier_v5")
result = router.route("Write a Python function to sort a list")
# result.decision: "local", "sonnet", or "opus"
# result.p_cloud: probability of cloud routing
# result.p_opus: probability of opus (if routed to cloud)
- Downloads last month
- 84
Quantized