Vibe Coding Router v5

A three-tier cascaded router for coding tasks that routes prompts between:

  • Local: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
  • Sonnet: Claude Sonnet 4.6 (medium-complexity cloud)
  • Opus: Claude Opus 4.6 (max-capability cloud)

What's New in v5

v4 suffered from inverted routing — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:

  1. 7 new complexity features (45 handcrafted total): is_coding_task, junk_score, scope_breadth, imperative_verb_density, noun_phrase_density, interaction_complexity, requirement_clause_count
  2. Centered complexity premium: Adjusts training margins by premium * (complexity_score - center) so complex tasks push toward cloud and simple tasks push toward local
  3. Junk prompt clamping: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
  4. Reward weight cap: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance

Architecture

Two cascaded binary MLP routers trained with Privileged Information Distillation (PID):

  • Router A (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU
  • Router B (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU

Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).

Training

  • Data: 1,644 coding prompts with real quality scores from all three models
  • Judge: GPT-5.4 scoring correctness, completeness, code quality, explanation
  • Loss: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
  • Label smoothing: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
  • Complexity premium: 2.0, centered at 0.3
  • HP sweep: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
  • Threshold A: 0.60 (manually tuned for routing behavior — see note below)
  • Threshold B: 0.474 (calibrated on validation set)

Threshold Note

The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.

Real-World Routing (28 test queries, threshold_a=0.60)

Category Local Sonnet Opus
Simple (8) 5 (62%) 0 3 (38%)
Medium (8) 3 (38%) 0 5 (62%)
Complex (6) 1 (17%) 1 (17%) 4 (67%)

v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).

Test Set Results (calibrated thresholds)

Metric Value
Utility 0.6205
Oracle Utility 0.7179
Regret 0.0973

Files

  • router_a.safetensors — Router A weights (32×16 MLP, 13KB)
  • router_b.safetensors — Router B weights (128×64 MLP, 76KB)
  • config.json — Model config, thresholds, HP, training results
  • scaler.pkl — StandardScaler for feature normalization
  • embedding_extractor.pkl — PCA-reduced sentence-transformers extractor
  • sweep_results.json — Full 108-config HP sweep results

Usage

from router.three_tier_inference import ThreeTierRouter

router = ThreeTierRouter("models/three_tier_v5")
result = router.route("Write a Python function to sort a list")
# result.decision: "local", "sonnet", or "opus"
# result.p_cloud: probability of cloud routing
# result.p_opus: probability of opus (if routed to cloud)
Downloads last month
84
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support