Vibe Coding Router v5

A three-tier cascaded router for coding tasks that routes prompts between:

Local: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
Sonnet: Claude Sonnet 4.6 (medium-complexity cloud)
Opus: Claude Opus 4.6 (max-capability cloud)

What's New in v5

v4 suffered from inverted routing — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:

7 new complexity features (45 handcrafted total): is_coding_task, junk_score, scope_breadth, imperative_verb_density, noun_phrase_density, interaction_complexity, requirement_clause_count
Centered complexity premium: Adjusts training margins by premium * (complexity_score - center) so complex tasks push toward cloud and simple tasks push toward local
Junk prompt clamping: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
Reward weight cap: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance

Architecture

Two cascaded binary MLP routers trained with Privileged Information Distillation (PID):

Router A (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU
Router B (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU

Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).

Training

Data: 1,644 coding prompts with real quality scores from all three models
Judge: GPT-5.4 scoring correctness, completeness, code quality, explanation
Loss: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
Label smoothing: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
Complexity premium: 2.0, centered at 0.3
HP sweep: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
Threshold A: 0.60 (manually tuned for routing behavior — see note below)
Threshold B: 0.474 (calibrated on validation set)

Threshold Note

The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.

Real-World Routing (28 test queries, threshold_a=0.60)

Category	Local	Sonnet	Opus
Simple (8)	5 (62%)	0	3 (38%)
Medium (8)	3 (38%)	0	5 (62%)
Complex (6)	1 (17%)	1 (17%)	4 (67%)

v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).

Test Set Results (calibrated thresholds)

Metric	Value
Utility	0.6205
Oracle Utility	0.7179
Regret	0.0973

Files

router_a.safetensors — Router A weights (32×16 MLP, 13KB)
router_b.safetensors — Router B weights (128×64 MLP, 76KB)
config.json — Model config, thresholds, HP, training results
scaler.pkl — StandardScaler for feature normalization
embedding_extractor.pkl — PCA-reduced sentence-transformers extractor
sweep_results.json — Full 108-config HP sweep results

Usage

from router.three_tier_inference import ThreeTierRouter

router = ThreeTierRouter("models/three_tier_v5")
result = router.route("Write a Python function to sort a list")
# result.decision: "local", "sonnet", or "opus"
# result.p_cloud: probability of cloud routing
# result.p_opus: probability of opus (if routed to cloud)

Downloads last month: 84

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support