Complexity Deep 150M
A novel transformer architecture with deterministic Token-Routed MLP and INL Dynamics.
Model Details
| Attribute | Value |
|---|---|
| Parameters | ~150M |
| Hidden Size | 768 |
| Layers | 12 |
| Attention Heads | 12 |
| KV Heads (GQA) | 4 |
| Experts | 4 |
| Context Length | 2048 |
| Precision | FP16 |
| Training Steps | 600k (early checkpoint) |
Architecture Innovations
1. Token-Routed MLP (Deterministic MoE)
Unlike learned routing (Mixtral, DeepSeek), we route tokens to experts based on their token ID:
expert_id = token_id % num_experts
Token 0 -> Expert 0 Token 4 -> Expert 0
Token 1 -> Expert 1 Token 5 -> Expert 1
Token 2 -> Expert 2 Token 6 -> Expert 2
Token 3 -> Expert 3 Token 7 -> Expert 3
...
Why modulo routing?
- Uniform distribution: Each expert receives exactly 25% of tokens
- No expert collapse: Frequent tokens (low IDs) are spread across all experts
- Zero routing parameters: No router network to learn
- Zero load balancing loss: Perfectly balanced by design
- 100% deterministic and parallelizable
- One line of code:
token_id % num_experts
Scaling Comparison: Token-Routed vs Learned MoE (at 540B scale - Mathematical Analysis)
| Aspect | Learned MoE (RT-2, Mixtral) | Token-Routed (Ours) |
|---|---|---|
| Total Parameters | 540B | 540B |
| Active Parameters | ~37B (top-k) | ~37B (1 expert/token) |
| Routing Latency | 5-10ms | <0.1ms |
| Total Forward Pass | ~50ms | ~40ms |
| Deterministic | No | Yes |
| Predictable Behavior | No | Yes |
| Safety Certifiable | No | Yes |
Key insight: These are mathematical/theoretical results based on architecture analysis, not empirical benchmarks. For robotics and real-time applications, determinism and low latency are critical. Token-Routed MLP achieves the same sparsity benefits as learned MoE without the routing overhead.
2. INL Dynamics (Robotics-Grade Control)
A control system inspired by robotics, applied between attention and MLP:
error = h - mu # deviation from equilibrium
v_next = alpha * v - beta * error # velocity update (momentum + correction)
h_next = h + dt * gate * v_next # position update (integration)
Benefits:
- Smooth trajectories (no jerky token generation)
- Stable convergence (PID-like control)
- Learnable dynamics per dimension
- Real-time capable
3. Modern Attention
- GQA: 4 KV heads (3x less KV cache than MHA)
- QK Norm: Attention stability
- SDPA: Flash Attention via PyTorch 2.0+
- RoPE: Rotary positional embeddings
Layer Architecture
Input
|
v
[RMSNorm] --> [GQA Attention + QK Norm] --> [INL Dynamics] --> [RMSNorm] --> [Token-Routed MLP]
| |
+------------------------------- Residual -------------------------------------+
|
v
Output
Training Status
- Current Step: 608,000 (early checkpoint)
- Target: 1,000,000 steps
- Dataset: FineWeb-Edu (French/English)
- Hardware: H100 80GB
- Loss: ~5.0
- Perplexity: ~150
Note: This is an early checkpoint. The model generates text but is not yet coherent. Full training in progress.
Installation
pip install complexity-deep
pip install pyllm-inference
Usage
Python API
from complexity_deep import DeepForCausalLM, ComplexityConfig
from tokenizers import Tokenizer
# Load model and tokenizer
model = DeepForCausalLM.from_pretrained("Pacific-Prime/pacific-prime")
tokenizer = Tokenizer.from_file("tokenizer.json")
# Generate
input_ids = torch.tensor([tokenizer.encode("Hello").ids])
output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0].tolist()))
PyLLM Server (OpenAI-compatible API)
pyllm serve --model Pacific-Prime/pacific-prime --port 8000
Then use with any OpenAI-compatible client:
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="pacific-prime",
messages=[{"role": "user", "content": "Hello!"}]
)
What's Original Here?
| Innovation | Status |
|---|---|
| Token-Routed MLP (deterministic) | Novel - No one else does this |
| INL Dynamics in transformers | Novel - Robotics control in LLMs |
| GQA at 150M scale | Rare |
| QK Norm at 150M scale | Rare |
Files
model.safetensors- Model weights (938MB, FP16)config.json- Architecture configurationtokenizer.json- BPE tokenizer (100K vocab)
Citation
@misc{complexity-deep-2025,
title={Complexity Deep: Deterministic Token-Routed MLP with INL Dynamics},
author={Pacific Prime},
year={2025},
url={https://huggingface.co/Pacific-Prime/pacific-prime}
}
Links
License
CC-BY-4.0 (Creative Commons Attribution 4.0)
- Downloads last month
- 280
