Complexity Deep 150M

A novel transformer architecture with deterministic Token-Routed MLP and INL Dynamics.

Model Details

Attribute Value
Parameters ~150M
Hidden Size 768
Layers 12
Attention Heads 12
KV Heads (GQA) 4
Experts 4
Context Length 2048
Precision FP16
Training Steps 600k (early checkpoint)

Architecture Innovations

1. Token-Routed MLP (Deterministic MoE)

Unlike learned routing (Mixtral, DeepSeek), we route tokens to experts based on their token ID:

expert_id = token_id % num_experts

Token 0 -> Expert 0    Token 4 -> Expert 0
Token 1 -> Expert 1    Token 5 -> Expert 1
Token 2 -> Expert 2    Token 6 -> Expert 2
Token 3 -> Expert 3    Token 7 -> Expert 3
...

Why modulo routing?

  • Uniform distribution: Each expert receives exactly 25% of tokens
  • No expert collapse: Frequent tokens (low IDs) are spread across all experts
  • Zero routing parameters: No router network to learn
  • Zero load balancing loss: Perfectly balanced by design
  • 100% deterministic and parallelizable
  • One line of code: token_id % num_experts

Scaling Comparison: Token-Routed vs Learned MoE (at 540B scale - Mathematical Analysis)

Aspect Learned MoE (RT-2, Mixtral) Token-Routed (Ours)
Total Parameters 540B 540B
Active Parameters ~37B (top-k) ~37B (1 expert/token)
Routing Latency 5-10ms <0.1ms
Total Forward Pass ~50ms ~40ms
Deterministic No Yes
Predictable Behavior No Yes
Safety Certifiable No Yes

Key insight: These are mathematical/theoretical results based on architecture analysis, not empirical benchmarks. For robotics and real-time applications, determinism and low latency are critical. Token-Routed MLP achieves the same sparsity benefits as learned MoE without the routing overhead.

2. INL Dynamics (Robotics-Grade Control)

A control system inspired by robotics, applied between attention and MLP:

error = h - mu                      # deviation from equilibrium
v_next = alpha * v - beta * error   # velocity update (momentum + correction)
h_next = h + dt * gate * v_next     # position update (integration)

Benefits:

  • Smooth trajectories (no jerky token generation)
  • Stable convergence (PID-like control)
  • Learnable dynamics per dimension
  • Real-time capable

3. Modern Attention

  • GQA: 4 KV heads (3x less KV cache than MHA)
  • QK Norm: Attention stability
  • SDPA: Flash Attention via PyTorch 2.0+
  • RoPE: Rotary positional embeddings

Layer Architecture

Input
  |
  v
[RMSNorm] --> [GQA Attention + QK Norm] --> [INL Dynamics] --> [RMSNorm] --> [Token-Routed MLP]
  |                                                                              |
  +------------------------------- Residual -------------------------------------+
  |
  v
Output

Training Status

Training Progress

  • Current Step: 608,000 (early checkpoint)
  • Target: 1,000,000 steps
  • Dataset: FineWeb-Edu (French/English)
  • Hardware: H100 80GB
  • Loss: ~5.0
  • Perplexity: ~150

Note: This is an early checkpoint. The model generates text but is not yet coherent. Full training in progress.

Installation

pip install complexity-deep
pip install pyllm-inference

Usage

Python API

from complexity_deep import DeepForCausalLM, ComplexityConfig
from tokenizers import Tokenizer

# Load model and tokenizer
model = DeepForCausalLM.from_pretrained("Pacific-Prime/pacific-prime")
tokenizer = Tokenizer.from_file("tokenizer.json")

# Generate
input_ids = torch.tensor([tokenizer.encode("Hello").ids])
output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0].tolist()))

PyLLM Server (OpenAI-compatible API)

pyllm serve --model Pacific-Prime/pacific-prime --port 8000

Then use with any OpenAI-compatible client:

import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
    model="pacific-prime",
    messages=[{"role": "user", "content": "Hello!"}]
)

What's Original Here?

Innovation Status
Token-Routed MLP (deterministic) Novel - No one else does this
INL Dynamics in transformers Novel - Robotics control in LLMs
GQA at 150M scale Rare
QK Norm at 150M scale Rare

Files

  • model.safetensors - Model weights (938MB, FP16)
  • config.json - Architecture configuration
  • tokenizer.json - BPE tokenizer (100K vocab)

Citation

@misc{complexity-deep-2025,
  title={Complexity Deep: Deterministic Token-Routed MLP with INL Dynamics},
  author={Pacific Prime},
  year={2025},
  url={https://huggingface.co/Pacific-Prime/pacific-prime}
}

Links

License

CC-BY-4.0 (Creative Commons Attribution 4.0)

Downloads last month
280
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support