ModernBERT-32K Hallucination Detector with Early Exit Adapters
Fast and Faithful Long-Context Hallucination Detection - A 32K-token encoder for RAG verification with configurable early exit for production deployment.
Overview
This repository contains early exit adapters for the llm-semantic-router/modernbert-base-32k-haldetect-combined model, enabling configurable accuracy-latency tradeoffs for production deployment.
| Component | Description |
|---|---|
| Base Model | llm-semantic-router/modernbert-base-32k-haldetect-combined |
| This Repo | Early exit adapters (1.5MB) at layers 6, 11, 16 |
| Architecture | ModernBERT (32K context, RoPE + Flash Attention 2) |
| Task | Token-level hallucination detection |
Key Features
1. Long-Context Support (32K tokens)
- Process entire legal contracts, financial reports, and scientific papers
- No chunking required - single-pass inference
- 4Γ longer context than previous encoder-based detectors (8K)
2. Configurable Early Exit
Exit at different layers for accuracy-latency tradeoffs:
| Exit Layer | F1 Score | Relative Accuracy | Speedup |
|---|---|---|---|
| L6 | 48.2% | 48% | 3.9Γ |
| L11 | 81.2% | 81% | 2.3Γ |
| L16 | 95.5% | 97% | 1.4Γ |
| L22 (full) | 98.4% | 100% | 1.0Γ |
Key insight: Speedup increases with context length (3.4Γ at 512 tokens β 3.9Γ at 24K tokens).
3. Production Performance on RAGTruth
| Metric | Score |
|---|---|
| Example F1 | 77.0% |
| Token F1 | 53.4% |
Installation
pip install transformers torch
Usage
Basic Hallucination Detection (Full Model)
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
# Load base model
model_name = "llm-semantic-router/modernbert-base-32k-haldetect-combined"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Format: context + response
context = "The Eiffel Tower was completed in 1889 and stands 330 meters tall."
response = "The Eiffel Tower was built in 1920 and is 500 meters tall."
inputs = tokenizer(
context,
response,
return_tensors="pt",
max_length=32768,
truncation=True
).to(model.device)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
# 0 = faithful, 1 = hallucinated (per token)
Early Exit Inference (Faster)
import torch
import torch.nn as nn
from transformers import AutoModelForTokenClassification, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load base model
model_name = "llm-semantic-router/modernbert-base-32k-haldetect-combined"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
output_hidden_states=True,
)
model = model.cuda().eval()
# Download and load early exit adapters
adapter_path = hf_hub_download(
repo_id="HuaminChen/modernbert-32k-hallucination-early-exit",
filename="early_exit_adapters.pt"
)
adapter_weights = torch.load(adapter_path, map_location="cpu")
# Create adapter modules
class EarlyExitAdapter(nn.Module):
def __init__(self, hidden_size=768, bottleneck_size=256, num_classes=2):
super().__init__()
self.adapter = nn.Sequential(
nn.LayerNorm(hidden_size),
nn.Linear(hidden_size, bottleneck_size),
nn.GELU(),
nn.Dropout(0.1),
nn.Linear(bottleneck_size, bottleneck_size),
nn.GELU(),
nn.Dropout(0.1),
nn.Linear(bottleneck_size, num_classes),
)
def forward(self, hidden_states):
return self.adapter(hidden_states)
# Load adapters for each exit layer
adapters = {}
for layer in [6, 11, 16]:
adapters[layer] = EarlyExitAdapter().to(torch.bfloat16).cuda()
# Load weights
state_dict = {
k.replace(f"{layer}.", ""): v
for k, v in adapter_weights.items()
if k.startswith(f"{layer}.")
}
adapters[layer].load_state_dict(state_dict)
adapters[layer].eval()
def early_exit_predict(text_context, text_response, exit_layer=16, confidence_threshold=0.9):
"""
Predict with early exit.
Args:
exit_layer: Which layer to exit at (6, 11, 16, or 22)
confidence_threshold: Exit early if confidence exceeds this
"""
inputs = tokenizer(
text_context,
text_response,
return_tensors="pt",
max_length=32768,
truncation=True
).to("cuda")
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
if exit_layer == 22:
# Use full model
logits = outputs.logits
else:
# Use early exit adapter
hidden = outputs.hidden_states[exit_layer]
logits = adapters[exit_layer](hidden)
predictions = torch.argmax(logits, dim=-1)
probs = torch.softmax(logits, dim=-1)
return predictions, probs
# Example usage
context = "The contract specifies a 30-day notice period for termination."
response = "According to the contract, termination requires 60 days notice."
# Fast inference with L16 (97% accuracy, 1.4x speedup)
preds, probs = early_exit_predict(context, response, exit_layer=16)
print(f"Predictions: {preds}")
print(f"Max hallucination probability: {probs[0, :, 1].max():.2%}")
Dynamic Early Exit (Adaptive)
def dynamic_early_exit(text_context, text_response, thresholds={6: 0.95, 11: 0.9, 16: 0.85}):
"""
Dynamically choose exit layer based on confidence.
Exit early if confident, otherwise continue to deeper layers.
"""
inputs = tokenizer(
text_context,
text_response,
return_tensors="pt",
max_length=32768,
truncation=True
).to("cuda")
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
for layer in [6, 11, 16]:
hidden = outputs.hidden_states[layer]
logits = adapters[layer](hidden)
probs = torch.softmax(logits, dim=-1)
confidence = probs.max(dim=-1).values.mean()
if confidence >= thresholds[layer]:
return torch.argmax(logits, dim=-1), layer, confidence.item()
# Fall back to full model
return torch.argmax(outputs.logits, dim=-1), 22, 1.0
# Example
preds, exit_layer, conf = dynamic_early_exit(context, response)
print(f"Exited at layer {exit_layer} with confidence {conf:.2%}")
Model Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input Tokens β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ModernBERT Encoder (Frozen) β
β β
β Layer 1-5: [ββββββββββββββββββββββββββββββββββ] β
β β β
β Layer 6: [ββββββββββββββββββββββββββββββββββ]βββ¬βββΊ Adapter 6 βββΊ Exit (48% F1)
β β β 3.9Γ speedup
β Layer 7-10:[ββββββββββββββββββββββββββββββββββ] β
β β β
β Layer 11: [ββββββββββββββββββββββββββββββββββ]βββΌβββΊ Adapter 11 βββΊ Exit (81% F1)
β β β 2.3Γ speedup
β Layer 12-15:[βββββββββββββββββββββββββββββββββ] β
β β β
β Layer 16: [ββββββββββββββββββββββββββββββββββ]βββΌβββΊ Adapter 16 βββΊ Exit (96% F1)
β β β 1.4Γ speedup
β Layer 17-21:[βββββββββββββββββββββββββββββββββ] β
β β β
β Layer 22: [ββββββββββββββββββββββββββββββββββ]βββ΄βββΊ Classifier βββΊ Exit (98% F1)
β 1.0Γ speedup
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Training Details
Base Model Training
- Extended from 8K to 32K tokens using YaRN RoPE scaling
- Fine-tuned on RAGTruth dataset for hallucination detection
- Achieves 77.0% Example F1
Early Exit Adapter Training
- Method: Self-distillation from Layer 22 to earlier layers
- Adapters: Lightweight bottleneck adapters (256-dim) at layers 6, 11, 16
- Loss: KL divergence + task loss
- Training data: RAGTruth + long-context hallucination benchmark
Files in This Repository
| File | Description |
|---|---|
early_exit_adapters.pt |
PyTorch weights for early exit adapters (1.5MB) |
config.json |
Model configuration and performance metrics |
inference.py |
Example inference code |
Limitations
- Language: Primarily trained on English data
- Domain: Best performance on factual/encyclopedic content
- Memory: Full 32K context requires ~8GB GPU memory
- Calibration: Early exit thresholds may need task-specific tuning
Citation
@article{modernbert-32k-hallucination,
title={Fast and Faithful: Long-Context Hallucination Detection with Early Exit Adapters},
author={Anonymous},
year={2026},
note={Under review}
}
License
MIT License
- Downloads last month
- -
Model tree for HuaminChen/modernbert-32k-hallucination-early-exit
Base model
answerdotai/ModernBERT-base
Finetuned
llm-semantic-router/modernbert-base-32k
Evaluation results
- Example F1 on RAGTruthself-reported0.770
- Token F1 on RAGTruthself-reported0.534