You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ModernBERT-32K Hallucination Detector with Early Exit Adapters

Fast and Faithful Long-Context Hallucination Detection - A 32K-token encoder for RAG verification with configurable early exit for production deployment.

Overview

This repository contains early exit adapters for the llm-semantic-router/modernbert-base-32k-haldetect-combined model, enabling configurable accuracy-latency tradeoffs for production deployment.

Component Description
Base Model llm-semantic-router/modernbert-base-32k-haldetect-combined
This Repo Early exit adapters (1.5MB) at layers 6, 11, 16
Architecture ModernBERT (32K context, RoPE + Flash Attention 2)
Task Token-level hallucination detection

Key Features

1. Long-Context Support (32K tokens)

  • Process entire legal contracts, financial reports, and scientific papers
  • No chunking required - single-pass inference
  • 4Γ— longer context than previous encoder-based detectors (8K)

2. Configurable Early Exit

Exit at different layers for accuracy-latency tradeoffs:

Exit Layer F1 Score Relative Accuracy Speedup
L6 48.2% 48% 3.9Γ—
L11 81.2% 81% 2.3Γ—
L16 95.5% 97% 1.4Γ—
L22 (full) 98.4% 100% 1.0Γ—

Key insight: Speedup increases with context length (3.4Γ— at 512 tokens β†’ 3.9Γ— at 24K tokens).

3. Production Performance on RAGTruth

Metric Score
Example F1 77.0%
Token F1 53.4%

Installation

pip install transformers torch

Usage

Basic Hallucination Detection (Full Model)

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer

# Load base model
model_name = "llm-semantic-router/modernbert-base-32k-haldetect-combined"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Format: context + response
context = "The Eiffel Tower was completed in 1889 and stands 330 meters tall."
response = "The Eiffel Tower was built in 1920 and is 500 meters tall."

inputs = tokenizer(
    context,
    response,
    return_tensors="pt",
    max_length=32768,
    truncation=True
).to(model.device)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)
    # 0 = faithful, 1 = hallucinated (per token)

Early Exit Inference (Faster)

import torch
import torch.nn as nn
from transformers import AutoModelForTokenClassification, AutoTokenizer
from huggingface_hub import hf_hub_download

# Load base model
model_name = "llm-semantic-router/modernbert-base-32k-haldetect-combined"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    output_hidden_states=True,
)
model = model.cuda().eval()

# Download and load early exit adapters
adapter_path = hf_hub_download(
    repo_id="HuaminChen/modernbert-32k-hallucination-early-exit",
    filename="early_exit_adapters.pt"
)
adapter_weights = torch.load(adapter_path, map_location="cpu")

# Create adapter modules
class EarlyExitAdapter(nn.Module):
    def __init__(self, hidden_size=768, bottleneck_size=256, num_classes=2):
        super().__init__()
        self.adapter = nn.Sequential(
            nn.LayerNorm(hidden_size),
            nn.Linear(hidden_size, bottleneck_size),
            nn.GELU(),
            nn.Dropout(0.1),
            nn.Linear(bottleneck_size, bottleneck_size),
            nn.GELU(),
            nn.Dropout(0.1),
            nn.Linear(bottleneck_size, num_classes),
        )
    
    def forward(self, hidden_states):
        return self.adapter(hidden_states)

# Load adapters for each exit layer
adapters = {}
for layer in [6, 11, 16]:
    adapters[layer] = EarlyExitAdapter().to(torch.bfloat16).cuda()
    # Load weights
    state_dict = {
        k.replace(f"{layer}.", ""): v 
        for k, v in adapter_weights.items() 
        if k.startswith(f"{layer}.")
    }
    adapters[layer].load_state_dict(state_dict)
    adapters[layer].eval()

def early_exit_predict(text_context, text_response, exit_layer=16, confidence_threshold=0.9):
    """
    Predict with early exit.
    
    Args:
        exit_layer: Which layer to exit at (6, 11, 16, or 22)
        confidence_threshold: Exit early if confidence exceeds this
    """
    inputs = tokenizer(
        text_context,
        text_response,
        return_tensors="pt",
        max_length=32768,
        truncation=True
    ).to("cuda")
    
    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)
        
        if exit_layer == 22:
            # Use full model
            logits = outputs.logits
        else:
            # Use early exit adapter
            hidden = outputs.hidden_states[exit_layer]
            logits = adapters[exit_layer](hidden)
        
        predictions = torch.argmax(logits, dim=-1)
        probs = torch.softmax(logits, dim=-1)
        
    return predictions, probs

# Example usage
context = "The contract specifies a 30-day notice period for termination."
response = "According to the contract, termination requires 60 days notice."

# Fast inference with L16 (97% accuracy, 1.4x speedup)
preds, probs = early_exit_predict(context, response, exit_layer=16)
print(f"Predictions: {preds}")
print(f"Max hallucination probability: {probs[0, :, 1].max():.2%}")

Dynamic Early Exit (Adaptive)

def dynamic_early_exit(text_context, text_response, thresholds={6: 0.95, 11: 0.9, 16: 0.85}):
    """
    Dynamically choose exit layer based on confidence.
    Exit early if confident, otherwise continue to deeper layers.
    """
    inputs = tokenizer(
        text_context,
        text_response,
        return_tensors="pt",
        max_length=32768,
        truncation=True
    ).to("cuda")
    
    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)
        
        for layer in [6, 11, 16]:
            hidden = outputs.hidden_states[layer]
            logits = adapters[layer](hidden)
            probs = torch.softmax(logits, dim=-1)
            confidence = probs.max(dim=-1).values.mean()
            
            if confidence >= thresholds[layer]:
                return torch.argmax(logits, dim=-1), layer, confidence.item()
        
        # Fall back to full model
        return torch.argmax(outputs.logits, dim=-1), 22, 1.0

# Example
preds, exit_layer, conf = dynamic_early_exit(context, response)
print(f"Exited at layer {exit_layer} with confidence {conf:.2%}")

Model Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Input Tokens                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              ModernBERT Encoder (Frozen)                 β”‚
β”‚                                                         β”‚
β”‚   Layer 1-5: [──────────────────────────────────]       β”‚
β”‚                           β”‚                             β”‚
β”‚   Layer 6:   [──────────────────────────────────]──┬──► Adapter 6 ──► Exit (48% F1)
β”‚                           β”‚                        β”‚                   3.9Γ— speedup
β”‚   Layer 7-10:[──────────────────────────────────]  β”‚
β”‚                           β”‚                        β”‚
β”‚   Layer 11:  [──────────────────────────────────]──┼──► Adapter 11 ──► Exit (81% F1)
β”‚                           β”‚                        β”‚                    2.3Γ— speedup
β”‚   Layer 12-15:[─────────────────────────────────]  β”‚
β”‚                           β”‚                        β”‚
β”‚   Layer 16:  [──────────────────────────────────]──┼──► Adapter 16 ──► Exit (96% F1)
β”‚                           β”‚                        β”‚                    1.4Γ— speedup
β”‚   Layer 17-21:[─────────────────────────────────]  β”‚
β”‚                           β”‚                        β”‚
β”‚   Layer 22:  [──────────────────────────────────]──┴──► Classifier ──► Exit (98% F1)
β”‚                                                                         1.0Γ— speedup
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Training Details

Base Model Training

  • Extended from 8K to 32K tokens using YaRN RoPE scaling
  • Fine-tuned on RAGTruth dataset for hallucination detection
  • Achieves 77.0% Example F1

Early Exit Adapter Training

  • Method: Self-distillation from Layer 22 to earlier layers
  • Adapters: Lightweight bottleneck adapters (256-dim) at layers 6, 11, 16
  • Loss: KL divergence + task loss
  • Training data: RAGTruth + long-context hallucination benchmark

Files in This Repository

File Description
early_exit_adapters.pt PyTorch weights for early exit adapters (1.5MB)
config.json Model configuration and performance metrics
inference.py Example inference code

Limitations

  • Language: Primarily trained on English data
  • Domain: Best performance on factual/encyclopedic content
  • Memory: Full 32K context requires ~8GB GPU memory
  • Calibration: Early exit thresholds may need task-specific tuning

Citation

@article{modernbert-32k-hallucination,
  title={Fast and Faithful: Long-Context Hallucination Detection with Early Exit Adapters},
  author={Anonymous},
  year={2026},
  note={Under review}
}

License

MIT License

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for HuaminChen/modernbert-32k-hallucination-early-exit

Evaluation results