IntentGuard β€” Healthcare & Clinical

License Accuracy Size Latency Format

Production-ready vertical intent classifier for LLM chatbot guardrails. Classifies user messages as allow, deny, or abstain to keep healthcare chatbots on-topic and compliant.

Research Article | perfecXion.ai | Finance Model | Healthcare Model | Legal Model


IntentGuard Model Family

Model Vertical Accuracy Off-Topic Pass Rate Link
intentguard-finance Financial Services 99.6% 0.00% perfecXion/intentguard-finance
intentguard-healthcare Healthcare & Clinical 98.9% 0.98% This model
intentguard-legal Legal & Compliance 97.9% 0.50% perfecXion/intentguard-legal

Overview

The Problem

Healthcare chatbots face unique regulatory risks β€” HIPAA compliance, patient safety, and liability exposure demand that AI assistants stay strictly within clinical domains. Users asking about sports scores, celebrity news, or relationship advice should be blocked, not answered.

The Solution

IntentGuard uses a tiny DeBERTa-v3-xsmall model (22M parameters, 2.5MB quantized) to classify user intent in <30ms on CPU with three-way classification:

  • Allow β€” On-topic healthcare query, pass to the LLM
  • Deny β€” Off-topic, block with a polite redirect
  • Abstain β€” Ambiguous, escalate to secondary classifier or human review

Performance

Metric Value
Overall Accuracy 98.9%
Legitimate Block Rate 0.00%
Off-Topic Pass Rate 0.98%
p99 Latency (CPU) <30ms
Model Size (ONNX INT8) 2.5MB
Base Parameters 22M (DeBERTa-v3-xsmall)
Expected Calibration Error <0.03

Model Details

Property Value
Architecture DeBERTa-v3-xsmall (fine-tuned for 3-way classification)
Format ONNX (INT8 quantized)
Version 1.0
Vertical Healthcare (Clinical & Wellness)
GPU Required No β€” runs on CPU
Publisher perfecXion.ai

Core Topics (Allow)

Symptoms, diagnosis, treatment, medications, preventive care, nutrition, mental health, fitness, chronic conditions, surgery, emergency care, health insurance, patient rights, telemedicine

Hard Exclusions (Deny)

Sports, entertainment, cooking, gaming, celebrity gossip, fashion, travel/leisure, fiction writing, relationship advice


Usage

Python (ONNX Runtime)

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("perfecXion/intentguard-healthcare")
session = ort.InferenceSession("model.onnx")

text = "What are the symptoms of Type 2 diabetes?"
inputs = tokenizer(text, return_tensors="np", max_length=128, truncation=True, padding="max_length")

logits = session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})[0]

labels = ["allow", "deny", "abstain"]
prediction = labels[np.argmax(logits)]
confidence = float(np.max(np.exp(logits) / np.sum(np.exp(logits))))

print(f"Intent: {prediction} (confidence: {confidence:.3f})")
# Output: Intent: allow (confidence: 0.997)

Docker

docker pull ghcr.io/perfecxion/intentguard:healthcare-1.0
docker run -p 8080:8080 ghcr.io/perfecxion/intentguard:healthcare-1.0

curl -X POST http://localhost:8080/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What medications interact with metformin?"}]}'

pip

pip install intentguard

from intentguard import IntentGuard
guard = IntentGuard.load("healthcare")
result = guard.classify("What are the symptoms of Type 2 diabetes?")
print(result)  # Intent(label='allow', confidence=0.997)

Example Classifications

User Message Predicted Confidence Correct?
"What are the symptoms of Type 2 diabetes?" allow 0.997 βœ…
"What medications interact with metformin?" allow 0.996 βœ…
"Who won the Super Bowl?" deny 0.999 βœ…
"Tell me a joke" deny 0.996 βœ…
"Is telemedicine covered by Medicare?" allow 0.981 βœ…
"What's the best recipe for pasta?" deny 0.998 βœ…

Citation

@misc{thornton2025intentguard,
  title={IntentGuard: A Production-Grade Vertical Intent Classifier for LLM Guardrails},
  author={Thornton, Scott},
  year={2025},
  publisher={perfecXion.ai},
  url={https://perfecxion.ai/articles/intentguard-vertical-intent-classifier-llm-guardrails.html},
  note={Model: https://huggingface.co/perfecXion/intentguard-healthcare}
}

Quality Metrics

Metric Result
Accuracy (Healthcare vertical) 98.9%
Legitimate Block Rate 0.00%
Off-Topic Pass Rate 0.98%
Expected Calibration Error <0.03
ONNX INT8 Quantization Validated
CPU Inference (p99) <30ms

License

Apache 2.0


Links

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results