CMMC Expert 32B v2.0

Notice: These models are provided for proof-of-concept and testing purposes only. Production-grade models are not publicly shared. For inquiries regarding production models or commercial licensing, please contact the maintainer: Nathan Maine.

A locally-hosted, fine-tuned language model specialized in CMMC 2.0, NIST 800-171, NIST 800-53, NIST CSF, HIPAA, DFARS, and cybersecurity compliance frameworks.

This is the 32B variant — the deep analysis option for gap assessments, SSP drafting, and detailed implementation guidance. It achieves the best eval loss (1.073) of the entire model suite. Part of a four-model suite (7B, 14B, 32B, 72B) sharing the same compliance knowledge base.

What's New in v2.0

  • 40% more training data — 18,747 total examples (up from 16,906 in v1.0)
  • 6 new authoritative sources — NIST SP 800-53 Rev. 5 full catalog, NIST CSF 2.0, eCFR regulations (CMMC/DFARS/HIPAA), Federal Register documents, DoD PDFs
  • Expanded LoRA coverage — All 7 transformer modules targeted (v1.0 used only 4)
  • Best eval loss in suite — 1.073 (6% better than 7B and 14B variants)
  • Automated data pipeline — Reproducible scraping, filtering, and deduplication via cmmc-data-pipeline

Quick Start (Ollama)

# Download and run
ollama pull Nathan-Maine/cmmc-expert-32b-v2.0

# Ask a compliance question
ollama run cmmc-expert-32b-v2.0 "What access controls are required for CMMC Level 2?"

# Or use the OpenAI-compatible API
curl http://localhost:11434/api/generate -d '{
  "model": "cmmc-expert-32b-v2.0",
  "prompt": "What are the key differences between CMMC Level 1 and Level 2?",
  "stream": false
}'

Model Details

Property Value
Base Model Qwen2.5-32B-Instruct
Parameters 32.5 billion
Fine-Tuning Method QLoRA (4-bit NF4 base, LoRA rank 32, alpha 64)
Quantization q4_k_m (GGUF)
File Size 18.9 GB
Context Length 32,768 tokens
Training Hardware NVIDIA A100-SXM4-80GB
Training Time ~9.6 hours
Training Framework HuggingFace TRL + PEFT + bitsandbytes

Security Domain Coverage

Models are fine-tuned for complete security domain coverage, including vulnerability analysis, incident response scenarios, and access control failure modes required for professional SSP and POA&M generation. Behavioral guardrails and policy enforcement are handled at the governed-llm-gateway layer.

Base model migration to Meta Llama 3.1/3.3 (US-origin, open weights) is in progress.

Compliance Framework Coverage

Trained across eight overlapping frameworks to support cross-framework mapping:

Framework Coverage
CMMC 2.0 (32 CFR Part 170) All three levels — 17 L1 practices, 110 L2, 134 L3, assessment methodology
NIST SP 800-171 Rev. 2 & 3 110 security requirements across 14 families
NIST SP 800-172 Enhanced security requirements for critical CUI programs
NIST SP 800-53 Rev. 5 Full catalog of 1,189 controls across 20 families
NIST SP 800-37 Risk Management Framework (RMF) steps and authorization
NIST CSF 2.0 Govern, Identify, Protect, Detect, Respond, Recover functions
HIPAA Security Rule Administrative, physical, and technical safeguards
DFARS Clauses 252.204-7008/7009/7012/7019/7020/7021/7024/7025, 252.239-7009/7010

Training Data

14,906 training + 3,841 validation examples (~4.5M tokens) assembled from 11 curated sources:

v1.0 Legacy Sources (13,434 examples)

Source Examples Share
NIST Cybersecurity (filtered from 424K) 6,372 33.9%
CMMC Full 4,787 25.5%
CMMC Balanced 994 5.3%
HIPAA Compliance 961 5.1%
CMMC Core 320 1.7%

v2.0 New Sources (1,841 examples via automated pipeline)

Source Examples Share
NIST CSRC (SP 800-53 Rev. 5 controls) 773 4.1%
DoD Documents (PDFs) 519 2.8%
eCFR Regulations (CMMC/DFARS/HIPAA) 75 0.4%
NIST SP 800-171 Rev. 3 63 0.3%
NIST CSF 2.0 61 0.3%
Federal Register 350 1.9%

v2.0 Data Processing Pipeline:

  1. Automated scraping — 6 authoritative sources scraped via dedicated modules
  2. Relevance filtering — eCFR filtered to only CMMC-relevant DFARS clauses (252.204-70xx, 252.239-70xx), CMMC (32 CFR 170), and HIPAA (45 CFR 164)
  3. Format conversion — Raw records converted to chat-style instruction/response pairs
  4. Quality filtering — Removed entries <100 chars, entries >8,000 chars, OCR artifacts
  5. Deduplication — Exact dedup (xxhash) + near-dedup (MinHash LSH, 128 permutations, Jaccard 0.8 threshold, 5-gram shingles)
  6. Cross-version dedup — v2.0 records deduplicated against v1.0 corpus to prevent overlap
  7. Validation split — 80/20 stratified split maintaining source distribution

Pipeline source code: github.com/NathanMaine/cmmc-data-pipeline

Training Configuration

Parameter Value
Epochs 3
Learning Rate 1e-4 (cosine decay)
Warmup 5% of steps
Optimizer 8-bit AdamW
Batch Size 1 (effective 16 with gradient accumulation x16)
LoRA Rank 32
LoRA Alpha 64
LoRA Dropout 0.05
LoRA Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Sequence Length 2048
Packing Enabled
Base Quantization 4-bit NF4 with double quantization

Evaluation Results

Training Metrics

Metric Value
Final Train Loss 1.005
Average Train Loss 1.139
Eval Loss at Epoch 1 1.128
Final Eval Loss (Epoch 2) 1.073
Best Mean Token Accuracy 77.9%
Final Mean Token Accuracy 76.8%
Total Training Steps 561
Tokens Processed ~18M

Cross-Model Comparison (v2.0 Suite)

Model Eval Loss Token Accuracy GGUF Size Training Time
7B 1.142 76.5% 5.1 GB 3.1 hrs
14B 1.144 75.9% ~10 GB 5.7 hrs
32B 1.073 77.9% 18.9 GB 9.6 hrs
72B 1.048 45 GB 13.0 hrs

The 32B model achieves the best eval loss in the suite — 6% better than both the 7B and 14B variants — making it the top choice for tasks requiring deep reasoning such as gap assessments, SSP control narratives, and cross-framework mapping.

Intended Uses

  • SSP Drafting — Draft detailed System Security Plan control descriptions with NIST/CMMC citations. The 32B model excels at producing thorough, multi-paragraph narratives for each control family.
  • Gap Analysis — Identify controls required for specific CMMC levels and contract requirements with deep contextual reasoning across related controls.
  • Assessment Prep — Generate evidence checklists and assessment objective narratives with nuanced implementation guidance.
  • Cross-Framework Mapping — Map controls between CMMC, NIST 800-53, HIPAA, and DFARS with detailed justifications.
  • Detailed Implementation Guidance — Provide step-by-step implementation plans with consideration of dependencies, resource requirements, and common pitfalls.
  • Policy Drafting — Create policies aligned to specific CMMC practices with appropriate depth and specificity.
  • DFARS Clause Analysis — Identify requirements from contract language.
  • Regulatory Research — Understand eCFR regulations and Federal Register guidance.
  • Training & Education — Always-available compliance reference for teams.

Limitations

  • Not a substitute for qualified compliance professionals. This model is a tool to accelerate compliance work, not replace human judgment.
  • Knowledge cutoff. The model's knowledge is based on training data available at the time of fine-tuning (February 2026). Always verify against current published frameworks.
  • No retrieval augmentation. The model generates responses from trained knowledge only — it does not search or retrieve external documents at inference time.
  • Citation accuracy. While the model generally cites correct control numbers and framework sections, always verify specific citations against authoritative sources.

Out-of-Scope Uses

  • Legal advice. This model does not provide legal opinions on compliance status.
  • Automated compliance certification. CMMC certification requires human assessors (C3PAOs).
  • Processing actual CUI/ITAR data. The model itself does not process or store sensitive data, but users should follow their organization's data handling policies.

Hardware Requirements

Mode GPU (VRAM) CPU-Only (RAM) Storage
Inference 24 GB 32 GB 20 GB
Training 80 GB+ N/A 80 GB

Supported OS: Linux, macOS, Windows (WSL2)

The Model Suite

This is the 32B model — the deep analysis option for gap assessments, SSP drafting, and detailed implementation guidance. The full suite includes:

Model Parameters GGUF Size Eval Loss Best For
cmmc-expert-7b-v2.0 7.6B 5.1 GB 1.142 Quick lookups, day-to-day queries
cmmc-expert-14b-v2.0 14.7B ~10 GB 1.144 Detailed analysis, multi-control reasoning
cmmc-expert-32b-v2.0 32.5B 18.9 GB 1.073 Deep gap assessments, SSP drafting
cmmc-expert-72b-v2.0 72.7B 45 GB 1.048 Complex multi-framework analysis

Source Code

Known Issues

  • Repetition bug — The model may repeat content, lists, or entire sections multiple times within a single response. This is a known training artifact being addressed in future versions.
  • Verbose responses — Tends to over-explain in some contexts where a concise answer would be more appropriate.

Citation

@misc{maine2026cmmcexpert,
  title={CMMC Expert v2.0: Fine-Tuned Language Models for Cybersecurity Compliance},
  author={Nathan Maine},
  year={2026},
  url={https://github.com/NathanMaine/cmmc-compliance-ai-model}
}

Contact

Downloads last month
36
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nathan-Maine/cmmc-expert-32b-v2.0

Base model

Qwen/Qwen2.5-32B
Quantized
(146)
this model

Collection including Nathan-Maine/cmmc-expert-32b-v2.0

Evaluation results