CMMC Expert 32B v2.0

Notice: These models are provided for proof-of-concept and testing purposes only. Production-grade models are not publicly shared. For inquiries regarding production models or commercial licensing, please contact the maintainer: Nathan Maine.

A locally-hosted, fine-tuned language model specialized in CMMC 2.0, NIST 800-171, NIST 800-53, NIST CSF, HIPAA, DFARS, and cybersecurity compliance frameworks.

This is the 32B variant — the deep analysis option for gap assessments, SSP drafting, and detailed implementation guidance. It achieves the best eval loss (1.073) of the entire model suite. Part of a four-model suite (7B, 14B, 32B, 72B) sharing the same compliance knowledge base.

What's New in v2.0

40% more training data — 18,747 total examples (up from 16,906 in v1.0)
6 new authoritative sources — NIST SP 800-53 Rev. 5 full catalog, NIST CSF 2.0, eCFR regulations (CMMC/DFARS/HIPAA), Federal Register documents, DoD PDFs
Expanded LoRA coverage — All 7 transformer modules targeted (v1.0 used only 4)
Best eval loss in suite — 1.073 (6% better than 7B and 14B variants)
Automated data pipeline — Reproducible scraping, filtering, and deduplication via cmmc-data-pipeline

Quick Start (Ollama)

# Download and run
ollama pull Nathan-Maine/cmmc-expert-32b-v2.0

# Ask a compliance question
ollama run cmmc-expert-32b-v2.0 "What access controls are required for CMMC Level 2?"

# Or use the OpenAI-compatible API
curl http://localhost:11434/api/generate -d '{
  "model": "cmmc-expert-32b-v2.0",
  "prompt": "What are the key differences between CMMC Level 1 and Level 2?",
  "stream": false
}'

Model Details

Property	Value
Base Model	Qwen2.5-32B-Instruct
Parameters	32.5 billion
Fine-Tuning Method	QLoRA (4-bit NF4 base, LoRA rank 32, alpha 64)
Quantization	q4_k_m (GGUF)
File Size	18.9 GB
Context Length	32,768 tokens
Training Hardware	NVIDIA A100-SXM4-80GB
Training Time	~9.6 hours
Training Framework	HuggingFace TRL + PEFT + bitsandbytes

Security Domain Coverage

Models are fine-tuned for complete security domain coverage, including vulnerability analysis, incident response scenarios, and access control failure modes required for professional SSP and POA&M generation. Behavioral guardrails and policy enforcement are handled at the governed-llm-gateway layer.

Base model migration to Meta Llama 3.1/3.3 (US-origin, open weights) is in progress.

Compliance Framework Coverage

Trained across eight overlapping frameworks to support cross-framework mapping:

Framework	Coverage
CMMC 2.0 (32 CFR Part 170)	All three levels — 17 L1 practices, 110 L2, 134 L3, assessment methodology
NIST SP 800-171 Rev. 2 & 3	110 security requirements across 14 families
NIST SP 800-172	Enhanced security requirements for critical CUI programs
NIST SP 800-53 Rev. 5	Full catalog of 1,189 controls across 20 families
NIST SP 800-37	Risk Management Framework (RMF) steps and authorization
NIST CSF 2.0	Govern, Identify, Protect, Detect, Respond, Recover functions
HIPAA Security Rule	Administrative, physical, and technical safeguards
DFARS Clauses	252.204-7008/7009/7012/7019/7020/7021/7024/7025, 252.239-7009/7010

Training Data

14,906 training + 3,841 validation examples (~4.5M tokens) assembled from 11 curated sources:

v1.0 Legacy Sources (13,434 examples)

Source	Examples	Share
NIST Cybersecurity (filtered from 424K)	6,372	33.9%
CMMC Full	4,787	25.5%
CMMC Balanced	994	5.3%
HIPAA Compliance	961	5.1%
CMMC Core	320	1.7%

v2.0 New Sources (1,841 examples via automated pipeline)

Source	Examples	Share
NIST CSRC (SP 800-53 Rev. 5 controls)	773	4.1%
DoD Documents (PDFs)	519	2.8%
eCFR Regulations (CMMC/DFARS/HIPAA)	75	0.4%
NIST SP 800-171 Rev. 3	63	0.3%
NIST CSF 2.0	61	0.3%
Federal Register	350	1.9%

v2.0 Data Processing Pipeline:

Automated scraping — 6 authoritative sources scraped via dedicated modules
Relevance filtering — eCFR filtered to only CMMC-relevant DFARS clauses (252.204-70xx, 252.239-70xx), CMMC (32 CFR 170), and HIPAA (45 CFR 164)
Format conversion — Raw records converted to chat-style instruction/response pairs
Quality filtering — Removed entries <100 chars, entries >8,000 chars, OCR artifacts
Deduplication — Exact dedup (xxhash) + near-dedup (MinHash LSH, 128 permutations, Jaccard 0.8 threshold, 5-gram shingles)
Cross-version dedup — v2.0 records deduplicated against v1.0 corpus to prevent overlap
Validation split — 80/20 stratified split maintaining source distribution

Pipeline source code: github.com/NathanMaine/cmmc-data-pipeline

Training Configuration

Parameter	Value
Epochs	3
Learning Rate	1e-4 (cosine decay)
Warmup	5% of steps
Optimizer	8-bit AdamW
Batch Size	1 (effective 16 with gradient accumulation x16)
LoRA Rank	32
LoRA Alpha	64
LoRA Dropout	0.05
LoRA Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Sequence Length	2048
Packing	Enabled
Base Quantization	4-bit NF4 with double quantization

Evaluation Results

Training Metrics

Metric	Value
Final Train Loss	1.005
Average Train Loss	1.139
Eval Loss at Epoch 1	1.128
Final Eval Loss (Epoch 2)	1.073
Best Mean Token Accuracy	77.9%
Final Mean Token Accuracy	76.8%
Total Training Steps	561
Tokens Processed	~18M

Cross-Model Comparison (v2.0 Suite)

Model	Eval Loss	Token Accuracy	GGUF Size	Training Time
7B	1.142	76.5%	5.1 GB	3.1 hrs
14B	1.144	75.9%	~10 GB	5.7 hrs
32B	1.073	77.9%	18.9 GB	9.6 hrs
72B	1.048	45 GB	13.0 hrs

The 32B model achieves the best eval loss in the suite — 6% better than both the 7B and 14B variants — making it the top choice for tasks requiring deep reasoning such as gap assessments, SSP control narratives, and cross-framework mapping.

Intended Uses

SSP Drafting — Draft detailed System Security Plan control descriptions with NIST/CMMC citations. The 32B model excels at producing thorough, multi-paragraph narratives for each control family.
Gap Analysis — Identify controls required for specific CMMC levels and contract requirements with deep contextual reasoning across related controls.
Assessment Prep — Generate evidence checklists and assessment objective narratives with nuanced implementation guidance.
Cross-Framework Mapping — Map controls between CMMC, NIST 800-53, HIPAA, and DFARS with detailed justifications.
Detailed Implementation Guidance — Provide step-by-step implementation plans with consideration of dependencies, resource requirements, and common pitfalls.
Policy Drafting — Create policies aligned to specific CMMC practices with appropriate depth and specificity.
DFARS Clause Analysis — Identify requirements from contract language.
Regulatory Research — Understand eCFR regulations and Federal Register guidance.
Training & Education — Always-available compliance reference for teams.

Limitations

Not a substitute for qualified compliance professionals. This model is a tool to accelerate compliance work, not replace human judgment.
Knowledge cutoff. The model's knowledge is based on training data available at the time of fine-tuning (February 2026). Always verify against current published frameworks.
No retrieval augmentation. The model generates responses from trained knowledge only — it does not search or retrieve external documents at inference time.
Citation accuracy. While the model generally cites correct control numbers and framework sections, always verify specific citations against authoritative sources.

Out-of-Scope Uses

Legal advice. This model does not provide legal opinions on compliance status.
Automated compliance certification. CMMC certification requires human assessors (C3PAOs).
Processing actual CUI/ITAR data. The model itself does not process or store sensitive data, but users should follow their organization's data handling policies.

Hardware Requirements

Mode	GPU (VRAM)	CPU-Only (RAM)	Storage
Inference	24 GB	32 GB	20 GB
Training	80 GB+	N/A	80 GB

Supported OS: Linux, macOS, Windows (WSL2)

The Model Suite

This is the 32B model — the deep analysis option for gap assessments, SSP drafting, and detailed implementation guidance. The full suite includes:

Model	Parameters	GGUF Size	Eval Loss	Best For
cmmc-expert-7b-v2.0	7.6B	5.1 GB	1.142	Quick lookups, day-to-day queries
cmmc-expert-14b-v2.0	14.7B	~10 GB	1.144	Detailed analysis, multi-control reasoning
cmmc-expert-32b-v2.0	32.5B	18.9 GB	1.073	Deep gap assessments, SSP drafting
cmmc-expert-72b-v2.0	72.7B	45 GB	1.048	Complex multi-framework analysis

Source Code

Model training & evaluation: github.com/NathanMaine/cmmc-compliance-ai-model
Data pipeline: github.com/NathanMaine/cmmc-data-pipeline

Known Issues

Repetition bug — The model may repeat content, lists, or entire sections multiple times within a single response. This is a known training artifact being addressed in future versions.
Verbose responses — Tends to over-explain in some contexts where a concise answer would be more appropriate.

Citation

@misc{maine2026cmmcexpert,
  title={CMMC Expert v2.0: Fine-Tuned Language Models for Cybersecurity Compliance},
  author={Nathan Maine},
  year={2026},
  url={https://github.com/NathanMaine/cmmc-compliance-ai-model}
}

Contact

Author: Nathan Maine
Website: nathanmaine.com
LinkedIn: linkedin.com/in/nathanmaine
Email: nmaine@gmail.com

Downloads last month: 36

GGUF

Model size

33B params

Architecture

qwen2

Hardware compatibility

4-bit

Model tree for Nathan-Maine/cmmc-expert-32b-v2.0

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-32B-Instruct

Quantized

(146)

this model

Collection including Nathan-Maine/cmmc-expert-32b-v2.0

CMMC Expert — Cybersecurity Compliance AI Models

Collection

Fine-tuned models for CMMC 2.0, NIST 800-171/53, HIPAA, and DFARS compliance. On-premises, air-gapped deployment ready. • 11 items • Updated 13 days ago • 1

Evaluation results

Eval Loss (Final)
self-reported

1.073
Mean Token Accuracy (Best)
self-reported

0.779