natotan logo

A fine-tuned vision-language embedding model for MultiModal military & defense document retrieval by Racine AI.


What is natotan?

natotan is a domain-adapted vision-language embedding model built for multimodal military and defense document retrieval in English and French. It was created by applying LoRA (Low-Rank Adaptation) fine-tuning to Qwen/Qwen3-VL-Embedding-2B and merging the adapter weights into the base model for seamless deployment. On a custom retrieval benchmark of 5,428 query-document pairs spanning NATO and French defense publications, natotan achieves a 9.0% improvement in NDCG@1 and a 6.8% improvement in MRR over the unmodified base model, while outperforming Google's Gemini multimodalembedding@001 by over 230% in NDCG@10.


Key Findings

natotan demonstrates consistent retrieval improvements across both languages and nearly all document categories evaluated. The largest gains occur at the top of the ranking — NDCG@1 improves by 9.0% — which directly impacts the user experience in search applications where the first result matters most. Recall@5 improves from 0.843 to 0.893, meaning the correct document appears in the top 5 results for 89.3% of queries compared to 84.3% with the base model.

French-language retrieval benefits even more than English, with NDCG@1 improving by 12.3% (from 0.344 to 0.387) versus 5.8% for English (from 0.361 to 0.382). This is notable because many multimodal embedding models underperform on non-English content.

In comparison with Google's proprietary Gemini multimodalembedding@001 (1408-dimensional embeddings), natotan achieves an NDCG@10 of 0.699 versus 0.212, a difference of over 3.3x. Gemini's French performance is particularly weak at 0.132 NDCG@10, compared to 0.697 for natotan — a 5.3x gap. These results suggest that domain-adapted open-source models can substantially outperform general-purpose proprietary embeddings on specialized retrieval tasks.

Among the 16 document categories tested, natotan improves NDCG@1 in 11 categories, with the largest gains in medot (+200%), ajmedp (+46.4%), ft (+43.4%), and un_manuals (+39.7%). Five categories show minor regressions, primarily those with very small sample sizes (modern: n=14, medot: n=6) or specialized academic content (cahiers_pensee). Overall ranking quality (NDCG@10) improves in 13 out of 16 categories.


Model Overview

Architecture Qwen3-VL (Vision-Language Transformer)
Fine-tuning LoRA (Low-Rank Adaptation), merged into base weights
Task Multimodal embedding / document retrieval
Languages English (2,714 samples), French (2,714 samples)
Domain NATO & French defense publications
Format safetensors — ready for direct inference, no adapter loading needed

Evaluation Dataset

The benchmark uses 5,428 query-document pairs from held-out documents not seen during training, split evenly across English and French. The corpus covers NATO and French military sources across 16 document categories, ranging from tactical field manuals (1,016 samples) and allied medical publications (1,138 samples) to strategic doctrine (48 samples) and UN training manuals (200 samples).

Category Samples Category Samples
amedp 1,138 tta 1,100
tactical 1,016 ajp 916
ajmedp 224 un_manuals 200
ft 154 pia 136
irsem 132 cahiers_pensee 124
dia 92 lexicons 82
strategic 48 other 46
modern 14 medot 6

Source themes: French military (3,104 samples), NATO (2,324 samples).


Models Compared

Three models were evaluated on the same benchmark to provide context on both open-source and proprietary baselines.

Model Type Embedding Dim
Gemini multimodalembedding@001 Google proprietary, multimodal 1408
Base Qwen/Qwen3-VL-Embedding-2B Open-source, vision-language 2048
natotan (this model) Base + LoRA merge, domain-adapted 2048

Overall Results — 3-Way Comparison

NDCG (Normalized Discounted Cumulative Gain)

Cutoff Gemini Base natotan
@1 0.0925 0.3524 0.3841
@3 0.1662 0.6020 0.6456
@5 0.1880 0.6362 0.6802
@10 0.2118 0.6575 0.6990
@20 0.2328 0.6677 0.7064
@50 0.2549 0.6734 0.7097
@5428 0.3108 0.6769 0.7104

Recall

Cutoff Gemini Base natotan
@1 0.0925 0.3524 0.3841
@3 0.2159 0.7612 0.8106
@5 0.2690 0.8430 0.8930
@10 0.3427 0.9079 0.9501
@20 0.4259 0.9479 0.9790
@50 0.5368 0.9764 0.9954
@5428 1.0000 1.0000 1.0000

MRR & MAP

Metric Gemini Base natotan
MRR 0.1823 0.5785 0.6179
MAP 0.1823 0.5785 0.6179

Results by Language

NDCG@10

Language Gemini Base natotan
English 0.2917 0.6623 0.7013
French 0.1318 0.6527 0.6966

Recall@10

Language Gemini Base natotan
English 0.4591 0.9094 0.9562
French 0.2262 0.9064 0.9440

Gemini's French performance is notably poor (NDCG@10 of 0.13 vs 0.29 in English), while both Base and natotan maintain near-parity across languages.

Full Language Breakdown — natotan vs Base

French (2,714 samples) — all metrics
Metric Cutoff Base natotan Delta
NDCG @1 0.3441 0.3865 +0.0424
NDCG @3 0.5948 0.6442 +0.0494
NDCG @5 0.6319 0.6779 +0.0460
NDCG @10 0.6527 0.6966 +0.0439
NDCG @20 0.6630 0.7050 +0.0420
NDCG @50 0.6690 0.7085 +0.0395
Recall @1 0.3441 0.3865 +0.0424
Recall @3 0.7542 0.8069 +0.0527
Recall @5 0.8427 0.8869 +0.0442
Recall @10 0.9064 0.9440 +0.0376
Recall @20 0.9469 0.9768 +0.0298
Recall @50 0.9768 0.9941 +0.0173
MRR — 0.5727 0.6171 +0.0445
MAP — 0.5727 0.6171 +0.0445
English (2,714 samples) — all metrics
Metric Cutoff Base natotan Delta
NDCG @1 0.3607 0.3817 +0.0210
NDCG @3 0.6092 0.6470 +0.0377
NDCG @5 0.6406 0.6826 +0.0420
NDCG @10 0.6623 0.7013 +0.0390
NDCG @20 0.6724 0.7077 +0.0354
NDCG @50 0.6778 0.7109 +0.0331
Recall @1 0.3607 0.3817 +0.0210
Recall @3 0.7682 0.8143 +0.0461
Recall @5 0.8434 0.8990 +0.0556
Recall @10 0.9094 0.9562 +0.0468
Recall @20 0.9488 0.9812 +0.0324
Recall @50 0.9761 0.9967 +0.0206
MRR — 0.5843 0.6187 +0.0344
MAP — 0.5843 0.6187 +0.0344

Results by Document Category

Performance varies by document type. natotan achieves the strongest gains on categories where the base model was weakest, such as tactical documents and UN manuals, while a small number of low-sample categories show minor regressions.

NDCG@10 — 3-Way Comparison (sorted by natotan score)

Category n Gemini Base natotan
medot 6 0.2103 0.427 0.815
un_manuals 200 0.1356 0.667 0.764
ajmedp 224 0.3231 0.653 0.750
other 46 0.5336 0.723 0.737
modern 14 0.5694 0.791 0.757
lexicons 82 0.1972 0.712 0.727
strategic 48 0.2222 0.633 0.726
ft 154 0.0360 0.655 0.720
ajp 916 0.2817 0.698 0.714
tta 1,100 0.1875 0.647 0.706
amedp 1,138 0.2589 0.685 0.694
cahiers_pensee 124 0.2505 0.682 0.678
pia 136 0.1965 0.656 0.674
tactical 1,016 0.1274 0.597 0.669
irsem 132 0.2426 0.654 0.644
dia 92 0.0610 0.612 0.627

Recall@10 — 3-Way Comparison (sorted by natotan score)

Category n Gemini Base natotan
medot 6 0.3333 0.667 1.000
strategic 48 0.4167 0.896 1.000
cahiers_pensee 124 0.4194 0.960 1.000
lexicons 82 0.3659 0.963 1.000
modern 14 0.8571 1.000 1.000
un_manuals 200 0.2500 0.920 0.975
ajmedp 224 0.4866 0.929 0.978
pia 136 0.3088 0.956 0.963
other 46 0.7826 0.957 0.957
tta 1,100 0.3064 0.884 0.956
ft 154 0.0844 0.948 0.955
ajp 916 0.4421 0.931 0.952
amedp 1,138 0.4156 0.944 0.936
tactical 1,016 0.2156 0.842 0.935
irsem 132 0.3939 0.924 0.932
dia 92 0.0870 0.880 0.924

MRR by Category (sorted by natotan score)

Category n Base natotan Delta
medot 6 0.354 0.750 +0.396
un_manuals 200 0.587 0.694 +0.107
ajmedp 224 0.565 0.675 +0.110
other 46 0.647 0.661 +0.015
modern 14 0.721 0.676 -0.046
ft 154 0.561 0.644 +0.083
ajp 916 0.624 0.637 +0.013
strategic 48 0.551 0.637 +0.086
lexicons 82 0.631 0.636 +0.005
tta 1,100 0.572 0.625 +0.053
amedp 1,138 0.601 0.616 +0.014
tactical 1,016 0.523 0.585 +0.062
pia 136 0.560 0.581 +0.020
cahiers_pensee 124 0.595 0.572 -0.023
irsem 132 0.570 0.555 -0.014
dia 92 0.528 0.533 +0.005
NDCG@1 by Category — natotan vs Base (sorted by improvement)
Category n Base natotan Delta Relative
medot 6 0.167 0.500 +0.333 +200.0%
un_manuals 200 0.365 0.510 +0.145 +39.7%
ajmedp 224 0.308 0.451 +0.143 +46.4%
ft 154 0.299 0.429 +0.130 +43.4%
strategic 48 0.313 0.417 +0.104 +33.3%
tta 1,100 0.350 0.388 +0.038 +10.9%
tactical 1,016 0.324 0.356 +0.032 +10.0%
other 46 0.391 0.413 +0.022 +5.6%
amedp 1,138 0.358 0.373 +0.016 +4.4%
pia 136 0.331 0.338 +0.007 +2.2%
ajp 916 0.394 0.401 +0.007 +1.7%
dia 92 0.315 0.304 -0.011 -3.4%
irsem 132 0.341 0.326 -0.015 -4.4%
lexicons 82 0.427 0.390 -0.037 -8.6%
modern 14 0.500 0.429 -0.071 -14.3%
cahiers_pensee 124 0.387 0.306 -0.081 -20.9%

Qualitative Examples

Example 1

Retrieval comparison for a French tactical query about section leader responsibilities during reconnaissance and braking missions. The base model fails to rank the ground truth in the top 5. natotan retrieves it at rank 2.

Example 2

Retrieval comparison for a French administrative query about career orientation for volunteer soldiers. The base model ranks the ground truth at position 3. natotan promotes it to position 1.


Quick Start

natotan is a fully merged model that requires no adapter loading. It can be used as a drop-in replacement for the base Qwen3-VL-Embedding-2B model with the same API.

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "YOUR_USERNAME/natotan",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "YOUR_USERNAME/natotan",
    trust_remote_code=True,
)

Reproducing the Merge

The LoRA adapter can be merged into the base model using the provided script. The resulting model is self-contained and does not require the adapter at inference time.

python3 merge_lora.py \
  --base_model Qwen/Qwen3-VL-Embedding-2B \
  --adapter ./lora_adapters \
  --output_dir ./merged \
  --trust_remote_code

Output contents:

  • config.json
  • model.safetensors (or sharded weights + index)
  • Tokenizer files (tokenizer.json, tokenizer_config.json, etc.)
  • MERGED_FROM_LORA.txt (provenance marker)

Frequently Asked Questions

What is natotan? natotan is a LoRA-fine-tuned and merged version of Qwen3-VL-Embedding-2B, optimized for multimodal military and defense document retrieval in English and French. It produces 2048-dimensional embeddings and is evaluated on a benchmark of 5,428 query-document pairs from NATO and French military publications.

How much does natotan improve over the base model? On the custom retrieval benchmark, natotan improves NDCG@1 by 9.0% (from 0.352 to 0.384), Recall@5 by 5.9% (from 0.843 to 0.893), and MRR by 6.8% (from 0.578 to 0.618) compared to the unmodified Qwen3-VL-Embedding-2B.

How does natotan compare to Gemini multimodal embeddings? natotan outperforms Google's Gemini multimodalembedding@001 by over 230% in NDCG@10 (0.699 vs 0.212) on the same benchmark. The gap is especially large for French-language queries, where natotan scores 0.697 NDCG@10 versus 0.132 for Gemini.

Does natotan work for both English and French? Yes. The evaluation dataset is split evenly between 2,714 English and 2,714 French query-document pairs. natotan improves retrieval in both languages, with slightly larger gains in French (NDCG@1 +12.3%) than English (NDCG@1 +5.8%).

Do I need to load a LoRA adapter separately? No. The adapter weights have been merged into the base model. natotan can be loaded directly with AutoModel.from_pretrained() exactly like any standard Hugging Face model, with no additional dependencies.

What types of documents does natotan work best on? The model was evaluated on 16 categories of military documents. The largest improvements appear on tactical field manuals (+12.1% NDCG@10), UN training manuals (+14.6%), and allied joint medical publications (+14.9%). A small number of categories with very few samples (modern: n=14) show minor regressions.

Can natotan be used for non-military retrieval tasks? natotan inherits the general-purpose capabilities of Qwen3-VL-Embedding-2B. While it was fine-tuned specifically on defense documents, the LoRA adaptation is lightweight and the base model's broad capabilities are preserved. Performance on out-of-domain tasks has not been formally evaluated.


Citation

@misc{natotan2025,
  title={natotan: LoRA-tuned Qwen3-VL-Embedding-2B for multimodal defense document retrieval},
  year={2025},
  url={https://huggingface.co/racineai/natotan}
}
Downloads last month
7
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for racineai/NATOTAN

Adapter
(1)
this model