Extended Turkish NER (Hybrid CRF)
This is a high-performance Named Entity Recognition (NER) model for Turkish, using a hybrid approach of Conditional Random Fields (CRF), deep morphological analysis (Nuve/Zemberek), and contextual embeddings (BERTurk).
Features
- 6 Extended Categories:
PER,LOC,ORG,COMPANY,GROUP,MOVIE. - Hybrid Features: Combines linguistic morphology with semantic BERT vectors.
- Gazetteer Support: Uses 160K+ entity entries for high precision.
Performance
| Metric | Value |
|---|---|
| Best F1-Score | %86.66 |
| Precision | %87.42 |
| Recall | %85.91 |
Available Models (6 Variants)
| Model File | Description | F1 Score |
|---|---|---|
ner_crf_model.pkl |
Best Hybrid (Nuve + BERT) - Main SOTA model | 0.8666 |
final_proper_model.pkl |
Full features without embeddings | 0.8557 |
crf_gold_best.pkl |
Best Gold-only trained model | 0.8514 |
crf_gold_no_emb.pkl |
Gold without BERT embeddings | 0.8496 |
crf_gold_gaz_only.pkl |
Gazetteer-only features (baseline) | 0.8463 |
final_model.pkl |
Alternative final configuration | 0.8487 |
Usage
The models are saved as .pkl files (sklearn-crfsuite).
Refer to the source code for feature extraction logic using Nuve and BERTurk.
Example Inference
import joblib
model = joblib.load("models/ner_crf_model.pkl")
# Use FeatureExtractor from src/features.py to prepare input
Citation
Please cite this work if you use it in your research. Akademik Makale