Extended Turkish NER (Hybrid CRF)

This is a high-performance Named Entity Recognition (NER) model for Turkish, using a hybrid approach of Conditional Random Fields (CRF), deep morphological analysis (Nuve/Zemberek), and contextual embeddings (BERTurk).

Features

6 Extended Categories: PER, LOC, ORG, COMPANY, GROUP, MOVIE.
Hybrid Features: Combines linguistic morphology with semantic BERT vectors.
Gazetteer Support: Uses 160K+ entity entries for high precision.

Performance

Metric	Value
Best F1-Score	%86.66
Precision	%87.42
Recall	%85.91

Available Models (6 Variants)

Model File	Description	F1 Score
`ner_crf_model.pkl`	Best Hybrid (Nuve + BERT) - Main SOTA model	0.8666
`final_proper_model.pkl`	Full features without embeddings	0.8557
`crf_gold_best.pkl`	Best Gold-only trained model	0.8514
`crf_gold_no_emb.pkl`	Gold without BERT embeddings	0.8496
`crf_gold_gaz_only.pkl`	Gazetteer-only features (baseline)	0.8463
`final_model.pkl`	Alternative final configuration	0.8487

Usage

The models are saved as .pkl files (sklearn-crfsuite). Refer to the source code for feature extraction logic using Nuve and BERTurk.

Example Inference

import joblib
model = joblib.load("models/ner_crf_model.pkl")
# Use FeatureExtractor from src/features.py to prepare input

Citation

Please cite this work if you use it in your research. Akademik Makale

Downloads last month: -; Downloads are not tracked for this model. How to track

WildGenie
/

nerextended-turkish-crf