Extended Turkish NER (Hybrid CRF)

This is a high-performance Named Entity Recognition (NER) model for Turkish, using a hybrid approach of Conditional Random Fields (CRF), deep morphological analysis (Nuve/Zemberek), and contextual embeddings (BERTurk).

Features

  • 6 Extended Categories: PER, LOC, ORG, COMPANY, GROUP, MOVIE.
  • Hybrid Features: Combines linguistic morphology with semantic BERT vectors.
  • Gazetteer Support: Uses 160K+ entity entries for high precision.

Performance

Metric Value
Best F1-Score %86.66
Precision %87.42
Recall %85.91

Available Models (6 Variants)

Model File Description F1 Score
ner_crf_model.pkl Best Hybrid (Nuve + BERT) - Main SOTA model 0.8666
final_proper_model.pkl Full features without embeddings 0.8557
crf_gold_best.pkl Best Gold-only trained model 0.8514
crf_gold_no_emb.pkl Gold without BERT embeddings 0.8496
crf_gold_gaz_only.pkl Gazetteer-only features (baseline) 0.8463
final_model.pkl Alternative final configuration 0.8487

Usage

The models are saved as .pkl files (sklearn-crfsuite). Refer to the source code for feature extraction logic using Nuve and BERTurk.

Example Inference

import joblib
model = joblib.load("models/ner_crf_model.pkl")
# Use FeatureExtractor from src/features.py to prepare input

Citation

Please cite this work if you use it in your research. Akademik Makale

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train WildGenie/nerextended-turkish-crf