Ultra Diar Streaming Sortformer (8-Speaker)
This model extends NVIDIA Streaming Sortformer speaker diarization from 4 speakers to 8 speakers. The original diar_streaming_sortformer_4spk-v2.1 supports up to 4 speakers; this model expands the capability to handle 5โ8 speakers through fine-tuning and architectural modifications.
Model Details
- Base model: nvidia/diar_streaming_sortformer_4spk-v2.1
- Extension: 4spk โ 8spk
- Framework: NeMo (NVIDIA)
- Version: 1.0.0
Code & Training
The experimental pipeline, training scripts, and inference code will be made public on GitHub at a later date. Currently available only on Hugging Face.
Training
This model was trained on 1ร NVIDIA H100 GPU. We use 180 second long training samples of synthetic data with 2โ8 speakers.
Usage
This model requires the NVIDIA NeMo toolkit to train, fine-tune, or perform diarization. Install NeMo after installing Cython and the latest PyTorch.
Install NeMo
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]
Quick Start: Run Diarization
from nemo.collections.asr.models import SortformerEncLabelModel
# Load model from Hugging Face (requires Hugging Face token for gated models)
diar_model = SortformerEncLabelModel.from_pretrained("devsy0117/ultra_diar_streaming_sortformer_8spk_v1.0.0")
diar_model.eval()
# Streaming parameters (recommended for best performance)
diar_model.sortformer_modules.chunk_len = 340
diar_model.sortformer_modules.chunk_right_context = 40
diar_model.sortformer_modules.fifo_len = 40
diar_model.sortformer_modules.spkcache_update_period = 300
# Run diarization
predicted_segments = diar_model.diarize(audio=["/path/to/your/audio.wav"], batch_size=1)
for segment in predicted_segments[0]:
print(segment)
Loading the Model
from nemo.collections.asr.models import SortformerEncLabelModel
# Option 1: Load directly from Hugging Face (requires Hugging Face token)
diar_model = SortformerEncLabelModel.from_pretrained("devsy0117/ultra_diar_streaming_sortformer_8spk_v1.0.0")
# Option 2: Load from a downloaded .nemo file
diar_model = SortformerEncLabelModel.restore_from(
restore_path="/path/to/ultra_diar_streaming_sortformer_8spk_v1.0.0.nemo",
map_location="cuda",
strict=False,
)
diar_model.eval()
Input Format
- Single audio file:
audio_input="/path/to/multispeaker_audio.wav" - Multiple files:
audio_input=["/path/to/audio1.wav", "/path/to/audio2.wav"]
License
This model is a derivative of NVIDIA Sortformer, licensed under the NVIDIA Open Model License.
Attribution: Licensed by NVIDIA Corporation under the NVIDIA Open Model License.
- Downloads last month
- 17
Model tree for devsy0117/ultra_diar_streaming_sortformer_8spk_v1.0.0
Base model
nvidia/diar_streaming_sortformer_4spk-v2.1