Kokoro Vietnamese

Vietnamese Kokoro TTS inference model. Vietnamese G2P is handled by vig2p.

Install

git clone https://github.com/iamdinhthuan/Kokoro-Vietnamese.git
cd Kokoro-Vietnamese
pip install -e .

Install the PyTorch build that matches your machine first if you need CUDA.

Voices

Use these names with voice=... in Python or --voice ... in the CLI.

Voice	Name
`diem_trinh`	Diễm Trinh
`hung_thinh`	Hưng Thịnh
`mai_linh`	Mai Linh
`mai_loan`	Mai Loan
`manh_dung`	Mạnh Dũng
`my_yen`	Mỹ Yến
`ngoc_huyen`	Ngọc Huyền
`phat_tai`	Phát Tài
`thanh_dat`	Thành Đạt
`thuc_trinh`	Thục Trinh
`tuan_ngoc`	Tuấn Ngọc
`storyvert`	storyvert
`duc_an`	Đức An
`duc_duy`	đức duy

Python API

import soundfile as sf
from kokoro_vietnamese import KokoroVietnamese

tts = KokoroVietnamese(device="cuda", voice="diem_trinh")

audio, phonemes = tts.synthesize(
    "Giữa một buổi chiều yên tĩnh, cô ấy kể lại câu chuyện bằng một giọng nói ấm áp và chậm rãi."
)

sf.write("sample.wav", audio, 24000)
print(phonemes)

Use another voice:

tts = KokoroVietnamese(device="cuda", voice="mai_linh")
audio, phonemes = tts.synthesize("Hôm nay trời trong xanh, gió thổi nhẹ qua hiên nhà.")

CLI

kokoro-vietnamese \
  --text "Giữa một buổi chiều yên tĩnh, cô ấy kể lại câu chuyện bằng một giọng nói ấm áp và chậm rãi." \
  --voice diem_trinh \
  --output sample.wav \
  --device cuda

List voices:

kokoro-vietnamese --list-voices

Notes

Text is split by sentence punctuation and merged with a short crossfade.
The model expects Vietnamese text normalized enough for vig2p.
The voicepacks are derived from the LarVoice multi-speaker training set and are distributed as inference artifacts.

Downloads last month: 706