Qwen3-TTS
Collection
7 items
โข
Updated
โข
294
Qwen3-TTS is a series of advanced multilingual, controllable, robust, and streaming text-to-speech models developed by the Qwen team.
This specific checkpoint is the 0.6B CustomVoice variant, based on the 12Hz tokenizer. It supports 9 premium timbres and allows for fine-grained style control over target voices via natural language instructions across 10 major languages.
To use Qwen3-TTS, you can install the qwen-tts package:
pip install -U qwen-tts
import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel
# Load the model
model = Qwen3TTSModel.from_pretrained(
"Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice",
device_map="cuda:0",
dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
)
# Generate speech with specific instructions
wavs, sr = model.generate_custom_voice(
text="ๅ
ถๅฎๆ็็ๆๅ็ฐ๏ผๆๆฏไธไธช็นๅซๅไบ่งๅฏๅซไบบๆ
็ปช็ไบบใ",
language="Chinese",
speaker="Vivian",
instruct="็จ็นๅซๆคๆ็่ฏญๆฐ่ฏด",
)
# Save the generated audio
sf.write("output_custom_voice.wav", wavs[0], sr)
For Qwen3-TTS-12Hz-0.6B-CustomVoice, the following speakers are supported. We recommend using each speakerโs native language for the best results:
| Speaker | Voice Description | Native Language |
|---|---|---|
| Vivian | Bright young female voice. | Chinese |
| Serena | Warm, gentle young female voice. | Chinese |
| Uncle_Fu | Seasoned male voice, mellow timbre. | Chinese |
| Dylan | Youthful Beijing male voice. | Chinese (Beijing) |
| Eric | Lively Chengdu male voice. | Chinese (Sichuan) |
| Ryan | Dynamic male voice with rhythm. | English |
| Aiden | Sunny American male voice. | English |
| Ono_Anna | Playful Japanese female voice. | Japanese |
| Sohee | Warm Korean female voice. | Korean |
If you find Qwen3-TTS useful for your research, please consider citing:
@article{Qwen3-TTS,
title={Qwen3-TTS Technical Report},
author={Hangrui Hu and Xinfa Zhu and Ting He and Dake Guo and Bin Zhang and Xiong Wang and Zhifang Guo and Ziyue Jiang and Hongkun Hao and Zishan Guo and Xinyu Zhang and Pei Zhang and Baosong Yang and Jin Xu and Jingren Zhou and Junyang Lin},
journal={arXiv preprint arXiv:2601.15621},
year={2026}
}