YoruTTS-1.5 Model

The use of TTS-based augmentation to generate large-scale synthetic English audio is well reported in the literature (Li et al., 2025; Moslem 2024; Robinson et al., 2022)  due to high resourcefulness of English. Conversely, this is not the case with the Yorùbá Language, which is extremely low resourced in terms of audio datasets and speech based models. The critical place of robust TTS models for synthesizing target audio files for direct S2ST models from a high resource source language to a low resource target language is also well reported (Jia et al. 2022; Conneau et al. 2023; IWSLT, 2023). We reviewed Eighteen (18) state-of-the-art TTS models across five (5) architectural categories (i.e. autoregressive, flow-based, diffusion-based, parallel feedforward, and prompt-based), which revealed the predominance of the English language. Among these, only Facebook MMS supports Yorùbá TTS  in its pre-trained version but with no Yorùbá specific Grapheme2Phoneme(G2P) tool. Furthermore, Variational Inference Text-to-Speech(VITS) was not pretrained with  Yorùbá but can only be finetuned for it, but it also lacks G2P tool for Yorùbá and other low resourced African languages.

Given the foregoing, a Yorùbá TTS model named YoruTTS-1.5, based on our newly released BENYO-S2ST-Corpus-1(https://huggingface.co/datasets/aspmirlab/BENYO-S2ST-Corpus-1 was developed. Developing a Yorùbá TTS model with the augmented Yorùbá audio and transcript pairs, which is a subset of the BENYO-S2ST-Corpus-1 presents several potential benefits. The major one is that the model can be utilised to carry out TTS-based augmentation, which would boost the size of the Yorùbá audio samples for upgrading the BENYO-S2ST-Corpus-1 towards building more robust direct S2ST model for English and Yorùbá language pair.

This work is funded through the 2024 Google Academic Research Award (GARA) for Society Centered Artificial Intelligence (SCAI) to Emmanuel Adetiba on the research project titled - A Direct Speech-to-Speech Model for English-to-Yoruba Translation Towards Bridging Language Barriers in Public Health Education Outreaches(https://bit.ly/3PQj7fq).

CONTACT: emmanueladetiba@gmail.com, emmanuel.adetiba@covenantuniversity.edu.ng

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support