ObalaPalava T5-Small
Solving street language palava — from Pidgin to clear English.
Model Description
ObalaPalava T5-Small is a text-to-text Transformer model based on the T5-small architecture, fine-tuned for Pidgin-to-English translation and street-language understanding across Ghanaian and Nigerian contexts.
The model treats translation and normalization as a unified text generation task, making it flexible for informal language, slang, and everyday conversational Pidgin commonly used on the streets, social media, and messaging platforms.
Although the name carries strong cultural branding, the model is architecturally T5-based, not BERT-based.
Intended Use
This model is intended for:
- Pidgin → English translation
- Informal text normalization
- Street-language understanding
- Low-resource African language research
- Educational and prototyping use
Example Use Cases
- Translating WhatsApp or social media Pidgin posts
- Cleaning informal text for downstream NLP tasks
- Supporting local-language AI applications in West Africa
Out-of-Scope Use
The model is not intended for:
- High-stakes decision-making (legal, medical, financial)
- Generating harmful, abusive, or deceptive content
- Fully replacing professional translation services
Model Architecture
- Base model: T5-Small
- Type: Encoder–Decoder Transformer
- Training Objective: Text-to-text generation
- Parameters: ~60M
Training Data
The model was fine-tuned on custom-curated Pidgin–English parallel text, reflecting:
- Ghanaian Pidgin usage
- Nigerian Pidgin usage
- Informal, conversational, and street-level expressions
The dataset includes variations in spelling, slang, and grammar common in real-world Pidgin communication.
Note: Due to the informal nature of Pidgin, translations may vary depending on regional context.
Training Procedure
- Framework: Hugging Face Transformers
- Optimization: Standard T5 fine-tuning setup
- Hardware: CPU-friendly configuration
- Sequence-to-sequence learning
Evaluation
Evaluation was primarily qualitative, focusing on:
- Semantic correctness
- Preservation of meaning
- Handling of slang and informal expressions
Formal benchmark scores are not provided due to the lack of standardized Pidgin evaluation datasets.
Example Inference
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "Willie999/obalapalava-t5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = "the boy no dey fear at all"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 53