ObalaPalava T5-Small

Solving street language palava — from Pidgin to clear English.

Model Description

ObalaPalava T5-Small is a text-to-text Transformer model based on the T5-small architecture, fine-tuned for Pidgin-to-English translation and street-language understanding across Ghanaian and Nigerian contexts.

The model treats translation and normalization as a unified text generation task, making it flexible for informal language, slang, and everyday conversational Pidgin commonly used on the streets, social media, and messaging platforms.

Although the name carries strong cultural branding, the model is architecturally T5-based, not BERT-based.

Intended Use

This model is intended for:

Pidgin → English translation
Informal text normalization
Street-language understanding
Low-resource African language research
Educational and prototyping use

Example Use Cases

Translating WhatsApp or social media Pidgin posts
Cleaning informal text for downstream NLP tasks
Supporting local-language AI applications in West Africa

Out-of-Scope Use

The model is not intended for:

High-stakes decision-making (legal, medical, financial)
Generating harmful, abusive, or deceptive content
Fully replacing professional translation services

Model Architecture

Base model: T5-Small
Type: Encoder–Decoder Transformer
Training Objective: Text-to-text generation
Parameters: ~60M

Training Data

The model was fine-tuned on custom-curated Pidgin–English parallel text, reflecting:

Ghanaian Pidgin usage
Nigerian Pidgin usage
Informal, conversational, and street-level expressions

The dataset includes variations in spelling, slang, and grammar common in real-world Pidgin communication.

Note: Due to the informal nature of Pidgin, translations may vary depending on regional context.

Training Procedure

Framework: Hugging Face Transformers
Optimization: Standard T5 fine-tuning setup
Hardware: CPU-friendly configuration
Sequence-to-sequence learning

Evaluation

Evaluation was primarily qualitative, focusing on:

Semantic correctness
Preservation of meaning
Handling of slang and informal expressions

Formal benchmark scores are not provided due to the lack of standardized Pidgin evaluation datasets.

Example Inference

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "Willie999/obalapalava-t5"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "the boy no dey fear at all"
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_length=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 53

Safetensors

Model size

60.5M params

Tensor type

F32

Willie999
/

obalapalava-t5