ObalaPalava T5-Small

Solving street language palava — from Pidgin to clear English.

Model Description

ObalaPalava T5-Small is a text-to-text Transformer model based on the T5-small architecture, fine-tuned for Pidgin-to-English translation and street-language understanding across Ghanaian and Nigerian contexts.

The model treats translation and normalization as a unified text generation task, making it flexible for informal language, slang, and everyday conversational Pidgin commonly used on the streets, social media, and messaging platforms.

Although the name carries strong cultural branding, the model is architecturally T5-based, not BERT-based.


Intended Use

This model is intended for:

  • Pidgin → English translation
  • Informal text normalization
  • Street-language understanding
  • Low-resource African language research
  • Educational and prototyping use

Example Use Cases

  • Translating WhatsApp or social media Pidgin posts
  • Cleaning informal text for downstream NLP tasks
  • Supporting local-language AI applications in West Africa

Out-of-Scope Use

The model is not intended for:

  • High-stakes decision-making (legal, medical, financial)
  • Generating harmful, abusive, or deceptive content
  • Fully replacing professional translation services

Model Architecture

  • Base model: T5-Small
  • Type: Encoder–Decoder Transformer
  • Training Objective: Text-to-text generation
  • Parameters: ~60M

Training Data

The model was fine-tuned on custom-curated Pidgin–English parallel text, reflecting:

  • Ghanaian Pidgin usage
  • Nigerian Pidgin usage
  • Informal, conversational, and street-level expressions

The dataset includes variations in spelling, slang, and grammar common in real-world Pidgin communication.

Note: Due to the informal nature of Pidgin, translations may vary depending on regional context.


Training Procedure

  • Framework: Hugging Face Transformers
  • Optimization: Standard T5 fine-tuning setup
  • Hardware: CPU-friendly configuration
  • Sequence-to-sequence learning

Evaluation

Evaluation was primarily qualitative, focusing on:

  • Semantic correctness
  • Preservation of meaning
  • Handling of slang and informal expressions

Formal benchmark scores are not provided due to the lack of standardized Pidgin evaluation datasets.


Example Inference

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "Willie999/obalapalava-t5"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "the boy no dey fear at all"
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_length=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
53
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Willie999/obalapalava-t5 1