Bangla Cyberbullying Detection Model
This model is fine-tuned for multi-label classification to detect cyberbullying in Bangla text.
Model Details
- Base Model: FacebookAI/xlm-roberta-base
- Task: Multi-label text classification
- Labels: bully, sexual, religious, threat, spam
- Number of Labels: 5
- Classifier Hidden Size: 256
- Dropout: 0.1
Usage
Installation
pip install torch transformers
Loading and Inference
from model import TransformerMultiLabelClassifier
from transformers import AutoTokenizer
import torch
# Load the model
model = TransformerMultiLabelClassifier.from_pretrained("path/to/saved/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/model")
# Prepare input
text = "আপনার বাংলা টেক্সট এখানে"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
# Get predictions
outputs = model.predict(inputs['input_ids'], inputs['attention_mask'])
probabilities = outputs['probabilities'][0]
predictions = outputs['predictions'][0]
labels = ['bully', 'sexual', 'religious', 'threat', 'spam']
for label, prob, pred in zip(labels, probabilities, predictions):
status = "✓ Detected" if pred else "✗ Not detected"
print(f"{label}: {prob:.4f} ({status})")
Using with Pipeline (Alternative)
# For batch inference
texts = ["টেক্সট ১", "টেক্সট ২", "টেক্সট ৩"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=128)
outputs = model.predict(inputs['input_ids'], inputs['attention_mask'])
Labels
| Label | Description |
|---|---|
| bully | General bullying content |
| sexual | Sexual harassment or inappropriate content |
| religious | Religious hate or discrimination |
| threat | Threatening content |
| spam | Spam or irrelevant content |
Training
This model was trained using:
- K-fold cross-validation with multi-label stratification
- AdamW optimizer with linear warmup
- Mixed precision training (AMP)
- Early stopping based on weighted F1 score
Citation
If you use this model, please cite:
@misc{bangla-cyberbullying-detection,
author = {Your Name},
title = {Bangla Cyberbullying Detection Model},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/your-username/your-model}
}
Limitations
- Trained specifically on Bangla text
- Performance may vary on out-of-domain text
- Multi-label threshold of 0.5 used by default (can be adjusted)
- May not generalize well to code-mixed text (Bangla + English)