Bangla Cyberbullying Detection Model

This model is fine-tuned for multi-label classification to detect cyberbullying in Bangla text.

Model Details

Base Model: FacebookAI/xlm-roberta-base
Task: Multi-label text classification
Labels: bully, sexual, religious, threat, spam
Number of Labels: 5
Classifier Hidden Size: 256
Dropout: 0.1

Usage

Installation

pip install torch transformers

Loading and Inference

from model import TransformerMultiLabelClassifier
from transformers import AutoTokenizer
import torch

# Load the model
model = TransformerMultiLabelClassifier.from_pretrained("path/to/saved/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/model")

# Prepare input
text = "আপনার বাংলা টেক্সট এখানে"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)

# Get predictions
outputs = model.predict(inputs['input_ids'], inputs['attention_mask'])
    
probabilities = outputs['probabilities'][0]
predictions = outputs['predictions'][0]

labels = ['bully', 'sexual', 'religious', 'threat', 'spam']
for label, prob, pred in zip(labels, probabilities, predictions):
    status = "✓ Detected" if pred else "✗ Not detected"
    print(f"{label}: {prob:.4f} ({status})")

Using with Pipeline (Alternative)

# For batch inference
texts = ["টেক্সট ১", "টেক্সট ২", "টেক্সট ৩"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=128)
outputs = model.predict(inputs['input_ids'], inputs['attention_mask'])

Labels

Label	Description
bully	General bullying content
sexual	Sexual harassment or inappropriate content
religious	Religious hate or discrimination
threat	Threatening content
spam	Spam or irrelevant content

Training

This model was trained using:

K-fold cross-validation with multi-label stratification
AdamW optimizer with linear warmup
Mixed precision training (AMP)
Early stopping based on weighted F1 score

Citation

If you use this model, please cite:

@misc{bangla-cyberbullying-detection,
  author = {Your Name},
  title = {Bangla Cyberbullying Detection Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/your-model}
}

Limitations

Trained specifically on Bangla text
Performance may vary on out-of-domain text
Multi-label threshold of 0.5 used by default (can be adjusted)
May not generalize well to code-mixed text (Bangla + English)

Downloads last month: -; Downloads are not tracked for this model. How to track