Bangla Cyberbullying Detection Model

This model is fine-tuned for multi-label classification to detect cyberbullying in Bangla text.

Model Details

  • Base Model: FacebookAI/xlm-roberta-base
  • Task: Multi-label text classification
  • Labels: bully, sexual, religious, threat, spam
  • Number of Labels: 5
  • Classifier Hidden Size: 256
  • Dropout: 0.1

Usage

Installation

pip install torch transformers

Loading and Inference

from model import TransformerMultiLabelClassifier
from transformers import AutoTokenizer
import torch

# Load the model
model = TransformerMultiLabelClassifier.from_pretrained("path/to/saved/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/model")

# Prepare input
text = "আপনার বাংলা টেক্সট এখানে"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)

# Get predictions
outputs = model.predict(inputs['input_ids'], inputs['attention_mask'])
    
probabilities = outputs['probabilities'][0]
predictions = outputs['predictions'][0]

labels = ['bully', 'sexual', 'religious', 'threat', 'spam']
for label, prob, pred in zip(labels, probabilities, predictions):
    status = "✓ Detected" if pred else "✗ Not detected"
    print(f"{label}: {prob:.4f} ({status})")

Using with Pipeline (Alternative)

# For batch inference
texts = ["টেক্সট ১", "টেক্সট ২", "টেক্সট ৩"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=128)
outputs = model.predict(inputs['input_ids'], inputs['attention_mask'])

Labels

Label Description
bully General bullying content
sexual Sexual harassment or inappropriate content
religious Religious hate or discrimination
threat Threatening content
spam Spam or irrelevant content

Training

This model was trained using:

  • K-fold cross-validation with multi-label stratification
  • AdamW optimizer with linear warmup
  • Mixed precision training (AMP)
  • Early stopping based on weighted F1 score

Citation

If you use this model, please cite:

@misc{bangla-cyberbullying-detection,
  author = {Your Name},
  title = {Bangla Cyberbullying Detection Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/your-model}
}

Limitations

  • Trained specifically on Bangla text
  • Performance may vary on out-of-domain text
  • Multi-label threshold of 0.5 used by default (can be adjusted)
  • May not generalize well to code-mixed text (Bangla + English)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support