Instructions to use rockerritesh/trOCR_ne with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rockerritesh/trOCR_ne with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="rockerritesh/trOCR_ne")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("rockerritesh/trOCR_ne") model = AutoModelForImageTextToText.from_pretrained("rockerritesh/trOCR_ne") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rockerritesh/trOCR_ne with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rockerritesh/trOCR_ne" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockerritesh/trOCR_ne", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/rockerritesh/trOCR_ne
- SGLang
How to use rockerritesh/trOCR_ne with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rockerritesh/trOCR_ne" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockerritesh/trOCR_ne", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rockerritesh/trOCR_ne" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockerritesh/trOCR_ne", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use rockerritesh/trOCR_ne with Docker Model Runner:
docker model run hf.co/rockerritesh/trOCR_ne
TrOCR Fine-Tuned for Nepali Language
Model Description
This model is a fine-tuned version of Microsoft's TrOCR model for optical character recognition (OCR) tasks, specifically trained to recognize and generate Nepali text from handwritten or printed image inputs. It leverages a VisionEncoderDecoder architecture with a DeiT-based encoder and a BERT-based decoder.
Model Architecture
- Encoder: Vision Transformer (DeiT)
- Decoder: BERT-like architecture adapted for OCR tasks
- Pretrained Base: microsoft/trocr-base-handwritten
- Tokenizer: Nepali BERT tokenizer from Shushant/nepaliBERT
Training Details
- Dataset: Fine-tuned using a Nepali dataset consisting of handwritten and printed text.
- Objective: Generate accurate Nepali text outputs from images containing textual content.
- Optimization: Trained with a combination of beam search and length penalty to enhance the quality of text generation.
- Beam Search Parameters:
num_beams = 8length_penalty = 2.0max_length = 47no_repeat_ngram_size = 3
Usage
Inference Example
To use this model for OCR tasks, you can follow the steps below:
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
# Load the fine-tuned model and processor
model = VisionEncoderDecoderModel.from_pretrained("rockerritesh/trOCR_ne")
processor = TrOCRProcessor.from_pretrained("rockerritesh/trOCR_ne")
# Load an image
image = Image.open("path_to_image.jpg").convert("RGB")
# Preprocess image and generate predictions
pixel_values = processor(images=image, return_tensors="pt").pixel_values
output_ids = model.generate(pixel_values, num_beams=8, max_length=47, early_stopping=True)
decoded_text = processor.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
print("Recognized Text:", decoded_text)
Hugging Face Hub
You can access the model and its processor on the Hugging Face Hub:
- Model: rockerritesh/trOCR_ne
- Processor: rockerritesh/trOCR_ne
Features
- OCR for Nepali: Trained to accurately recognize Nepali text in handwritten and printed formats.
- Robust Tokenizer: Utilizes the Nepali BERT tokenizer for efficient tokenization.
- Efficient Inference: Supports beam search and length penalties to optimize generation quality.
Fine-Tuning Details
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Batch Size | 16 |
| Learning Rate | 5e-5 |
| Epochs | 5 |
| Optimizer | AdamW |
| Beam Search Beams | 8 |
| Max Length | 47 |
| Length Penalty | 2.0 |
| No Repeat N-Gram Size | 3 |
Model Configuration
The model was configured as follows:
Decoder
- Activation Function: ReLU
- Attention Heads: 8
- Layers: 6
- Hidden Size: 256
- FFN Size: 1024
Encoder
- Hidden Size: 384
- Layers: 12
- Attention Heads: 6
- Image Size: 384
Dataset Details
The dataset used for fine-tuning consists of diverse handwritten and printed Nepali text from publicly available and custom datasets.
Limitations and Bias
- The model's performance depends on the quality and diversity of the fine-tuning dataset.
- It may not generalize well to unseen handwriting styles or printed text with unconventional fonts.
Citation
If you use this model in your research or applications, please cite:
@article{rockerritesh-trocr-nepali,
title={Fine-Tuned TrOCR Model for Nepali Language},
author={Sumit Yadav},
year={2024},
url={https://huggingface.co/rockerritesh/trOCR_ne}
}
License
license: apache-2.0
- Downloads last month
- -
Model tree for rockerritesh/trOCR_ne
Base model
microsoft/trocr-base-handwritten