---
license: apache-2.0
base_model: Qwen/Qwen2.5-VL-32B-Instruct
tags:
- qwen
- vision-language
- tool-use
- lora
- fine-tuned
- multimodal
- visual-reasoning
language:
- en
pipeline_tag: text-generation
---

# Qwen2.5-VL-32B Tool Assistant with LoRA fine-tuning

This is a LoRA adapter for the Qwen2.5-VL-32B model, fine-tuned for tool-use with visual input.

## Usage

```python
from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel
import torch
from PIL import Image

# Load the model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct", 
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(
    base_model, 
    "srai86825/qwen-vl-tool-assistant-lora"
)

# Use the model
image = Image.open("your_image.jpg")
text = "What is in this image?"

inputs = processor(text=text, images=image, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)
```

## Training Details
- Base model: Qwen/Qwen2.5-VL-32B-Instruct
- Fine-tuning method: LoRA with rank 8
- Target modules: all
- Training data: Custom tool-use dataset

## Model Architecture

This model uses the Low-Rank Adaptation (LoRA) technique to efficiently fine-tune the Qwen2.5-VL-32B-Instruct model. LoRA works by adding small, trainable rank decomposition matrices to existing weights, allowing for parameter-efficient fine-tuning.

The adapter is applied to all attention layers in the model, which allows it to learn new capabilities without modifying the entire model.

## Limitations

- This model inherits the limitations of the base Qwen2.5-VL model
- The fine-tuning data may introduce biases or limitations in certain domains
- For optimal performance, use images similar in style and content to what the model was trained on