--- license: apache-2.0 base_model: Qwen/Qwen2.5-VL-32B-Instruct tags: - qwen - vision-language - tool-use - lora - fine-tuned - multimodal - visual-reasoning language: - en pipeline_tag: text-generation --- # Qwen2.5-VL-32B Tool Assistant with LoRA fine-tuning This is a LoRA adapter for the Qwen2.5-VL-32B model, fine-tuned for tool-use with visual input. ## Usage ```python from transformers import AutoProcessor, AutoModelForCausalLM from peft import PeftModel import torch from PIL import Image # Load the model processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-VL-32B-Instruct", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) model = PeftModel.from_pretrained( base_model, "srai86825/qwen-vl-tool-assistant-lora" ) # Use the model image = Image.open("your_image.jpg") text = "What is in this image?" inputs = processor(text=text, images=image, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100) result = processor.decode(outputs[0], skip_special_tokens=True) print(result) ``` ## Training Details - Base model: Qwen/Qwen2.5-VL-32B-Instruct - Fine-tuning method: LoRA with rank 8 - Target modules: all - Training data: Custom tool-use dataset ## Model Architecture This model uses the Low-Rank Adaptation (LoRA) technique to efficiently fine-tune the Qwen2.5-VL-32B-Instruct model. LoRA works by adding small, trainable rank decomposition matrices to existing weights, allowing for parameter-efficient fine-tuning. The adapter is applied to all attention layers in the model, which allows it to learn new capabilities without modifying the entire model. ## Limitations - This model inherits the limitations of the base Qwen2.5-VL model - The fine-tuning data may introduce biases or limitations in certain domains - For optimal performance, use images similar in style and content to what the model was trained on