NekoQwen-9B

Qwen3.5-9B finetuned by NekoQA-30K

Model Details

Architecture: Qwen3_5ForConditionalGeneration
Processor: Qwen3VLProcessor
Precision: float16
Format: sharded safetensors
Parameter count: about 9.41B
Repository size: about 18 GB
Modalities: text, image, and video inputs with text generation output
Max position embeddings: 262144
Transformers version in config: 5.3.0

Fine-Tuning Summary

Base model: Qwen/Qwen3.5-9B
Tuning method: LoRA merged into full weights
Epochs: 1.0
Learning rate: 1e-4
Per-device batch size: 1
Gradient accumulation: 16
Sequence length: 768
Precision during training: fp16

Usage

import torch
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

model_id = "your-username/your-repo"

processor = AutoProcessor.from_pretrained(model_id)
model = Qwen3_5ForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe the main characteristics of this model in one paragraph."},
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = processor(text=[text], padding=True, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=128)

print(processor.batch_decode(generated_ids, skip_special_tokens=True)[0])

For image or video inputs, use the same chat-template message structure with Qwen3VLProcessor.

Notes

This folder contains the merged checkpoint, tokenizer, processor configuration, and chat template needed to load the model with Transformers.

Training data provenance, evaluation results, and intended-use notes are not documented in this folder yet. Add those details before making the repository public if you want a complete public model card.

Downloads last month: 11

Safetensors

Model size

9B params

Tensor type

F16

Model tree for Sfever/NekoQwen-9B

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(156)

this model

Quantizations

2 models