Instructions to use Guilherme34/Firefly-V3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Guilherme34/Firefly-V3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Guilherme34/Firefly-V3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Guilherme34/Firefly-V3")
model = AutoModelForCausalLM.from_pretrained("Guilherme34/Firefly-V3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use Guilherme34/Firefly-V3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Guilherme34/Firefly-V3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Guilherme34/Firefly-V3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Guilherme34/Firefly-V3

SGLang

How to use Guilherme34/Firefly-V3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Guilherme34/Firefly-V3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Guilherme34/Firefly-V3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Guilherme34/Firefly-V3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Guilherme34/Firefly-V3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Guilherme34/Firefly-V3 with Docker Model Runner:
```
docker model run hf.co/Guilherme34/Firefly-V3
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Firefly-V3

A heavily refined roleplay-first language model, built from a merge foundation and then pushed further through additional training, fine-tuning, and style refinement.

My personal strongest roleplay model to date. (btw if you need the GGUF: https://huggingface.co/mradermacher/Firefly-V3-GGUF)

Roleplay Storytelling Character Acting Long Chats Creative Writing Fine-Tuned Merge Foundation

About Firefly-V3

Firefly-V3 is a heavily refined roleplay model created from a merge foundation and then improved through additional training, fine-tuning, and roleplay-focused behavioral shaping.

The model was built to deliver stronger character presence, more natural dialogue, better scene flow, and more enjoyable long-form roleplay conversations. The merge is only part of its origin. Firefly-V3's final behavior comes from the extra training and refinement applied after that foundation.

I personally consider Firefly-V3 the best roleplay model I have made so far.

Roleplay First

Designed around immersive character interaction, emotional pacing, scene continuity, and expressive dialogue.

Further Trained

Firefly-V3 was not released as a raw merge. It was further trained and refined to shape its final personality and RP behavior.

Character Presence

Built to make characters feel more distinct, expressive, and consistent across longer chats.

Creative Writing

Strong for prose, atmosphere, dialogue, character descriptions, and interactive fiction.

Best For

Roleplay chats

Character-driven conversations and interactive scenes.

Character cards

Bots with personality, style, and scene memory.

Creative writing

Narration, dialogue, atmosphere, and prose.

Long scenes

Extended interactions that need pacing and continuity.

Training and Model Lineage

Firefly-V3 began from Guilherme34/Firefly-V2.5 and merge components including SicariusSicariiStuff/Impish_LLAMA_3B, but it did not stop there.

After the merge foundation, the model went through additional training and fine-tuning to improve roleplay behavior, character voice, response style, dialogue quality, and long-form scene handling.

The merge defines part of the model's ancestry. The final Firefly-V3 experience comes from the further training and refinement done after that stage.

Attribution

Model: Firefly-V3
Author: Guilherme34

Base: Guilherme34/Firefly-V2.5
Merge Component: SicariusSicariiStuff/Impish_LLAMA_3B

This model is part of the Firefly family and was further trained and refined for users who want a stronger, more immersive roleplay experience.

⚠️ Disclaimer

This model is uncensored and will generate content without built-in refusals. It is intended for creative fiction and roleplay between consenting adults. The creator is not responsible for how the model is used. Do not use it to produce content that is illegal in your jurisdiction.