Instructions to use Guilherme34/Firefly-V3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Guilherme34/Firefly-V3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Guilherme34/Firefly-V3") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Guilherme34/Firefly-V3") model = AutoModelForCausalLM.from_pretrained("Guilherme34/Firefly-V3") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Local Apps Settings
- vLLM
How to use Guilherme34/Firefly-V3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Guilherme34/Firefly-V3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Guilherme34/Firefly-V3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Guilherme34/Firefly-V3
- SGLang
How to use Guilherme34/Firefly-V3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Guilherme34/Firefly-V3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Guilherme34/Firefly-V3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Guilherme34/Firefly-V3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Guilherme34/Firefly-V3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Guilherme34/Firefly-V3 with Docker Model Runner:
docker model run hf.co/Guilherme34/Firefly-V3
About Firefly-V3
Firefly-V3 is a heavily refined roleplay model created from a merge foundation and then improved through additional training, fine-tuning, and roleplay-focused behavioral shaping.
The model was built to deliver stronger character presence, more natural dialogue, better scene flow, and more enjoyable long-form roleplay conversations. The merge is only part of its origin. Firefly-V3's final behavior comes from the extra training and refinement applied after that foundation.
I personally consider Firefly-V3 the best roleplay model I have made so far.
Roleplay First
Designed around immersive character interaction, emotional pacing, scene continuity, and expressive dialogue.
Further Trained
Firefly-V3 was not released as a raw merge. It was further trained and refined to shape its final personality and RP behavior.
Character Presence
Built to make characters feel more distinct, expressive, and consistent across longer chats.
Creative Writing
Strong for prose, atmosphere, dialogue, character descriptions, and interactive fiction.
Best For
Training and Model Lineage
Firefly-V3 began from Guilherme34/Firefly-V2.5 and merge components including SicariusSicariiStuff/Impish_LLAMA_3B, but it did not stop there.
After the merge foundation, the model went through additional training and fine-tuning to improve roleplay behavior, character voice, response style, dialogue quality, and long-form scene handling.
The merge defines part of the model's ancestry. The final Firefly-V3 experience comes from the further training and refinement done after that stage.
Attribution
Model: Firefly-V3
Author: Guilherme34
Base: Guilherme34/Firefly-V2.5
Merge Component: SicariusSicariiStuff/Impish_LLAMA_3B
This model is part of the Firefly family and was further trained and refined for users who want a stronger, more immersive roleplay experience.
This model is uncensored and will generate content without built-in refusals. It is intended for creative fiction and roleplay between consenting adults. The creator is not responsible for how the model is used. Do not use it to produce content that is illegal in your jurisdiction.
Firefly-V3
A further-trained and fine-tuned Firefly model built for expressive characters, creative scenes, and long-form roleplay.
- Downloads last month
- 166
Model tree for Guilherme34/Firefly-V3
Base model
Guilherme34/Firefly-V2