Gemma 3 4B T1-it GGUF Collection
GGUF quantized models converted from twinkle-ai/gemma-3-4B-T1-it for use with llama.cpp.
About
Gemma 3 4B T1-it is a small language model fine-tuned on Taiwan-focused datasets, supporting both English and Traditional Chinese. This repository provides multiple quantization formats optimized for different use cases.
Available Models
| Model | Size | Use Case |
|---|---|---|
twinkle-ai-gemma-3-4B-T1-it-BF16.gguf |
Largest | Best quality, highest precision |
twinkle-ai-gemma-3-4B-T1-it-F16.gguf |
Large | High quality, good precision |
twinkle-ai-gemma-3-4B-T1-it-Q8_0.gguf |
Medium | Balanced quality and speed |
twinkle-ai-gemma-3-4b-t1-it-q4_k_m.gguf |
Smallest | Fastest inference, lower memory |
Quick Start
Option 1: Using Hugging Face Hub (Recommended)
Install llama.cpp via Homebrew:
brew install llama.cpp
Run inference directly from Hugging Face:
llama-cli --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
--hf-file gemma-3-4b-t1-it-q8_0.gguf \
-p "Your prompt here"
Start as a server:
llama-server --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
--hf-file gemma-3-4b-t1-it-q8_0.gguf \
-c 2048
Option 2: Build from Source
Step 1: Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Step 2: Build llama.cpp
Basic build (CPU only):
LLAMA_CURL=1 make
Hardware-specific build options:
NVIDIA GPU (Linux):
LLAMA_CUDA=1 LLAMA_CURL=1 makeApple Silicon (Mac):
LLAMA_METAL=1 LLAMA_CURL=1 makeAMD GPU (ROCm):
LLAMA_HIPBLAS=1 LLAMA_CURL=1 make
Step 3: Run inference
./llama-cli --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
--hf-file gemma-3-4b-t1-it-q8_0.gguf \
-p "Your prompt here"
Step 4: Start server (optional)
./llama-server --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
--hf-file gemma-3-4b-t1-it-q8_0.gguf \
-c 2048
Advanced Usage
Choosing the Right Model
Select a model based on your needs:
- Best Quality: Use
BF16orF16versions (requires more memory) - Balanced: Use
Q8_0version (recommended for most users) - Resource Constrained: Use
q4_k_mversion (suitable for devices with limited memory)
Common Parameters
-p "prompt": Your input text for the model to respond to-c 2048: Context length (maximum number of tokens that can be processed)--hf-repo: Hugging Face repository name--hf-file: Model file name to use
Adjusting Generation Parameters
llama-cli --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
--hf-file gemma-3-4b-t1-it-q8_0.gguf \
-p "Your prompt here" \
--temp 0.7 \
--top-p 0.9 \
--repeat-penalty 1.1
Parameter explanations:
--temp: Temperature (0.0-2.0), higher values produce more random output--top-p: Nucleus sampling parameter (0.0-1.0)--repeat-penalty: Repetition penalty to avoid repetitive content
Model Information
- Base Model: twinkle-ai/gemma-3-4B-T1-it
- Languages: English, Traditional Chinese
- License: Gemma
- Format: GGUF (converted via GGUF-my-repo)
Training Data
- Taiwan reasoning and instruction datasets
- Contract review and legal documents
- Multimodal and long-form content
- Instruction-following examples
Benchmarks
- TMMLU+: 47.44% accuracy
- MMLU: 59.13% accuracy
- TW Legal Benchmark: 44.18% accuracy
Troubleshooting
Common Issues
Q: Getting out of memory errors?
A: Try using a smaller quantized version like q4_k_m, or reduce the context length parameter -c.
Q: How can I speed up inference?
A:
- Use GPU acceleration (add hardware-specific flags during compilation)
- Choose a smaller quantized model (like
q4_k_m) - Reduce context length
Q: What prompt format does the model support?
A: This is an instruction-tuned model. Use a clear instruction format, for example:
Please analyze the main clauses of the following contract: [contract content]
Links
Contributing
If you have any questions or suggestions, please feel free to open a discussion in the Hugging Face repository.
Note: On first run, llama.cpp will automatically download the model file from Hugging Face. Please ensure you have a stable internet connection.
- Downloads last month
- 2,368
4-bit
8-bit
16-bit
Model tree for twinkle-ai/gemma-3-4B-T1-it-GGUF
Datasets used to train twinkle-ai/gemma-3-4B-T1-it-GGUF
Collection including twinkle-ai/gemma-3-4B-T1-it-GGUF
Evaluation results
- single choice on tmmlu+test set self-reported47.440
- single choice on mmlutest set self-reported59.130
- single choice on tw-legal-benchmark-v1test set self-reported44.180
