Distil-Qwen3-4B-Text2SQL-GGUF-4bit

4-bit quantized GGUF version of distil-qwen3-4b-text2sql for efficient local inference. Only 2.5GB - runs on most laptops and edge devices.

Results

Metric DeepSeek-V3 (Teacher) Qwen3-4B (Base) This Model
LLM-as-a-Judge 80% 62% 80%
Exact Match 48% 16% 60%
ROUGE 87.6% 84.2% 89.5%

Quick Start with Ollama

1. Download the model

git lfs install
git clone https://huggingface.co/distil-labs/distil-qwen3-4b-text2sql-gguf-4bit
cd distil-qwen3-4b-text2sql-gguf-4bit

2. Create and run the model

# Create the Ollama model (Modelfile is included)
ollama create distil-qwen3-4b-text2sql -f Modelfile

# Run the model
ollama run distil-qwen3-4b-text2sql

3. Test it

>>> Schema:
... CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT, department TEXT, salary INTEGER);
...
... Question: How many employees earn more than 50000?

SELECT COUNT(*) FROM employees WHERE salary > 50000;

Usage with Python

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:11434/v1", api_key="EMPTY")

schema = """CREATE TABLE employees (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  department TEXT,
  salary INTEGER
);"""

question = "How many employees earn more than 50000?"

response = client.chat.completions.create(
    model="distil-qwen3-4b-text2sql",
    messages=[
        {
            "role": "system",
            "content": """You are given a database schema and a natural language question. Generate the SQL query that answers the question.

Rules:
- Use only tables and columns from the provided schema
- Use uppercase SQL keywords (SELECT, FROM, WHERE, etc.)
- Use SQLite-compatible syntax
- Output only the SQL query, no explanations"""
        },
        {
            "role": "user",
            "content": f"Schema:\n{schema}\n\nQuestion: {question}"
        }
    ],
    temperature=0
)

print(response.choices[0].message.content)
# Output: SELECT COUNT(*) FROM employees WHERE salary > 50000;

Model Details

Property Value
Format GGUF (Q4_K_M)
Size ~2.5 GB
Base Model distil-labs/distil-qwen3-4b-text2sql
Parameters 4 billion
Quantization 4-bit

Why Use This Version?

  • Small size: 2.5GB vs 15GB (full GGUF) or 8GB (safetensors)
  • Fast inference: Optimized for CPU and consumer GPUs
  • Same accuracy: Quantization has minimal impact on Text2SQL quality
  • Easy setup: Works with Ollama out of the box

Related Models

Model Format Size Use Case
distil-qwen3-4b-text2sql Safetensors ~8 GB Transformers, vLLM
distil-qwen3-4b-text2sql-gguf GGUF (F16) ~15 GB Full precision GGUF
This model GGUF (Q4_K_M) ~2.5 GB Recommended for local use

Supported SQL Features

  • Simple: SELECT, WHERE, COUNT, SUM, AVG, MAX, MIN
  • Medium: JOIN, GROUP BY, HAVING, ORDER BY, LIMIT
  • Complex: Subqueries, multiple JOINs, UNION

License

This model is released under the Apache 2.0 license.

Links

Downloads last month
160
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for distil-labs/distil-qwen3-4b-text2sql-gguf-4bit

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(2)
this model

Collection including distil-labs/distil-qwen3-4b-text2sql-gguf-4bit