Phi-3.5 Mini Instruct (GGUF Quantized)

This repository contains the GGUF quantized version of the Microsoft Phi-3.5 Mini Instruct model. It is optimized for low-resource devices (like mobile phones and older laptops) while maintaining high reasoning capabilities.

Model Creator: Microsoft
Quantized By: Habibur Rahman (Aasif)
Quantization Format: GGUF (Q4_0)

🚀 Usage

You can run this model easily using the llama-cpp-python library.

1. Installation

First, install the necessary library. Ensure you have GPU support enabled for faster inference.

pip install llama-cpp-python huggingface_hub

Python Code Example

Here is a simple script to download and run the model:

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Download the GGUF model
model_path = hf_hub_download(
    repo_id="Habibur2/Phi-3.5-mini-GGUF",
    filename="phi-3.5-mini-q4_0.gguf"
)

# Load the model
# Set n_gpu_layers=-1 for full GPU usage (Requires CUDA)
# Set n_gpu_layers=0 if you only want to use CPU
llm = Llama(
    model_path=model_path,
    n_ctx=2048,        # Context window
    n_threads=4,       # Number of CPU threads
    n_gpu_layers=-1    # Offload all layers to GPU
)

# Run Inference
output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Who is the founder of Microsoft?"}
    ],
    max_tokens=512,
    temperature=0.7
)

print(output['choices'][0]['message']['content'])

⚙️ Model Details Feature,Details Original Model,Phi-3.5 Mini Instruct Parameters,3.8 Billion Quantization,Q4_0 (4-bit) File Size,~2.18 GB Recommended RAM,4 GB+

👨‍💻 About the Author

Quantized and uploaded by Md Habibur Rahman. This model is intended for educational purposes and hackathon projects focusing on Edge AI and SLM (Small Language Models).

Downloads last month: 21

GGUF

Model size

4B params

Architecture

phi3

Hardware compatibility

4-bit

Model tree for Habibur2/Phi-3.5-mini-GGUF

Base model

microsoft/Phi-3.5-mini-instruct

Quantized

(171)

this model