Instructions to use MBZUAI/MobiLlama-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MBZUAI/MobiLlama-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MBZUAI/MobiLlama-1B", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-1B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-1B", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MBZUAI/MobiLlama-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MBZUAI/MobiLlama-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MBZUAI/MobiLlama-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/MBZUAI/MobiLlama-1B
- SGLang
How to use MBZUAI/MobiLlama-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MBZUAI/MobiLlama-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MBZUAI/MobiLlama-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MBZUAI/MobiLlama-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MBZUAI/MobiLlama-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use MBZUAI/MobiLlama-1B with Docker Model Runner:
docker model run hf.co/MBZUAI/MobiLlama-1B
MobiLlama-1B

MobiLlama-1B is a Small Language Model with 1.2 billion parameters. It was trained using the Amber data sources Amber-Dataset.
Model Summary
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the ‘less is more’ paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource-constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes are available on our Github.
Model Description
- Model type: Small Language Model (SLM) built using the architecture design of LLaMA-7B
- Language(s) (NLP): English
- License: Apache 2.0
- Resources for more information:
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-1B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-1B", trust_remote_code=True)
model.to('cuda')
text = "I was walking towards the river when "
input_ids = tokenizer(text, return_tensors="pt").to('cuda').input_ids
outputs = model.generate(input_ids, max_length=1000, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
Training DataMix
| Subset | Tokens (Billion) |
|---|---|
| Arxiv | 30.00 |
| Book | 28.86 |
| C4 | 197.67 |
| Refined-Web | 665.01 |
| StarCoder | 291.92 |
| StackExchange | 21.75 |
| Wikipedia | 23.90 |
| Total | 1259.13 |
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Total Parameters | 1.2B |
| Hidden Size | 2048 |
| Intermediate Size (MLPs) | 5632 |
| Number of Attention Heads | 32 |
| Number of Hidden Lyaers | 22 |
| RMSNorm ɛ | 1e^-5 |
| Max Seq Length | 2048 |
| Vocab Size | 32000 |
Evaluation
| Evaluation Benchmark | MobiLlama-0.5B | MobiLlama-0.8B | MobiLlama-1.2B |
|---|---|---|---|
| HellaSwag | 52.52 | 54.09 | 62.99 |
| MMLU | 26.45 | 26.92 | 24.23 |
| Arc Challenge | 29.52 | 30.20 | 34.55 |
| TruthfulQA | 38.05 | 38.48 | 35.57 |
| CrowsPairs | 64.03 | 64.82 | 68.12 |
| PIQA | 72.03 | 73.17 | 75.29 |
| Race | 33.68 | 33.37 | 35.31 |
| SIQA | 40.22 | 41.60 | 41.96 |
| Winogrande | 57.53 | 57.45 | 61.08 |
Citation
BibTeX:
@misc{thawakar2024mobillama,
title={MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT},
author={Omkar Thawakar and Ashmal Vayani and Salman Khan and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Timothy Baldwin and Eric P. Xing and Fahad Shahbaz Khan},
year={2024},
eprint={2402.16840},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 77