Instructions to use openbmb/AgentCPM-Report-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use openbmb/AgentCPM-Report-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="openbmb/AgentCPM-Report-GGUF", filename="AgentCPM-Report-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use openbmb/AgentCPM-Report-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf openbmb/AgentCPM-Report-GGUF:Q4_K_M
Use Docker
docker model run hf.co/openbmb/AgentCPM-Report-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use openbmb/AgentCPM-Report-GGUF with Ollama:
ollama run hf.co/openbmb/AgentCPM-Report-GGUF:Q4_K_M
- Unsloth Studio new
How to use openbmb/AgentCPM-Report-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for openbmb/AgentCPM-Report-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for openbmb/AgentCPM-Report-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for openbmb/AgentCPM-Report-GGUF to start chatting
- Docker Model Runner
How to use openbmb/AgentCPM-Report-GGUF with Docker Model Runner:
docker model run hf.co/openbmb/AgentCPM-Report-GGUF:Q4_K_M
- Lemonade
How to use openbmb/AgentCPM-Report-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull openbmb/AgentCPM-Report-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.AgentCPM-Report-GGUF-Q4_K_M
List all available models
lemonade list
AgentCPM-Report: Gemini-2.5-pro-DeepResearch Level Local DeepResearch
Links & Resources
π AgentCPM-Report Models
- AgentCPM-Report The Gemini-2.5-pro-DeepResearch Level Local DeepResearch Model
- AgentCPM-Report-GGUF The GGUF version of AgentCPM-Report
π€ AgentCPM-Explore Models
- AgentCPM-Explore The first open-source agent model with 4B parameters to appear on 8 widely used long-horizon agent benchmarks.
- AgentCPM-Explore-GGUF The GGUF version of AgentCPM-Explore
π» Code & Framework
- AgentCPM Our code for AgentCPM Series
- UltraRAG A RAG Framework, Less Code, Lower Barrier, Faster Deployment
News
- [2026-01-20] πππ We open-sourced AgentCPM-Report built on MiniCPM4.1-8B, capable of matching top closed-source commercial systems like Gemini-2.5-pro-DeepResearch in report generation.
Overview
AgentCPM-Report is an open-source large language model agent jointly developed by THUNLP, Renmin University of China RUCBM, and ModelBest. It is based on the MiniCPM4.1 8B-parameter base model. It accepts user instructions as input and autonomously generates long-form reports. Key highlights:
- Extreme Performance, Minimal Footprint: Through an average of 40 rounds of deep retrieval and nearly 100 rounds of chain-of-thought reasoning, it achieves comprehensive information mining and restructuring, enabling edge-side models to produce logically rigorous, deeply insightful long-form articles with tens of thousands of words. With just 8 billion parameters, it delivers performance on par with top-tier closed-source systems in deep research tasks.
- Physical Isolation, Local Security: Specifically designed for high-privacy scenarios, it supports fully offline and agile local deployment, completely eliminating the risk of cloud data leaks. Leveraging our UltraRAG framework, it efficiently mounts and understands your local private knowledge base, securely transforming core confidential data into highly valuable professional decision-making reports without ever leaving its domain.
Demo Cases
You can watch our demo video here Demo π
Quick Start
Docker Deployment
You can watch our demo video here Tutorial π
We provide a minimal one-click docker-compose deployment integrated with UltraRAG, including the RAG framework UltraRAG2.0, the model inference framework llama.cpp, and the vector database milvus. If you want CPU inference, we also provide a gpu-based version βjust switch docker-compose.cpu.yml to docker-compose.yml.
git clone git@github.com:OpenBMB/UltraRAG.git
cd UltraRAG
git checkout agentcpm-report-demo
cd agentcpm-report-demo
cp env.example .env
docker-compose -f docker-compose.cpu.yml up -d --build
docker-compose -f docker-compose.cpu.yml logs -f ultrarag-ui
The first startup pulls images, downloads the model, and configures the environment, which takes about 30 minutes.
Then open http://localhost:5050. If you can see the UI, your deployment is successful.
Follow the UI instructions to upload local files, chunk them, and build indexes; then in the Chat section, select AgentCPM-Report in the pipeline to start your workflow.
(Optional) You can import Wiki2024 as the writing database.
You can read more tutorials about AgentCPM-Report in the documentation.
Evaluation
| DeepResearch Bench | Overall | Comprehensiveness | Insight | Instruction Following | Readability |
|---|---|---|---|---|---|
| Doubao-research | 44.34 | 44.84 | 40.56 | 47.95 | 44.69 |
| Claude-research | 45.00 | 45.34 | 42.79 | 47.58 | 44.66 |
| OpenAI-deepresearch | 46.45 | 46.46 | 43.73 | 49.39 | 47.22 |
| Gemini-2.5-Pro-deepresearch | 49.71 | 49.51 | 49.45 | 50.12 | 50.00 |
| WebWeaver(Qwen3-30B-A3B) | 46.77 | 45.15 | 45.78 | 49.21 | 47.34 |
| WebWeaver(Claude-Sonnet-4) | 50.58 | 51.45 | 50.02 | 50.81 | 49.79 |
| Enterprise-DR(Gemini-2.5-Pro) | 49.86 | 49.01 | 50.28 | 50.03 | 49.98 |
| RhinoInsigh(Gemini-2.5-Pro) | 50.92 | 50.51 | 51.45 | 51.72 | 50.00 |
| AgentCPM-Report | 50.11 | 50.54 | 52.64 | 48.87 | 44.17 |
| DeepResearch Gym | Avg. | Clarity | Depth | Balance | Breadth | Support | Insightfulness |
|---|---|---|---|---|---|---|---|
| Doubao-research | 84.46 | 68.85 | 93.12 | 83.96 | 93.33 | 84.38 | 83.12 |
| Claude-research | 80.25 | 86.67 | 96.88 | 84.41 | 96.56 | 26.77 | 90.22 |
| OpenAI-deepresearch | 91.27 | 84.90 | 98.10 | 89.80 | 97.40 | 88.40 | 89.00 |
| Gemini-2.5-pro-deepresearch | 96.02 | 90.71 | 99.90 | 93.37 | 99.69 | 95.00 | 97.45 |
| WebWeaver (Qwen3-30b-a3b) | 77.27 | 71.88 | 85.51 | 75.80 | 84.78 | 63.77 | 81.88 |
| WebWeaver (Claude-sonnet-4) | 96.77 | 90.50 | 99.87 | 94.30 | 100.00 | 98.73 | 97.22 |
| AgentCPM-Report | 98.48 | 95.10 | 100.00 | 98.50 | 100.00 | 97.30 | 100.00 |
| DeepConsult | Avg. | Win | Tie | Lose |
|---|---|---|---|---|
| Doubao-research | 5.42 | 29.95 | 40.35 | 29.70 |
| Claude-research | 4.60 | 25.00 | 38.89 | 36.11 |
| OpenAI-deepresearch | 5.00 | 0.00 | 100.00 | 0.00 |
| Gemini-2.5-Pro-deepresearch | 6.70 | 61.27 | 31.13 | 7.60 |
| WebWeaver(Qwen3-30B-A3B) | 4.57 | 28.65 | 34.90 | 36.46 |
| WebWeaver(Claude-Sonnet-4) | 6.96 | 66.86 | 10.47 | 22.67 |
| Enterprise-DR(Gemini-2.5-Pro) | 6.82 | 71.57 | 19.12 | 9.31 |
| RhinoInsigh(Gemini-2.5-Pro) | 6.82 | 68.51 | 11.02 | 20.47 |
| AgentCPM-Report | 6.60 | 57.60 | 13.73 | 28.68 |
Our evaluation datasets include DeepResearch Bench, DeepConsult, and DeepResearch Gym. The writing-time knowledge base includes about 2.7 million Arxiv papers and about 200,000 internal webpage summaries.
Acknowledgements
This project would not be possible without the support and contributions of the open-source community. During development, we referred to and used multiple excellent open-source frameworks, models, and data resources, including verl, UltraRAG, MiniCPM4.1, and SurveyGo.
Contributions
Project leads: Yishan Li, Wentong Chen
Contributors: Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun
Advisors: Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
Citation
If AgentCPM-Report is helpful for your research, please cite it as follows:
@misc{li2026agentcpmreport,
title={AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research},
author={Yishan Li and Wentong Chen and Yukun Yan and Mingwei Li and Sen Mei and Xiaorong Wang and Kunpeng Liu and Xin Cong and Shuo Wang and Zhong Zhang and Yaxi Lu and Zhenghao Liu and Yankai Lin and Zhiyuan Liu and Maosong Sun},
year={2026},
eprint={2602.06540},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.06540},
}
- Downloads last month
- 106
4-bit

