Spaces:
Running
title: RepoReaper
emoji: ๐
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
RepoReaper
๐ Harvest Logic. Dissect Architecture. Chat with Code.
English โข ็ฎไฝไธญๆ
๐ Live Demo / ๅจ็บฟไฝ้ช ๐
โ ๏ธ Public demos use shared API quotas. Deploy locally for the best experience.
An autonomous Agent that dissects any GitHub repository. It maps code architecture, warms up semantic cache, and answers questions with Just-In-Time context retrieval.
โจ Key Features
| Feature | Description |
|---|---|
| Multi-Language AST Parsing | Python AST + Regex patterns for Java, TypeScript, Go, Rust, etc. |
| Hybrid Search | Qdrant vectors + BM25 with RRF fusion |
| JIT Context Loading | Auto-fetches missing files during Q&A |
| Query Rewrite | Translates natural language to code keywords |
| End-to-End Tracing | Langfuse integration for observability |
| Auto Evaluation | LLM-as-Judge scoring pipeline |
๐ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Vue 3 Frontend (SSE Streaming + Mermaid Diagrams) โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Backend โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Agent โ โ Chat โ โ Evaluation โ โ
โ โ Service โ โ Service โ โ Framework โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Vector Service (Qdrant+BM25)โ โ Tracing (Langfuse) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Tech Stack
Backend: Python 3.10+ ยท FastAPI ยท AsyncIO ยท Qdrant ยท BM25
Frontend: Vue 3 ยท Pinia ยท Mermaid.js ยท SSE
LLM: DeepSeek V3 ยท SiliconFlow BGE-M3
Ops: Docker ยท Gunicorn ยท Langfuse
๐ Quick Start
Prerequisites: Python 3.10+ ยท (Optional) Node 18+ for rebuilding frontend ยท GitHub Token (recommended) ยท LLM API Key (required)
# Clone & Setup
git clone https://github.com/tzzp1224/RepoReaper.git && cd RepoReaper
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Configure .env (copy from example and fill in your keys)
cp .env.example .env
# Required: set LLM_PROVIDER and the matching *_API_KEY
# Recommended: GITHUB_TOKEN and SILICON_API_KEY (embeddings)
# (Optional) Build frontend (repo already contains frontend-dist)
cd frontend-vue
npm install
npm run build
cd ..
# Run
python -m app.main
Open http://localhost:8000 and paste any GitHub repo URL.
Docker (single container, local Qdrant):
cp .env.example .env
docker build -t reporeaper .
docker run -d -p 8000:8000 --env-file .env reporeaper
Docker Compose (recommended, with Qdrant Server):
cp .env.example .env
# Set QDRANT_MODE=server and QDRANT_URL=http://qdrant:6333 in .env
docker compose up -d --build
๐ Evaluation & Tracing Status
| Component | Status | Notes |
|---|---|---|
| Self-built Eval Engine | โ Working | 4-layer metrics (QueryRewrite / Retrieval / Generation / Agentic), LLM-as-Judge |
| Auto Evaluation | โ Working | Triggers after every /chat, async, writes to evaluation/sft_data/ |
| Data Routing (SFT) | โ Working | Auto-grades Gold/Silver/Bronze/Rejected โ JSONL files |
| Eval API Endpoints | โ Working | /evaluate, /evaluation/stats, /dashboard/*, /auto-eval/* (7 endpoints) |
| Offline Retrieval Eval | โ Working | test_retrieval.py โ Hit Rate, Recall@K, Precision@K, MRR |
| Langfuse Tracing | โ ๏ธ Partial | Framework + 14 call sites wired in agent/chat services; falls back to local JSON logs (logs/traces/) when Langfuse unavailable |
| Ragas Integration | โ Placeholder | use_ragas=False by default; _ragas_eval() API call doesn't match latest Ragas SDK |
| Langfuse โ Eval | โ Not connected | Eval results only write JSONL, not reported to Langfuse Scores API |
Overall completion: ~65% โ the self-built eval loop is production-ready; Ragas and Langfuse integrations are scaffolded but not functional.
โ ๏ธ Known Issues
Python 3.14 + Langfuse import error
pydantic.V1.errors.ConfigError: unable to infer type for attribute "description"โ Langfuse 3.x internally usespydantic.v1compat layer which breaks on Python 3.14.
Workaround: setLANGFUSE_ENABLED=falsein.env, or use Python 3.10โ3.12.Langfuse Server not included in
docker-compose.yml
Even if the import works, you need a running Langfuse instance. Add it yourself or use app.langfuse.com.Trace spans are not linked
tracing_servicerecords spans/events but doesn't passtrace_idto Langfuse API calls โ the Langfuse UI will show isolated events instead of a connected trace tree.Ragas
_ragas_eval()uses outdated API
Passes a plain dict toragas.evaluate(), but latest Ragas requires aDatasetobject. Theragas_eval_dataset.jsonexport exists but no script consumes it.Golden dataset has no reference answers
All 26 test cases haveexpected_answer: ""โ generation quality cannot be compared against ground truth.Heuristic fallback is coarse
When no LLM client is available,faithfulnessuses keyword overlap + 0.2 baseline;completenessis purely length-based.
๐บ Roadmap
- Fix Langfuse compat โ pin
langfuse/pydanticversions or gate import behind Python version check - Add Langfuse to
docker-compose.ymlโ one-command local observability - Wire trace_id through spans โ enable full trace tree in Langfuse UI
- Integrate Ragas properly โ update
_ragas_eval()to useragas.evaluate(Dataset(...)), add a standalone eval script - Enrich golden dataset โ add
expected_answerfor generation benchmarking, expand to 50+ cases - Eval dashboard frontend โ Vue component to visualize quality distribution and bad cases
- CI regression baseline โ run
test_retrieval.pyin GitHub Actions, fail on metric regression - Export to Langfuse Datasets โ push eval results to Langfuse Scores/Datasets API for unified observability