--- title: RepoReaper emoji: ๐Ÿ’€ colorFrom: blue colorTo: indigo sdk: docker pinned: false app_port: 8000 ---
RepoReaper Logo

RepoReaper

๐Ÿ’€ Harvest Logic. Dissect Architecture. Chat with Code.

English โ€ข ็ฎ€ไฝ“ไธญๆ–‡

License Python Version DeepSeek Powered Agent Architecture
RAG Qdrant FastAPI Vue 3 Docker

๐Ÿ‘‡ Live Demo / ๅœจ็บฟไฝ“้ชŒ ๐Ÿ‘‡

Global Demo     China Demo

โš ๏ธ Public demos use shared API quotas. Deploy locally for the best experience.


RepoReaper Demo
--- An autonomous Agent that dissects any GitHub repository. It maps code architecture, warms up semantic cache, and answers questions with Just-In-Time context retrieval. --- ## โœจ Key Features | Feature | Description | |:--------|:------------| | **Multi-Language AST Parsing** | Python AST + Regex patterns for Java, TypeScript, Go, Rust, etc. | | **Hybrid Search** | Qdrant vectors + BM25 with RRF fusion | | **JIT Context Loading** | Auto-fetches missing files during Q&A | | **Query Rewrite** | Translates natural language to code keywords | | **End-to-End Tracing** | Langfuse integration for observability | | **Auto Evaluation** | LLM-as-Judge scoring pipeline | --- ## ๐Ÿ— Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Vue 3 Frontend (SSE Streaming + Mermaid Diagrams) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ FastAPI Backend โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Agent โ”‚ โ”‚ Chat โ”‚ โ”‚ Evaluation โ”‚ โ”‚ โ”‚ โ”‚ Service โ”‚ โ”‚ Service โ”‚ โ”‚ Framework โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Vector Service (Qdrant+BM25)โ”‚ โ”‚ Tracing (Langfuse) โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` --- ## ๐Ÿ›  Tech Stack **Backend:** Python 3.10+ ยท FastAPI ยท AsyncIO ยท Qdrant ยท BM25 **Frontend:** Vue 3 ยท Pinia ยท Mermaid.js ยท SSE **LLM:** DeepSeek V3 ยท SiliconFlow BGE-M3 **Ops:** Docker ยท Gunicorn ยท Langfuse --- ## ๐Ÿ Quick Start **Prerequisites:** Python 3.10+ ยท (Optional) Node 18+ for rebuilding frontend ยท GitHub Token (recommended) ยท LLM API Key (required) ```bash # Clone & Setup git clone https://github.com/tzzp1224/RepoReaper.git && cd RepoReaper python -m venv venv && source venv/bin/activate pip install -r requirements.txt # Configure .env (copy from example and fill in your keys) cp .env.example .env # Required: set LLM_PROVIDER and the matching *_API_KEY # Recommended: GITHUB_TOKEN and SILICON_API_KEY (embeddings) # (Optional) Build frontend (repo already contains frontend-dist) cd frontend-vue npm install npm run build cd .. # Run python -m app.main ``` Open `http://localhost:8000` and paste any GitHub repo URL. **Docker (single container, local Qdrant):** ```bash cp .env.example .env docker build -t reporeaper . docker run -d -p 8000:8000 --env-file .env reporeaper ``` **Docker Compose (recommended, with Qdrant Server):** ```bash cp .env.example .env # Set QDRANT_MODE=server and QDRANT_URL=http://qdrant:6333 in .env docker compose up -d --build ``` ## ๐Ÿ“Š Evaluation & Tracing Status | Component | Status | Notes | |:----------|:------:|:------| | **Self-built Eval Engine** | โœ… Working | 4-layer metrics (QueryRewrite / Retrieval / Generation / Agentic), LLM-as-Judge | | **Auto Evaluation** | โœ… Working | Triggers after every `/chat`, async, writes to `evaluation/sft_data/` | | **Data Routing (SFT)** | โœ… Working | Auto-grades Gold/Silver/Bronze/Rejected โ†’ JSONL files | | **Eval API Endpoints** | โœ… Working | `/evaluate`, `/evaluation/stats`, `/dashboard/*`, `/auto-eval/*` (7 endpoints) | | **Offline Retrieval Eval** | โœ… Working | `test_retrieval.py` โ€” Hit Rate, Recall@K, Precision@K, MRR | | **Langfuse Tracing** | โš ๏ธ Partial | Framework + 14 call sites wired in agent/chat services; falls back to local JSON logs (`logs/traces/`) when Langfuse unavailable | | **Ragas Integration** | โŒ Placeholder | `use_ragas=False` by default; `_ragas_eval()` API call doesn't match latest Ragas SDK | | **Langfuse โ†” Eval** | โŒ Not connected | Eval results only write JSONL, not reported to Langfuse Scores API | > **Overall completion: ~65%** โ€” the self-built eval loop is production-ready; Ragas and Langfuse integrations are scaffolded but not functional. --- ## โš ๏ธ Known Issues 1. **Python 3.14 + Langfuse import error** `pydantic.V1.errors.ConfigError: unable to infer type for attribute "description"` โ€” Langfuse 3.x internally uses `pydantic.v1` compat layer which breaks on Python 3.14. **Workaround:** set `LANGFUSE_ENABLED=false` in `.env`, or use Python 3.10โ€“3.12. 2. **Langfuse Server not included in `docker-compose.yml`** Even if the import works, you need a running Langfuse instance. Add it yourself or use [app.langfuse.com](https://app.langfuse.com). 3. **Trace spans are not linked** `tracing_service` records spans/events but doesn't pass `trace_id` to Langfuse API calls โ€” the Langfuse UI will show isolated events instead of a connected trace tree. 4. **Ragas `_ragas_eval()` uses outdated API** Passes a plain dict to `ragas.evaluate()`, but latest Ragas requires a `Dataset` object. The `ragas_eval_dataset.json` export exists but no script consumes it. 5. **Golden dataset has no reference answers** All 26 test cases have `expected_answer: ""` โ€” generation quality cannot be compared against ground truth. 6. **Heuristic fallback is coarse** When no LLM client is available, `faithfulness` uses keyword overlap + 0.2 baseline; `completeness` is purely length-based. --- ## ๐Ÿ—บ Roadmap - [ ] **Fix Langfuse compat** โ€” pin `langfuse`/`pydantic` versions or gate import behind Python version check - [ ] **Add Langfuse to `docker-compose.yml`** โ€” one-command local observability - [ ] **Wire trace_id through spans** โ€” enable full trace tree in Langfuse UI - [ ] **Integrate Ragas properly** โ€” update `_ragas_eval()` to use `ragas.evaluate(Dataset(...))`, add a standalone eval script - [ ] **Enrich golden dataset** โ€” add `expected_answer` for generation benchmarking, expand to 50+ cases - [ ] **Eval dashboard frontend** โ€” Vue component to visualize quality distribution and bad cases - [ ] **CI regression baseline** โ€” run `test_retrieval.py` in GitHub Actions, fail on metric regression - [ ] **Export to Langfuse Datasets** โ€” push eval results to Langfuse Scores/Datasets API for unified observability --- ## ๐Ÿ“ˆ Star History Star History Chart