Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Spaces:
DataEyond
/
Agentic-Service-Data-Eyond
like
0
Paused
App
Files
Files
Community
16
Fetching metadata from the HF Docker repository...
[KM-438][KM-439] Improve Retrieval and Querying feature
#15
by
rhbt6767
- opened
13 days ago
base:
refs/heads/main
←
from:
refs/pr/15
Discussion
Files changed
+4430
-281
[noticket] add gitignore
c87f27f4
[NOTICKET]: add document pipeline, simplify document API
fb871f3d
[NOTICKET]: update folder document_pipelines after pipelines
a4cf97ab
[NOTICKET][DB] refactor code to new repo
7f3bb978
[KM-441] add mean and median
9b593342
[NOTICKET] new metadata format for cleaner code
6b590d94
update document
5a69e0ee
delete duplicate file
3848d7b2
edit document for new pipeline
425e0210
[NOTICKET]: add CSV and XLSX file type
31920c3b
[DB] fix/rename db_pipeline.py
d913315c
[NOTICKET][DB] menyesuaikan format struktur db_pipeline sesuai dengan file lain
e13a9017
[NOTICKET][DB] pisahin db credential ke folder model. add ingestion endpoint at db_client to use db pipeline. add router db_client di main.
347a73aa
[NOTICKET]: use tesseract for extract PDF
6b9a13d4
[NOTICKET]: add Tesseract and Poppler binaries via Git LFS
0a9101a1
[NOTICKET]: update uv.lock
bb79f64b
[NOTICKET][DB] update credential & databaseclient. update settings
0e079550
[NOTICKET] update settings
65a5c6b1
[KM-437][DB] add mysql, sqlserver, bigquery, snowflake connections
43539293
[NOTICKET]: adjusted pyproject.toml for OCR PDF
a00e2ad5
[NOTICKET]: fix merge conflict
6c873460
[NOTICKET][DB] fix mysql pipeline
060c8cc8
[NOTICKET] edit imports
b145c06e
[NOTICKET] minor code refactor
52415b6a
[NOTICKET] add duplicate check for storing database
d310770f
[NOTICKET][DB] add supported dbtype for frontend
a531fcc7
[NOTICKET]: add doctypes endpoint & 10MB file size limit
9debae56
[NOTICKET]: add comments to flag that file type lists must stay in sync
023b7cfe
[NOTICKET]: add to gitignore
bbc8c584
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
7757da18
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
9c090a04
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
5398fec4
Merge branch 'dev_new' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
20bf3f8f
[NOTICKET] add total token logging
b9703fc5
[NOTICKET] add updated_at field for metadata & delete old embedding before appending
cb5ab327
[NO TICKET][document]: add updated_at on metadata
d2f7a483
[NO TICKET][document]: delete vector embedding on table langchain_pg_embedding if user delete document on knowledge
ac3d8c19
[NOTICKET][document]: make a clean output to status error unsupported file type
2814813f
[KM-438][KM-439] framework for knowledge retriever
d1e12641
Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
a701ac37
[NOTICKET] fix single source to multiple sources
589ca324
[NOTICKET] fixed multiple sources
e9f2a263
[KM-507] add multiple retrieval method to compare (dense, mmr, bm25, hybrid)
ac6b78d1
[KM-507] add changes to methods
82186504
[NOTICKET] add db_client for querying
e49db601
add to gitignore
83ed7447
[KM-507] add different methods, now using dense cosine
145bca39
[KM-512] create folder for querying from bd/tabular docs
2c8a3e89
[NOTICKET] minor fix in chat.py, add package for query, change schema used to hybrid (cosine+bm25)
15cd3a7f
[KM-512] add Pydantic model the LLM fills via function calling in sql_query, and add same signature for db and tabular
220f59eb
[NOTICKET] rename file name, updated after uv sync
948d6dda
[NOTICKET] update .gitignore
240251c4
Merge branch 'dev_new' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new
29efec67
[KM-513][document] add convert to parquet if type file is XLSX and CSV
770f26b1
add to gitignore
1fef470b
[KM-512] connect query executor to user question. add logging for db_executor
abc494f9
[NOTICKET] fix delete, now can filter by user
f273db05
[NOTICKET] db_executor: CTE DML check now walks entire AST root, schema: cast instead of string interpolation
bd2b1d9d
[NOTICKET] fix-revert string change
110ee343
[KM-520] Integrate db query executor pipeline with existing rag retrieve pipeline
a25febe2
[KM-516][KM-517] add new feature; ai can now see table & column names that have fk relationship with retrieved result
f86da27b
[NOTICKET] fix query now use orchestrator msg, rework db pipeline replace ingestion logic
be9bbd9d
[KM-507] now only uses hybrid (cosine and bm25)
40925b45
[NOTICKET]untrack software/ folder (ignored via .gitignore)
0931c10d
[NOTICKET] add pyarrow
432c1fa9
[NOTICKET] add pyarrow
7ff66c9b
[KM-515][document] Make Query for Tabular Type (XLSX & CSV)
36049948
[KM-455][document] decided methods retrieval for document
cf77d20e
[KM-533] add table level schema, differentiate with chunk level. expand retrieval result with FK exploration
fc1239ae
[KM-533] now also retrieves table level chunk
4150ba7e
fix: fix dedup logic
c9d3b337
[NOTICKET] rrf merge now at router level
de32ab04
[NOTICKET] minor refactoring
e4f62b85
fix: query executor now use user question as prompt (sebelumnya pakai hasil orchestrator)
0935ede4
fix: increase K in chat endpoint to 10
b59ef76f
feat: add sheet-level chunk on CSV/XLSX ingestion
8daf9b59
fix: 5 bug fixes on tabular executor
a49dc1b5
[NOTICKET][doc] fix aggregate count operation when value_col is not specified
00aa61d9
[NOTICKET] now retrieve db tables first, then get column from the obtained tables. reduce k to 5
bb29492a
[NOTICKET][doc] add guard if filename None
b7fbaebb
fix fallback to fresh retrieval on corrupted Redis cache
9cb950f7
[NOTICKET][doc] validate embedding vector for NaN/Infinity in manhattan retriever
2167a5bd
[NOTICKET] pass orchestrator search_query to sql executor for multi-turn context
23eeb2d3
[NOTICKET][db] add sheet-level retrieval and focus LLM schema context to retrieved columns
a205d0c5
fix: minor returned type if sql writes limit yang melebihi batas
b4df8b1d
[NOTICKET][doc] add sheet-level leg and RRF voting for tabular retrieval
5f86993f
[NOTICKET][doc] remove column filter and fallback cap for full-schema approach
959b1b00
[NOTICKET][doc] correct metadata key path in _format_context
16ab9164
NOTICKET] add dev dependency group and update gitignore
36ffff42
make executors self-contained, remove redundant pre-filter
73b7fe32
fix sorted ranking so model uses overall sorted retrieved chunks
3e7924d5
rhbt6767
DataEyond org
13 days ago
No description provided.
rhbt6767
changed pull request title from
test
to
[KM-438][KM-439] Improve Retrieval and Querying feature
13 days ago
merge dev_new to main
9257b7bc
rhbt6767
changed pull request status to
open
13 days ago
ishaq101
changed pull request status to
merged
13 days ago
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Comment
·
Sign up
or
log in
to comment