Spaces:

DataEyond
/

Agentic-Service-Data-Eyond

Paused

App Files Files Community

[KM-438][KM-439] Improve Retrieval and Querying feature

#15

by rhbt6767 - opened 13 days ago

base: refs/heads/main

←

from: refs/pr/15

Discussion Files changed

+4430

-281

[noticket] add gitignorec87f27f4

[NOTICKET]: add document pipeline, simplify document APIfb871f3d

[NOTICKET]: update folder document_pipelines after pipelinesa4cf97ab

[NOTICKET][DB] refactor code to new repo7f3bb978

[KM-441] add mean and median9b593342

[NOTICKET] new metadata format for cleaner code6b590d94

update document5a69e0ee

delete duplicate file3848d7b2

edit document for new pipeline425e0210

[NOTICKET]: add CSV and XLSX file type31920c3b

[DB] fix/rename db_pipeline.pyd913315c

[NOTICKET][DB] menyesuaikan format struktur db_pipeline sesuai dengan file laine13a9017

[NOTICKET][DB] pisahin db credential ke folder model. add ingestion endpoint at db_client to use db pipeline. add router db_client di main.347a73aa

[NOTICKET]: use tesseract for extract PDF6b9a13d4

[NOTICKET]: add Tesseract and Poppler binaries via Git LFS0a9101a1

[NOTICKET]: update uv.lockbb79f64b

[NOTICKET][DB] update credential & databaseclient. update settings0e079550

[NOTICKET] update settings65a5c6b1

[KM-437][DB] add mysql, sqlserver, bigquery, snowflake connections43539293

[NOTICKET]: adjusted pyproject.toml for OCR PDFa00e2ad5

[NOTICKET]: fix merge conflict6c873460

[NOTICKET][DB] fix mysql pipeline060c8cc8

[NOTICKET] edit importsb145c06e

[NOTICKET] minor code refactor52415b6a

[NOTICKET] add duplicate check for storing databased310770f

[NOTICKET][DB] add supported dbtype for frontenda531fcc7

[NOTICKET]: add doctypes endpoint & 10MB file size limit9debae56

[NOTICKET]: add comments to flag that file type lists must stay in sync023b7cfe

[NOTICKET]: add to gitignorebbc8c584

Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new7757da18

Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new9c090a04

Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new5398fec4

Merge branch 'dev_new' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new20bf3f8f

[NOTICKET] add total token loggingb9703fc5

[NOTICKET] add updated_at field for metadata & delete old embedding before appendingcb5ab327

[NO TICKET][document]: add updated_at on metadatad2f7a483

[NO TICKET][document]: delete vector embedding on table langchain_pg_embedding if user delete document on knowledgeac3d8c19

[NOTICKET][document]: make a clean output to status error unsupported file type2814813f

[KM-438][KM-439] framework for knowledge retrieverd1e12641

Merge branch 'main' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_newa701ac37

[NOTICKET] fix single source to multiple sources589ca324

[NOTICKET] fixed multiple sourcese9f2a263

[KM-507] add multiple retrieval method to compare (dense, mmr, bm25, hybrid)ac6b78d1

[KM-507] add changes to methods82186504

[NOTICKET] add db_client for queryinge49db601

add to gitignore83ed7447

[KM-507] add different methods, now using dense cosine145bca39

[KM-512] create folder for querying from bd/tabular docs2c8a3e89

[NOTICKET] minor fix in chat.py, add package for query, change schema used to hybrid (cosine+bm25)15cd3a7f

[KM-512] add Pydantic model the LLM fills via function calling in sql_query, and add same signature for db and tabular220f59eb

[NOTICKET] rename file name, updated after uv sync948d6dda

[NOTICKET] update .gitignore240251c4

Merge branch 'dev_new' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond into dev_new29efec67

[KM-513][document] add convert to parquet if type file is XLSX and CSV770f26b1

add to gitignore1fef470b

[KM-512] connect query executor to user question. add logging for db_executorabc494f9

[NOTICKET] fix delete, now can filter by userf273db05

[NOTICKET] db_executor: CTE DML check now walks entire AST root, schema: cast instead of string interpolationbd2b1d9d

[NOTICKET] fix-revert string change110ee343

[KM-520] Integrate db query executor pipeline with existing rag retrieve pipelinea25febe2

[KM-516][KM-517] add new feature; ai can now see table & column names that have fk relationship with retrieved resultf86da27b

[NOTICKET] fix query now use orchestrator msg, rework db pipeline replace ingestion logicbe9bbd9d

[KM-507] now only uses hybrid (cosine and bm25)40925b45

[NOTICKET]untrack software/ folder (ignored via .gitignore)0931c10d

[NOTICKET] add pyarrow432c1fa9

[NOTICKET] add pyarrow7ff66c9b

[KM-515][document] Make Query for Tabular Type (XLSX & CSV)36049948

[KM-455][document] decided methods retrieval for documentcf77d20e

[KM-533] add table level schema, differentiate with chunk level. expand retrieval result with FK explorationfc1239ae

[KM-533] now also retrieves table level chunk4150ba7e

fix: fix dedup logicc9d3b337

[NOTICKET] rrf merge now at router levelde32ab04

[NOTICKET] minor refactoringe4f62b85

fix: query executor now use user question as prompt (sebelumnya pakai hasil orchestrator)0935ede4

fix: increase K in chat endpoint to 10b59ef76f

feat: add sheet-level chunk on CSV/XLSX ingestion8daf9b59

fix: 5 bug fixes on tabular executora49dc1b5

[NOTICKET][doc] fix aggregate count operation when value_col is not specified00aa61d9

[NOTICKET] now retrieve db tables first, then get column from the obtained tables. reduce k to 5bb29492a

[NOTICKET][doc] add guard if filename Noneb7fbaebb

fix fallback to fresh retrieval on corrupted Redis cache9cb950f7

[NOTICKET][doc] validate embedding vector for NaN/Infinity in manhattan retriever2167a5bd

[NOTICKET] pass orchestrator search_query to sql executor for multi-turn context23eeb2d3

[NOTICKET][db] add sheet-level retrieval and focus LLM schema context to retrieved columnsa205d0c5

fix: minor returned type if sql writes limit yang melebihi batasb4df8b1d

[NOTICKET][doc] add sheet-level leg and RRF voting for tabular retrieval5f86993f

[NOTICKET][doc] remove column filter and fallback cap for full-schema approach959b1b00

[NOTICKET][doc] correct metadata key path in _format_context16ab9164

NOTICKET] add dev dependency group and update gitignore36ffff42

make executors self-contained, remove redundant pre-filter73b7fe32

fix sorted ranking so model uses overall sorted retrieved chunks3e7924d5

rhbt6767

DataEyond org 13 days ago

No description provided.

rhbt6767 changed pull request title from test to [KM-438][KM-439] Improve Retrieval and Querying feature 13 days ago

merge dev_new to main9257b7bc

rhbt6767 changed pull request status to open 13 days ago

ishaq101 changed pull request status to merged 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment