Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Super noob in vector embeddings: I never considered that tables would be a complexifier. (beyond defining in a parseable format for ingestion).

Do vector databases do better with long grouped text vs table formats?





The issue is the ingestion (extracting the right data in the right format). This is mainly an issue in PDFs and sometimes when there are tables added as images in Docx too. You need a mix of text and OCR extraction to get the data correctly first before start chunking and adding embeddings



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: