Is it, though? Apparently the current best practice is just to allow the LLM unt...

kgeist · 2025-11-23T17:22:32 1763918552

Well it took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate. Had to:

- write an evaluation pipeline to automate quality testing

- add a query rewriting step to explore more options during search

- add hybrid BM-25+vector search with proper rank fusion

- tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)

- parallelize the search pipeline to decrease wait times

- add moderation

- add a reranker to find best candidates

- add background embedding calculation of user documents

- lots of failure cases to iron out so that the prompt worked for most cases

There's no "just give LLM all the data", it's more complex than that, especially if you want best results and also full control of data (we run all of that using open source models because user data is under NDA)

saberience · 2025-11-23T18:03:20 1763921000

Sounds like you vibe coded a RAG system in two weeks, which isn't very hard. Any startup can do it.

I've debugged single difficult bugs before for two weeks, a whole feature that takes two weeks is an easy feature to build.

kgeist · 2025-11-23T18:20:20 1763922020

I already had experience with RAG before so I had a head start. You're right that it's not rocket science, but it's not just "press F to implement the feature" either

P.S. No vibe coding was used. I only used LLM-as-a-judge to automate quality testing when tuning the parameters, before passing it to human QA

mbesto · 2025-11-23T17:39:34 1763919574

"did not hallucinate"

Sorry to nitpick, but this is not technically possible no matter how much RAG you throw at it. I assume you just mean "hallucinates a lot less"

kgeist · 2025-11-23T18:14:57 1763921697

You're right, bad wording

altcognito · 2025-11-23T17:36:35 1763919395

whoa, two weeks

rynn · 2025-11-23T17:42:16 1763919736

@apwell23 while the author didn’t say how s/he measured QA, creating the QA process was literally the first bullet.

mettamage · 2025-11-23T17:14:30 1763918070

You still need to find the correct data, and get it to the LLM. IMO, a lot of it is data engineering work with API calls to an LLM as an extra step. I'm currently doing a lot of ETL work with Airflow (and whatever data {warehouses, lakes, bases} are needed) to get the right data to a prompt engineering flow. The prompt engineering flow is literally a for loop of Google Docs in a Google Drive that non-tech people, but domain experts in their field, can access.

It's up to the domain experts and me to understand where giving it data will tone down the hallucinative nonsense an LLM puts out, and where we should not give data because we need the problem solving skills of the LLM itself. A similar process is for tool-use, which in our case are pre-selected Python scripts that it is allowed to run.

apwell23 · 2025-11-23T17:40:19 1763919619

can you describe what the usecase is ?