Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My advice - use same rigor as other software development for a RAG application. Have a test suite (of say 100 cases) which says for this question correct response is this. Use an LLM judge to score each of the outputs of the RAG system. Now iterate till you get a score of 85 or so. And every change of prompts and strategy triggers this check, and ensures that output of 85 is always maintained.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: