Even with prompt caching this adds a huge extra time to your vector database cre...

Even with prompt caching this adds a huge extra time to your vector database create/update, right? That may be okay for some use cases but I’m always wary of adding multiple LLM layers into these kinds of applications. It’s nice for the cloud LLM providers of course.

I wonder how it would work if you generated the contexts yourself algorithmically. Depending on how well structured your docs are this could be quite trivial (eg for an html doc insert the title > h1 > h2 > chunk).