This is a solved problem. The answer is to add extra relevant information to the...

hollerith · 2025-10-04T23:32:52 1759620772

Interesting, thanks.

th0ma5 · 2025-10-05T02:54:22 1759632862

[flagged]

simonw · 2025-10-05T12:01:58 1759665718

Three years of researching and writing about this and related topics.

th0ma5 · 2025-10-05T16:12:24 1759680744

[flagged]

simonw · 2025-10-05T17:30:32 1759685432

I never said I invented them (although, amusingly, for a few brief hours back in January 2023 I thought I had - https://simonwillison.net/2023/Jan/13/semantic-search-answer...)

What makes you say this isn't a solved problem?

th0ma5 · 2025-10-05T21:04:13 1759698253

The fact that adding in information to support a query at inference time has no guarantee to be limited by that information. Besides my 15 years of natural language processing experience, my first hand experience is non deterministic output using that technique because the math simply requires it to be. I know we only trade in anecdotes because the studies, perhaps considered more rigorous collections of anecdotes, like the BBC studies on summaries show that they cannot summarize only synthesize, so... I guess I mean to say that while I can't rule it out that some day it may be possible, or even that it works for some people in some contexts most of the time, there is no study to point to that shows anything resembling a "solution." Perhaps you meant something like "there is a prevailing technique" not that it is a solved problem.

simonw · 2025-10-05T21:53:15 1759701195

Maybe "solved problem" is an overly strong statement here. I was responding to:

> Are there any effective ways to add extra knowledge to an LLM, ways that are more than just demos or proofs of concept?

Adding to the context is certainly "effective" and more than just a proof-of-concept/demo. There are many production systems out there now using context-filling tools, most notably GPT-5 with search itself.

I do think it's only recently (this year) that models got reliable enough at this to be useful. For me it was o3 that first appeared strong enough at selecting and executing search tools for this trick to really start to shine.

th0ma5 · 2025-10-06T21:11:29 1759785089

Since this is an anecdote, it is unfalsifiable, and can't support these softened claims, either. The endless possibilities of incorrect contexts inferred from simply incomplete or adjacent contexts would prevent the ability to manage information quantity versus quality. I want to continue to provide my own unfalsifiable anecdote though, that all of this is a way to just name a new rug to sweep the problems under, and feels to me the kind of problem that if we knew how to solve we wouldn't use these models at all.