The general pattern seems to be that LLM+scaffolding performs better than LLM. I...

The general pattern seems to be that LLM+scaffolding performs better than LLM. In 6 months time a new model will incorporate 80% of your scaffolding, but also will enable new capabilities with a new layer of scaffolding.

I suspect the model that doesn’t need scaffolding is simply ASI, as in, the AI can build its own scaffolding (aka recursive self-improvement), and build it better than a human can. Until that point, the job is going to remain figuring out how to eval your frontier task, scaffold the models’ weaknesses, and codify/absorb more domain knowledge that’s not in the training set.

You are talking about context management stuff here, the solution will be something like a proper memory subsystem, maybe some architectural tweaks to integrate it. There are more obvious gaps beyond that which we will have to scaffold and then solve in turn.

Another way of thinking about this is just that scaffolding is a much faster way of iterating on solutions than pre-training, or even post-training, and so it will continue to be a valuable way of advancing capabilities.