I feel the "let's think about it step by step" is a bit of a hack. To circumvent the fact that there's no external loop you use the fact that it gets re-run on every token so you can store a bit of state in the tokens that it's already generated.
Or am I misunderstanding something about that technique?
You are right, it’s sometimes called zero shot chain of thought, but it’s a way of getting the type of thing you are describing to happen. The LLMs somehow process things in a perceived step by step to get a much improved answer. Whether the external loop or an llm imposed internal loop, does it matter? Are our own minds looping or just adding tokens?
Yeah that's true, and I do believe there's a good chance our minds are perpetually adding tokens. But our minds also have an efficient/effective way of dealing with the context cut off. We don't have (or we don't experience) a hard cut off of our memory context. Instead the tokens are increasingly lossily compressed as they age out of our memory, the lossiness amount being based both on time passed but also on some fancy value function. And that combined with a "system" (or trained/fine-tuned in) prompt that motivates the LLM to reason in a way that is conducive to working with that kind of memory would be a sort of single-shot AGI system. Where single-shot is lying a bit because it's just infinitely looping.
I guess from that perspective it might make sense to test if such a thing is already happening in current LLM's and my dismissive attitude stems from the fact that I've played with them enough to know that they currently don't.
The main thing that may matter, at least in the short run, is that an external loop allows us to inject additional steps by applying heuristics and allowing tool use. E.g. we can let the LLM "realise" there errors in its code and have it continue from an injected "thought" about making sure to fix the errors from the compiler before presenting it's output, or "remembering" that it needs test cases etc.
We can also potentially add longer term memory - summarise the context, and judge which parts are important and stuff them in a vector store, and now and again swap in similar pieces of past context.
But of course it's not either or - better prompting to get the LLMs to do better from the start doesn't compete with then feeding that into an external loop as well.
Or am I misunderstanding something about that technique?