One question, if anyone knows the details: does this prove that there exists a single LLM that can approximate any function to arbitrary precision given enough CoT, or does it prove that for every function, there exists a Transformer that fits those criteria?
That is, does this prove that a single LLM can solve any problem, or that for any problem, we can find an LLM that solves it?
If it's possible to find an LLM for any given problem, then find an LLM for the problem "find an LLM for the problem and then evaluate it" and then evaluate it, and then you have an LLM that can solve any problem.
It's the "Universal Turing Machine" for LLMs.
I wonder what's the LLM equivalent of the halting problem?
A closer analogy is the Hutter Search (http://hutter1.net/ai/pfastprg.pdf), as it is also an algorithm that can solve any problem. And it is probably too inefficient to use in practice, like the Hutter Search.
That is, does this prove that a single LLM can solve any problem, or that for any problem, we can find an LLM that solves it?