How does that make sense? LLM's are machines that produce output from input, the...

How does that make sense? LLM's are machines that produce output from input, the position and distribution of that input in the latent space is highly predictive of the output. It seems fairly uncontroversial to expect some knowledge of the tokens and their individual contribution to that distribution in combination with the others, some intuition of the multivariate nonlinear behavior of the hidden layers, is exactly what would let you utilize this machine for anything useful.

Regular people type all sorts of shit into google, but power users know how to query google effectively to work with the system. Knowing the right keywords is often half the work. I don't understand how the architecture of current LLMs are going to work around that feature.