Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs don't "read" text sequentially, right?


The causal masking means future tokens don’t affect previous tokens embeddings as they evolve throughout the model, but all tokens a processed in parallel… so, yes and no. See this previous HN post (https://news.ycombinator.com/item?id=45644328) about how bidirectional encoders are similar to diffusion’s non-linear way of generating text. Vision transformers use bidirectional encoding b/c of the non-causal nature of image pixels.


Didn’t anthropic show that the models engage in a form of planning such that it is predicting a possible future subsequent tokens that then affects prediction of the next token: https://transformer-circuits.pub/2025/attribution-graphs/bio...


Sure, an LLM can start "preparing" for token N+4 at token N. But that doesn't change that the token N can't "see" N+1.

Causality is enforced in LLMs - past tokens can affect future tokens, but not the other way around.


If the attention is masked, then yes they do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: