This sounds an awful lot like the old Markov chains we used to write for fun in ...

alabastervlog · 2025-04-28T13:04:58 1745845498

They're Markov chain generators with weighting that looks many tokens back and assigns, based on a training corpus, higher weight ("attention") to tokens that are more likely to significantly influence the probability of later tokens ("evolutionary" might get greater weight than "the", for instance, though to be clear tokens aren't necessarily the same as words), then smears those various weights together before rolling its newly-weighted dice to come up with the next token.

Throw in some noise-reduction that disregards too-low probabilities, and that's basically it.

This dials down the usual chaos of Markov chains, and makes their output far more convincing.

Yes, that's really what all this fuss is about. Very fancy Markov chains.

AlexCoventry · 2025-04-28T01:45:37 1745804737

You can think of an autoregressive LLM as a Markov chain, sure. It's just sampling from a much more sophisticated distribution than the ones you wrote for fun did. That by itself is not much of an argument against LLMs, though.