Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I always like to think LLMs are markov models in the way that real-world computers are finite state machines. It's technically true, but not a useful abstraction at which to analyze them.

Both LLMs and n-gram models satisfy the markov property, and you could in principle go through and compute explicit transition matrices (something on the size of vocab_size*context_size I think). But LLMs aren't trained as n-gram models, so besides giving you autoregressive-ness, there's not really much you can learn by viewing it as a markov model



> Both LLMs and n-gram models satisfy the markov property, and you could in principle go through and compute explicit transition matrices (something on the size of vocab_size*context_size I think).

Isn’t it actually (vocab_size)^(context_size)?


Yes, you're right. I typed "**" (exponentiation) but HN ate the second star since I forgot to escape.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: