At the base levels LLMs aren't actually deterministic because the model weights are typically floats of limited precision. At a large enough scale (enough parameters, model size, etc) you will run into rounding issues that effectively behave randomly and alter output.
Even with temperature of zero floating point rounding, probability ties, MoE routing, and other factors make outputs not fully deterministic even between multiple runs with identical contexts/prompts.
In theory you could construct a fully deterministic LLM but I don't think any are deployed in practice. Because there's so many places where behavior is effectively non-deterministic the system itself can't be thought of as deterministic.
Errors might be completely innocuous like one token substituted for another with the same semantic meaning. An error might also completely change the semantic meaning of the output with only a single token change like an "un-" prefix added to a word.
The non-determinism is both technically and practically true in practice.
Most floating point implementations have deterministic rounding. The popular LLM inference engine llama.cpp is deterministic when using the same sampler seed, hardware, and cache configuration.
Even with temperature of zero floating point rounding, probability ties, MoE routing, and other factors make outputs not fully deterministic even between multiple runs with identical contexts/prompts.
In theory you could construct a fully deterministic LLM but I don't think any are deployed in practice. Because there's so many places where behavior is effectively non-deterministic the system itself can't be thought of as deterministic.
Errors might be completely innocuous like one token substituted for another with the same semantic meaning. An error might also completely change the semantic meaning of the output with only a single token change like an "un-" prefix added to a word.
The non-determinism is both technically and practically true in practice.