srean
8 hours ago
In the discussion thread there seems to be a somewhat contentious argument about what is a Markov Model, whether LLMs are one, RNNs are one and so on.
Markov Models are anything that has state and emit tokens based only on its current state and undergoes a state transition. The token emission and state transitions are usually probabilistic -- a statistical/probabilistic analogue of a state machine. The deterministic state machine is a special case where the transition probabilities are degenerate (concentrated at an unique point).
For a Markov Model to be non-vacuous, non-vapid discussion point, however, one needs to specify very precisely the relationships allowed between state and tokens/observations, whether it's hidden or visible, discrete or continuous, fixed context length or variable context length, causal or non causal ...
The simplest such model is one where the state is a specified, computable function of the last k observations. One such simple function is the identity function -- the state then is the last k tokens. This is called a k order Markov Chain and is a restriction of the bigger class -- Markov Models.
One can make the state a specified, computable function of (k) previous states and k most recent tokens/observations. (Equivalently RNNs)
The functions may be specified only upto a class of computable functions, finite or infinite in size. They may be stochastic in the sense they define only the state transition probabilities.
You can make the context length a computable function of the k most recent observations (therefore they can be of varying length), but you have to ensure that the contexts are always full for this model to be well defined.
Context length can be a computable function of both the (el) most recent states and k most recent observations.
Crazy ones emit more than one token based on current state.
On and on.
Not all Markov Models are learnable.