MarkusQ
5 hours ago
LLMs include mechanisms (notably, attention) that allow longer-distance correlations than you could get with a similarly-sized Markov chain. If you squint hard enough though, they are Markov chains with this "one weird trick" that makes them much more effective for their size.