Physics in Next-Token Prediction

22 pointsposted 15 hours ago
by Anon84

4 Comments

AIorNot

8 hours ago

Anyone with a physics and a ML/DL background analyze this? Any insights

sigmoid10

7 hours ago

Just as a preface: This paper is an extremely basic derivation that totally ignores all current architectures and training algorithms. If someone had actually done this for a realistic, modern model, it would be amazing - but that is extremely challenging mathematically and I don't think anyone has come close to cracking that.

So their main proposal makes no assumptions about the model and essentially says that any model "absorbs" information equal to the difference between the training data set's entropy and the final cross entropy loss after training. They state that this information difference must have gone somewhere and thereby it must have been encoded in the model. They argue about this in terms of transmission, but it seems vague enough to be generally applicable. So nothing too crazy here.

But: This feels pretty similar to MOND research in physics, in the sense that it would have been super cool if someone had predicted this before we knew all those laws from empirics. But since it came out post factum, it leaves the bitter taste of someone just trying to mold their world view to the available data.

rob_c

9 hours ago

Was hoping for something slightly stronger and can't help but feel put off by a big over sized box on the first page with some quantities which feel more "derived post-facto" in practice.

Maybe very useful, but still feels more qualitative than quantitative, or maybe I'm missing something (wouldn't shock me)...