hackernews client

neilkale

a year ago

Reminds me of the Physics of LM papers (Part 1 here https://arxiv.org/abs/2305.13673)

rob_c

a year ago

Was hoping for something slightly stronger and can't help but feel put off by a big over sized box on the first page with some quantities which feel more "derived post-facto" in practice.

Maybe very useful, but still feels more qualitative than quantitative, or maybe I'm missing something (wouldn't shock me)...

AIorNot

a year ago

Anyone with a physics and a ML/DL background analyze this? Any insights

Just as a preface: This paper is an extremely basic derivation that totally ignores all current architectures and training algorithms. If someone had actually done this for a realistic, modern model, it would be amazing - but that is extremely challenging mathematically and I don't think anyone has come close to cracking that.

So their main proposal makes no assumptions about the model and essentially says that any model "absorbs" information equal to the difference between the training data set's entropy and the final cross entropy loss after training. They state that this information difference must have gone somewhere and thereby it must have been encoded in the model. They argue about this in terms of transmission, but it seems vague enough to be generally applicable. So nothing too crazy here.

But: This feels pretty similar to MOND research in physics, in the sense that it would have been super cool if someone had predicted this before we knew all those laws from empirics. But since it came out post factum, it leaves the bitter taste of someone just trying to mold their world view to the available data.

Physics in Next-Token Prediction

4 Comments

neilkale

rob_c

AIorNot

sigmoid10