From tokens to thoughts: How LLMs and humans trade compression for meaning

124 pointsposted 8 months ago
by ggirelli

3 Comments

valine

8 months ago

>> For each LLM, we extract static, token-level embeddings from its input embedding layer (the ‘E‘matrix). This choice aligns our analysis with the context-free nature of stimuli typical in human categorization experiments, ensuring a comparable representational basis.

They're analyzing input embedding models, not LLMs. I'm not sure how the authors justify making claims about the inner workings of LLMs when they haven't actually computed a forward pass. The EMatrix is not an LLM, its a lookup table.

Just to highlight the ridiculousness of this research, no attention was computed! Not a single dot product between keys and queries. All of their conclusions are drawn from the output of an embedding lookup table.

The figure showing their alignment score correlated with model size is particularly egregious. Model size is meaningless when you never activate any model parameters. If Bert is outperforming Qwen and Gemma something is wrong with your methodology.

johnnyApplePRNG

8 months ago

This paper is interesting, but ultimately it's just restating that LLMs are statistical tools and not cognitive systems. The information-theoretic framing doesn’t really change that.

andoando

8 months ago

Am I the only one that is lost on how the calculations are made?

From what I can tell this is limited in scope to categorizing nouns (robin is a bird).