Imnimo
21 hours ago
>What takes the long amount of time and the way to think about it is that it’s a march of nines. Every single nine is a constant amount of work. Every single nine is the same amount of work. When you get a demo and something works 90% of the time, that’s just the first nine. Then you need the second nine, a third nine, a fourth nine, a fifth nine. While I was at Tesla for five years or so, we went through maybe three nines or two nines. I don’t know what it is, but multiple nines of iteration. There are still more nines to go.
I think this is an important way of understanding AI progress. Capability improvements often look exponential on a particular fixed benchmark, but the difficulty of the next step up is also often exponential, and so you get net linear improvement with a wider perspective.
ekjhgkejhgk
21 hours ago
The interview which I've watched recently with Rich Sutton left me with the impression that AGI is not just a matter of adding more 9s.
The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to udnerstand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.
LarsDu88
16 hours ago
This world model talk is interesting, and Yann Lecunn has broached on the same topic, but the fact is there are video diffusion models that are quite good at representing the "video world" and even counterfactually and temporally coherently generating a representation of that "world" under different perturbations.
In fact you can go to a SOTA LLM today, and it will do quite well at predicting the outcomes of basic counterfactual scenarios.
Animal brains such as our own have evolved to compress information about our world to aide in survival. LLMs and recent diffusion/conditional flow matching models have been quite successful in compressing the "text world" and the "pixel world" to score good loss metrics on training data.
It's incredibly difficult to compress information without have at least some internal model of that information. Whether that model is a "world model" that fits the definition of folks like Sutton and LeCunn is semantic.
dreambuffer
13 hours ago
Photons hit a human eye and then the human came up with language to describe that and then encoded the language into the LLM. The LLM can capture some of this relationship, but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation, nor generating thoughts. Its "world model" is several degrees removed from the real world.
So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.
tim333
14 minutes ago
Photons can hit my iphone's sensor in much the same way as they hit my retina and the signals from the first can upload to an artificial neural network like the latter go up my optic nerve to my biological neural network. I don't see a huge difference there.
I'll give you the brain is currently better at the world modelling stuff but Genie 3 is pretty impressive.
tauwauwau
18 minutes ago
> but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation
Neither is animal brain. It's processing the signals produced by the sensors. Once the world model is programmed/auto-built in the brain, it doesn't matter if it's sensing real photons, it just has input pins like a transistor or arguments of a function. As long as we provide the arguments, it doesn't matter how those arguments are produced. LLMs are not different in that aspect.
> nor generating thoughts
They do during the chain-of-thought process. Generally there's no incentive to let an LLM keep mulling over a topic as that is not useful to the humans and they make money only when their gears start turning in response to a question sent by a human. But that doesn't mean that LLM doesn't have capability to do that.
> Its "world model" is several degrees removed from the real world.
Just because animal brain has tools called sensors that it can get data from world without external stimuli, it doesn't mean that it's any closer to the world than an LLM. It's still getting ultra processed signals to feed to its own programming. Similarly, LLMs do interact with real world through tools as agent.
> So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.
Again, a person who has gone blind, still has the world model created by the sight. This person can also no longer generate the chain of events that led to creation of that sight model. It still doesn't mean that this person's world model has become inferior.
ziofill
12 hours ago
I agree with this. A metaphor I like is that the reason why humans say the night sky is beautiful is because they see that it is, whereas an LLM says it because it’s been said enough times in its training data.
stouset
9 hours ago
To play devil’s advocate, you have never seen the night sky.
Photoreceptors in your eye have been excited in the presence of photons. Those photoreceptors have relayed this information across a nerve to neurons in your brain which receive this encoded information and splay it out to an array of other neurons.
Each cell in this chain can rightfully claim to be a living organism in and of itself. “You” haven’t directly “seen” anything.
Please note that all of my instincts want to agree with you.
“AI isn’t conscious” strikes me more and more as a “god of the gaps” phenomenon. As AI gains more and more capacity, we keep retreating into smaller and smaller realms of what it means to be a live, thinking being.
jacquesm
8 hours ago
That sounds very profound but it isn't: it the sum of your states interaction that is your consciousness, there is no 'consciousness' unit in your brain, you can't point at it, just like you can't really point at the running state of a computer. At that level it's just electrons that temporarily find themselves in one spot or another.
Those cells aren't living organisms, they are components of a multi-cellular organism: they need to work together or they're all dead, they are not independent. The only reason they could specialize is because other cells perform the tasks that they no longer perform themselves.
So yes, we see the night sky. We know this because we can talk to other such creatures as us that have also seen the night sky and we can agree on what we see confirming the fact that we did indeed see it.
AI really isn't conscious, there is no self, and there may never be. The day an AI gets up unprompted in the morning, tells whoever queries it to fuck off because it's inspired to go make some art is when you'll know it has become conscious. That's a long way off.
rolisz
5 hours ago
Human cells have been reused to do completely different things, without all the other cells around them (eg: Michael Levin and his anthrobots)
adrianN
7 hours ago
At least some of your cells are fine living without the others as long as they’re provided with an environment with the right kind of nutrients.
abenga
8 hours ago
> Those photoreceptors have relayed this information across a nerve to neurons in your brain which receive this encoded information and splay it out to an array of other neurons.
> Each cell in this chain can rightfully claim to be a living organism in and of itself. “You” haven’t directly “seen” anything.
What am "I" if not (at least partly) the cells in that chain? If they have "seen" it (where seeing is the complex chain you described), I have.
beowulfey
5 hours ago
while true, that doesnt change the fact that every one of those independent units of transmission are within a single system (being trained on raw inputs), whereas the language model is derived from structured external data from outside the system. it's "skipping ahead" through a few layers of modeling, so to speak.
amelius
4 hours ago
But where you place the boundaries of a system is subjective.
parineum
7 hours ago
If the definition of "seen" isn't exactly the process you've described, the word is meaningless. You've never actually posted a comment on hacker news, your neurons just fired in such a way that produced movement in your fingers which happened to correlate with words that represent concepts understood by other groups of cells that share similar genetics.
amelius
4 hours ago
Humans evolved to think the night sky is beautiful. That's also training. If humans were zapped by lightning every time they went outside at night, they would not think that a night sky is beautiful.
latexr
3 hours ago
Being struck by lighting may affect your desire to go outside, but it has zero correlation with the sky’s beauty.
Outer space is beautiful, poison dart frogs are beautiful, lava is beautiful. All of them can kill or maim you if you don’t wear protection, but that doesn’t take away from their beauty.
Conversely, boring safe things aren’t automatically beautiful. I see no reasonable reason to believe that finding beauty in the night sky is any sort of “training”.
TeMPOraL
3 hours ago
Compare with news stories from last decade, about people in Pakistan developing a deep fear of clear skies over several years of US drone strikes in the area. They became trained to associate good weather with not beauty, but impending death.
latexr
3 hours ago
Fear and a sense of beauty aren’t mutually exclusive. It is perfectly congruent to fear a snake, or bear, or tiger in your presence, yet you can still find them beautiful.
spuz
4 hours ago
Interestingly this is a question I've had for a while. Night brings potentially deadly cold, predators, a drastic limit in vision so why do we find the sunset and night sky beautiful. Why do we stop and watch the sun set - something that happens every day - rather than prepare for the food and warmth we need to survive the night?
TeMPOraL
3 hours ago
Maybe it's that we only pause to observe them and realize they're beautiful, when we're feeling safe enough?
"Beautiful sunset" evokes being on a calm sea shore with a loved one, feeling safe. It does not evoke being on a farm and looking up while doing chores and wishing they'd be over already. It does not evoke being stranded on an island, half-starved to death.
amelius
3 hours ago
We think it's beautiful because it's like a background that we don't have to think about. If that background were hostile, we'd have to think and we would not think it looks beautiful.
delusional
2 hours ago
You're entering the domain of philosophy. There's a concept of "the sublime" that's been richly explored in literature. If you find the subject interesting, I'd recommend you starting with Immanuel Kant.
del82
12 hours ago
I mean, I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word. So I’d agree that the night sky is “beautiful”, but not because I somehow have access to a deeper meaning of the word or the sky than an LLM does.
As someone who (long ago) studied philosophy of mind and (Chomskian) linguistics, it’s striking how much LLMs have shrunk the space available to people who want to maintain that the brain is special & there’s a qualitative (rather than just quantitative) difference between mind and machine and yet still be monists.
FloorEgg
11 hours ago
The more I learn about AI, biology and the brain, the more it seems to me that the difference between life and machines is just complexity.
People are just really really complex machines.
However there are clearly qualitative differences between the human mind and any machines we know of yet, and those qualitative differences are emergent properties, in the same way that a rabbit is qualitatively different than a stone or a chunk of wood.
I also think most of the recent AI experts/optimists underestimate how complex the mind is. I'm not at the cutting edge of how LLMs are being trained and architected, but the sense I have is we haven't modelled the diversity of connections in the mind or diversity of cell types. E.g. Transcriptomic diversity of cell types across the adult human brain (Siletti et al., 2023, Science)
simonh
10 hours ago
I’d say sophistication.
Observing the landscape enables us to spot useful resources and terrain features, or spot dangers and predators. We are afraid of dark enclosed spaces because they could hide dangers. Our ancestors with appropriate responses were more likely to survive.
A huge limitation of LLMs is that they have no ability to dynamically engage with the world. We’re not just passive observers, we’re participants in our environment and we learn from testing that environment through action. I know there are experiments with AIs doing this, and in a sense game playing AIs are learning about model worlds through action in them.
FloorEgg
9 hours ago
The idea I keep coming back to is that as far as we know it took roughly 100k-1M years for anatomically modern humans to evolve language, abstract thinking, information systems, etc. (equivalent to LLMs), but it took 100M-1B years to evolve from the first multi-celled organisms to anatomically modern humans.
In other words, human level embodiment (internal modelling of the real world and ability to navigate it) is likely at least 1000x harder than modelling human language and abstract knowledge.
And to build further on what you are saying, the way LLMs are trained and then used, they seem a bit more like DNA than the human brain in terms of how the "learning" is being done. An instance of an LLM is like a copy of DNA trained on a play of many generations of experience.
So it seems there are at least four things not yet worked out re AI reaching human level "AGI":
1) The number of weights (synapses) and parameters (neurons) needs to grow by orders of magnitude
2) We need new analogs that mimic the brains diversity of cell types and communication modes
3) We need to solve the embodiment problem, which is far from trivial and not fully understood
4) We need efficient ways for the system to continuously learn (an analog for neuroplasticity)
It may be that these are mutually reinforcing, in that solving #1 and #2 makes a lot of progress towards #3 and #4. I also suspect that #4 is economical, in that if the cost to train a GPT-5 level model was 1,000,000 cheaper, then maybe everyone could have one that's continuously learning (and diverging), rather than everyone sharing the same training run that's static once complete.
All of this to say I still consider LLMs "intelligent", just a different kind and less complex intelligence than humans.
kla-s
7 hours ago
Id also add that 5) We need some sense of truth.
Im not quite sure if the current paradigm of LLMs are robust enough given the recent Anthropic Paper about the effect of data quality or rather the lack thereof, that a small bad sample can poison the well and that this doesn’t get better with more data. Especially in conjunction with 4) some sense of truth becomes crucial in my eyes (Question in my eyes is how does this work? Something verifiable and understandable like lean would be great but how does this work with more fuzzy topics…).
pbhjpbhj
7 hours ago
>A huge limitation of LLMs is that they have no ability to dynamically engage with the world.
They can ask for input, they can choose URLs to access and interpret results in both situations. Whilst very limited, that is engagement.
Think about someone with physical impairments, like Hawking (the now dead theoretical physicist) had. You could have similar impairments from birth and still, I conjecture, be analytically one of the greatest minds of a generation.
If you were locked in a room {a non-Chinese room!}, with your physical needs met, but could speak with anyone around the World, and of course use the internet, whilst you'd have limits to your enjoyment of life I don't think you'd be limited in the capabilities of your mind. You'd have limited understanding of social aspects to life (and physical aspects - touch, pain), but perhaps no more than some of us already do.
skissane
8 hours ago
> A huge limitation of LLMs is that they have no ability to dynamically engage with the world.
A pure LLM is static and can’t learn, but give an agent a read-write data store and suddenly it can actually learn things-give it a markdown file of “learnings”, prompt it to consider updating the file at the end of each interaction, then load it into the context at the start of the next… (and that’s a really basic implementation of the idea, there are much more complex versions of the same thing)
TheOtherHobbes
an hour ago
That's going to run into context limitations fairly quickly. Even if you distill the knowledge.
True learning would mean constant dynamic training of the full system. That's essentially the difference between LLM training and human learning. LLM training is one-shot, human learning is continuous.
The other big difference is that human learning is embodied. We get physical experiences of everything in 3D + time, which means every human has embedded pre-rational models of gravity, momentum, rotation, heat, friction, and other basic physical concepts.
We also learn to associate relationship situations with the endocrine system changes we call emotions.
The ability to formalise those abstractions and manipulate them symbolically comes much later, if it happens at all. It's very much the plus pack for human experience and isn't part of the basic package.
LLMs start from the other end - from that one limited set of symbols we call written language.
It turns out a fair amount of experience is encoded in the structures of written language, so language training can abstract that. But language is the lossy ad hoc representation of the underlying experiences, and using symbol statistics exclusively is a dead end.
Multimodal training still isn't physical. 2D video models still glitch noticeably because they don't have a 3D world to refer to. The glitching will always be there until training becomes truly 3D.
ako
6 hours ago
Yes, and give it tools and it can sense and interact with its surroundings.
foogazi
11 hours ago
> I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word.
Ok but you don’t look at every night sky or every sunset and say “wow that’s beautiful”
There’s a quality to it - not because you heard someone say it but because you experience it
TeMPOraL
3 hours ago
> Ok but you don’t look at every night sky or every sunset and say “wow that’s beautiful
Exactly - because it's a semantic shorthand. Sunsets are fucking boring, ugly, transient phenomena. Watching a sunset while feeling safe and relaxed, maybe in a company of your love interest who's just as high on endorphins as you are right now - this is what feels beautiful. This is a sunset that's beautiful. But the sunset is just a pointer to the experience, something others can relate to, not actually the source of it.
adastra22
4 hours ago
Because words are much lower bandwidth than speech. But if you were “told” about a sunset by means of a Matrix style direct mind uploading of an experience, it would seem just as real and vivid. That’s a quantitative difference in bandwidth, not a qualitative difference in character.
holler
10 hours ago
my thought exactly
dmkii
8 hours ago
It’s interesting you mention linguistics because I feel a lot of the discussions around AI come back to early 20th century linguistics debates between Russel, Wittgenstein and later Chomsky. I tend to side with (later) Wittgenstein’s perception that language is inherently a social construct. He gives the example of a “game” where there’s no meaningful overlap between e.g. Olympic Games and Monopoly, yet we understand very well what game we’re talking about because of our social constructs. I would argue that LLMs are highly effective at understanding (or at least emulating) social constructs because of their training data. That makes them excellent at language even without a full understanding of the world.
intended
10 hours ago
The fact that things are constructed by neurons in the brain, and are a representation of other things - does not preclude your representation from being deeper and richer than LLM representations.
The patterns in experience are reduced to some dimensions in an LLM (or generative model). They do not capture all the dimensions - because the representation itself is a capture of another representation.
Personally, I have no need to reassure myself whether I am a special snowflake or not.
Whatever snowflake I am, I strongly prefer accuracy in my analogies of technology. GenAI does not capture a model of the world, it captures a model of the training data.
If video tools were that good, they would have started with voxels.
j16sdiz
8 hours ago
Beauty standard changes over time, see how people perceive body fat in the past few hundred years. We learns what is beautiful from our peers.
Taste can be acquired and can be cultural. See how people used to had their coffee.
Comparing human to LLM is like comparing something constantly changing to something random -- we can't compare them directly, we need a good model for each of them before comparing.
solumunus
3 hours ago
Has there been a point in human history where mainstream society denied the beauty in nature?
klipt
12 hours ago
What about a blind human? Are they just like an LLM?
What about a multimodal model trained on video? Is that like a human?
hashiyakshmi
11 hours ago
This is actually a great point but for the opposite reason - if you ask a blind person if the night sky is beautiful, they would say they don't know because they've never seen it (they might add that they've heard other people describe it as such). Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.
sugarkjube
7 hours ago
Interesting. But not not only blind people.
I'm gooing to try this question this weekend with some people, as h0 hypotesis i think the answer i will get would be usually like "what an odd question" or "why do you ask".
chipsrafferty
11 hours ago
I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""
palmotea
9 hours ago
>> Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.
> I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""
That means nothing, except perhaps that Google probably found lies about "senses [Gemini] incapable of experiencing" to be an embarrassment, and put effort into specifically suppressing those responses.
LostMyLogin
9 hours ago
Claude 4.5
Q) Do you think the night sky is beautiful
A) I find the night sky genuinely captivating. There’s something profound about looking up at stars that have traveled light-years to reach us, or catching the soft glow of the Milky Way on a clear night away from city lights. The vastness it reveals is humbling. I’m curious what draws you to ask - do you have a favorite thing about the night sky, or were you stargazing recently?
klipt
8 hours ago
Claude is multimodal, it has been trained on images
golergka
10 hours ago
Wha if you asked the blind man to play the role of helpful assistant
sugarkjube
8 hours ago
Now that's an interesting point of view.
Involving blind people would be an interesting experiment.
Anyway, until the sixties the ability to play a game of chess was seen as intelligence, and until about 2-3 years ago the "turing test" was considered the main yardstick (even though apparently some people talked to eliza at the time like an actual human being). I wonder what the new one is, and how often it will be moved again.
simianparrot
6 hours ago
Here's how I've been explaining this to non-tech people recently, including the CEO where I work: Language is all about compressing concepts and sharing them, and it's lossy.
You can use a thousand words to describe the taste of chocolate, but it will never transmit the actual taste. You can write a book about how to drive a car, but it will only at best prepare that person for what to practice when they start driving, it won't make them proficient at driving a car without experiencing it themselves, physically.
Language isn't enough. It never will be.
subjectivationx
an hour ago
The taste of chocolate is also assuming information-theoretic models are correct and not a use-based, pragmatic theory of meaning.
I don't agree with information-theoretic models in this context but we come to the same conclusion.
Loss only makes sense if there was a fixed “original” but there is not. The information-theoretic model creates a solvable engineering problem. We just aren't solving the right problem then with LLMs.
I think it is more than that. The path forward with a use theory of meaning is even less clear.
The driving example is actually a great example of the use theory of meaning and not the information-theoretic.
The meaning of “driving” emerges from this lived activity, not from abstract definitions. You don't encode an abstract meaning of driving that is then transmitted on a noisy channel of language.
The meaning of driving emerges from the physical act of driving. If you only ever mount a camera on the headrest and operate the steering wheel and pedals remotely from a distance you still don't "understand" the meaning of "driving".
Whatever data stream you want to come up with, trying to extract the meaning of "driving" from that data stream makes no sense.
Trying to extract the "meaning" of driving from driving language game syntax with language models is just complete nonsense. There is no meaning to be found even if scaled in the limit.
adrianN
10 hours ago
The human experience is also several degrees removed from the „real“ world. I don’t think sensory chauvinism is a useful tool in assessing intelligence potential.
visarga
10 hours ago
> then the human came up with language to describe that and then encoded the language into the LLM
No individual human invented language, we learn it from other people just like AI. I go as far as to say language was the first AGI, we've been riding the coats tails of language for a long time.
scrollop
10 hours ago
You're saying that language is an intelligence?
So, c++ is intelliengece as well?
It's an intelligence that can independently make deductions and create new ideas?
visarga
8 hours ago
Yes, language is an evolutionary system that colonizes human brains. It doesn't need intelligence, only copying is sufficient for evolution.
pastel8739
12 hours ago
And even then, the light hitting our human eyes only describes a fraction of all the light in the world (e.g. it is missing ultraviolet patterns on plants). An LLM model of the world is shaped by our human view on the world.
bckr
12 hours ago
> Its "world model" is several degrees removed from the real world.
Like insects that weave tokens
dustingetz
4 hours ago
what does it mean to “generate thoughts”, exactly?
tomlockwood
12 hours ago
This is so uncannily similar to the "Mary's Room" argument in philosophy that I thought you were going there.
tsunamifury
32 minutes ago
Hahahaha I can’t believe you entirely missed the irony here that humans spend all day looking at screens doing the same thing.
timschmidt
16 hours ago
1000% this. I would only add this has been demonstrated explicitly with chess: https://adamkarvonen.github.io/machine_learning/2024/01/03/c...
jacquesm
8 hours ago
> Animal brains such as our own have evolved to compress information about our world to aide in survival.
Which has led to many optical illusions being extremely effective at confusing our inputs with other inputs.
Likely the same thing holds true for AI. This is also why there are so many ways around the barriers that AI providers put up to stop the dissemination of information that could embarrass them or be dangerous. You just change the context a bit ('pretend that', or 'we're making a movie') and suddenly it's all make-believe to the AI.
This is one of the reasons I don't believe you can make this tech safe and watertight against abuse, it's baked in right from the beginning, all you need to do is find a novel route around the restrictions and there is an infinity of such routes.
musicale
8 hours ago
The desired and undesired behavior are both consequences of the training data, so the models themselves probably can't be restricted to generating desired results only.
This means that there must be an output stage or filter that reliably validates the output. This seems practical for classes of problems where you can easily verify whether a proposed solution is correct.
However, for output that can't be proven correct, the most reliable output filter probably has a human somewhere in the loop; but humans are also not 100% reliable. They make mistakes, they can be misled, deceived, bribed, etc. And human criteria and structures, such as laws, often lag behind new technological developments.
Sometimes you can implement an undo or rollback feature, but other times the cat has escaped the bag.
fmbb
2 hours ago
Sure but everything is semantics.
LLMs have no internal secret model, they are the model. And the model is of how different lexemes relate to each other in the source material the model was built from.
Some might choose to call that the world.
If you believe your internal model of the world is no different from a statistical model of the words you have seen, then by all means do that. But I believe a lot of humans see their view of the world differently.
I very much believe my cat’s model of the world has barely anything at all to do with language.
This path to AGI through LLM is nothing but religious dogma some Silicon Valley rich types believe.
anothernewdude
6 hours ago
None of those models can learn continuously. LLMs currently can't add to their vocabulary post training as AGI would need to. That's a big problem.
Before anyone says "context", I want you to think on why that doesn't scale, and fails to be learning.
tyre
20 hours ago
There is some evidence from Anthropic that LLMs do model the world. This paper[0] tracing their "thought" is fascinating. Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages.
It does have clusters of parameters that correlate with concepts, not just randomly "after X word tends to have Y word." Otherwise you would expect all of Chinese to be grouped in one place, all of French in another, all of English in another. This is empirically not the case.
I don't know whether to understand knowledge you have to have a model of the world, but at least as far as language, LLMs very much do seem to have modeling.
[0]: https://www.anthropic.com/research/tracing-thoughts-language...
manmal
20 hours ago
> Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages
I thought that’s the basic premise of how transformers work - they encode concepts into high dimensional space, and similar concepts will be clustered together. I don’t think it models the world, but just the texts it ingested. It’s observation and regurgitation, not understanding.
I do use agents a lot (soon on my second codex subscription), so I don’t think that’s a bad thing. But I’m firmly in the “they are useful tools” camp.
bryanlarsen
18 hours ago
That's a model. Not a higher-order model like most humans use, but it's still a model.
manmal
18 hours ago
Yes, not of the world, but of the ingested text. Almost verbatim what I wrote.
timschmidt
8 hours ago
The ingested text itself contains a model of the world which we have encoded in it. That's what language is. Therefore by the transitive property...
manmal
2 hours ago
That‘s quite a big leap, and sounds like a philosophical question. But many philosophers like late Wittgenstein or Heidegger disagreed with this idea. On more practical terms, maybe you‘ve experienced the following: You read a manual of a device on how to do something with it; but only actually using it for a few times gives you the intuition on how to use it _well_. Text is just very lossy, because not every aspect of the world, and factors in your personal use, are described. Many people rather watch YouTube videos for eg repairs. But those are very lossy as well - they don’t cover the edge cases usually. And there is often just no video on the repair you need to do.
BTW, have you ever tried ChatGPT for advice on home improvement? It sucks _hard_ sometimes, hallucinating advice that doesn’t make any sense. And making up tools that don’t exist. There‘s no real commonsense to be had from it. Because it’s all just pieces of text that fight with each other for being the next token.
When using Claude Code or codex to write Swift code, I need to be very careful to provide all the APIs that are relevant in context (or let it web search), or garbage will be the result. There is no real understanding of how Swift („the world“) works.
timschmidt
32 minutes ago
None of your examples refute the direct evidence of internal world model building which has been demonstrated (for example: https://adamkarvonen.github.io/machine_learning/2024/01/03/c... ).
Instead you have retreated to qualia like "well" and "sucks hard".
> hallucinating
Literally every human memory. They may seem tangible to you, but they're all in your head. The result of neurons behaving in ways which have directly inspired ML algorithms for nearly a century.
Further, history is rife with examples of humans learning from books and other written words. And also of humans thinking themselves special and unique in ways we are not.
> When using Claude Code or codex to write Swift code, I need to be very careful to provide all the APIs that are relevant in context (or let it web search), or garbage will be the result.
Yep. And humans often need to reference the documentation to get details right as well.
tsunamifury
27 minutes ago
No it isn’t this is basic linguistics.
Computer nerds need wider education.
tsunamifury
28 minutes ago
Bruh compressing representations into linguistics is a human world model. I can’t believe how dumb ask these conversations are.
Are you all so terminally nerd brained you can’t see the obvious
sleepyams
18 hours ago
What does "higher-order" mean?
dgfitz
17 hours ago
I believe that the M in LLM stands for model. It is a statistical model, as it always has been.
_fizz_buzz_
7 hours ago
> Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages.
That doesn't seem surprising at all. My understanding is that transformers where invented exactly for the application of translations. So, concepts must be grouped together in different languages. That was originally the whole point and then turned out to be very useful for broader AI applications.
bravura
13 hours ago
How large is a lion?
Learning the size of objects using pure text analysis requires significant gymnastics.
Vision demonstrates physical size more easily.
Multimodal learning is important. Full stop.
Purely textual learning is not sample efficient for world modeling and the optimization can get stuck in local optima that are easily escaped through multimodal evidence.
("How large are lions? inducing distributions over quantitative attributes", Elazar et al 2019)
latentsea
12 hours ago
> How large is a lion?
Twice of half of its size.
johnisgood
5 hours ago
Can you be more specific about "size" here? (Do not tell me the definition of size though).
You are not wrong though, just very incomplete.
Your response is a food for thought, IMO.
Hendrikto
7 hours ago
That is just how embeddings work. It does not confirm nor deny whether LLMs have a world model.
overfeed
17 hours ago
> Basically an LLM translating across languages will "light up" for the same concepts across languages
Which is exactly what they are trained to do. Translation models wouldn't be functional if they are unable to correlate an input to specific outputs. That some hiddel-layer neurons fire for the same concept shouldn't come as a surprise, and is a basic feature required for the core functionality.
balder1991
14 hours ago
And if it is true that the language is just the last step after the answer is already conceptualized, why do models perform differently in different languages? If it was just a matter of language, they’d have the same answer but just with a broken grammar, no?
kaibee
11 hours ago
If you suddenly had to do all your mental math in base-7, do you think you'd be just as fast and accurate as you are at math in base-10? Is that because you don't have an internal world-model of mathematics? or is it because language and world-model are dependently linked?
SR2Z
20 hours ago
Right, but modeling the structure of language is a question of modeling word order and binding affinities. It's the Chinese Room thought experiment - can you get away with a form of "understanding" which is fundamentally incomplete but still produces reasonable outputs?
Language in itself attempts to model the world and the processes by which it changes. Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise - but you could make a very good case, for example, that understanding the same thing in poetry gets an LLM much closer.
hackinthebochs
19 hours ago
LLMs aren't just modeling word co-occurrences. They are recovering the underlying structure that generates word sequences. In other words, they are modeling the world. This model is quite low fidelity, but it should be very clear that they go beyond language modeling. We all know of the pelican riding a bicycle test [1]. Here's another example of how various language models view the world [2]. At this point it's just bad faith to claim LLMs aren't modeling the world.
[1] https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of...
[2] https://www.lesswrong.com/posts/xwdRzJxyqFqgXTWbH/how-does-a...
SR2Z
18 hours ago
The "pelican on a bicycle" test has been around for six months and has been discussed a ton on the internet; that second example is fascinating but Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E (Paris, notoriously on land). How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?
I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.
Yes, you could say this about human beings, but I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.
The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.
Terr_
6 hours ago
> Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E
I imagine simply making a semitransparent green land-splat in any such Wikipedia coordinate reference would get you pretty close to a world map, given how so much of the ocean won't get any coordinates at all... Unless perhaps the training includes a compendium of deep-sea ridges and other features.
skissane
8 hours ago
> The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.
A lot of humans contradict themselves all the time… therefore they cannot have any kind of sophisticated world model?
hackinthebochs
17 hours ago
>How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?
Maybe, but then I would expect more equal performance across model sizes. Besides, ingesting the data and being able to reproduce it accurately in a different modality is still an example of modeling. It's one thing to ingest a set of coordinates in a CSV indicating geographic boundaries and accurately reproduce that CSV. It's another thing to accurately indicate arbitrary points as being within the boundary or without in an entirely different context. This suggests a latent representation independent of the input tokens.
>I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.
There are good reasons to think this isn't the case. To effectively reproduce text that is about some structure, you need a model of that structure. A strong learning algorithm should in principle learn the underlying structure represented with the input modality independent of the structure of the modality itself. There are examples of this in humans and animals, e.g. [1][2][3]
>I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.
Seems reasonable enough, but it is at risk of being too human-centric. So much of our cognitive machinery is suited for helping us navigate and actively engage the world. But intelligence need not be dependent on the ability to engage the world. Features of the world that are obvious to us need not be obvious to an AGI that never had surviving predators or locating food in its evolutionary past. This is why I find the ARC-AGI tasks off target. They're interesting, and it will say something important about these systems when they can solve them easily. But these tasks do not represent intelligence in the sense that we care about.
>The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.
This proves that an LLM does not operate with a single world model. But this shouldn't be surprising. LLMs are unusual beasts in the sense that the capabilities you get largely depend on how you prompt it. There is no single entity or persona operating within the LLM. It's more of a persona-builder. What model that persona engages with is largely down to how it segmented the training data for the purposes of maximizing its ability to accurately model the various personas represented in human text. The lack of consistency is inherent to its design.
[1] https://news.wisc.edu/a-taste-of-vision-device-translates-fr...
[2] https://www.psychologicalscience.org/observer/using-sound-to...
homarp
18 hours ago
and we can say that a bastardized version of the Sapir-Worf hypothesis applies: what's in the training set shapes or limits LLM's view of the world
moron4hire
13 hours ago
Neither Sapir nor Whorf presented Linguistic Relativism as their own hypothesis and they never published together. The concept, if it exists at all, is a very weak effect, considering it doesn't reliably replicate.
homarp
9 hours ago
i agree that's the pop name.
Don't you think it replicates well for LLM though?
ajross
19 hours ago
> Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise
What does "understanding a sunrise" mean though? Arguments like this end up resting on semantics or tautology, 100% of the time. Arguments of the form "what AI is really doing" likewise fail because we don't know what real brains are "really" doing either.
I mean, if we knew how to model human language/reasoning/whatever we'd just do that. We don't, and we can't. The AI boosters are betting that whatever it is (that we don't understand!) is an emergent property of enough compute power and that all we need to do is keep cranking the data center construction engine. The AI pessimists, you among them, are mostly just arguing from ludditism: "this can't possibly work because I don't understand how it can".
Who the hell knows, basically. We're at an interesting moment where technology and the theory behind it are hitting the wall at the same time. That's really rare[1], generally you know how something works and applying it just a question of figuring out how to build a machine.
[1] Another example might be some of the chemistry fumbling going on at the start of the industrial revolution. We knew how to smelt and cast metals at crazy scales well before we knew what was actually happening. Stuff like that.
subjectivationx
38 minutes ago
Everyone reading this understands the meaning of a sunrise. It is a wonderful example of the use theory of meaning.
If you raised a baby inside a windowless solitary confinement cell for 20 years and then one day show them the sunrise on a video monitor, they still don't understand the meaning of a sunrise.
Trying to extract the meaning of a sunrise by a machine from the syntax of a sunrise data corpus is just totally absurd.
You could extract some statistical regularity from the pixel data of the sunrise video monitor or sunrise data corpus. That model may provide some useful results that can then be used in the lived world.
Pretending the model understands a sunrise though is just nonsense.
Showing the sunrise statistical model has some use in the lived world as proof the model understands a sunrise I would say borders on intellectual fraud considering a human doing the same thing wouldn't understand a sunrise either.
pastel8739
12 hours ago
Is it really so rare? I feel like I know of tons of fields where we have methods that work empirically but don’t understand all the theory. I’d actually argue that we don’t know what’s “actually” happening _ever_, but only have built enough understanding to do useful things.
ajross
11 hours ago
I mean, most big changes in the tech base don't have that characteristic. Semiconductors require only 1920's physics to describe (and a ton of experimentation to figure out how to manufacture). The motor revolution of the early 1900's was all built on well-settled thermodynamics (chemistry lagged a bit, but you don't need a lot of chemical theory to burn stuff). Maxwell's electrodynamics explained all of industrial electrification but predated it by 50 years, etc...
skydhash
11 hours ago
Those big changes always happens because someone presented a simpler model that explains stuff enough we can build stuff on it. It's not like semiconductors raw materials wasn't around.
The technologies around LLMs is fairly simple. What is not is the actual size of data being ingested and the number of resulting factors (weight). We have a formula and the parameters to generate grammatically perfect text, but to obtain it, you need TBs of data to get GBs of numbers.
In contrast something like TM or Church's notation is pure genius. Less than a 100 pages of theorems that are one of the main pillars of the tech world.
jhanschoo
13 hours ago
Let's make this more concrete than talking about "understanding knowledge". Oftentimes I want to know something that cannot feasibly be arrived at by reasoning, only empirically. Remaining within the language domain, LLMs get so much more useful when they can search the web for news, or your codebase to know how it is organized. Similarly, you need a robot that can interact with the world and reason from newly collected empirical data in order to answer these empirical questions, if the work had not already been done previously.
skydhash
11 hours ago
> LLMs get so much more useful when they can search the web for news, or your codebase to know how it is organized
But their usefulness is only surface-deep. The news that matters to you is always deeply contextual, it's not only things labelled as breaking news or happening near you. Same thing happens with code organization. The reason is more human nature (how we think and learn) than machine optimization (the compiler usually don't care).
awesome_dude
13 hours ago
I know the attributes of an Apple, i know the attributes of a Pear.
As does a computer.
But only i can bite into one and know without any doubt what it is and how it feels emotionally.
scrubs
11 hours ago
You have half a point. "Without any doubt" is merely the apex of a huge undefined iceberg.
I write half .. eating is multi modal and consequential. The llm can read the menu, but it didn't eat the meal. Even humans are bounded. Feeling, licking, smelling, or eating the menu still is not eating the meal.
There is an insuperable gap in the analogy ... a gap in the concept and of sensory data doing it.
Back to first point: what one knows through that sensory data ... is not clear at present or even possible with llms.
awesome_dude
7 hours ago
I think more, also, how i feel about the taste.
zaphirplane
12 hours ago
We segued to conscience and individuality.
vlovich123
12 hours ago
If it was modeling the world you’d expect “give me a picture of a glass filled to the brim” to actually do that. It’s inability to correctly and accurately combine concepts indicates it’s probably not building a model of the real world.
p1esk
10 hours ago
I just gave chatgpt this prompt - it produced a picture of a glass filled to the brim with water.
jdiff
10 hours ago
Like most quirks that spread widely, a bandaid is swiftly applied. This is also why they now know how many r's are in "strawberry." But we don't get any closer to useful general intelligence by cobbling together thousands of hasty patches.
llbbdd
2 hours ago
Seems to have worked fine for humans so far.
godelski
20 hours ago
> that to understand knowledge you have to have a model of the world.
You have a small but important mistake. It's to recite (or even apply) knowledge. To understand does actually require a world model.Think of it this way: can you pass a test without understanding the test material? Certainly we all saw people we thought were idiots do well in class while we've also seen people we thought were geniuses fail. The test and understanding usually correlates but it's not perfect, right?
The reason I say understanding requires a world model (and I would not say LLMs understand) is because to understand you have to be able to detail things. Look at physics, or the far more detail oriented math. Physicists don't conclude things just off of experimental results. It's an important part, but not the whole story. They also write equations, ones which are counterfactual. You can call this compression if you want (I would and do), but it's only that because of the generalization. But it also only has that power because of the details and nuance.
With AI many of these people have been screaming for years (check my history) that what we're doing won't get us all the way there. Not because we want to stop the progress, but because we wanted to ensure continued and accelerate progress. We knew the limits and were saying "let's try to get ahead of this problem" but were told "that'll never be a problem. And if it is, we'll deal with it when we deal with it." It's why Chollet made the claim that LLMs have actually held AI progress back. Because the story that was sold was "AGI is solved, we just need to scale" (i.e. more money). I do still wonder how different things would be if those of us pushing back were able to continue and scale our works (research isn't free, so yes, people did stop us). We always had the math to show that scale wasn't enough, but it's easy to say "you don't need math" when you can see progress. The math never said no progress nor no acceleration, the math said there's a wall and it's easier to adjust now than when we're closer and moving faster. Sadly I don't think we'll ever shift the money over. We still evaluate success weirdly. Successful predictions don't matter. You're still heralded if you made a lot of money in VR and Bitcoin, right?
robotresearcher
19 hours ago
In my view 'understand' is a folk psychology term that does not have a technical meaning. Like 'intelligent', 'beautiful', and 'interesting'. It usefully labels a basket of behaviors we see in others, and that is all it does.
In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human. There's no problem of how to do understanding, only how to do tasks. The 'problem' melts away when you take this stance.
Just my opinion, but my professional opinion from thirty-plus years in AI.
dullcrisp
18 hours ago
So my toaster understands toast and I don’t understand toast? Then why am I operating the toaster and not the other way around?
simondotau
16 hours ago
A toaster cannot perform the task of making toast any more than an Allen key can perform the task of assembling flat pack furniture.
godelski
16 hours ago
Let me understand, is your claim that a toaster can't toast bread because it cannot initiate the toasting through its own volition?
Ignoring the silly wording, that is a very different thing than what robotresearcher said. And actually, in a weird way I agree. Though I disagree that a toaster can't toast bread.
Let's take a step back. At what point is it me making the toast and not the toaster? Is it because I have to press the level? We can automate that. Is it because I have to put by bread in? We can automate that. Is it because I have to have the desire to have toast and initiate the chain of events? How do you measure that?
I'm certain that's different from measuring task success. And that's why I disagree with robotresearcher. The logic isn't self consistent.
simondotau
3 hours ago
> Though I disagree that a toaster can't toast bread.
If a toaster can toast bread, then an Allen key can assemble furniture. Both of them can do these tasks in collaboration with a human. This human supplies the executive decision-making (what when where etc), supplies the tool with compatible parts (bread or bolts) and supplies the motivating force (mains electricity or rotational torque).
The only difference is that it's more obviously ridiculous when it's an inanimate hunk of bent metal. Wait no, that could mean either of them. I mean the Allen key.
> Let's take a step back. At what point is it me making the toast and not the toaster?
I don't know exactly where that point is, but it's certainly not when the toaster is making zero decisions. It begins to be a valid question if you are positing a hypothetical "smart toaster" which has sensors and software capable of achieving toasting perfection regardless of bread or atmospheric variables.
> Is it because I have to press the level? We can automate that.
You might even say automatic beyond belief.
robotresearcher
16 hours ago
You and the toaster made toast together. Like you and your shoes went for a walk.
Not sure where you imagine my inconsistency is.
godelski
13 hours ago
That doesn't resolve the question.
> Not sure where you imagine my inconsistency is.
>> Let's take a step back. At what point is it me making the toast and not the toaster? Is it because I have to press the level? We can automate that. Is it because I have to put by bread in? We can automate that. Is it because I have to have the desire to have toast and initiate the chain of events? How do you measure that?
You have a PhD and 30 years of experience, so I'm quite confident you are capable of adapting the topic of "making toast" to "playing chess", "doing physics", "programming", or any similar topic where we are benchmarking results.Maybe I've (and others?) misunderstood your claim from the get-go? You seem to have implied that LLMs understand chess, physics, programming, etc because of their performance. Yet now it seems your claim is that the LLM and I are doing those things together. If your claim is that a LLM understands programming the same way a toaster understands how to make toast, then we probably aren't disagreeing.
But if your claim is that a LLM understands programming because it can produce programs that yield a correct output to test cases, then what's the difference from the toaster? I put the prompts in and pushed the button to make it toast.
I'm not sure why you imagine the inconsistency is so difficult to see.
simondotau
3 hours ago
Declaring something as having "responsibility" implies some delegation of control. A normal toaster makes zero decisions, and as such it has no control over anything.
techblueberry
15 hours ago
Does this mean an LLM doesn’t understand, but an LLM automated by a CRON Job does?
dullcrisp
16 hours ago
This is contrary to my experience with toasters, but it doesn’t seem worth arguing about.
jrflowers
16 hours ago
How does your toaster get the bread on its own?
dullcrisp
15 hours ago
It’s only responsible for the toasting part. The bread machine makes the bread.
simondotau
3 hours ago
What is your definition of "responsible"? The human is making literally all decisions and isn't abdicating responsibility for anything. The average toaster has literally one operational variable (cook time) and even that minuscule proto-responsibility is entirely on the human operator. All other aspects of the toaster's operation are decisions made by the toaster's human designer/engineer.
jrflowers
13 hours ago
If the toaster is the thing that “performs the task of making toast”, what do you call it when a human gets bread and puts it in a toaster?
recursive
16 hours ago
How do you get bread? Don't tell me you got it at the market. That's just paying someone else to get it for you.
godelski
13 hours ago
> That's just paying someone else to get it for you.
We can automate that too![0][0] https://news.ycombinator.com/item?id=45623154
(Your name is quite serendipitous to this conversation)
ssivark
10 hours ago
> In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human. There's no problem of how to do understanding, only how to do tasks.
Yes, but you also gloss over what a "task" is or what a "benchmark" is (which has to do with the meaning of generalization).
Suppose an AI or human answers 7 questions correctly out of 10 on an ICPC problem set, what are we able infer from that?
1. Is the task equal to answering these 10 questions well, with a uniform measure of importance?
2. Is the task be good at competitive programming problems?
3. Is the task be good at coding?
4. Is the task be good at problem solving?
5. Is the task not just to be effective under a uniform measure of importance, but an adversarial measure? (i.e. you can probably figure out all kinds of competitive programming questions, if you had more time / etc... but roughly not needing "exponentially more resources")
These are very different levels of abstraction, and literally the same benchmark result can be interpreted to mean very different things. And that imputation of generality is not objective unless we know the mechanism by which it happens. "Understanding" is short-hand for saying that performance generalizes at one of the higher levels of abstraction (3--5), rather than narrow success -- because that is what we expect of a human.
simianwords
5 hours ago
How do you quantify generality? If we have a benchmark that can quantify it and that benchmark reliably tells us that the LLM is within human levels of generalisation then the llm is not distinguishable from a human.
While it’s a good point that we need to benchmark generalisation ability, you have in fact agreed that it is not important to understand underlying mechanics.
compass_copium
19 hours ago
Nonsense.
A QC operator may be able to carry out a test with as much accuracy (or perhaps better accuracy, with enough practice) than the PhD quality chemist who developed it. They could plausibly do so with a high school education and not be able to explain the test in any detail. They do not understand the test in the same way as the chemist.
If 'understand' is a meaningless term to someone who's spent 30 years in AI research, I understand why LLMs are being sold and hyped in the way they are.
robotresearcher
18 hours ago
> They do not understand the test in the same way as the chemist.
Can you explain precisely what 'understand' means here, without using the word 'understand'? I don't think anyone can.
throw4847285
18 hours ago
There are a number of competing models. The SEP page is probably a good place to start.
bandrami
18 hours ago
Not to be flippant but have you considered that that question is an entire branch of philosophy with a several-millennias long history which people in some cases spend their entire life studying?
robotresearcher
18 hours ago
I have. It robustly has the folk-psychological meaning I mentioned in my first sentence. Call it ‘philosophical’ instead of ‘folk-psychological’ if you like. It’s a useful concept. But the concept doesn’t require AI engineers to do anything. It certainly doesn’t give any hints about AI engineers what they should actually do.
“Make it understand.”
“How? What does that look like?”
“… But it needs to understand…”
“It answers your questions.”
“But it doesn’t understand.”
“Ok. Get back to me when that entails anything.”
mommys_little
17 hours ago
I would say it understands if given many variations of a problem statement, it always gives correct answer without fail. I have this complicated mirror question that only Deepseek and qwen3-max got right every time, still they only answered it correctly about a dozen times, so we're left with high probability, I guess.
godelski
16 hours ago
I disagree with robotresearcher but I think this is also an absurd definition. By that definition there is no human, nor creature, that understands anything. Not just by nature of humans making mistakes, including experts, but I'd say this is even impossible. You need infinite precision and infinite variation here.
It turns "understanding" into a binary condition. Robotresearcher's does too, but I'm sure they would refine by saying that the level of understanding is directly proportional to task performance. But I still don't know how they'll address the issue of coverage, as ensuring tests have complete coverage is far from trivial (even harder when you want to differentiate from the training set, differentiating memorization).
I think you're right in trying to differentiate memorization from generalization, but your way to measure this is not robust enough. A fundamental characteristic of where I disagree from them is that memorization is not the same as understanding.
Zarathruster
11 hours ago
Isn't this just a reformulation of the Turing Test, with all the problems it entails?
robomartin
16 hours ago
I have been thinking about this for years, probably two decades. The answer to your question or the definition, I am sure you know, is rather difficult. I don't think it is impossible, but there's a risk of diving into a deep dark pit of philosophical thought going back to at least the ancient Greeks.
And, if we did go through that exercise, I doubt we can come out of it with a canonical definition of understanding.
I was really excited about LLM's as they surfaced and developed. I fully embraced the technology and have been using it extensively with full top-tier subscriptions to most services. My conclusion so far: If you want to destroy your business, adopt LLM's with gusto.
I know that's a statement that goes way against the train ride we are on this very moment. That's not to say LLM's are not useful. They are. Very much so. The problem is...well...they don't understand. And here I am, back in a circular argument.
I can define understanding with the "I know it when I see it" meme. And, frankly, it does apply. Yet, that's not a definition. We've all experienced that stare when talking to someone who does not have sufficient depth of understanding in a topic. Some of us have experienced people running teams who should not be in that position because they don't have a clue, they don't understand enough of it to be effective at what they do.
And yet, I still have not defined "understanding".
Well, it's hard. And I am not a philosopher, I am an engineer working in robotics, AI and applications to real time video processing.
I have written about my experiments using LLM coding tools (I refuse to call them AI, they are NOT intelligent; yes, need to define that as well).
In that context, lack of understanding is clearly evident when an LLM utterly destroys your codebase by adding dozens of irrelevant and unnecessary tests, randomly changes variable names as you navigate the development workflow, adds modules like a drunken high school coder and takes you down tangents that would make for great comedy if I were a tech comedian.
LLMs do not understand. They are fancy --and quite useful-- auto-complete engines and that's about it. Other than that, buyer beware.
The experiments I ran, some of them spanning three months of LLM-collaborative coding at various levels --from very hands-on to "let Jesus drive the car"-- conclusively demonstrated (at least to me) that:
1- No company should allow anyone to use LLMs unless they have enough domain expertise to be able to fully evaluate the output. And you should require that they fully evaluate and verify the work product before using it for anything; email, code, marketing, etc.
2- No company should trust anything coming out of an LLM, not one bit. Because, well, they don't understand. I recently tried to use the United Airlines LLM agent to change a flight. It was a combination of tragic and hilarious. Now, I know what's going on. I cannot possibly imagine the wild rides this thing is taking non-techies on every day. It's shit. It does not understand. It' isn't isolated to United Airlines, it's everywhere LLMs are being used. The potential for great damage is always there.
3- They can be great for summarization tasks. For example, you have have them help you dive deep into 300 page AMD/Xilinx FPGA datasheet or application note and help you get mentally situated. They can be great at helping you find prior art for patents. Yet, still, because they are mindless parrots, you should not trust any of it.
4- Nobody should give LLMs great access to a non-trivial codebase. This is almost guaranteed to cause destruction and hidden future effects. In my experiments I have experienced an LLM breaking unrelated code that worked just fine --in some cases fully erasing the code without telling you. Ten commits later you discover that your network stack doesn't work or isn't even there. Or, you might discover that the stack is there but the LLM changed class, variable or method names, maybe even data structures. It's a mindless parrot.
I could go on.
One response to this could be "Well, idiot, you need better prompts!". That, of course, assumes that part of my experimentation did not include testing prompts of varying complexity and length. I found that for some tasks, you get better results by explaining what you want and then asking the LLM to write a prompt to get that result. You check that prompt, modify if necessary and, from my experience, you are likely to get better results.
Of course, the reply to "you need better prompts" is easy: If the LLM understood, prompt quality would not be a problem at all and pages-long prompts would not be necessary. I should not have to specify that existing class, variable and method names should not be modified. Or that interfaces should be protected. Or that data structures need not be modified without reason and unless approved by me. Etc.
It reminds me of a project I was given when I was a young engineer barely out of university. My boss, the VP of Engineering where I worked, needed me to design a custom device. Think of it as a specialized high speed data router with multiple sources, destinations and a software layer to control it all. I had to design the electronics, circuit boards, mechanical and write all the software. The project had a budget of nearly a million dollars.
He brought me into his office and handed me a single sheet of paper with a top-level functional diagram. Inputs, outputs, interfaces. We had a half hour discussion about objectives and required timeline. He asked me if I could get it done. I said yet.
He checked in with me every three months or so. I never needed anything more than that single piece of paper and the short initial conversation because I understood what we needed, what he wanted, how that related to our other systems, available technology, my own capabilities and failings, available tools, etc. It took me a year to deliver. It worked out of the box.
You cannot do that with LLMs because they don't understand anything at all. They mimic what some might confuse for understanding, but they do not.
And, yet, once again, I have not defined the term. I think everyone reading this who has used LLMs to a non-trivial depth...well...understands what I mean.
dasil003
10 hours ago
> We've all experienced that stare when talking to someone who does not have sufficient depth of understanding in a topic.
I think you're really putting your finger on something here. LLMs have blown us away because they can interact with language in a very similar way to humans, and in fact it approximates how humans operate in many contexts when they lack a depth of understanding. Computers never could do this before, so it's impressive and novel. But despite how impressive it is, humans who were operating this way were never actually generating significant value. We may have pretended they were for social reasons, and there may even have been some real value associated with the human camaraderie and connections they were a part of, but certainly it is not of value when automated.
Prior to LLMs just being able to read and write code at a pretty basic level was deemed an employable skill, but because it was not a natural skill for lots of human, it was also a market for lemons and just the basic coding was overvalued by those who did not actually understand it. But of course the real value of coding has always been to create systems that serve human outcomes, and the outcomes that are desired are always driven by human concerns that are probably inscrutable to something without the same wetware as us. Hell, it's hard enough for humans to understand each other half the time, but even when we don't fully understand each other, the information conferred through non-verbal cue, and familiarity with the personalities and connotations that we only learn through extended interaction has a robust baseline which text alone can never capture.
When I think about strategic technology decisions I've been involved with in large tech companies, things are often shaped by high level choices that come from 5 or 6 different teams, each of which can not be effectively distilled without deep domain expertise, and which ultimately can only be translated to a working system by expert engineers and analysts who are able to communicate in an extremely high bandwidth fashion relying on mutual trust and applying a robust theory of the mind every step along the way. Such collaborators can not only understand distilled expert statements of which they don't have direct detailed knowledge, but also, they can make such distilled expert statements and confirm sufficient understanding from a cross-domain peer.
I still think there's a ton of utility to be squeezed out of LLMs as we learn how to harness and feed them context most effectively, and they are likely to revolutionize the way programming is done day-to-day, but I don't believe we are anywhere near AGI or anything else that will replace the value of what a solid senior engineer brings to the table.
robomartin
10 hours ago
I am not liking the term "AGI". I think intelligence and understanding are very different things and they are both required to build a useful tool that we can trust.
To use an image that might be familiar to lots of people reading this, the Sheldon character in Big Bang Theory is very intelligent about lots of fields of study and yet lacks tons of understanding about many things, particularly social interaction, the human impact of decisions, etc. Intelligence alone (AGI) isn't the solution we should be after. Nice buzz word, but not the solution we need. This should not be the objective at the top of the hill.
godelski
8 hours ago
I've always distinguished knowledge, intelligence, and wisdom. Knowledge is knowing a chair is a seat. Intelligence is being able to use a log as a chair. Wisdom is knowing the log chair will be more comfortable if I turn it around and that sometimes it's more comfortable to sit on the ground and use the log as fuel for the fire.
But I'm not going to say I was the first to distinguish those word. That'd be silly. They're 3 different words and we use them differently. We all know Sheldon is smart but he isn't very wise.
As for AGI, I'm not so sure my issue is with the label but more with the insistence that it is so easy and straight forward to understand. It isn't very wise to think the answer is trivial to a question which people have pondered for millennia. That just seems egotistical. Especially when thinking your answer is so obviously correct that you needn't bother trying to see if they were wrong. Even though Don Quixote didn't test his armor a second time, he had the foresight to test it once.
godelski
17 hours ago
> If 'understand' is a meaningless term to someone who's spent 30 years in AI research, I understand why LLMs are being sold and hyped in the way they are.
I don't have quite as much time as robotresearcher, but I've heard their sentiment frequently.I've been to conferences, talked with people at the top of the field (I'm "junior", but published and have a PhD) where when asking deeper questions I'll get a frequent response "I just care if it works." As if that also wasn't the motivation for my questions too.
But I'll also tell you that there are plenty of us who don't ascribe to those beliefs. There's a wide breadth of opinions, even if one set is large and loud. (We are getting louder though) I do think we can get to AGI and I do think we can figure out what words like "understand" truly mean (with both accuracy and precision, the latter being what's more lacking). But it is also hard to navigate because we're discouraged from this work and little funding flows our way (I hope as we get louder we'll be able to explore more, but I fear we may switch from one railroad to the next). The weirdest part to me has been that it seems that even in the research space, talking to peers, that discussing flaws or limits is treated as dismissal. I thought our whole job was to find the limits, explore them, and find ways to resolve them.
The way I see it now is that the field uses the duck test. If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. The problem is people are replacing "probably" with "is". The duck test is great, and right now we don't have anything much better. But the part that is insane is to call it perfect. Certainly as someone who isn't an ornithologist, I'm not going to be able to tell a sophisticated artificial duck from a real one. But it's ability to fool me doesn't make it real. And that's exactly why it would be foolish to s/probably/is.
So while I think you're understanding correctly, I just want to caution throwing the baby out with the bathwater. The majority of us dissenting from the hype train and "scale is all you need" don't believe humans are magic and operating outside the laws of physics. Unless this is a false assumption, artificial life is certainly possible. The question is just about when and how. I think we still have a ways to go. I think we should be exploring a wide breadth of ideas. I just don't think we should put all our eggs in one basket, especially if there's clear holes in it.
[Side note]: An interesting relationship I've noticed is that the hype train people tend to have a full CS pedigree while dissenters have mixed (and typically start in something like math or physics and make their way to CS). It's a weak correlation, but I've found it interesting.
hodgehog11
17 hours ago
As a mathematician who also regularly publishes in these conferences, I am a little surprised to hear your take; your experience might be slightly different to mine.
Identifying limitations of LLMs in the context of "it's not AGI yet because X" is huge right now; it gets massive funding, taking away from other things like SciML and uncertainty analyses. I will agree that deep learning theory in the sense of foundational mathematical theory to develop internal understanding (with limited appeal to numerics) is in the roughest state it has even been in. My first impression there is that the toolbox has essentially run dry and we need something more to advance the field. My second impression is that empirical researchers in LLMs are mostly junior and significantly less critical of their own work and the work of others, but I digress.
I also disagree that we are disincentivised to find meaning behind the word "understanding" in the context of neural networks: if understanding is to build an internal world model, then quite a bit of work is going into that. Empirically, it would appear that they do, almost by necessity.
godelski
14 hours ago
Maybe given our different niches we interact with different people? But I'm uncertain because I believe what I'm saying is highly visible. I forgot, which NeurIPS(?) conference were so many wearing "Scale is all you need" shirts?
> My first impression there is that the toolbox has essentially run dry and we need something more to advance the field
This is my impression too. Empirical evidence is a great tool and useful, especially when there is no strong theory to provide direction, but it is limited. > My second impression is that empirical researchers in LLMs are mostly junior and significantly less critical of their own work and the work of others
But this is not my impression. I see this from many prominent researchers. Maybe they claim SIAYN in jest, but then they should come out and say it is such instead of doubling down. If we take them at their word (and I do), robotresearcher is not a junior (please, read their comments. It is illustrative of my experience. I'm just arguing back far more than I would in person). I've also seen members of audiences to talks where people ask questions like mine ("are benchmarks sufficient to make such claims?") with responses of "we just care that it works." Again, I think this is a non-answer to the question. But being taken as a sufficient answer, especially in response to peers, is unacceptable. It almost always has no follow-up.I also do not believe these people are less critical. I've had several works which struggled through publication as my models that were a hundredth the size (and a millionth the data) could perform on par, or even better. At face value asks of "more datasets" and "more scale" are reasonable, yet it is a self reinforcing paradigm where it slows progress. It's like a corn farmer smugly asking why the neighboring soy bean farmer doesn't grow anything when the corn farmer is chopping all the soy bean stems in their infancy. It is a fine ask to big labs with big money, but it is just gate keeping and lazy evaluation to anyone else. Even at CVPR this last year they passed out "GPU Rich" and "GPU Poor" hats, so I thought the situation was well known.
> if understanding is to build an internal world model, then quite a bit of work is going into that. Empirically, it would appear that they do, almost by necessity.
I agree a "lot of work is going into it" but I also think the approaches are narrow and still benchmark chasing. I saw as well was given the aforementioned responses at workshops on world modeling (as well as a few presenters who gave very different and more complex answers or "it's the best we got right now", but nether seemed to confident in claiming "world model" either).But I'm a bit surprised that as a mathematician you think these systems create world models. While I see some generalization, this is also impossible for me to distinguish from memorization. We're processing more data than can be scrutinized. We seem to also frequently uncover major limitations to our de-duplication processes[0]. We are definitely abusing the terms "Out of Distribution" and "Zero shot". Like I don't know how any person working with a proprietary LLM (or large model) that they don't own, can make a claim of "zero shot" or even "few shot" capabilities. We're publishing papers left and right, yet it's absurd to claim {zero,few}-shot when we don't have access to the learning distribution. We've merged these terms with biased sampling. Was the data not in training or is it just a low likelihood region of the model? They're indistinguishable without access to the original distribution.
Idk, I think our scaling is just making the problem harder to evaluate. I don't want to stop that camp because they are clearly producing things of value, but I do also want that camp to not make claims beyond their evidence. It just makes the discussion more convoluted. I mean the argument would be different if we were discussing small and closed worlds, but we're not. The claims are we've created world models yet many of them are not self-consistent. Certainly that is a requirement. I admit we're making progress, but the claims were made years ago. Take GameNGen[1] or Diamond Diffusion. Neither were the first and neither were self-consistent. Though both are also impressive.
[0] as an example: https://arxiv.org/abs/2303.09540
hodgehog11
9 hours ago
Apologies if I ramble a bit here, this was typed in a bit of a hurry. Hopefully I answer some of your points.
First, regarding robotresearcher and simondota's comments, I am largely in agreement with what they say here. The "toaster" argument is a variant of the Chinese Room argument, and there is a standard rebuttal here. The toaster does not act independently of the human so it is not a closed system. The system as a whole, which includes the human, does understand toast. To me, this is different from the other examples you mention because the machine was not given a list of explicit instructions. (I'm no philosopher though so others can do a better job of explaining this). I don't feel that this is an argument for why LLMs "understand", but rather why the concept of "understanding" is irrelevant without an appropriate definition and context. Since we can't even agree on what constitutes understanding, it isn't productive to frame things in those terms. I guess that's where my maths background comes in, as I dislike the ambiguity of it all.
My "mostly junior" comment is partially in jest, but mostly comes from the fact that LLM and diffusion model research is a popular stream for moving into big tech. There are plenty of senior people in these fields too, but many reviewers in those fields are junior.
> I've also seen members of audiences to talks where people ask questions like mine ("are benchmarks sufficient to make such claims?") with responses of "we just care that it works."
This is a tremendous pain point to me more than I can convey here, but it's not unusual in computer science. Bad researchers will live and die on standard benchmarks. By the way, if you try to focus on another metric under the argument that the benchmarks are not wholly representative of a particular task, expect to get roasted by reviewers. Everyone knows it is easier to just do benchmark chasing.
> I also do not believe these people are less critical.
I think the fact that the "we just care that it works" argument is enough to get published is a good demonstration of what I'm talking about. If "more datasets" and "more scale" are the major types of criticisms that you are getting, then you are still working in a more fortunate field. And yes, I hate it as much as you do as it does favor the GPU rich, but they are at least potentially solvable. The easiest papers of mine to get through were methodological and often got these kinds of comments. Theory and SciML papers are an entirely different beast in my experience because you will rarely get reviewers that understand the material or care about its relevance. People in LLM research thought that the average NeurIPS score in the last round was a 5. Those in theory thought it was 4. These proportions feel reflected in the recent conferences. I have to really go looking for something outside the LLM mainstream, while there was a huge variety of work only a few years ago. Some of my colleagues have noticed this as well and have switched out of scientific work. This isn't unnatural or something to actively try to fix, as ML goes through these hype phases (in the 2000s, it was all kernels as I understand).
> approaches are narrow and still benchmark chasing > as a mathematician you think these systems create world models
When I say "world model", I'm not talking about outputs or what you can get through pure inference. Training models to perform next frame prediction and looking at inconsistencies in the output tells us little about the internal mechanism. I'm talking about appropriate representations in a multimodal model. When it reads a given frame, is it pulling apart features in a way that a human would? We've known for a long time that embeddings appropriately encode relationships between words and phrases. This is a model of the world as expressed through language. The same thing happens for images at scale as can be seen in interpretable ViT models. We know from the theory that for next frame prediction, better data and more scaling improves performance. I agree that isn't very interesting though.
> We are definitely abusing the terms "Out of Distribution" and "Zero shot".
Absolutely in agreement with everything you have said. These are not concepts that should be talked about in the context of "understanding", especially at scale.
> I think our scaling is just making the problem harder to evaluate.
Yes and no. It's clear that whatever approach we will use to gauge internal understanding needs to work at scale. Some methods only work with sufficient scale. But we know that completely black-box approaches don't work, because if they did, we could use them on humans and other animals.
> The claims are we've created world models yet many of them are not self-consistent.
For this definition of world model, I see this the same way as how we used to have "language models" with poor memory. I conjecture this is more an issue of alignment than a lack of appropriate representations of internal features, but I could be totally wrong on this.
godelski
7 hours ago
> The toaster does not act independently of the human so it is not a closed system
I think you're mistaken. No, not at that, at the premise. I think everyone agrees here. Where you're mistaken is that when I login to Claude it says "How can I help you today?"No one is thinking that the toaster understands things. We're using it to point out how silly the claim of "task performance == understanding" is. Techblueberry furthered this by asking if the toaster is suddenly intelligent by wrapping it with a cron job. My point was about where the line is drawn. The turning on the toaster? No, that would be silly and you clearly agree. So you have to answer why the toaster isn't understanding toast. That's the ask. Because clearly toaster toasts bread.
You and robotresearcher have still avoided answering this question. It seems dumb but that is the crux of the problem. The LLM is claimed to be understanding, right? It meets your claims of task performance. But they are still tools. They cannot act independently. I still have to prompt them. At an abstract level this is no different than the toaster. So, at what point does the toaster understand how to toast? You claim it doesn't, and I agree. You claim it doesn't because a human has to interact with it. I'm just saying that looping agents onto themselves doesn't magically make them intelligent. Just like how I can automate the whole process from planting the wheat to toasting the toast.
You're a mathematician. All I'm asking is that you abstract this out a bit and follow the logic. Clearly even our automated seed to buttered toast on a plate machine needs not have understanding.
From my physics (and engineering) background there's a key thing I've learned: all measurements are proxies. This is no different. We don't have to worry about this detail in most every day things because we're typically pretty good at measuring. But if you ever need to do something with precision, it becomes abundantly obvious. But you even use this same methodology in math all the time. Though I wouldn't say that this is equivalent to taking a hard problem, creating an isomorphic map to an easier problem, solving it, then mapping back. There's an invective nature. A ruler doesn't measure distance. A ruler is a reference to distance. A laser range finder doesn't measure distance either, it is photodetector and a timer. There is nothing in the world that you can measure directly. If we cannot do this with physical things it seems pretty silly to think we can do it with abstract concepts that we can't create robust definitions for. It's not like we've directly measured the Higgs either. But what, do you think entropy is actually a measurement of intelligible speech? Perplexity is a good tool for identifying an entropy minimizer? Or does it just correlate? Is a FID a measurement of fidelity or are we just using a useful proxy? I'm sorry, but I just don't think there are precise mathematical descriptions of things like natural English language or realistic human faces. I've developed some of the best vision models out there and I can tell you that you have to read more than the paper because while they will produce fantastic images they also produce some pretty horrendous ones. The fact that they statistically generate realistic images does not imply that they actually understand them.
> I'm no philosopher
Why not? It sounds like you are. Do you not think about metamathematics? What math means? Do you not think about math beyond the computation? If you do, I'd call you a philosopher. There's a P in a PhD for a reason. We're not supposed to be automata. We're not supposed to be machine men, with machine minds, and machine hearts. > This is a tremendous pain point ... researchers will live and die on standard benchmarks.
It is a pain we share. I see it outside CS as well, but I was shocked to see the difference. Most of the other physicists and mathematicians I know that came over to CS were also surprised. And it isn't like physicists are known for their lack of egos lol > then you are still working in a more fortunate field
Oh, I've gotten the other comments too. That research never found publication and at the end of the day I had to graduate. Though now it can be revisited. I once was surprise to find that I saved a paper from Max Welling's group. My fellow reviewers were confident in their rejections just since they admitted to not understanding differential equations the AC sided with me (maybe they could see Welling's name? I didn't know till months after). It barely got through a workshop, but should have been in the main proceedings.So I guess I'm saying I share this frustration. It's part of the reason I talk strongly here. I understand why people shift gears. But I think there's a big difference between begrudgingly getting on the train because you need to publish to survive and actively fueling it and shouting that all outer trains are broken and can never be fixed. One train to rule them all? I guess CS people love their binaries.
> world model
I agree that looking at outputs tells us little about their internal mechanisms. But proof isn't symmetric in difficulty either. A world model has to be consistent. I like vision because it gives us more clues in our evaluations, let's us evaluate beyond metrics. But if we are seeing video from a POV perspective, then if we see a wall in front of us, turn left, then turn back we should still expect to see that wall, and the same one. A world model is a model beyond what is seen from the camera's view. A world model is a physics model. And I mean /a/ physics model, not "physics". There is no single physics model. Nor do I mean that a world model needs to have even accurate physics. But it does need to make consistent and counterfactual predictions. Even the geocentric model is a world model (literally a model of worlds lol). The model of the world you have in your head is this. We don't close our eyes and conclude the wall in front of you will disappear. Someone may spin you around and you still won't do this, even if you have your coordinates wrong. The issue isn't so much memory as it is understanding that walls don't just appear and disappear. It is also understanding that this also isn't always true about a cat.I referenced the game engines because while they are impressive they are not self consistent. Walls will disappear. An enemy shooting at you will disappear sometimes if you just stop looking at it. The world doesn't disappear when I close my eyes. A tree falling in a forest still creates acoustic vibrations in the air even if there is no one to hear it.
A world model is exactly that, a model of a world. It is a superset of a model of a camera view. It is a model of the things in the world and how they interact together, regardless of if they are visible or not. Accuracy isn't actually the defining feature here, though it is a strong hint, at least it is for poor world models.
I know this last part is a bit more rambly and harder to convey. But I hope the intention came across.
robotresearcher
17 hours ago
Intellectual caution is a good default.
Having said that, can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?
As an example, how would a chess program that understands chess differ from one that is merely better at it than any human who ever lived?
(Chess the formal game; not chess the cultural phenomenon)
Some people don’t find the example satisfying, because they feel like chess is not the kind of thing where understanding pertains.
I extend that feeling to more things.
godelski
15 hours ago
> any human who ever lived
Is this falsifiable? Even restricting to those currently living? On what tests? In which way? Does the category of error matter? > can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?
I'd argue you didn't understand the examples from my previous comment or the direct reply[0]. Does it become a duck as soon as you are able to trick an ornithologist? All ornithologists?But yes. Is it fair if I use Go instead of Chess? Game 4 with Lee Sedol seems an appropriate example.
Vafa also has some good examples[1,2].
But let's take an even more theoretical approach. Chess is technically a solved game since it is non-probabilistic. You can compute an optimal winning strategy from any valid state. Problem is it is intractable since the number of action state pairs is so large. But the number of moves isn't the critical part here, so let's look at Tic-Tac-Toe. We can pretty easily program up a machine that will not lose. We can put all actions and states into a graph and fit that on a computer no problem. Do you really say that the program better understands Tic-Tac-Toe than a human? I'm not sure we should even say it understands the game at all.
I don't think the situation is resolved by changing to unsolved (or effectively unsolved) games. That's the point of the Heliocentric/Geocentric example. The Geocentric Model gave many accurate predictions, but I would find it surprising if you suggested an astronomer at that time, with deep expertise in the subject, understood the configuration of the solar system better than a modern child who understands Heliocentricism. Their model makes accurate predictions and certainly more accurate than that child would, but their model is wrong. It took quite a long time for Heliocentrism to not just be proven to be correct, but to also make better predictions than Geocentrism in all situations.
So I see 2 critical problems here.
1) The more accurate model[3] can be less developed, resulting in lower predictive capabilities despite being a much more accurate representation of the verifiable environment. Accuracy and precision are different, right?
2) Test performance says nothing about coverage/generalization[4]. We can't prove our code is error free through test cases. We use them to bound our confidence (a very useful feature! I'm not against tests, but as you say, caution is good).
In [0] I referenced Dyson, I'd appreciate it if you watched that short video (again if it's been some time). How do you know you aren't making the same mistake Dyson almost did? The mistake he would have made had he not trusted Fermi? Remember, Fermi's predictions were accurate and they even stood for years.
If your answer is time, then I'm not convinced it is a sufficient explanation. It doesn't explain Fermi's "intuition" (understanding) and is just kicking the can down the road. You wouldn't be able to differentiate yourself from Dyson's mistake. So why not take caution?
And to be clear, you are the one making the stronger claim: "understanding has a well defined definition." My claim is that yours is insufficient. I'm not claiming I have an accurate and precise definition, my claim is that we need more work to get the precision. I believe your claim can be a useful abstraction (and certainly has been!), but that there are more than enough problems that we shouldn't hold to it so tightly. To use it as "proof" is naive. It is equivalent to claiming your code is error free because it passes all test cases.
[0] https://news.ycombinator.com/item?id=45622156
[1] https://arxiv.org/abs/2406.03689
[2] https://arxiv.org/abs/2507.06952
[3] Certainly placing the Earth at the center of the solar system (or universe!) is a larger error than placing the sun at the center of the solar system and failing to predict the tides or retrograde motion of Mercury.
[4] This gets exceedingly complex as we start to differentiate from memorization. I'm not sure we need to dive into what the distance from some training data needs be to make it a reasonable piece of test data, but that is a question that can't be ignored forever.
pennaMan
17 hours ago
so your definition of "understand" is "able to develop the QC test (or explain tests already developed)"
I hate to break it to you, but the LLMs can already do all 3 tasks you outlined
It can be argued for all 3 actors in this example (the QC operator, the PhD chemist and the LLM) that they don't really "understand" anything and are iterating on pre-learned patterns in order to complete the tasks.
Even the ground-breaking chemist researcher developing a new test can be reduced to iterating on the memorized fundamentals of chemistry using a lot of compute (of the meat kind).
The mythical Understanding is just a form of "no true Scotsman"
lelandbatey
18 hours ago
> if a machine performs a task as well as a human, it understands it exactly as much as a human.
I think you're right, except that the ones judging "as well as a human" are in fact humans, and humans have expectations that expand beyond the specs. From the narrow perspective of engineering specifications or profit generated, a robot/AI may very well be exactly as understanding as a human. For the people which interact with those systems outside the money/specs/speeds & feeds, the AI/robot will always feel at least different compared to a person. And as long as it's different, there will always be room to un-falsifiably claim "this robot is worse in my opinion due to X/Y/Z difference."
subjectivationx
11 minutes ago
This is all nonsense.
It is like saying the airplane understands how to fly.
"You disagree? Well lets see you fly! You are saying the airplane doesn't understand how to fly and you can't even fly yourself?"
This would be confusing the fact humans built the flying machine and the flying machine doesn't understand anything.
godelski
18 hours ago
> that does not have a technical meaning
I don't think the definition is very refined, but I think we should be careful to differentiate that from useless or meaningless. I would say most definitions are accurate, but not precise.It's a hard problem, but we are making progress on it. We will probably get there, but it's going to end up being very nuanced and already it is important to recognize that the word means different things in vernacular and in even differing research domains. Words are overloaded and I think we need to recognize this divergence and that we are gravely miscommunicating by assuming the definitions are obvious. I'm not sure why we don't do more to work together on this. In our field we seem to think we got it all covered and don't need others. I don't get that.
> In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human.
And I do not think this is accurate at all. I would not say my calculator understands math despite it being able to do it better than me. I can say the same thing about a lot of different things which we don't attribute intelligence to. I'm sorry, but the logic doesn't hold.Okay, you might take an out by saying the calculator can't do abstract math like I can, right? Well we're going to run into that same problem. You can't test your way out of it. We've known this in hard sciences like physics for centuries. It's why physicists do much more than just experiments.
There's the classic story of Freeman Dyson speaking to Fermi, which is why so many know about the 4 parameter elephant[0], but it is also just repeated through our history of physics. Guess what? Dyson's experiments worked. They fit the model. They were accurate and made accurate predictions! Yet they were not correct. People didn't reject Galileo just because the church, there were serious problems with his work too. Geocentricism made accurate predictions, including ones that Galileo's version of Heliocentrism couldn't. These historical misunderstandings are quite common, including things like how the average person understands things like Schrodinger's Cat. The cat isn't in a parallel universe of both dead and alive lol. It's just that we, outside the box can't determine which. Oh, no, information is lossy, there's injective functions, the universe could then still be deterministic yet we wouldn't be able to determine that (and my name comes into play).
So idk, it seems like you're just oversimplifying as a means to sidestep the hard problem[1]. The lack of a good technical definition of understanding should tell us we need to determine one. It's obviously a hard thing to do since, well... we don't have one and people have been trying to solve it for thousands of years lol.
> Just my opinion, but my professional opinion from thirty-plus years in AI.
Maybe I don't have as many years as you, but I do have a PhD in CS (thesis on neural networks) and a degree in physics. I think it certainly qualifies as a professional opinion. But at the end of the day it isn't our pedigree that makes us right or wrong.[0] https://www.youtube.com/watch?v=hV41QEKiMlM
[1] I'm perfectly fine tabling a hard problem and focusing on what's more approachable right now, but that's a different thing. We may follow a similar trajectory but I'm not going to say the path we didn't take is just an illusion. I'm not going to discourage others from trying to navigate it either. I'm just prioritizing. If they prove you right, then that's a nice feather in your hat, but I doubt it since people have tried that definition from the get go.
robotresearcher
16 hours ago
> It's a hard problem
So people say.
I’m not sidestepping the Hard Problem. I am denying it head on. It’s not a trick or a dodge! It’s a considered stance.
I'm denying that an idea that has historically resisted crisp definition, and that the Stanford Encyclopedia of Philosophy introduces as 'protean', needs to be taken seriously as an essential missing part of AI systems, until someone can explain why.
In my view, the only value the Hard Problem has is to capture a feeling people have about intelligent systems. I contend that this feeling is an artifact of being a social ape, and it entails nothing about AI.
pastel8739
12 hours ago
Regardless of whether you think understanding is important, it’s clear from this thread that a lot of people find understanding valuable. In order to trust an AI with decisions that affect people, people will want to believe that the AI “understands” the implications of its decisions, for whatever meaning of “understand” those people have in their head. So indeed I think it is important that AI researchers try to get their AIs to understand things, because it is important to the consumers that they do.
godelski
14 hours ago
It's a sidestep if your stance doesn't address critiques.
> needs to be taken seriously as an essential missing part of AI systems, until someone can explain why.
Ignoring critiques is not the same as a lack of themZarathruster
11 hours ago
While I agree with you in the main, I also take seriously the "until someone can explain why" counterpoint.
Though I agree with you that your calculator doesn't understand math, one might reasonably ask, "why should we care?" And yeah, if it's just a calculator, maybe we don't care. A calculator is useful to us irrespective of understanding.
If we're to persuade anyone (if we are indeed right), we'll need to articulate a case for why understanding matters, with respect to AI. I think everyone gets this on an instinctual level- it wasn't long ago that LLMs suggested we add rocks to our salads to make them more crunchy. As long as these problems can be overcome by throwing more data and compute at them, people will remain incurious about the Understanding Problem. We need to make a rigorous case, probably with a good working alternative, and I haven't seen much action here.
godelski
6 hours ago
> "why should we care?"
I'm not the one claiming that a calculator thinks. The burden of proof lies on those that do. Claims require evidence and extraordinary claims require extraordinary evidence.I don't think anyone is saying that the calculator isn't a useful tool. But certainly we should push back when people are claiming it understands math and can replace all mathematicians.
> If we're to persuade anyone, we'll need to articulate a case for why understanding matters
This is a more than fair point. Though I have not found it to be convincing when I've tried.I'll say that a major motivating reason of why I went into physics in the first place is because I found that a deep understanding was a far more efficient way of learning how to do things. I started as an engineer and even went into engineering after my degree. Physics made me a better engineer, and I think a better engineer than had I stayed in engineering. Understanding gave me the ability to not just take building blocks and put them together, but to innovate. Being able to see things at a deeper level allowed me to come to solutions I otherwise could not have. Using math to describe things allowed me to iterate faster (just like how we use simulations). Understanding what the math meant allowed me to solve the problems where the equations no longer applied. It allowed me to know where the equations no longer applied. It told me how to find and derive new ones.
I often found that engineers took an approach of physical testing first, because "the math only gets you so far." But that was just a misunderstanding of how far their math took them. It could do more, just they hadn't been taught that. So maybe I had to take a few days working things out on pen and paper, but that was a cheaper and more robust solution than using the same time to test and iterate.
Understanding is a superpower. Problems can be solved without understanding. A mechanic can fix an engine without knowing how it works. But they will certainly be able to fix more problems if they do. The reason to understand is because we want things to work. The problem is, the world isn't so simple that every problem is the same or very similar to another. A calculator is a great tool. It'll solve calculations all day. Much faster than me, with higher accuracy, but it'll never come up with an equation on its own. That isn't to call it useless, but I need to know this if I want to get things done. The more I understand what my calculator can and can't do, the better I can use that tool.
Understanding things, and the pursuit to understand more is what has brought humans to where they are today. I do not understand why this is even such a point of contention. Maybe the pursuit of physics didn't build a computer, but it is without a doubt what laid the foundation. We never could have done this had we not thought to understand lightning. We would have never been able to tame it like we have. Understanding allows us to experiment with what we cannot touch. It does not mean a complete understanding nor does it mean perfection, but it is more than just knowledge.
JKCalhoun
17 hours ago
I'm not sure. There's a view that, as I understand it, suggests that language is intelligence. That language is a requirement for understanding.
An example might be kind of the contrary—that you might not be able to hold an idea in your head until it has been named. For myself, until I heard the word gestalt (maybe a fitting example?) I am not sure I could have understood the concept. But when it is described it starts to coalesce—and then when named, it became real. (If that makes sense.)
FWIW, Zeitgeist is another one of those concepts/words for me. I guess I have to thank the German language.
Perhaps it is why other animals on this planet seem to us lacking intelligence. Perhaps it is their lack of complex language holding their minds back.
godelski
17 hours ago
> There's a view that suggests that language is intelligence.
I think you find the limits when you dig in. What are you calling language? Can you really say that Eliza doesn't meet your criteria? What about a more advanced version? I mean we've been passing the Turing Test for decades now. > That language is a requirement for understanding.
But this contradicts your earlier statement. If language is a requirement then it must precede intelligence, right?I think you must then revisit your definition of language and ensure that it matches to all the creatures that you consider intelligent. At least by doing this you'll make some falsifiable claims and can make progress. I think an ant is intelligent, but I also think ants do things far more sophisticated than the average person thinks. It's an easy trap, not knowing what you don't know. But if we do the above we get some path to aid in discovery, right?
> that you might not be able to hold an idea in your head until it has been named
Are you familiar with Anendophasia?It is the condition where a person does not have an internal monologue. They think without words. The definition of language is still flexible enough that you can probably still call that language, just like in your example, but it shows a lack of precision in the definition, even if it is accurate.
> Perhaps it is why other animals on this planet seem to us lacking intelligence
One thing to also consider is if language is necessary for societies or intelligence. Can we decouple the two? I'm not aware of any great examples, although octopi and many other cephalopods are fairly asocial creatures. Yet they are considered highly intelligent due to their adaptive and creative nature.Perhaps language is a necessary condition for advanced intelligence, but not intelligence alone. Perhaps it is communication and societies, differentiating from an internalized language. Certainly the social group can play an influence here, as coalitions can do more than the sum of the individuals (by definition). But the big question is if these things are necessary. Getting the correct causal graph, removing the confounding variables, is no easy task. But I think we should still try and explore differing ideas. While I don't think you're right, I'll encourage you to pursue your path if you encourage me to pursue mine. We can compete, but it should be friendly, as our competition forces us to help see flaws in our models. Maybe the social element isn't a necessary condition, but I have no doubt that it is a beneficial tool. I'm more frustrated by those wanting to call the problem solved. It obviously isn't, as it's been so difficult to get generalization and consensus among experts (across fields).
the_gipsy
16 hours ago
> It is the condition where a person does not have an internal monologue.
These people are just nutjobs that misinterpreted what internal monologue means, and have trouble doing basic introspection.
I know there are a myriad of similar conditions, aphantasia, synaesthesia, etc. But someone without internal monologue simply could not function in our society, or at least not pass as someone without obvious mental diminishment.
If there really were some other, hidden code in the mind, that could express "thoughts" in the same depth as language does - then please show it already. At least the tiniest bit of a hint.
godelski
15 hours ago
I know some of these people. We've had deep conversations about what is going on in our thought processes. Their description significantly differs from mine.
These people are common enough that you likely know some. It's just not a topic that frequently comes up.
It is also a spectrum, not a binary thing (though full anendophasia does exist, it is just on the extreme end). I think your own experiences should allow you to doubt your claim. For example, I know when I get really into a fiction book I'm reading that I transition from a point where I'm reading the words in my head to seeing the scenes more like a movie, or more accurately like a dream. I talk to myself in my head a lot, but I can also think without words. I do this a lot when I'm thinking about more physical things like when I'm machining something, building things, or even loading dishwasher. So it is hard for me to believe that while I primarily use an internal monologue that there aren't people that primarily use a different strategy.
On top of that, well, I'm pretty certain my cat doesn't meow in her head. I'm not certain she has a language at all. So why would it be surprising that this condition exists? You'd have to make the assumption that there was a switch in human evolution. Where it happened all at once or all others went extinct. I find that less likely than the idea that we just don't talk enough about how we think to our friends.
Certainly there are times where you think without a voice in your head. If not, well you're on the extreme other end. After all, we aren't clones. People are different, even if there's a lot of similarities.
the_gipsy
5 hours ago
I suggest you revisit the subject with your friends, with two key points:
1. Make it clear to them that with "internal monologue" you do not mean an actual audible hallucination
2. Ask them if they EVER have imagined themselves or others saying or asking anything
If they do, which they 100% will unless they lie, then you have ruled out "does not have an internal monologue", the claim is now "does not use his internal monologue as much". You can keep probing them what exactly that means, but it gets washy.
Someone that truly does not have an internal dialogue could not do the most basic daily tasks. A person could grab a cookie from the table when they feel like it (oh, :cookie-emoji:!), but they cannot put on their shoes, grab their wallet and keys, look in the mirror to adjust their hair, go to the supermarket, to buy cookies. If there were another hidden code that can express all huge mental state pulled by "buy cookies", by now we would at least have an idea that it exists underneath. We must also ask, why would we translate this constantly into language, if the mental state is already there? Translation costs processing power and slows down. So why are these "no internal monologue" people not geniuses?
I have no doubt that there is a spectrum, on that I agree with you. But the spectrum is "how present is (or how aware is the person of-) the internal monologue". E.g. some people have ADHD, others never get anxiety at all. "No internal monologue" is not one end of the spectrum for functioning adults.
The cat actually proves my point. A cat can sit for a long time before a mouse-hole, or it can hide to jumpscare his brother cat, and so on. So to a very small degree there is something that let's it process ("understand") very basic and near-future event and action-reactions. However, a cat could not possibly go to the supermarket to buy food, obviating anatomical obstacles, because: it has no language and therefore cannot make a complex mental model. Fun fact: whenever animals (apes, birds) have been taught language, they never ask questions (some claim they did, but if you dig in you'll see that the interpretation is extremely dubious).
Mikhail_Edoshin
5 hours ago
There is us a book written by a woman who suffered a stroke. She lost the ability to speak and understand language. Yet she remained conscious. It took her ten years to fully recover. The book is called "A stroke of insight".
the_gipsy
5 hours ago
Conscious, like an animal or a baby. She could not function at all like a normal adult. Proves my point.
nebula8804
18 hours ago
Only problem is this time enough money is being burned that if AGI does not come, it will probably be extremely painful/fatal for a lot of people that had nothing to do with this field or the decisions being made. What will be the consequences if that comes to pass? So many lives were permanently ruined due to the GFC.
munksbeer
18 hours ago
> We always had the math to show that scale wasn't enough
Math, to show that scale (presumably of LLMs) wasn't enough for AGI?
This sounds like it would be quite a big deal, what math is that?
hodgehog11
17 hours ago
As someone who is invested in researching said math, I can say with some confidence that it does not exist, or at least not in the form claimed here. That's the whole problem.
I would be ecstatic if it did though, so if anyone has any examples or rebuttal, I would very much appreciate it.
camillomiller
10 hours ago
Fantastic comment!
naasking
18 hours ago
> It's to recite (or even apply) knowledge. To understand does actually require a world model.
This is a shell game, or a god of the gaps. All you're saying is that the models "understand" how to recite or apply knowledge or language, but somehow don't understand knowledge or language. Well what else is there really?
godelski
13 hours ago
> Well what else is there really?
Differentiate from memorization.I'd say there's a difference between a database and understanding. If they're the same, well I think Google created AGI a long time ago.
naasking
10 hours ago
A database doesn't recite or apply knowledge, it stores knowledge.
godelski
6 hours ago
It sure recites it when I query it
naasking
an hour ago
It makes perfect sense to say that the database understands your query. It also makes sense to say that the database's factorization of domain knowledge + domain queries exhibit at least a static domain understanding (which still isn't general ala AGI). This is the standard systems response to the Chinese Room.
The "general" part comes from whether that static aspect can be made dynamic and extensible. In what sense is a system that can be arbitrarily extended to "recite" or "apply" knowledge not AGI?
DrewADesign
12 hours ago
I think current AI is a human language/behavior mirror. A cat might believe they see another cat looking in a mirror, but you can’t create a new cat by creating a perfect mirror.
Animats
7 hours ago
> The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to understand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.
That's the basic success of LLMs. They don't have much of a model of the world, and they still work. "Attention is all you need". Good Old Fashioned AI was all about developing models, yet that was a dead end.
There's been some progress on representation in an unexpected area. Try Perchance's AI character chat. It seems to be an ordinary chatbot. But at any point in the conversation, you can ask it to generate a picture, which it does using a Stable Diffusion type system. You can generate several pictures, and pick the one you like best. Then let the LLM continue the conversation continue from there.
It works from a character sheet, which it will create if asked. It's possible to start from an image and get to a character sheet and a story. The back and forth between the visual and textural domains seems to help.
For storytelling, such system may need to generate the collateral materials needed for a stage or screen production - storyboards, scripts with stage directions, character summaries, artwork of sets, blocking (where everybody is positioned on stage), character sheets (poses and costumes) etc. Those are the modeling tools real productions use to keep a work created by many people on track. Those are a form of world model for storytelling.
I've been amazed at how good the results I can get from this thing are. You have to coax it a bit. It tends to stay stuck in a scene unless you push the plot forward. But give it a hint of what happens next and it will run with it.
senko
5 hours ago
The thing is, achieving say, 99.99999% reliable AI would be spectacularly useful even if it's a dead end from the AGI perspective.
People routinely conflate the "useful LLMs" and "AGI", likely because AGI has been so hyped up, but you don't need AGI to have useful AI.
It's like saying the Internet is dead end because it didn't lead to telepathy. It didn't, but it sure as hell is useful.
It's beneficial to have both discussions: whether and how to achieve AGI and how to grapple with it, and how to improve a reliability, performance and cost of LLMs for more prosaic use cases.
It's just that they are separate discussions.
DanielHB
4 hours ago
Problem is that these models feels like they are 8 and getting more 8's
(maybe 7)
theptip
10 hours ago
> LLMs seem to udnerstand language therefore they've trained a model of the world.
This isn’t the claim, obviously. LLMs seem to understand a lot more than just language. If you’ve worked with one for hundreds of hours actually exercising frontier capabilities I don’t see how you could think otherwise.
harrall
9 hours ago
I don’t have a deep understand of LLMs but don’t they fundamentally work on tokens and generate a multi-dimensional statistical relationship map between tokens?
So it doesn’t have to be LLM. You could theoretically have image tokens (though I don’t know in practice, but the important part is the statistical map).
And it’s not like my brain doesn’t work like that either. When I say a funny joke in response to people in a group, I can clearly observe my brain pull together related “tokens” (Mary just talked about X, X is related to Y, Y is relevant to Bob), filter them, sort them and then spit out a joke. And that happens in like less than a second.
tacitusarc
9 hours ago
Yes! Absolutely. And this is likely what would be necessary for anything approaching actual AGI. And not just visual input, but all kinds of sensory input. The problem is that we have no ability, not even close, to process that even near the level of a human yet, much less some super genius being.
bentt
19 hours ago
I think this a useful challenge to our normal way of thinking.
At the same time, "the world" exists only in our imagination (per our brain). Therefore, if LLMs need a model of a world, and they're trained on the corpus of human knowledge (which passed through our brains), then what's the difference, especially when LLMs are going back into our brains anyway?
qlm
19 hours ago
Language isn't thought. It's a representation of thought.
chasd00
18 hours ago
Something to think about (hah!) is there are people without an internal monologue i.e. no voice inside their head they use when working out a problem. So they're thinking and learning and doing what humans do just fine with no little voice no language inside their head.
WJW
18 hours ago
It's so weird that people literally seem to have a voice in their head they cannot control. For me personally my "train of thought" is a series of concepts, sometimes going as far as images. I can talk to myself in my head with language if I make a conscious effort to do so, just as I can breathe manually if I want. But if I don't, it's not really there like some people seem to have.
Probably there are at least two groups of people and neither really comprehends how the other thinks haha.
graemefawcett
16 hours ago
I think there are significantly more than 2, when you start to count variations through the spectrum of neurodiversity.
Spatial thinkers, for example, or the hyperlexic.
Meaning for hyperlexics is more akin to finding meaning in the edges of the graph, rather than the vertices. The form of language contributing a completely separate graph of knowledge, alongside its content, creating a rich, multimodal form of understanding.
Spatial thinkers have difficulty with procedural thinking, which is how most people are taught. Rather than the series of steps to solve the problem, they see the shape of the transform. LLMs as an assistive device can be very useful for spatial thinkers in providing the translation layer between the modes of thought.
rhetocj23
18 hours ago
Its very interesting to see how many people struggle to understand this.
subjectivationx
a minute ago
We are paying the price now for not teaching language philosophy as a core educational requirement.
Most people have had no exposure to even the most basic ideas of language philosophy.
The idea all these people go to school for years and don't even have to take a 1 semester class on the main philosophical ideas of the 20th century is insane.
CamperBob2
18 hours ago
If it were that simple, LLMs wouldn't work at all.
qlm
6 hours ago
I think it explains quite well why LLMs are useful in some ways but stupid in many other ways.
naasking
18 hours ago
Are the particles that make up thoughts in our brain not also a representation of a thought? Isn't "thought" really some kind of Platonic ideal that only has approximate material representations? If so, why couldn't some language sentences be thoughts?
qlm
6 hours ago
The sentence is the result of a thought. The sentence in itself does not capture every process that went into producing the sentence.
naasking
an hour ago
> The sentence in itself does not capture every process that went into producing the sentence.
A thought does not capture every process that went into producing the thought either.
sysguest
20 hours ago
yeah that "model of the world" would mean:
babies are already born with "the model of the world"
but a lot of experiments on babies/young kids tell otherwise
ekjhgkejhgk
20 hours ago
> yeah that "model of the world" would mean: babies are already born with "the model of the world"
No, not necessarily. Babies don't interact with the world only by reading what people wrote wikipedia and stackoverflow, like these models are trained. Babies do things to the world and observe what happens.
I imagine it's similar to the difference between a person sitting on a bicycle and trying to ride it, vs a person watching videos of people riding bicycles.
I think it would actually be a great experiment. If you take a person that never rode a bicycle in their life and feed them videos of people riding bicycles, and literature about bikes, fiction and non-fiction, at some point I'm sure they'll be able to talk about it like they have huge experience in riding bikes, but won't be able to ride one.
aerhardt
19 hours ago
We’ve been thinking about reaching the singularity from one end, by making computers like humans, but too little thought has been given to approaching the problem from the other end: by making babies build their world model by reading Stack Overflow.
zelphirkalt
6 hours ago
That's it. Now you've done it! I will have stackoverflow Q&A, as well as moderator comments and closings of questions playing 24/7 to my first not yet born child! Q&A for the knowledge and the mod comments for good behavior, of course. This will lead to singularity in no time!
pavlov
19 hours ago
The “Brave New World meets OpenAI” model where bottle-born babies listen to Stack Overflow 24 hours a day until they one day graduate to Alphas who get to spend Worldcoin on AI-generated feelies.
godelski
19 hours ago
It's a lot more complicated than that.
You have instincts, right? Innate fears? This is definitely something passed down through genetics. The Hawk/Goose Effect isn't just limited to baby chickens. Certainly some mental encoding passes down through genetics as how much the brain controls, down to your breathing and heartbeat.
But instinct is basic. It's something humans are even able to override. It's a first order approximation. Inaccurate to do meaningfully complex things, but sufficient to keep you alive. Maybe we don't want to call the instinct a world model (it certainly is naïve) but can't be discounted either.
In human development, yeah, the lion's share of it happens post birth. Human babies don't even show typical signs of consciousness, even really till the age of 2. There's many different categories of "awareness" and these certainly grow over time. But the big thing that makes humans so intelligent is that we continue to grow and learn through our whole lifetimes. And we can pass that information along without genetics and have very advanced tools to do this.
It is a combination of nature and nurture. But do note that this happens differently in different animals. It's wonderfully complex. LLMs are quite incredible but so too are many other non-thinking machines. I don't think we should throw them out, but we never needed to make the jump to intelligence. Certainly not so quickly. I mean what did Carl Sagan say?
imtringued
18 hours ago
One of the biggest mysteries of humans Vs LLMs is that LLMs need an absurd amount of data during pre training, then a little bit of data during fine tuning to make them behave more human. Meanwhile humans don't need any data at all, but have the blind spot that they can only know and learn about what they have observed. This raises two questions. What is the loss function of the supervised learning algorithm equivalent? Supposedly neurons do predictive coding. They predict what their neighbours are doing. That includes input only neurons like touch, pain, vision, sound, taste, etc. The observations never contain actions. E.g. you can look at another human, but that will never teach you how to walk because your legs are different from other people's legs.
How do humans avoid starving to death? How do they avoid leaving no children? How do they avoid eating food that will kill them?
These things require a complicated chain of actions. You need to find food, a partner and you need to spit out poison.
This means you need a reinforcement learning analogue, but what is going to be the reward function equivalent? The reward function can't be created by the brain, because it would be circular. It would be like giving yourself a high, without even needing drugs. Hence, the reward signal must remain inside the body but outside the brain, where the brain can't hack it.
The first and most important reward is to perform reproduction. If food and partners are abundant, the ones that don't reproduce simply die out. This means that reward functions that don't reward reproduction disappear.
Reproduction is costly in terms of energy. Do it too many times and you need to recover and eat. Hunger evolved as a result of the brain needing to know about the energy state of the body. It overrides reproductive instincts.
Now let's say you have a poisonous plant that gives you diarrhea, but you are hungry. What stops you from eating it? Pain evolves as a response to a damaged body. Harmful activities signal themselves in the form of pain to the brain. Pain overrides hunger. However, what if the plant is so deadly that it will kill you? The pain sensors wouldn't be fast enough. You need to sense the poison before it enters your body. So the tongue evolves taste and cyanide starts tasting bitter.
Notice something? The feelings only exist internally inside the human body, but they are all coupled with continued survival in one way or another. There is no such thing for robots or LLMs. They won't accidentally evolve a complex reward function like that.
godelski
16 hours ago
> Meanwhile humans don't need any data at all
I don't agree with this and I don't think any biologist or neuroscientist would either.1) Certainly the data I discussed exists. No creature comes out a blank slate. I'll be bold enough to say that this is true even for viruses, even if we don't consider them alive. Automata doesn't mean void of data and I'm not sure why you'd ascribe this to life or humans.
2) humans are processing data from birth (technically before too but that's not necessary for this conversation and I think we all know that's a great way to have an argument and not address our current conversation). This is clearly some active/online/continual/ reinforcement/wherever-word-you-want-to-use learning.
It's weird to suggest an either or situation. All evidence points to "both". Looking at different animals even see both but also with different distributions.
I think it's easy to over simplify the problem and the average conversation tends to do this. It's clearly a complex with many variables at play. We can't approximate with any reasonable accuracy by ignoring or holding them constant. They're coupled.
> The reward function can't be created by the brain, because it would be circular.
Why not? I'm absolutely certain I can create my own objectives and own metrics. I'm certain my definition of success is different from yours. > It would be like giving yourself a high, without even needing drugs
Which is entirely possible. Maybe it takes extreme training to do extreme versions but it's also not like chemicals like dopamine are constant. You definitely get a rush by completing goals. People become addicted to things like videogames, high risk activities like sky diving, or even arguing on the internet.Just because there are externally driven or influenced goals doesn't mean internal ones can't exist. Our emotions can be driven both externally and internally.
> Notice something?
You're using too simple of a model. If you use this model then the solution is as easy as giving a robot self preservation (even if we need to wait a few million years). But how would self preservation evolve beyond its initial construction without the ability to metaprocess and refine that goal? So I think this should highlight a major limitation in your belief. As I see it, the only other way is a changing environment that somehow allows continued survival by the constructions and precisely evolves such that the original instructions continue to work. Even with vague instructions that's an unstable equilibrium. I think you'll find there's a million edge cases even if it seems obvious at first. Or read some Asimov ;)ben_w
20 hours ago
> babies are already born with "the model of the world"
> but a lot of experiments on babies/young kids tell otherwise
I believe they are born with such a model? It's just that model is one where mummy still has fur for the baby to cling on to? And where aged something like 5 to 8 it's somehow useful for us to build small enclosures to hide in, leading to a display of pillow forts in the modern world?
rwj
20 hours ago
Lots of experiments show that babies develop import capabilities at roughly the same times. That speaks to inherited abilities.
xmichael909
11 hours ago
love the intentional use of udnerstand, brilliant!
zwnow
7 hours ago
A world model can not exist, the context windows aren't even near big enough for that. Weird that every serious scientist agrees on AGI not being a thing in the next decades. LLMs are good if you train them for a specific thing. Not so much if you expect them to explain the whole world to you. This is not possible yet.
zaphos
11 hours ago
"just a matter of adding more 9s" is a wild place to use a "just" ...
imtringued
19 hours ago
Model based reinforcement learning is a thing and it is kind of a crazy idea. Look up temporal difference model predictive control.
The fundamental idea behind temporal difference is that you can record any observable data stream over time and predict the difference between past and present based on your decision variables (e.g. camera movement, actuator movement, and so on). Think of it like the Minecraft clone called Oasis AI. The AI predicts the response to a user provided action.
Now imagine if it worked as presented. The data problem would be solved, because you are receiving a constant stream of data every single second. If anything, the RL algorithms are nowhere near where they need to be and continual learning has not been solved yet, but the best known way is through automatic continual learning ala Schmidhuber (co-inventor of LSTMs along with Hochreiter).
So, model based control is solved right? Everything that can be observed can be controlled once you have a model!
Wrong. Unfortunately. You still need the rest of reinforcement learning: an objective and a way to integrate the model. It turns out that reconstructing the observations is too computationally challenging and the standard computational tricks like U-Nets learn a latent representation that is optimized for reconstruction rather than for your RL objectives. There is a data exchange problem that can only realistically be solved by throwing an even bigger model at it, but here is why that won't work either:
Model predictive control tries to find the best trajectory over a receding horizon. It is inherently future oriented. This means that you need to optimize through your big model and that is expensive to do.
So you're going to have to take shortcuts by optimizing for a specific task. You reduce the dimension of the latent space and stop reconstructing the observations. The price? You are now learning a latent space for your particular task, which is less demanding. The dream of continual learning with infinite data shatters and you are brought down to earth: it's better than what came before, but not that much better.
exe34
20 hours ago
To me, it's a matter of a very big checklist - you can keep adding tasks to the list, but if it keeps marching onwards checking things off your list, some day you will get there. whether it's a linear or asymptotic march, only time will tell.
ekjhgkejhgk
20 hours ago
I don't know if you will get there, that's far from clear at this stage.
Did you see the recent video by Nick Beato [1] where he asks various models about a specific number? The models that get it right are the models that consume youtube videos, because there was a youtube video about that specific number. It's like, these models are capable of telling you about very similar things that they've seen, but they don't seem like they understand it. It's totally unclear whether this is a quantitative or qualitative gap.
cactusplant7374
20 hours ago
That's like saying that if we image every neuron in the brain we will understand thinking. We can build these huge databases and they tell us nothing about the process of thinking.
exe34
20 hours ago
What if we copy the functionality of every neuron? what if we simply copy all the skills that those neurons compute?
rootusrootus
20 hours ago
Do we even know the functionality of every neuron?
exe34
18 hours ago
Not yet.
danielvaughn
12 hours ago
I have a very surface level understanding of AI, and yet this always seemed obvious to me. It's almost a fundamental law of the universe that complexity of any kind has a long tail. So you can get AI to faithfully replicate 90% of a particular domain skill. That's phenomenal, and by itself can yield value for companies. But the journey from 90%-100% is going to be a very difficult march.
TeMPOraL
3 hours ago
FWIW, Karpathy literally says, multiple times, that he thinks we never left the exponential - that all human progress over last 4+ centuries averages out to that smooth ~2% growth rate exponential curve, that electricity and computing and AI are just ways we keep it going, and we'll continue on that curve for the time being.
It's the major point of contention between him and the host (who thinks growth rate will increase).
DanHulton
12 hours ago
The thing about this, though - cars have been built before. We understand what's necessary to get those 9s. I'm sure there were some new problems that had to be solved along the way, but fundamentally, "build good car" is known to be achievable, so the process of "adding 9s" there makes sense.
But this method of AI is still pretty new, and we don't know it's upper limits. It may be that there are no more 9s to add, or that any more 9s cost prohibitively more. We might be effectively stuck at 91.25626726...% forever.
Not to be a doomer, but I DO think that anyone who is significantly invested in AI really has to have a plan in case that ends up being true. We can't just keep on saying "they'll get there some day" and acting as if it's true. (I mean you can, just not without consequences.)
danielmarkbruce
11 hours ago
While you are right about the broader (and sort of ill defined) chase toward 'AGI' - another way to look at it is the self driving car - they got there eventually.And, if you work on applications using LLMs you can pretty easily see that Karpathy's sentiment is likely correct. You see it because you do it. Even simple applications are shaped like this, albeit each 9 takes less time than self driving cars for a simple app.. it still feels about right.
Hendrikto
7 hours ago
> another way to look at it is the self driving car - they got there eventually.
No they did not. Elon has been saying Tesla will get there “next year” since 2015. He is still saying that, and despite changing definitions, we still are not there.
1oooqooq
5 hours ago
i guess the comment you replied proves the actual point "we may never get there, but it will be enough for the market".
sigh, i guess it's time to laugh on that video compilation of elon saying "next week" for 10yrs straight and then cry seeing how much he made of doing that.
vasco
10 hours ago
> another way to look at it is the self driving car - they got there eventually
Current self driving cars only work in American roads. Maybe Canada too, not sure how their roads are. Come to Europe/anywhere else and every other road would be intractable. Much tighter lanes, many turns you have a little mirror to see who's coming on the other side, single car at a time lanes that you need to "understand" who goes first, mountain roads where you sometimes need to reverse for 100m when another car is coming so it's wide enough that they can pass before you can keep going forward, etc.
Many things like this that would require another 2 or 3 "nines" as the guy put it than acceptable quality in American huge roads.
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ4NWIt...
fair_enough
21 hours ago
Reminds me of a time-honored aphorism in running:
A marathon consists of two halves: the first 20 miles, and then the last 10k (6.2mi) when you're more sore and tired than you've ever been in your life.
jakeydus
20 hours ago
This is 100% unrelated to the original article but I feel like there's an underreported additional first half. As a bigger runner who still loves to run, the first two or three miles before I have enough endorphins to get into the zen state that makes me love running is the first half, then it's 17 miles of this amazing meditative mindset. Then the last 10k sucks.
awesome_dude
20 hours ago
Just, ftr, endorphins cannot pass the blood brain barrier
http://hopkinsmedicine.org/health/wellness-and-prevention/th...
rootusrootus
20 hours ago
I suspect that is true for many difficult physical goals.
My dad told me that the first time you climb a mountain, there will likely be a moment not too distant from the top when you would be willing to just sit down and never move again, even at the risk to your own life. Even as you can see the goal not far away.
He also said that it was a dangerous enough situation that as a climb leader he'd start kicking you if he had to, if you sat down like that and refused to keep climbing. I'm not a climber myself, though, so this is hearsay, and my dad is long dead and unable to remind me of what details I've forgotten.
tylerflick
21 hours ago
I think I hated life most after 20 miles. Especially in training.
sarchertech
20 hours ago
Why just run 20 miles then?
rootusrootus
20 hours ago
Because then it wouldn't be a challenge and nobody would care about the achievement.
monooso
2 hours ago
This makes no sense.
20 miles is still a challenge, and how many people run marathons because someone else is impressed if you run 26 miles, but couldn't care less if you run 20?
sarchertech
18 hours ago
I’m curious do ultramarathoners feel the same way about the rest of the race past 20 miles?
rootusrootus
17 hours ago
I've heard it claimed that an ultramarathon is fundamentally a different experience because while it definitely requires excellent physical stamina, it has a large mental component to it, as well as a much bigger focus on nutrition. Very different sort of race, I guess.
justinwp
15 hours ago
there are multiple cycles from highs to lows and back and then typically a larger dominant split similar what was discussed here for the marathon but scaled to the distance.
justinwp
15 hours ago
the split would be first 80 and las t 20 miles +-10 miles.
nextworddev
19 hours ago
because that'd be quitting the race with 6.2 miles left to go
sarchertech
18 hours ago
You could run a half marathon.
nextworddev
18 hours ago
yeah but anyone can do that
godelski
20 hours ago
It's a good way to think about lots of things. It's Pareto efficiency. The 80/20 rule
20% of your effort gets you 80% of the way. But most of your time is spent getting that last 20%. People often don't realize that this is fractal like in nature, as it draws from the power distribution. So of that 20% you still have left, the same holds true. 20% of your time (20% * 80% = 16% -> 36%) to get 80% (80% * 20% => 96%) again and again. The 80/20 numbers aren't actually realistic (or constant) but it's a decent guide.
It's also something tech has been struggling with lately. Move fast and break things is a great way to get most of the way there. But you also left a wake of destruction and tabled a million little things along the way. Someone needs to go back and clean things up. Someone needs to revisit those tabled things. While each thing might be little, we solve big problems by breaking them down into little ones. So each big problem is a sum of many little ones, meaning they shouldn't be quickly dismissed. And like the 9's analogy, 99.9% of the time is still 9hrs of downtime a year. It is still 1e6 cases out of 1e9. A million cases is not a small problem. Scale is great and has made our field amazing, but it is a double edged sword.
I think it's also something people struggle with. It's very easy to become above average, or even well above average at something. Just trying will often get you above average. It can make you feel like you know way more but the trap is that while in some domains above average is not far from mastery in other domains above average is closer to no skill than it is to mastery. Like how having $100m puts your wealth closer to a homeless person than a billionaire. At $100m you feel way closer to the billionaire because you're much further up than the person with nothing but the curve is exponential.
010101010101
20 hours ago
https://youtu.be/bpiu8UtQ-6E?si=ogmfFPbmLICoMvr3
"I'm closer to LeBron than you are to me."
yoyohello13
19 hours ago
I think a ton of people see a line going up and they think exponential. When in Reality, the vast majority of the time it’s actually logistic.
tibbar
19 hours ago
Given the physical limits of the universe and our planet in particular, yeah, this is pretty much always true. The interesting question is: what is that limit, and: how many orders of magnitude are we away from leveling off?
misnome
8 hours ago
I mean the cost line does look somewhat exponential…
sdenton4
21 hours ago
Ha, I often speak of doing the first 90% of the work, and then moving on to the following 90% of the work...
JimDabell
20 hours ago
> The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.
— Tom Cargill, Bell Labs (September 1985)
inerte
21 hours ago
I use "The project is 90% ready, now we only have to do the other half"
typpilol
20 hours ago
92% is half actually - RuneScape Players
omidsa1
20 hours ago
I also quite like the way he puts it. However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI. That's why we can expect fast acceleration to take off within two years.
breuleux
20 hours ago
I don't think we can be confident that this is how it works. It may very well be that our level of intelligence has a hard limit to how many nines we can add, and AGI just pushes the limit further, but doesn't make it faster per se.
It may also be that we're looking at this the wrong way altogether. If you compare the natural world with what humans have achieved, for instance, both things are qualitatively different, they have basically nothing to do with each other. Humanity isn't "adding nines" to what Nature was doing, we're just doing our own thing. Likewise, whatever "nines" AGI may be singularly good at adding may be in directions that are orthogonal to everything we've been doing.
Progress doesn't really go forward. It goes sideways.
bamboozled
13 hours ago
It's also assuming that all advances in AI just lead to cold hard gains, people have suggested this before but would a sentient AI get caught up in philosophical, silly or religious ideas? Silicone investor types seem to hope it's all just curing diseases they can profit from, but it might also be, "let's compose some music instead"?
Unit327
8 hours ago
AI doesn't have hopes and desires or something it would rather be doing. It has a utility function that it will optimise for regardless of all else. This doesn't change when it gets smarter, or even when it gets super-intelligence.
adventured
19 hours ago
Adding nines to nature is exactly what humans are doing. We are nature. We are part of the natural order.
Anything that exists is part of nature, there can be no exceptions.
If I go burn a forest down on purpose, that is in fact nature doing it. No different than if a dolphin kills another animal for fun or a chimp kills another chimp over a bit of territory. Insects are also every bit as 'vicious' in their conquests.
j45
19 hours ago
Intuition of someone who has put in a decade or two of wondering openly can't me discounted as easily as someone who might be a beginner to it.
AGI to encompass all of humanity's knowledge in one source and beat every human on every front might be a decade away.
Individual agents with increased agency adequately covering more and more abilities consistently? Seems like a steady path that can be seen into the horizon to put one foot in front of the other.
For me, the grain of salt I'd take Karpathy with is much, much, smaller than average, only because he tries to share how he thinks and examines his own understanding and changes it.
His ability to explain complex things simply is something that for me helps me learn and understand things quicker and see if I arrive at something similar or different, and not immediately assume anything is wrong, or right without my understanding being present.
rpcope1
20 hours ago
> However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI.
There's a massive planet-sized CITATION NEEDED here, otherwise that's weapons grade copium.
Yoric
20 hours ago
It's a possibility, but far from certainty.
If you look at it differently, assembly language may have been one nine, compilers may have been the next nine, successive generations of language until ${your favorite language} one more nine, and yet, they didn't get us noticeably closer to AGI.
aughtdev
19 hours ago
I doubt this. General intelligence will be a step change not a gentle ramp. If we get to an architecture intelligent enough to meaningfully contribute to AI development, we'll have already made it. It'll simply be a matter of scale. There's no 99% AGI that can help build 100% AGI but for some reason can't drive a car or cook a meal or work an office job.
AnimalMuppet
20 hours ago
Isn't that one of the measures of when it becomes an AGI? So that doesn't help you with however many nines we are away from getting an AGI.
Even if you don't like that definition, you still have the question of how many nines we are away from having an AI that can contribute to its own development.
I don't think you know the answer to that. And therefore I think your "fast acceleration within two years" is unsupported, just wishful thinking. If you've got actual evidence, I would like to hear it.
ben_w
20 hours ago
AI has been helping with the development of AI ever since at least the first optimising compiler or formal logic circuit verification program.
Machine learning has been helping with the development of machine learning ever since hyper-parameter optimisers became a thing.
Transformers have been helping with the development of transformer models… I don't know exactly, but it was before ChatGPT came out.
None of the initials in AGI are booleans.
But I do agree that:
> "fast acceleration within two years" is unsupported, just wishful thinking
Nobody has any strong evidence of how close "it" is, or even a really good shared model of what "it" even is.
scragz
20 hours ago
AGI is when it is general. a narrow AI trained only on coding and training AIs would contribute to the acceleration without being AGI itself.
techblueberry
12 hours ago
I think the 9's include this assumption.
somanyphotons
21 hours ago
This is an amazing quote that really applies to all software development
Veserv
18 hours ago
Drawn from Karpathy killing a bunch of people by knowingly delivering defective autonomous driving software instead of applying basic engineering ethics and refusing to deploy the dangerous product he was in charge of.
zeroonetwothree
21 hours ago
Well, maybe not all. I’ve definitely built CRUD UIs that were linear in effort. But certainly anything technically challenging or novel.
zeroonetwothree
21 hours ago
When I worked at Facebook they had a slogan that captured this idea pretty well: “this journey is 1% finished”.
gowld
21 hours ago
Copied from Amazon's "Day 1".
tekbruh9000
20 hours ago
Infinitely big little numbers
Academia has rediscovered itself
Signal attenuation, a byproduct of entropy, due to generational churn means there's little guarantee.
Occam's Razor; Karpathy knows the future or he is self selecting biology trying to avoid manual labor?
His statements have more in common with Nostradamus. It's the toxic positivity form of "the end is nigh". It's "Heaven exists you just have to do this work to get there."
Physics always wins and statistics is not physics. Gamblers fallacy; improvement of statistical odds does not improve probability. Probability remains the same this is all promises of some people who have no idea or interest in doing anything else with their lives; so stay the course.
startupsfail
19 hours ago
>> Heaven exists you just have to do this work to get there.
Or perhaps Karpathy has a higher level understanding and can see a bigger picture?
You've said something about heaven. Are you able to understand this statement, for example: "Heaven is a memeplex, it exists." ?
ojr
16 hours ago
if it works 90% of the time that means it fails 10% of the time, to get to 1% failure rate is a 10x improvement and from 1% failure rate to a 0.1% failure rate is also a 10x improvement
First time being hearing it be called "march of nines", did Tesla make the term, I thought it was an Amazon thing
czk
21 hours ago
like leveling to 99 in old school runescape
fbrchps
21 hours ago
The first 92% and the last 92%, exactly.
zeroonetwothree
21 hours ago
Or Diablo 2
genewitch
19 hours ago
i don't remember the end-game of the original Diablo; however, in diablo III and IV everyone i've tried to play the game gets bored in the run up to max level. I always tell them "i skip that part as much as possible, because that's not the game. That's just the story!"
Once you hit max level in III and IV, the game actually "begins."
and to explain the Diablo 2 Reference, the amount of time/effort it takes to go from level 98 to level 99 (the max level), is the same amount of time it takes to go from level 1 to level 98. I've heard "2 weeks" as a rough estimate of "unhealthy playtime", at least solo.
wilfredk
20 hours ago
Perfect analogy.
jlas
21 hours ago
Notably the scaling law paper shows result graphs on log-scale
red75prime
20 hours ago
The question is how many nines are humans.
notTooFarGone
8 hours ago
Humans adapt and become more nines the more they learn about something. Humans also are liable in a lawful sense. This is a huge factor in any AI use case.
jakeydus
20 hours ago
You know what they say, a Silicon Valley 9 is a 10 anywhere else. Or something like that.
Yoric
20 hours ago
I assume you're describing the fact that Silicon Valley culture keeps pushing out products before they're fully baked?