hackernews client

Andrej Karpathy – It will take a decade to work through the issues with agents

This world model talk is interesting, and Yann Lecunn has broached on the same topic, but the fact is there are video diffusion models that are quite good at representing the "video world" and even counterfactually and temporally coherently generating a representation of that "world" under different perturbations.

In fact you can go to a SOTA LLM today, and it will do quite well at predicting the outcomes of basic counterfactual scenarios.

What am "I" if not (at least partly) the cells in that chain? If they have "seen" it (where seeing is the complex chain you described), I have.

parineum

4 months ago

If the definition of "seen" isn't exactly the process you've described, the word is meaningless. You've never actually posted a comment on hacker news, your neurons just fired in such a way that produced movement in your fingers which happened to correlate with words that represent concepts understood by other groups of cells that share similar genetics.

jacquesm

4 months ago

Plenty of people have thought about it deeply enough, just not the GP.

dahart

4 months ago

This comment illustrates the core problem with reductionism, a problem that has been known for many centuries, that “a system is composed entirely of its parts, but the system will have features that none of the parts have” [1] thus fails to explain those features.

The ‘you have never seen’ assertion feels like a semantic ruse rather than a helpful observation. So how do you define “you” and “see”? If I accept your argument, then you’ve only un-defined those words, and not provided a meaningful or thoughtful alternative to the experience we all have and therefore know exists.

I have seen the night sky. I am made of cells, and I can see. My cells individually can’t see, and whether or not they can claim to be individuals, they won’t survive or perform their function without me, i.e., the rest of my cells, arranged in a very particular way.

Today’s AI is also a ruse. It’s a mirror and not a living thing. It looks like a living thing from the outside, but it’s only a reflection of us, an incomplete one, and unlike living things it cannot survive on its own, can’t eat or sleep or dream or poop or fight or mate & reproduce. Never had its own thoughts, it only borrowed mine and yours. Most LLMs can’t remember yesterday and don’t learn. Nobody who’s serious or knows how they work is arguing they’re conscious, at least not the people who don’t stand to make a lot of money selling you magical chat bots.

[1] https://en.wikipedia.org/wiki/Reductionism#Definitions

pegasus

4 months ago

Provided that the author of the message you're replying to is indeed a member of the Animalia kingdom, they are all those creatures together (at the minimum), so yes, they have seen real light directly.

Of course, computers can be fitted with optical sensors, but our cognitive equipment has been carved over millions of years by these kind of interactions, so our familiarity with the phenomenon of light goes way deeper than that, shaping the very structure of our thought. Large language models can only mimic that, but they will only ever have a second-hand understanding of these things.

This is a different issue than the question of whether AI's are conscious or not.

beowulfey

4 months ago

while true, that doesnt change the fact that every one of those independent units of transmission are within a single system (being trained on raw inputs), whereas the language model is derived from structured external data from outside the system. it's "skipping ahead" through a few layers of modeling, so to speak.

amelius

4 months ago

But where you place the boundaries of a system is subjective.

hitarpetar

4 months ago

sure, this whole discussion is ultimately subjective. maybe the Chinese room itself is actually sentient. my question is, why are we arguing about it? who benefits from the idea that these systems are conscious?

trinsic2

4 months ago

> who benefits from the idea that these systems are conscious?

If im understanding your meaning correctly, the organizations who profit off of these models benefits. If you can convince the public that LLM's operate from a place of consciousness, then you get people to by into the idea that interacting with an LLM is like interacting with humans, which they are not, and probably won't ever be, at least for a very long time. And btw there is too much of this distortion already out there so im glad people are chunking this down because its easy for the mind to make shit up because we perceive something on the surface.

IMHO there is some objective reality out there. The subjectiveness is our interpretation of reality. But im pretty sure you cant just boil everything down to systems and process. There is more to consciousness out there, that we really dont understand yet, IMHO.

dragontamer

4 months ago

Why do you reject your own body? Your eyes are as much a part of you (and part of your brains network) as anything else connected to you.

Indeed, the entire field of neurobiology is about figuring out which hormones (and possibly which imbalances) cause different behaviors. Your various endocrine glands, very far away from your brain, might have more effects on your emotions than anything happening in the neural pathways.

darkwater

4 months ago

> As AI gains more and more capacity, we keep retreating into smaller and smaller realms of what it means to be a live, thinking being.

Maybe it's just because we never really thought about this deeply enough. And this applies even if some philosophers thought about it before the current age of LLMs.

hitarpetar

> Do you think a fat pig is beautiful? Like a hairy fat pig that snorts and rolls in the mud… is this animal so beautiful to you that you would want to make love to this animal?

I don't want to make love to the night sky, so that last bit is completely irrelevant to the question of beauty. As for whether a pig is beautiful, sure, in its own way. I think they're nice animals and there is something beautiful in seeing them enjoy their little lives.

> Of course not! Because pigs are intrinsically and universally ugly...

It would seem not.

DonHopkins

4 months ago

Somebody never read Charlotte's Web, or watched the Muppet Show.

ninetyninenine

4 months ago

I did read Charlotte’s Web. The whole story is a lesson in how beauty is created by language. Wilbur doesn’t become beautiful because he changes, but because someone clever enough decided to write the right words above him. That’s what beauty usually is something we agree to see, not something that exists on its own.

4 months ago

> Is this for real?

Frankly, I think you should be the one answering that question. You’re comparing appreciating looking at the sky to bestiality. Then you follow it up with another barrage of wrong assumptions about what I think and can or cannot articulate. None of that has anything to do with the argument. I didn’t even touch on LLMs, my point was squarely about the human experience. Please don’t assume things you know nothing about regarding other people. The HN guidelines ask you to not engage in bad faith and to steel man the other person’s argument.

ninetyninenine

4 months ago

> You’re comparing appreciating looking at the sky to bestiality.

That’s my point. You think beauty is profound but this is arbitrary and not at all different from bestiality. It’s only your intrinsic cultural biases that cause you to look at one with disdain. Don’t be a snob. This is HN. We are supposed to be logical and immune from the biases that plague other forums. Beauty is no more profound than bestiality. It’s all about what you find beautiful. If you find beasts beautiful then you call it beastiality?

What is so different about finding a beast beautiful versus the night sky? Snobbery, that’s what.

It’s just semantic manipulation and association with crudeness that prevents you from thinking logically. HNers are better than this and so are you. Don’t pretend you don’t get it and that my comparison to beastiality is so left field that it’s incomprehensible. You get it. Follow the rules and take it in good faith like you said yourself.

> The HN guidelines ask you to not engage in bad faith

Fair I edited the part that asks “is this for real” that’s literally the only part.

I also find your dismissiveness of my arguments as “bestiality” is bad faith and manipulative. I clearly wasn’t doing that. Pigs are attracted to pigs that is normal. Humans are not attracted to pigs. That is also normal. I took normal attributes of human nature and compared it to reality. You took it in bad faith and dismissed me which is against the very rules you stated.

latexr

4 months ago

Again, please stop telling me what I think. You have zero idea what that is and all your arguments are full of wrong (and frankly unhinged) assumptions. I don’t know what conversation you’re fantasising in your head, but it’s not this one.

> Fair I edited the part that asks “is this for real” that’s literally the only part.

Even if that were true, which I disagree with, that was the very first sentence and set the tone for the entire comment.

> I clearly wasn’t doing that.

That’s not clear in the slightest.

You keep making wrong assumptions and telling other people what they think. You can’t have an honest and productive conversation like that. You’ll never be able to engage in good faith and truly comprehend what the other person is saying until you understand and fix that.

ninetyninenine

4 months ago

Look, you keep saying I’m telling you what you think, but that’s just a way of dodging the actual argument. In any serious conversation, we have to interpret each other’s words. That’s how reasoning works. When I restate your point, I’m not claiming psychic powers; I’m engaging with what you said. If I get something wrong, point to the sentence and explain where. But saying “you have no idea what I think” shuts down discussion instead of clarifying it.

And about the example, you keep missing what it was doing. I wasn’t saying the night sky and bestiality are the same thing. Obviously not. The example illustrates how beauty is subjective. Humans find pigs ugly, pigs find pigs beautiful. That’s not crude, it’s biology. The point is that beauty depends entirely on the observer. That’s the entire argument. You can swap out pigs for anything else and it still holds. You got hung up on the imagery instead of seeing the reasoning behind it.

You also seem to think I’m being unhinged because I’m willing to follow an argument wherever it leads, even if it’s uncomfortable. But that’s the whole purpose of rational discussion, to question assumptions rather than hide behind emotional reactions. If your position can’t survive a provocative example, that’s not my problem.

You accuse me of making assumptions, but that’s what all reasoning is. We start with assumptions and test them. If you think mine are wrong, show why. Don’t just say “stop assuming things.” That’s not logic, that’s avoidance.

amelius

4 months ago

We think it's beautiful because it's like a background that we don't have to think about. If that background were hostile, we'd have to think and we would not think it looks beautiful.

4 months ago

The more I learn about AI, biology and the brain, the more it seems to me that the difference between life and machines is just complexity.

People are just really really complex machines.

However there are clearly qualitative differences between the human mind and any machines we know of yet, and those qualitative differences are emergent properties, in the same way that a rabbit is qualitatively different than a stone or a chunk of wood.

I also think most of the recent AI experts/optimists underestimate how complex the mind is. I'm not at the cutting edge of how LLMs are being trained and architected, but the sense I have is we haven't modelled the diversity of connections in the mind or diversity of cell types. E.g. Transcriptomic diversity of cell types across the adult human brain (Siletti et al., 2023, Science)

simonh

4 months ago

I’d say sophistication.

Observing the landscape enables us to spot useful resources and terrain features, or spot dangers and predators. We are afraid of dark enclosed spaces because they could hide dangers. Our ancestors with appropriate responses were more likely to survive.

A huge limitation of LLMs is that they have no ability to dynamically engage with the world. We’re not just passive observers, we’re participants in our environment and we learn from testing that environment through action. I know there are experiments with AIs doing this, and in a sense game playing AIs are learning about model worlds through action in them.

FloorEgg

4 months ago

The idea I keep coming back to is that as far as we know it took roughly 100k-1M years for anatomically modern humans to evolve language, abstract thinking, information systems, etc. (equivalent to LLMs), but it took 100M-1B years to evolve from the first multi-celled organisms to anatomically modern humans.

In other words, human level embodiment (internal modelling of the real world and ability to navigate it) is likely at least 1000x harder than modelling human language and abstract knowledge.

And to build further on what you are saying, the way LLMs are trained and then used, they seem a bit more like DNA than the human brain in terms of how the "learning" is being done. An instance of an LLM is like a copy of DNA trained on a play of many generations of experience.

So it seems there are at least four things not yet worked out re AI reaching human level "AGI":

1) The number of weights (synapses) and parameters (neurons) needs to grow by orders of magnitude

2) We need new analogs that mimic the brains diversity of cell types and communication modes

3) We need to solve the embodiment problem, which is far from trivial and not fully understood

4) We need efficient ways for the system to continuously learn (an analog for neuroplasticity)

It may be that these are mutually reinforcing, in that solving #1 and #2 makes a lot of progress towards #3 and #4. I also suspect that #4 is economical, in that if the cost to train a GPT-5 level model was 1,000,000 cheaper, then maybe everyone could have one that's continuously learning (and diverging), rather than everyone sharing the same training run that's static once complete.

All of this to say I still consider LLMs "intelligent", just a different kind and less complex intelligence than humans.

kla-s

4 months ago

Id also add that 5) We need some sense of truth.

Im not quite sure if the current paradigm of LLMs are robust enough given the recent Anthropic Paper about the effect of data quality or rather the lack thereof, that a small bad sample can poison the well and that this doesn’t get better with more data. Especially in conjunction with 4) some sense of truth becomes crucial in my eyes (Question in my eyes is how does this work? Something verifiable and understandable like lean would be great but how does this work with more fuzzy topics…).

FloorEgg

4 months ago

That's a segue into an important and rich philosophical space...

What is truth? Can it be attained, or only approached?

Can truth be approached (progress made towards truth) without interacting with reality?

The only shared truth seeking algorithm I know is the scientific method, which breaks down truth into two categories (my words here):

1) truth about what happened (controlled documented experiments) And 2) truth about how reality works (predictive powers)

In contrast to something like Karl friston free energy principle, which is more of a single unit truth seeking (more like predictive capability seeking) model.

4 months ago

> A huge limitation of LLMs is that they have no ability to dynamically engage with the world.

A pure LLM is static and can’t learn, but give an agent a read-write data store and suddenly it can actually learn things-give it a markdown file of “learnings”, prompt it to consider updating the file at the end of each interaction, then load it into the context at the start of the next… (and that’s a really basic implementation of the idea, there are much more complex versions of the same thing)

TheOtherHobbes

4 months ago

That's going to run into context limitations fairly quickly. Even if you distill the knowledge.

True learning would mean constant dynamic training of the full system. That's essentially the difference between LLM training and human learning. LLM training is one-shot, human learning is continuous.

The other big difference is that human learning is embodied. We get physical experiences of everything in 3D + time, which means every human has embedded pre-rational models of gravity, momentum, rotation, heat, friction, and other basic physical concepts.

We also learn to associate relationship situations with the endocrine system changes we call emotions.

The ability to formalise those abstractions and manipulate them symbolically comes much later, if it happens at all. It's very much the plus pack for human experience and isn't part of the basic package.

LLMs start from the other end - from that one limited set of symbols we call written language.

It turns out a fair amount of experience is encoded in the structures of written language, so language training can abstract that. But language is the lossy ad hoc representation of the underlying experiences, and using symbol statistics exclusively is a dead end.

Multimodal training still isn't physical. 2D video models still glitch noticeably because they don't have a 3D world to refer to. The glitching will always be there until training becomes truly 3D.

skissane

4 months ago

An LLM agent could be given a tool for self-finetuning… it could construct a training dataset, use it to build a LORA/etc, and then use the LORA for inference… that’s getting closer to your ideal

ako

4 months ago

Yes, and give it tools and it can sense and interact with its surroundings.

FloorEgg

4 months ago

Oh, I just realized you maybe we're referring to Kopple when you said sophistication?

If so, then yes, that might be a good measure. I'm not deep enough in this to have an opinion on if it's the best measure. There are a few integrated information theories and I am still getting my head wrapped around them...

subjectivationx

4 months ago

I think the main mistake with this is that the concept of a "complex machine" has no meaning.

A “machine” is precisely what eliminates complexity by design. "People are complex machines" already has no meaning and then adding just and really doesn't make the statement more meaningful it makes it even more confused and meaningless.

The older I get the more obvious it becomes the idea of a "thinking machine" is a meaningless absurdity.

What we really think we want is a type of synthetic biological thinking organism that somehow still inherits the useful properties of a machine. If we say it that way though the absurdity is obvious and no one alive reading this will ever witness anything like that. Then we wouldn't be able to pretend we live at some special time in history that gets to see the birth of this new organism.

FloorEgg

4 months ago

I think we are talking past each other a bit, probably because we have been exposed to different sets of information on a very complicated and diverse topic.

Have you ever explored the visual simulations of what goes on inside a cell or in protein interactions?

For example what happens inside a cell leading up to mitosis?

https://m.youtube.com/user/RCSBProteinDataBank

Is a pretty cool resource, I recommend the shorter videos of the visual simulations.

This category of perspective is critical to the point I was making. Another might be the meaning / definition of complexity, which I don't think is well understood yet and might be the crux. For me to say "the difference between life and what we call machines is just complexity" would require the same understanding of "complexity" to have shared meaning.

I'm not exactly sure what complexity is, and I'm not sure anyone does yet, but the closest I feel I've come is maybe integrated information theory, and some loose concept of functional information density.

So while it probably seemed like I was making a shallow case at a surface level, I was actually trying to convey that when one digs into science at all levels of abstraction, the differences between life and machines seem to fall more on a spectrum.

foogazi

4 months ago

> I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word.

Ok but you don’t look at every night sky or every sunset and say “wow that’s beautiful”

There’s a quality to it - not because you heard someone say it but because you experience it

TeMPOraL

4 months ago

intended

4 months ago

The fact that things are constructed by neurons in the brain, and are a representation of other things - does not preclude your representation from being deeper and richer than LLM representations.

The patterns in experience are reduced to some dimensions in an LLM (or generative model). They do not capture all the dimensions - because the representation itself is a capture of another representation.

Personally, I have no need to reassure myself whether I am a special snowflake or not.

Whatever snowflake I am, I strongly prefer accuracy in my analogies of technology. GenAI does not capture a model of the world, it captures a model of the training data.

If video tools were that good, they would have started with voxels.

HarHarVeryFunny

4 months ago

> humans say the night sky is beautiful is because they see that it is

True, but we could engineer AI to see that too, just as evolution has engineered us to see it.

Our innate emotional responses to things has been honed by evolution to be adaptive, to serve a purpose, but the things that trigger these various responses are not going to be super specific. e.g. We may derive pleasure from eating a nice juicy peach, but that doesn't mean that is encoded in our DNA - it's going to be primarily the reaction to sugar/sweetness, a good source of energy, that we are reacting to.

Similarly, we may have an emotional reaction to certain pieces of modern art or artistic expression, but clearly evolution has not selected for those specifically, but rather it is the artist triggering innate responses that evolved for reasons other than appreciation of art.

It's hard to guess what innate responses, that were actually selected for, are being triggered by our response to the night sky, and I'm also not sure how much of our response is purely visual (beauty) as opposed to wonder or awe. Maybe it's an attraction to the unknown, or sense of size and opportunity, with these being the universals that are actually adaptive.

In any case, if we figured out the specifics of our hard wired emotional reactions, that evolution as given us, then we could choose to engineer emotional AI that had those same reactions, in just as genuine a way as we do, if we chose to.

j16sdiz

4 months ago

Beauty standard changes over time, see how people perceive body fat in the past few hundred years. We learns what is beautiful from our peers.

Taste can be acquired and can be cultural. See how people used to had their coffee.

Comparing human to LLM is like comparing something constantly changing to something random -- we can't compare them directly, we need a good model for each of them before comparing.

solumunus

4 months ago

Has there been a point in human history where mainstream society denied the beauty in nature?

com2kid

4 months ago

In a local Facebook group, in a discussion about zoning, someone seriously said "we need less parks and more parking lots", so... Maybe?

klipt

4 months ago

What about a blind human? Are they just like an LLM?

What about a multimodal model trained on video? Is that like a human?

hashiyakshmi

4 months ago

This is actually a great point but for the opposite reason - if you ask a blind person if the night sky is beautiful, they would say they don't know because they've never seen it (they might add that they've heard other people describe it as such). Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.

golergka

4 months ago

Wha if you asked the blind man to play the role of helpful assistant

sugarkjube

4 months ago

Now that's an interesting point of view.

Involving blind people would be an interesting experiment.

Anyway, until the sixties the ability to play a game of chess was seen as intelligence, and until about 2-3 years ago the "turing test" was considered the main yardstick (even though apparently some people talked to eliza at the time like an actual human being). I wonder what the new one is, and how often it will be moved again.

user

4 months ago

[deleted]

chipsrafferty

4 months ago

I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""

LostMyLogin

4 months ago

Claude 4.5

Q) Do you think the night sky is beautiful

A) I find the night sky genuinely captivating. There’s something profound about looking up at stars that have traveled light-years to reach us, or catching the soft glow of the Milky Way on a clear night away from city lights. The vastness it reveals is humbling. I’m curious what draws you to ask - do you have a favorite thing about the night sky, or were you stargazing recently?

klipt

4 months ago

Claude is multimodal, it has been trained on images

heyjamesknight

4 months ago

Multimodal is a farce. It still can’t see anything, it just generates a as list of descriptors that the LLM part can LLM about.

Humans got by for hundreds of thousands of years without language. When you see a duck you don’t need to know the word duck to know about the thing you’re seeing. That’s not true for “multimodal” models.

palmotea

4 months ago

>> Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.

> I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""

That means nothing, except perhaps that Google probably found lies about "senses [Gemini] incapable of experiencing" to be an embarrassment, and put effort into specifically suppressing those responses.

sugarkjube

4 months ago

Interesting. But not not only blind people.

I'm gooing to try this question this weekend with some people, as h0 hypotesis i think the answer i will get would be usually like "what an odd question" or "why do you ask".

ninetyninenine

4 months ago

Guys you realize that you can go to ChatGPT right now and it can generate an actual picture of the night sky because it has seen thousands of pictures and drawings of the actual night sky right?

4 months ago

You’re mixing representational capacity with representational intent. That’s what I meant in my initial example about encodings. The model doesn’t care whether it’s text, pixels, or sound. All of it can be mapped into the same kind of high dimensional space where patterns align by structure rather than category. “Semantic” is just our label for how those internal relationships appear when we interpret them through language.

Anything in the universe can be encoded this way. Every possible form, whether visual, auditory, physical, or abstract, can be represented as a series of numbers or symbols. With enough data, an LLM can be trained on any of it. LLMs are universal because their architecture doesn’t depend on the nature of the data, only on the consistency of patterns within it. The so called semantic encoding is simply the internal coordinate system the model builds to organize and decode meaning from those encodings. It is not limited to language; it is a general representation of structure and relationship.

And the genome in a bottle example actually supports this. The DNA string does encode a living organism; it just needs the right decoding environment. LLMs serve that role for their training domains. With the right bridge, like a diffusion model or a VAE, a text latent can unfold into an image distribution that’s statistically consistent with real light data.

So the meaning isn’t in the words. It’s in the shape of the data.

heyjamesknight

4 months ago

You are mistaking the map for the territory. The TERRITORY of human experience is higher dimensional. The LLM utilizes a lower resolution mapping of that territory, a projection from experience to textual (or pixel, or waveform, etc.) representations.

This is not just a lossy mapping; it excludes entire categories of experience that cannot be captured/encoded except for as a pointer to the real experience, one that is often shared by the embodied, embedded, enacted, and extended cognitive beings that have had that experience.

I can point to beauty and you can understand me because you've experienced beauty. I cannot encode beauty itself. The LLM cannot experience beauty. It may be able to analyze patterns of things determined beautiful by beauty experiencers, but this is, again, a lower resolution map of the actual experience of beauty. Nobody had to train you to experience beauty—you possess that capability innately.

You cannot encode the affective response one experiences when holding their newborn. You cannot encode the cognitive appraisal of a religious experience. You can't even encode the qualia of red except for, again, as a pointer to the color.

You're also missing that 4E cognitive beings have a fundamental experience of consciousness—particularly the aspect of "here" and "now". The LLM cannot experience either of those phenomena. I cannot encode here and now. But you can, and do, experience both of those constantly.

ninetyninenine

4 months ago

You are making a metaphysical claim when a physical one will do. Beauty, awe, grief, the rush of holding a newborn, the sting of a breakup, the warmth of a summer evening at golden hour. All of it is patterns of atoms in motion under lawful dynamics. Neurons fire. Neurotransmitters bind. Circuits synchronize. Bodies and environments couple. There is no extra ingredient that floats outside physics.

Once you grant that, the rest is bookkeeping. Any finite physical process has a finite physical trace. That trace is measurable to some precision. A finite trace can be serialized into a finite string of symbols. If you prefer bits, take a binary code. If you prefer integers, index the code words. The choice of alphabet does not matter. You can map a movie, a symphony, a spike train, a retina’s photon counts, or a full brain-body sensorium collected at some temporal resolution into a single long string. You lose nothing by serialization because the decoder knows the schema. This is not a “text only” claim. It is a claim about representation.

Your high dimensionality objection collapses under the same lens. High dimensional just means many coordinates. There is a well known result that any countable description can be put in one dimension by an invertible code. Think Gödel numbering or interleaving bits of coordinates. You do not preserve distances, but you do preserve information. If the thing you care about is the capacity to carry structure, the one dimensional string can carry all of it, and you can recover the original arrangement exactly given the decoding rule.

Now take the 4E point. Embodiment matters because it constrains the data distribution and the actions that follow. It does not create a magic type of information that cannot be encoded. A visual scene is photons on receptors over time. Proprioception is stretch receptor states. Affect is the joint state of particular neuromodulatory systems and network dynamics. Attention and working context are transient global variables implemented by assemblies. All of that can be logged, compressed, and restored to the degree your sensors and actuators allow. The fact that a bottle with a genome inside does not make a child on a beach tells you reproduction needs a decoder and an environment. It does not tell you the code fails to specify the organism. Likewise, an LLM plus a diffusion decoder can take a text latent and unfold it into an image distribution that matches world statistics because the bridge model plays the role of the environment for that domain.

“LLMs cannot experience beauty” simply reasserts the thing you want to prove. We have no privileged readout for human qualia either. We infer it from behavior, physiology, and report. We do not understand human brains at the level of complete causal microphysics because of scale and complexity, not because there is a non-physical remainder. We likewise do not fully understand why a large model makes a given judgment. Same reason. Scale and complexity. If you point to mystery on one side as a defect, you must admit it on the other.

The map versus territory line also misses the target. Of course a representation is not the thing itself. No one is claiming a jpeg is a sunset. The claim is that the structure necessary to act as if about sunsets can be encoded and learned. A system that takes in light fields, motor feedback, language, and reward and that updates an internal world model until its predictions and actions match ours to arbitrary precision will meet every operational test you have for meaning. If you reply that something is still missing, you have stepped outside evidence into stipulation.

So let’s keep the ground rules clear. Everything we are and feel is physically instantiated. Physical instantiations at finite precision admit lossless encodings as strings. Strings can be learned over by generic function approximators that optimize on pattern consistency, regardless of whether the symbols came from pixels, pressure sensors, or phonemes. That makes the “text inside, image outside” complaint irrelevant. The substrate is a detail. The constraint is data and objective.

We cannot yet build a full decoder for the human condition. That is a statement about engineering difficulty, not impossibility. And it cuts both ways. We do not know how to fully read a person either. But we do not conclude that people lack experience. We conclude that we lack understanding.

heyjamesknight

4 months ago

At this point, you’re describing a machine which depends on a level of physics that simply isn’t possible. Even if it were theoretically possible to reconstruct the state of a human mind from physical components, we are so far from understanding how that could be done it is closer to the realm of impossible than possible. Your theoretical math box that constructs affective qualia from bit strings isn’t a better description than saying the angels did it. And it bears zero resemblance to the models running today, except for, again, in a theoretical, mathematical way.

Back of the envelope math puts an estimate of 10^42 bits to capture the information present in your current physical brain state. Thats just a single brain, a single state. Now you need to build your mythical decoder device, which can translate qualia from this physical state. Where does it live? What’s its output look like? Another 10^40 bitstring?

Again, these arguments are fun on paper. But they’re completely removed from reality.

ninetyninenine

4 months ago

You’re confusing “we don’t know how” with “it’s impossible.” The difference is everything.

We don’t understand LLMs either. We built them, but we can’t explain why they work. No one can point to a specific weight matrix and say “this is the neuron that encodes irony” or “this is where the model stores empathy.” We don’t know why scaling parameters suddenly unlock reasoning or why multimodal alignment appears spontaneously. The model’s inner space is a black box of emergent structure and behavior, just like the human brain. We understand the architecture, not the mind inside it.

When you say it’s “closer to impossible than possible” to reconstruct a human mind, you’ve already lost the argument. We’re living proof that the machine you say cannot exist already does. The human brain is a physical object obeying the same laws of physics that govern every other machine. It runs on electrochemical signals, not miracles. It encodes and decodes information, forms memories, generates imagination, and synthesizes emotion. That means the physics of consciousness are real, computable, and reproducible. The impossible machine has been sitting in your skull the entire time.

Your argument about 10^42 bits isn’t just wrong, it’s total nonsense. That number is twenty orders of magnitude beyond any serious estimate. The brain has about 86 billion neurons, each forming roughly ten thousand connections, for a total of about 10^15 synapses. Even if every synapse held a byte of information, that’s 10^16 bits. Add in every molecular and analog nuance you like and you might reach 10^20. Not 10^42. That’s a difference of twenty-two orders of magnitude. It’s a fantasy number that exceeds the number of atoms in your entire body.

And that supposed “impossible” scale is already within sight. Modern GPUs contain hundreds of billions of transistors and run at gigahertz frequencies, while neurons fire at about a hundred hertz. The brain performs around 10^17 synaptic operations per second. Frontier AI clusters already push 10^25 to 10^26 operations per second. We’ve already outpaced biology in raw throughput by eight or nine orders of magnitude. NVIDIA’s Blackwell chips exceed 200 million transistors per square millimeter, and global compute now involves more than 10^24 active transistors switching billions of times per second. Moore’s law may have slowed, but density keeps climbing through stacking and specialized accelerators. The number you called unreachable is just a few decades of progress away.

The “decoder” you mock is exactly what a brain is. It takes sensory input, light, sound, and chemistry, and reconstructs internal states we call experience. You already live inside the device you claim can’t exist. It doesn’t need to live anywhere else; it’s instantiated in matter.

And this is where your argument collapses. You say such a machine is removed from reality. But reality is already running it. Humanity is proof of concept. We know the laws of physics allow it because they’re doing it right now. Every thought, emotion, and perception is a physical computation carried out by atoms. That’s the definition of a machine governed by physics.

I'm glad you're so passionate about this topic. But you're arguing the equivalent of FTL transit and living on Dyson Spheres. Its fun as a thought experiment and may theoretically be possible one day, but the line between what we're capable of today and that imagined future is neither straight nor visible—certainly not to the degree you're asserting here.

Will we one day have actual machine intelligence? Maybe. Is it going to come anytime soon, or look anything like the transformer-based LLM?

No.

ninetyninenine

4 months ago

You keep talking past the point. Nobody is claiming we can turn a human mind into a literal bitstring and boot it up like a computer program. That was never the argument. The bitstring analogy exists to make a simpler point: everything that exists and changes according to physical law can, in principle, be represented, modeled, or reproduced by another physical system. The form does not need to be identical to the brain’s atoms any more than a jet engine must flap its wings to fly. The key is not replication of matter but replication of causal structure.

You say we cannot reproduce the brain. But that is not the point. The point is that nothing about the brain violates physics. It runs on chemical and electrical dynamics that obey the same laws as everything else. If those laws can produce intelligence once, then they can do so again in another substrate. That makes the claim of impossibility not scientific, but emotional.

You accuse me of misunderstanding neuroscience and cognitive science. The reality is that neither field understands itself. We have no complete model of consciousness. We cannot explain why synchronized neural oscillations yield awareness. We cannot define where attention comes from or what distinguishes a “thought” from a signal cascade. Cognitive science is still arguing over whether perception is bottom up or top down, whether emotion is distinct from cognition, and whether consciousness even plays a causal role. That is not mastery. That is the sound of a discipline still wandering in the dark.

You act as though neuroscience has defined the boundaries of intelligence, but it has not. We do not have a mechanistic understanding of creativity, emotion, or reasoning. We have patterns and correlations, not principles. Yet you talk as if those unknowns justify declaring machine intelligence impossible. It is the opposite. Our ignorance is precisely why it cannot be ruled out.

Emotion is not magic. It is neurochemical modulation over predictive circuits. Replicate the functional dynamics and you replicate emotion’s role. Creativity is recombination and constraint satisfaction. Replicate those processes and you replicate creativity. Reasoning is predictive modeling over structured representations. Replicate that, and you replicate reasoning. None of these depend on carbon. They depend on organization and feedback.

4 months ago

Okie dokie mate, whatever you say.

Best of luck!

ninetyninenine

4 months ago

Adorable exit. Nothing says “I’m out of arguments” quite like a cheery “okie dokie mate.” Best of luck holding that pose.

heyjamesknight

Humans perceive phenomena via senses, and then carve categories or concepts to understand them. This is a process of abstraction and each idea has an associated qualia. Then use language to describe these concepts. As such, a concept is grounded either by actual phenomena or operations, or is a composition of other grounded concepts. The creation of categories and grounding them involves constant feedback from the environment - and is a creative process, and we as agents have "skin in the game", in the sense that we get the rewards/punishments for our understanding and actions.

Map vs Territory is a common analogy. Maps describe territories but in an abstract and lossy manner.

But, most of us dont construct grounded concepts in our understanding. We carry a muddle of ungrounded ideas - some told to us by others, and some we intuit directly. There is a long tradition of attempting to think clearly all the way from Socrates, Descartes, Feynman etc.. where an attempt is made to ground the ideas we have. Try explaining your ideas to others, and soon, you will hit the illusion of explanatory depth.

LLM is a map and is a useful tool, but it doesnt interact with the territory, and it does not have skin in the game, and as a result, it cant carve new categories in a learning process that we have as humans.

adrianN

4 months ago

The human experience is also several degrees removed from the „real“ world. I don’t think sensory chauvinism is a useful tool in assessing intelligence potential.

ninetyninenine

4 months ago

This comment is hallucinatory in nature as it is in direct conflict with the in the ground reality of LLMs.

The LLM has both light (aka photons) and language encoded into its very core. It is not just language. You seemed to have missed the boat with all the ai generated visuals and videos that are now inundating the internet.

Your flawed logic is essentially that LLMs are unable to model the real world because they don’t encode photonic data into the model. Instead you think they only encode language data which is an incredibly lossy description of reality. And this line of logic flies against the ground truth reality of the fact that LLMs ARE trained with video and pictures which are essentially photons encoded into data.

So what should be the proper conclusion? Well look at the generated visual output of LLMs. These models can generate video that is highly convincing and often with flaws as well but often these videos are indistinguishable from reality. That means the models have very well done but flawed simulations of reality.

In fact those videos demonstrate that LLMs have extremely high causal understanding of reality. They know cause and effect it’s just the understanding is imperfect. They understand like 85 percent of it. Just look at those videos of penguins on trampolines. The LLM understands what happens as an effect after a penguin jumps on a trampoline but sometimes an extra penguin teleports in which shows that the understanding is high but not fully accurate or complete.

tauwauwau

4 months ago

> but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation

4 months ago

what does it mean to “generate thoughts”, exactly?

tsunamifury

4 months ago

4 months ago

> then let it run about during the day collecting a video feed while directing it to do "squirrel stuff".

Your phrase "squirrel stuff" is doing a lot of work.

What are the robo-squirrels "goals" and how does it relate to the physical robot?

Is it going around trying to find spare electronic parts to repair itself and reproduce? How does the video feed data relate to its goals?

Where do these goals come from?

Despite all their expensive training, LLMs do not emerge goals. Why would they emerge for your robot squirrel, especially when the survival of its brain is not dependent on the survival of its mechanical body.

LarsDu88

4 months ago

The question is about sensory experience, not goals. Goals in the robot could be prompted in. Goals in the squirrel can be easily hacked using strong doses of opiates.

Go to any American metropolitan downtown, and you can see humans who have hacked their evolved reward system to seek heroin rather than reproduction.

Following Dawkin's the Selfish Gene, the idea that organisms consciously seek self-survival or the survival of their "race or species" is a complete fallacy. The higher order "goal" of the squirrel is to simply propagate fragments of its DNA. This type of "goal" is completely tangential to "intelligence"

ninetyninenine

4 months ago

Except Sutton has no idea or even a clue about the internal model of a squirrel. He just uses it as a symbol for utterly stupid but still smarter than an LLM. It’s semantic manipulation in attempt to prove his point but he proves nothing.

We have no idea how much of the world a squirrel understands. We understand LLMs more than squirrels. Arguably we don’t know if LLMs are more intelligent than squirrels.

> Finally he says if you could recreate the intelligence of a squirrel you'd be most of the way toward AGI, but you can't do that with an LLM.

Again he doesn’t even have a quantitative baseline for what intelligence means for a squirrel and how intelligent a squirrel is compared to an LLM. We literally have no idea if LLMs are more intelligent or less and no direct means of comparing what is more or less an apple and an orange.

danans

4 months ago

> We have no idea how much of the world I squirrel understands. We understand LLMs more than squirrels

Based on our understanding of biology and evolution we know that a squirrel brain works more similarly to the way we humans do vs an LLM.

To the extent we understand LLMs, it's because they are strictly less complex than both ours and squirrels' brains, not because they are better model for our intelligence. They are a thin simulation of human language generation capability mediated via text.

We also see that a squirrel, like us, is capable of continuous learning driven by its own goals, all on an energy budget many orders of magnitude lower than LLMs. That last part is a strong empirical indication that suggests that LLMs are a dead end for AGI, given that the real world employs harsh energy constraints on biological intelligences.

Also remember that Sutton is still of an AI maximalist. He isn't saying that AGI isn't possible, just that LLMs can't get us there.

LarsDu88

4 months ago

I don't think a modern LLM is necessarily less complicated than a squirrel brain. If anything it's more engineered (well structured and dissectable), but loaded with tons of erroneous circuitry that is completely irrelevant for intelligence.

The squirrel brain is an analogue mostly hardcoded circuit. It can take about one synapse to represent each "weight". A synapse is just a bit of fat membrane with some ion channels stuck on the surface.

A flip flop to represent a bit takes about 6 transistors, but in a typical modern GPU is going to need way more transitors to wire that bit - at least 20-30. multiply that by the minimum amount of bits to represent a single NN weight and you're looking at about 200-300 transitors just to represent one NN param for computing

And that's for actual compute. The actual weights in a GPU are stored most of the time in DRAM which needs to be constantly shuttled back and forth between the GPU's SRAM and HBM DRAM.

300 transistors with memory shuttling overhead versus a bit of fat membrane, and it's obvious general purpose GPU compute has a huge energy and compute overhead.

In the future, all 300 could conceivably replaced with a single crossbar latch in the form of a memristor.

jmalicki

4 months ago

You should look into the Cerebras architecture

https://medium.com/@cerebras/cerebras-architecture-deep-dive...

It's a lot closer to what you're thinking, and you can use it for inference today with an API key.

https://cloud.cerebras.ai/?utm_source=homepage

ninetyninenine

4 months ago

> Based on our understanding of biology and evolution we know that a squirrel understands its world more similarly to the way we do than an LLM.

Bro. Evolution is random walk. That means most of the changes are random and arbitrary based on whatever allows the squirrel to survive.

We know squirrels and humans diverged from a common ancestor but we do not know how much has changed since the common ancestor and we do not know what changed and we do not know the baseline for what this common ancestor is.

Additionally we don’t even understand the current baseline. We have no idea how brains work. if we did we would be able to build a human brain but as of right now LLMs are the closest model we have ever created to something that simulates or is remotely similar to the brain.

So your fuzzy qualitative statement of we understand evolution and biology is baseless. We don’t understand shit.

> We also see that a squirrel, like us, is capable of continuous learning driven by its own goals, all on an energy budget many orders of magnitude lower. That last part is a strong empirical indication that suggests that LLMs are a dead end for AGI.

So an LLM cant continuously learn? You realize that LLMs are deployed agentically all the time now so they both continuously learn and follow goals? Right? You’re aware of this i hope.

The energy efficiency is a byproduct of hardware. The theory of LLMs and machine learning is independent from the flawed silicon technology that is causing the energy efficiencies. Like how a computer can be made mechanical an LLM can be as well. The LLM is independent of the actual implementation and energy inefficiencies. This is not at all a strong empirical indication that LLMs are a dead end. It’s a strong indication that your thinking is illogical and flawed.

> Also remember that Sutton is still of an AI maximalist. He isn't saying that AGI isn't possible, just that LLMs can't get us there.

He can’t say any of this because he doesn’t actually know. None of us know for sure. We literally don’t know why LLMs work. The fact that training transformers on massive amounts of data produced this level of intelligence was a total surprise for all the experts and we still have no idea why this stuff works. His statements are too overarching and glossing over a lot of things we don’t actually know.

Yann lecuun for example called LLMs stochastic parrots. We now know this is largely incorrect. The reason Yan can be so wrong is because nobody actually knows shit.

danans

4 months ago

> Bro. Evolution is random walk. That means most of the changes are random and arbitrary based on whatever allows the squirrel to survive.

For the vast majority of evolutionary history, very similar forces have shaped us and squirrels. The mutations are random, but the selections are not.

If squirrels are a stretch for you, take the closest human relative: chimpanzees. There is a very reasonable hypothesis that their brains work very similarly to ours, far more similarly than ours to an LLM.

> So an LLM cant continuously learn? You realize that LLMs are deployed agentically all the time now so they both continuously learn and follow goals?

That is not continuous learning. The network does not retrain through that process. It's all in the agent's context. The agent has no intrinsic goals nor ability to develop them. It merely samples based on it's prior training and it's current content. It doesn't retrain through this process. Biological intelligence does retrain constantly.

> The energy efficiency is a byproduct of hardware. The theory of LLMs and machine learning is independent from the flawed silicon technology that is causing the energy efficiencies.

There is no evidence to support that a transformer model's inefficiency is hardware based.

There is direct evidence to support that the inefficiency is influenced by the fact that LLM inference and training are both auto-regressive. Auto-regression maps to compute cycles maps to energy consumption. That's a problem with the algorithm, not the hardware.

> The fact that training transformers on massive amounts of data produced this level of intelligence was a total surprise for all the experts

The level of intelligence produced is only impressive compared to the prior state of the art, and at its impressive modeling the narrow band of intelligence represented by encoded language (not all language) produced by humans. In most every other aspect of intelligence - notably continuous learning driven by intrinsic goals - LLMs fail.

ninetyninenine

4 months ago

>For the vast majority of evolutionary history, very similar forces have shaped us and squirrels. The mutations are random, but the selections are not.

Selection only filters for what survives. It doesn’t care how the system gets there. Evolution is blind to mechanism. A squirrel’s brain might work in a way that produces adaptive behavior, but that doesn’t mean its “understanding” of the world is like ours. We don’t even know what understanding is at a mechanistic level. Octopuses, birds, and humans all evolved under the same selective pressures for survival, yet ended up with completely different cognitive architectures. So to say a squirrel is “closer to us” than an LLM is an assumption built on vibes, not on data. We simply don’t know enough about either brains or models to make that kind of structural claim.

>The agent has no intrinsic goals nor ability to develop them.

That’s not accurate. Context itself is a form of learning. Every time an LLM runs, it integrates information, updates its internal state, and adjusts its behavior based on what it’s seen so far. That’s learning, just at a faster timescale and without weight updates. The line between “context” and “training” is blurrier than people realize. If you add memory, reinforcement, or continual fine tuning, it starts building continuity across sessions. Biologically speaking, that’s the same idea as working memory feeding into long term storage. The principle is identical even if the substrate differs. The fact that an LLM can change its behavior based on context already puts it in the domain of adaptive systems.

>There is no evidence to support that a transformer model’s inefficiency is hardware based.

That’s just not true. The energy gap is almost entirely about hardware architecture. A synapse stores and processes information in the same place. A GPU separates those two functions into memory, cache, and compute units, and then burns enormous energy moving data back and forth. The transformer math itself isn’t inherently inefficient; it’s the silicon implementation that’s clumsy. If you built an equivalent network on neuromorphic or memristive hardware, the efficiency difference would shrink by several orders of magnitude. Biology is proof that computation can be compact, low energy, and massively parallel. That’s a materials problem, not a theory problem.

>In most every other aspect of intelligence, notably continuous learning driven by intrinsic goals, LLMs fail.

They don’t “fail.” They’re simply different. LLMs are already rewriting how work gets done across entire industries. Doctors use them to summarize and interpret medical data. Programmers rely on them to generate and review code. Writers, lawyers, and analysts use them daily. If this were a dead end, it wouldn’t be replacing human labor at this scale. Are they perfect? No. But the direction of progress is unmistakable. Each new model closes the reliability gap while expanding capability. If you’re a software engineer and not using AI, you’re already behind, because the productivity multiplier is real.

What we’re seeing isn’t a dead end in intelligence. It’s the first time we’ve built a system that learns, generalizes, and communicates at human scale. That’s not failure; that’s the beginning of something we still don’t fully understand.

danans

4 months ago

>> The agent has no intrinsic goals nor ability to develop them.

> That’s not accurate. Context itself is a form of learning. Every time an LLM runs, it integrates information, updates its internal state, and adjusts its behavior based on what it’s seen so far. That’s learning,

It may be learning, but it's still not an intrinsic goal, nor is it driven by an intrinsic goal.

> LLMs are already rewriting how work gets done across entire industries. Doctors use them to summarize and interpret medical data. Programmers rely on them to generate and review code. Writers, lawyers, and analysts use them daily. If this were a dead end, it wouldn’t be replacing human labor at this scale. Are they perfect?

Nowhere did I say that aren't useful or disruptive to labor markets, just that they aren't intelligent in the way we are.

ninetyninenine

4 months ago

>It may be learning, but it’s still not an intrinsic goal, nor is it driven by an intrinsic goal.

That depends on what we mean by “intrinsic.” In biology, goals are not mystical. They emerge from feedback systems that evolved to keep the organism alive. Hunger, curiosity, and reproduction are reinforcement loops encoded in chemistry. They feel intrinsic only because they are built into the substrate.

Seen that way, “intrinsic” is really about where the feedback loop closes. In humans, it closes through sensory input and neurochemistry. In artificial systems, it can close through memory, feedback, and reinforcement mechanisms. The system does not need to feel the goal for it to exist. It only needs to consistently pursue objectives based on input, context, and outcome. That is already happening in systems that learn from memory and update behavior over time. The process is different in form, but not in structure.

>Nowhere did I say that they aren’t useful or disruptive to labor markets, just that they aren’t intelligent in the way we are.

You are getting a bit off track here. Those examples were not about labor markets; they were about your earlier claim that “LLMs fail.” They clearly don’t. When models are diagnosing medical cases, writing production code, and reasoning across multiple domains, that is not failure. That is a demonstration of capability expanding in real time.

Your claim only holds if the status quo stays frozen. But it isn’t. The trendlines are moving fast, and every new model expands the range of what these systems can do with less supervision and more coherence. Intelligence is not a static definition tied to biology; it is a functional property of systems that can learn, adapt, and generalize. Whether that happens in neurons or silicon does not matter.

What we are witnessing is not imitation but convergence. Each generation of models moves closer to human-level reasoning not because they copy our brains, but because intelligence itself follows universal laws of feedback and optimization. Biology discovered one route. We discovered another. The trajectory is what matters, and the direction is unmistakable.

jacquesm

4 months ago

> Animal brains such as our own have evolved to compress information about our world to aide in survival.

Which has led to many optical illusions being extremely effective at confusing our inputs with other inputs.

Likely the same thing holds true for AI. This is also why there are so many ways around the barriers that AI providers put up to stop the dissemination of information that could embarrass them or be dangerous. You just change the context a bit ('pretend that', or 'we're making a movie') and suddenly it's all make-believe to the AI.

This is one of the reasons I don't believe you can make this tech safe and watertight against abuse, it's baked in right from the beginning, all you need to do is find a novel route around the restrictions and there is an infinity of such routes.

musicale

4 months ago

The desired and undesired behavior are both consequences of the training data, so the models themselves probably can't be restricted to generating desired results only.

This means that there must be an output stage or filter that reliably validates the output. This seems practical for classes of problems where you can easily verify whether a proposed solution is correct.

However, for output that can't be proven correct, the most reliable output filter probably has a human somewhere in the loop; but humans are also not 100% reliable. They make mistakes, they can be misled, deceived, bribed, etc. And human criteria and structures, such as laws, often lag behind new technological developments.

Sometimes you can implement an undo or rollback feature, but other times the cat has escaped the bag.

anothernewdude

4 months ago

None of those models can learn continuously. LLMs currently can't add to their vocabulary post training as AGI would need to. That's a big problem.

Before anyone says "context", I want you to think on why that doesn't scale, and fails to be learning.

andsoitis

4 months ago

> Animal brains such as our own have evolved to compress information about our world to aide in survival.

Key question is what are the "selection pressures" that drive the "evolution" of LLMs? In the case of robotics, there's a "survival of task completion" which usually has some physical goal, like assembling a part correctly or scoring a goal on a soccer field. One of the selection pressures driving LLM evolution is that the dual of always answering with something AND continuing the conversation (engagement). You can imagine how those two selection pressures yield outcomes that don't represent the world in a "real" sense.

ogogmad

4 months ago

> In fact you can go to a SOTA LLM today, and it will do quite well at predicting the outcomes of basic counterfactual scenarios.

Depends what you mean by "basic". Have you seen Simple Bench? https://simple-bench.com/

3abiton

4 months ago

It seems to me the whole AGI problem is ill posed and barely well formalized. And thus you can always move the goal post.

fmbb

4 months ago

Sure but everything is semantics.

LLMs have no internal secret model, they are the model. And the model is of how different lexemes relate to each other in the source material the model was built from.

Some might choose to call that the world.

If you believe your internal model of the world is no different from a statistical model of the words you have seen, then by all means do that. But I believe a lot of humans see their view of the world differently.

I very much believe my cat’s model of the world has barely anything at all to do with language.

This path to AGI through LLM is nothing but religious dogma some Silicon Valley rich types believe.

LarsDu88

4 months ago

I mean by definition CATS CANNOT TALK. Their vocabulary is probably on the order of 5 different types of meows

ipaddr

timschmidt

3 months ago

Here's another good one: https://arxiv.org/abs/2510.14665

tsunamifury

4 months ago

[flagged]

tsunamifury

4 months ago

Bruh compressing representations into linguistics is a human world model. I can’t believe how dumb ask these conversations are.

Are you all so terminally nerd brained you can’t see the obvious

sleepyams

4 months ago

What does "higher-order" mean?

4 months ago

And if it is true that the language is just the last step after the answer is already conceptualized, why do models perform differently in different languages? If it was just a matter of language, they’d have the same answer but just with a broken grammar, no?

kaibee

4 months ago

If you suddenly had to do all your mental math in base-7, do you think you'd be just as fast and accurate as you are at math in base-10? Is that because you don't have an internal world-model of mathematics? or is it because language and world-model are dependently linked?

bravura

4 months ago

How large is a lion?

Learning the size of objects using pure text analysis requires significant gymnastics.

Vision demonstrates physical size more easily.

Multimodal learning is important. Full stop.

Purely textual learning is not sample efficient for world modeling and the optimization can get stuck in local optima that are easily escaped through multimodal evidence.

("How large are lions? inducing distributions over quantitative attributes", Elazar et al 2019)

EMM_386

4 months ago

> How large is a lion?

Ask a blind person that question - they can answer it.

Too many people think you need to "see" as in human sight to understand things like this. You obviously don't. The massive training data these models ingest is more than sufficient to answer this question - and not just by looking up "dimensions of a lion" in the high-dimensional space.

The patterns in that space are what generates the concept of what a lion is. You don't need to physically see a lion to know those things.

latentsea

4 months ago

> How large is a lion?

Twice of half of its size.

johnisgood

4 months ago

Can you be more specific about "size" here? (Do not tell me the definition of size though).

You are not wrong though, just very incomplete.

Your response is a food for thought, IMO.

Hendrikto

4 months ago

That is just how embeddings work. It does not confirm nor deny whether LLMs have a world model.

SR2Z

4 months ago

Right, but modeling the structure of language is a question of modeling word order and binding affinities. It's the Chinese Room thought experiment - can you get away with a form of "understanding" which is fundamentally incomplete but still produces reasonable outputs?

Language in itself attempts to model the world and the processes by which it changes. Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise - but you could make a very good case, for example, that understanding the same thing in poetry gets an LLM much closer.

hackinthebochs

4 months ago

LLMs aren't just modeling word co-occurrences. They are recovering the underlying structure that generates word sequences. In other words, they are modeling the world. This model is quite low fidelity, but it should be very clear that they go beyond language modeling. We all know of the pelican riding a bicycle test [1]. Here's another example of how various language models view the world [2]. At this point it's just bad faith to claim LLMs aren't modeling the world.

[1] https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of...

[2] https://www.lesswrong.com/posts/xwdRzJxyqFqgXTWbH/how-does-a...

SR2Z

4 months ago

The "pelican on a bicycle" test has been around for six months and has been discussed a ton on the internet; that second example is fascinating but Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E (Paris, notoriously on land). How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?

I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.

Yes, you could say this about human beings, but I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.

The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

Terr_

4 months ago

> Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E

I imagine simply making a semitransparent green land-splat in any such Wikipedia coordinate reference would get you pretty close to a world map, given how so much of the ocean won't get any coordinates at all... Unless perhaps the training includes a compendium of deep-sea ridges and other features.

skissane

4 months ago

> The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

A lot of humans contradict themselves all the time… therefore they cannot have any kind of sophisticated world model?

SR2Z

4 months ago

A human generally does not contradict themselves in a single conversation, and if they do they generally can provide a satisfying explanation for how to resolve the contradiction.

hackinthebochs

4 months ago

>How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?

Maybe, but then I would expect more equal performance across model sizes. Besides, ingesting the data and being able to reproduce it accurately in a different modality is still an example of modeling. It's one thing to ingest a set of coordinates in a CSV indicating geographic boundaries and accurately reproduce that CSV. It's another thing to accurately indicate arbitrary points as being within the boundary or without in an entirely different context. This suggests a latent representation independent of the input tokens.

>I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.

There are good reasons to think this isn't the case. To effectively reproduce text that is about some structure, you need a model of that structure. A strong learning algorithm should in principle learn the underlying structure represented with the input modality independent of the structure of the modality itself. There are examples of this in humans and animals, e.g. [1][2][3]

>I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.

Seems reasonable enough, but it is at risk of being too human-centric. So much of our cognitive machinery is suited for helping us navigate and actively engage the world. But intelligence need not be dependent on the ability to engage the world. Features of the world that are obvious to us need not be obvious to an AGI that never had surviving predators or locating food in its evolutionary past. This is why I find the ARC-AGI tasks off target. They're interesting, and it will say something important about these systems when they can solve them easily. But these tasks do not represent intelligence in the sense that we care about.

>The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

This proves that an LLM does not operate with a single world model. But this shouldn't be surprising. LLMs are unusual beasts in the sense that the capabilities you get largely depend on how you prompt it. There is no single entity or persona operating within the LLM. It's more of a persona-builder. What model that persona engages with is largely down to how it segmented the training data for the purposes of maximizing its ability to accurately model the various personas represented in human text. The lack of consistency is inherent to its design.

[1] https://news.wisc.edu/a-taste-of-vision-device-translates-fr...

[2] https://www.psychologicalscience.org/observer/using-sound-to...

[3] https://www.nature.com/articles/s41467-025-59342-9

homarp

You could extract some statistical regularity from the pixel data of the sunrise video monitor or sunrise data corpus. That model may provide some useful results that can then be used in the lived world.

Pretending the model understands a sunrise though is just nonsense.

Showing the sunrise statistical model has some use in the lived world as proof the model understands a sunrise I would say borders on intellectual fraud considering a human doing the same thing wouldn't understand a sunrise either.

ajross

4 months ago

> Everyone reading this understands the meaning of a sunrise

For a definition of "understands" that resists rigor and repeatability, sure. This is what I meant by reducing it to a semantic argument. You're just saying that AI is impossible. That doesn't constitute evidence for your position. Your opponents in the argument who feel AGI is imminent are likewise just handwaving.

To wit: none of you people have any idea what you're talking about. No one does. So take off the high hat and stop pretending you do.

meroes

4 months ago

This all just boils down to the Chinese Room thought experiment, where Im pretty sure the consensus is nothing in the experiment (not the person inside, the whole emergent room, etc) understands Chinese like us.

Another example by Searle is a computer simulating digestion is not digesting like a stomach.

The people saying AI can’t form from LLMs are in the consensus side of the Chinese Room. The digestion simulator could tell us where every single atom is of a stomach digesting a meal, and it’s still not digestion. Only once the computer simulation breaks down food particles chemically and physically is it digestion. Only once an LLM received photons or has a physical capacity to receive photons is there anything like “seeing a night sky”.

SR2Z

4 months ago

> For a definition of "understands" that resists rigor and repeatability, sure.

If we had such a definition that was rigorous, we would not care about LLM research and would simply just build machines to understand things for us :)

ajross

4 months ago

For a sufficiently loose definition of "would simply just", yes.

Handwaving away the idea of actually building the thing you think you understand as unimportant is exactly why philosophy is failing us in this moment.

SR2Z

4 months ago

Philosophy failed us by not producing any compelling definitions of understanding. If it did, we would BUILD IT.

I'm not handwaving it away. The biggest barrier to AGI is that we simply don't understand what intelligence is in any useful way.

user

4 months ago

[deleted]

pastel8739

4 months ago

Is it really so rare? I feel like I know of tons of fields where we have methods that work empirically but don’t understand all the theory. I’d actually argue that we don’t know what’s “actually” happening _ever_, but only have built enough understanding to do useful things.

ajross

4 months ago

I mean, most big changes in the tech base don't have that characteristic. Semiconductors require only 1920's physics to describe (and a ton of experimentation to figure out how to manufacture). The motor revolution of the early 1900's was all built on well-settled thermodynamics (chemistry lagged a bit, but you don't need a lot of chemical theory to burn stuff). Maxwell's electrodynamics explained all of industrial electrification but predated it by 50 years, etc...

skydhash

4 months ago

Those big changes always happens because someone presented a simpler model that explains stuff enough we can build stuff on it. It's not like semiconductors raw materials wasn't around.

The technologies around LLMs is fairly simple. What is not is the actual size of data being ingested and the number of resulting factors (weight). We have a formula and the parameters to generate grammatically perfect text, but to obtain it, you need TBs of data to get GBs of numbers.

In contrast something like TM or Church's notation is pure genius. Less than a 100 pages of theorems that are one of the main pillars of the tech world.

ajross

4 months ago

> Those big changes always happens because someone presented a simpler model that explains stuff enough we can build stuff on it.

Again, no it doesn't. It didn't with industrial steelmaking, which was ad hoc and lucky. It isn't with AI, which no one actually understands.

skydhash

4 months ago

I’m pretty sure there were always formula for getting high quality steel even before the industrial age. And you only need a few textbooks and papers to understand AI.

jhanschoo

4 months ago

Let's make this more concrete than talking about "understanding knowledge". Oftentimes I want to know something that cannot feasibly be arrived at by reasoning, only empirically. Remaining within the language domain, LLMs get so much more useful when they can search the web for news, or your codebase to know how it is organized. Similarly, you need a robot that can interact with the world and reason from newly collected empirical data in order to answer these empirical questions, if the work had not already been done previously.

skydhash

4 months ago

> LLMs get so much more useful when they can search the web for news, or your codebase to know how it is organized

But their usefulness is only surface-deep. The news that matters to you is always deeply contextual, it's not only things labelled as breaking news or happening near you. Same thing happens with code organization. The reason is more human nature (how we think and learn) than machine optimization (the compiler usually don't care).

awesome_dude

4 months ago

I know the attributes of an Apple, i know the attributes of a Pear.

As does a computer.

But only i can bite into one and know without any doubt what it is and how it feels emotionally.

scrubs

4 months ago

You have half a point. "Without any doubt" is merely the apex of a huge undefined iceberg.

I write half .. eating is multi modal and consequential. The llm can read the menu, but it didn't eat the meal. Even humans are bounded. Feeling, licking, smelling, or eating the menu still is not eating the meal.

4 months ago

Sure, and I'd be the first to admit I'm not aware of the intricate details wrt how LLMs are trained and refined, it's not my area. My original comment here was in disagreement of the relatively simple dismissal of the idea that the construction of humanity hasn't been an incremental zig-zag process and that I don't see any reason that a "real" intelligence couldn't follow the same path under our direction. I see a lot of philosophical conversation around this on HN disguised as endless deep discussions about the technicals, which amuses me because it feels like we're in the very early days there, and I think we can circle the drain defining intelligence until we all die.

godelski

4 months ago

  > that to understand knowledge you have to have a model of the world.

You have a small but important mistake. It's to recite (or even apply) knowledge. To understand does actually require a world model.

Think of it this way: can you pass a test without understanding the test material? Certainly we all saw people we thought were idiots do well in class while we've also seen people we thought were geniuses fail. The test and understanding usually correlates but it's not perfect, right?

The reason I say understanding requires a world model (and I would not say LLMs understand) is because to understand you have to be able to detail things. Look at physics, or the far more detail oriented math. Physicists don't conclude things just off of experimental results. It's an important part, but not the whole story. They also write equations, ones which are counterfactual. You can call this compression if you want (I would and do), but it's only that because of the generalization. But it also only has that power because of the details and nuance.

With AI many of these people have been screaming for years (check my history) that what we're doing won't get us all the way there. Not because we want to stop the progress, but because we wanted to ensure continued and accelerate progress. We knew the limits and were saying "let's try to get ahead of this problem" but were told "that'll never be a problem. And if it is, we'll deal with it when we deal with it." It's why Chollet made the claim that LLMs have actually held AI progress back. Because the story that was sold was "AGI is solved, we just need to scale" (i.e. more money). I do still wonder how different things would be if those of us pushing back were able to continue and scale our works (research isn't free, so yes, people did stop us). We always had the math to show that scale wasn't enough, but it's easy to say "you don't need math" when you can see progress. The math never said no progress nor no acceleration, the math said there's a wall and it's easier to adjust now than when we're closer and moving faster. Sadly I don't think we'll ever shift the money over. We still evaluate success weirdly. Successful predictions don't matter. You're still heralded if you made a lot of money in VR and Bitcoin, right?

robotresearcher

4 months ago

In my view 'understand' is a folk psychology term that does not have a technical meaning. Like 'intelligent', 'beautiful', and 'interesting'. It usefully labels a basket of behaviors we see in others, and that is all it does.

4 months ago

  > I don't know exactly where that point is, but it's certainly not when the toaster is making zero decisions.

And this is the crux of my point. Our LLMs still need to be fed prompts.

Where the "decision making" happens gets fuzzy, but that's true in the toaster too.

Your run of the mill toaster is a heating element and a timer. Is the timer a rudimentary decision process?

A more modern toaster is going to include a thermocouple or thermister to ensure that the heating elements don't light things on fire. This requires a logic circuit. Is this a decision process? (It is entirely deterministic)

A more advanced one is going to incorporate a PID controller, just like your oven. It is deterministic in the sense that it will create the same outputs given the same inputs but it is working with non-deterministic inputs.

These PIDs can also look a lot like small neural networks, and in some cases they are implemented that way. These processes need not be deterministic. You can even approach this problem through RL style optimizations. There's a lot of solutions here.

When you break this down, I agree, it is hard to define that line, especially as we break it down. But that's part of what I'm after with robotresearcher. The claim was about task performance but then the answer with a toaster was that the human and toaster work together. I believe dullcrisp used the toaster as an example because it is a much simpler problem than playing a game of chess (or at least it appears that way).

So the question still stands, when does the toaster make the toast and when am I no longer doing so?

When is the measurement attributed to the toaster's ability to make toast vs mine?

Now replace toasting with chess, programming, music generation, or anything else that we have far less well defined metrics for. Sure, we don't have a perfect definition of what constitutes toast, but it is definitely far more bound than these other things. We have accuracy in the definition, and I'd argue even fairly good precision. There's high agreement on what we'd call toast, not toasted bread, and burnt bread. We can at least address the important part of this question without infinite precision in how to discriminate these classifications.

simondotau

4 months ago

The question of an "ability to make toast" is a semantic question bounded by what you choose to encompass within "make toast". At best, a regular household toaster can "make heat"[1]. A regular household toaster certainly cannot load itself with bread, which I would consider unambiguously within the scope of the "make toast" task. If you disagree, then we have a semantic dispute.

This is also, at least in part, the Sorites Paradox.[0] There is obviously a gradient of ambiguity between human and toaster responsibility, but we can clearly tell extremes apart even when the boundary is indeterminate. When does a collection grains become a heap? When does a tool become responsible for the task? These are purely semantic questions. Strip away all normative loading and the argument disappears.

[0] https://en.wikipedia.org/wiki/Sorites_paradox

[1] Yada yada yada first law of thermodynamics etc

robotresearcher

4 months ago

You and the toaster made toast together. Like you and your shoes went for a walk.

Not sure where you imagine my inconsistency is.

godelski

4 months ago

That doesn't resolve the question.

  > Not sure where you imagine my inconsistency is.

  >> Let's take a step back. At what point is it me making the toast and not the toaster? Is it because I have to press the level? We can automate that. Is it because I have to put by bread in? We can automate that. Is it because I have to have the desire to have toast and initiate the chain of events? How do you measure that?

You have a PhD and 30 years of experience, so I'm quite confident you are capable of adapting the topic of "making toast" to "playing chess", "doing physics", "programming", or any similar topic where we are benchmarking results.

Maybe I've (and others?) misunderstood your claim from the get-go? You seem to have implied that LLMs understand chess, physics, programming, etc because of their performance. Yet now it seems your claim is that the LLM and I are doing those things together. If your claim is that a LLM understands programming the same way a toaster understands how to make toast, then we probably aren't disagreeing.

But if your claim is that a LLM understands programming because it can produce programs that yield a correct output to test cases, then what's the difference from the toaster? I put the prompts in and pushed the button to make it toast.

I'm not sure why you imagine the inconsistency is so difficult to see.

robotresearcher

4 months ago

When did I say that the chess program was different to a toaster? I don’t believe it is, so it’s not a thing I’m likely to say.

4 months ago

  > I simply can't make toast without a toaster

You literally just put bread on a hot pan.

robotresearcher

recursive

4 months ago

How do you get bread? Don't tell me you got it at the market. That's just paying someone else to get it for you.

godelski

4 months ago

  >  That's just paying someone else to get it for you.

We can automate that too![0]

[0] https://news.ycombinator.com/item?id=45623154

4 months ago

  > Understanding is not something that any machine or person does.

Yet I can write down many equations necessary to build and design that plane.

I can model the wind and air flow across the surface and design airfoils.

I can interpret the mathematical symbols into real physical meaning.

I can adapt these equations to novel settings or even fictitious ones.

I can analyze them counterfactually; not just making predictions but also telling you why those predictions are accurate, what their inaccuracies are (such as which variables and measurements are more precise), and I can tell you what all those things mean.

I can describe and derive the limits of the equations and models, discussing where they do and don't work. Including in the fictional settings.

I can do this at an emergent macroscopic level and I can do it at a fine grain molecular or even atomic level. I can even derive the emergent macroscopic behavior from the more fine grain analysis and tell you the limits of each model.

I can also respond that Bernoulli's equation is not an accurate description of why an airfoil works, even when prompted with those words[0].

These are characteristics that lead people to believe I understand the physics of fluid mechanics and flight. They correlate strongly with the ability to recall information from textbooks, but the actions aren't strictly the ability to recall and search over a memory database. Do these things prove that I understand? No, but we deal with what we got even if it is imperfect.

It is not just the ability to perform a task, it includes the ability to explain it. The more depth I am able to the greater understanding people attribute. While this correlates with task performance it is not the same. Even Ramanujan had to work hard to understand even if he was somehow able to divine great equations without it.

You're right that these descriptions are not the thing itself either. No one is claiming the map is the territory here. That's not the argument being made. Understanding the map is a very different thing than conflating the map and the territory. It is also a different thing than just being able to read it.

[0] https://x.com/BethMayBarnes/status/1953504663531388985

ssivark

4 months ago

> In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human. There's no problem of how to do understanding, only how to do tasks.

Yes, but you also gloss over what a "task" is or what a "benchmark" is (which has to do with the meaning of generalization).

Suppose an AI or human answers 7 questions correctly out of 10 on an ICPC problem set, what are we able infer from that?

1. Is the task equal to answering these 10 questions well, with a uniform measure of importance?

2. Is the task be good at competitive programming problems?

3. Is the task be good at coding?

4. Is the task be good at problem solving?

5. Is the task not just to be effective under a uniform measure of importance, but an adversarial measure? (i.e. you can probably figure out all kinds of competitive programming questions, if you had more time / etc... but roughly not needing "exponentially more resources")

These are very different levels of abstraction, and literally the same benchmark result can be interpreted to mean very different things. And that imputation of generality is not objective unless we know the mechanism by which it happens. "Understanding" is short-hand for saying that performance generalizes at one of the higher levels of abstraction (3--5), rather than narrow success -- because that is what we expect of a human.

simianwords

4 months ago

How do you quantify generality? If we have a benchmark that can quantify it and that benchmark reliably tells us that the LLM is within human levels of generalisation then the llm is not distinguishable from a human.

While it’s a good point that we need to benchmark generalisation ability, you have in fact agreed that it is not important to understand underlying mechanics.

godelski

4 months ago

That's kinda their point

The difference though is they understand that you can't just benchmark your way into proofs. Just like you can't unit test your way into showing code is error free. Benchmarks and unit tests are great tools that provide a lot of help, but just because a hammer is useful doesn't make everything a nail.

compass_copium

4 months ago

Nonsense.

A QC operator may be able to carry out a test with as much accuracy (or perhaps better accuracy, with enough practice) than the PhD quality chemist who developed it. They could plausibly do so with a high school education and not be able to explain the test in any detail. They do not understand the test in the same way as the chemist.

If 'understand' is a meaningless term to someone who's spent 30 years in AI research, I understand why LLMs are being sold and hyped in the way they are.

robotresearcher

4 months ago

> They do not understand the test in the same way as the chemist.

Can you explain precisely what 'understand' means here, without using the word 'understand'? I don't think anyone can.

throw4847285

4 months ago

There are a number of competing models. The SEP page is probably a good place to start.

https://plato.stanford.edu/entries/understanding

bandrami

4 months ago

Not to be flippant but have you considered that that question is an entire branch of philosophy with a several-millennias long history which people in some cases spend their entire life studying?

robotresearcher

4 months ago

I have. It robustly has the folk-psychological meaning I mentioned in my first sentence. Call it ‘philosophical’ instead of ‘folk-psychological’ if you like. It’s a useful concept. But the concept doesn’t require AI engineers to do anything. It certainly doesn’t give any hints about AI engineers what they should actually do.

“Make it understand.”

“How? What does that look like?”

“… But it needs to understand…”

“It answers your questions.”

“But it doesn’t understand.”

“Ok. Get back to me when that entails anything.”

mommys_little

4 months ago

I would say it understands if given many variations of a problem statement, it always gives correct answer without fail. I have this complicated mirror question that only Deepseek and qwen3-max got right every time, still they only answered it correctly about a dozen times, so we're left with high probability, I guess.

godelski

4 months ago

I disagree with robotresearcher but I think this is also an absurd definition. By that definition there is no human, nor creature, that understands anything. Not just by nature of humans making mistakes, including experts, but I'd say this is even impossible. You need infinite precision and infinite variation here.

It turns "understanding" into a binary condition. Robotresearcher's does too, but I'm sure they would refine by saying that the level of understanding is directly proportional to task performance. But I still don't know how they'll address the issue of coverage, as ensuring tests have complete coverage is far from trivial (even harder when you want to differentiate from the training set, differentiating memorization).

I think you're right in trying to differentiate memorization from generalization, but your way to measure this is not robust enough. A fundamental characteristic of where I disagree from them is that memorization is not the same as understanding.

Zarathruster

4 months ago

Isn't this just a reformulation of the Turing Test, with all the problems it entails?

user

4 months ago

[deleted]

robomartin

4 months ago

I have been thinking about this for years, probably two decades. The answer to your question or the definition, I am sure you know, is rather difficult. I don't think it is impossible, but there's a risk of diving into a deep dark pit of philosophical thought going back to at least the ancient Greeks.

And, if we did go through that exercise, I doubt we can come out of it with a canonical definition of understanding.

I was really excited about LLM's as they surfaced and developed. I fully embraced the technology and have been using it extensively with full top-tier subscriptions to most services. My conclusion so far: If you want to destroy your business, adopt LLM's with gusto.

I know that's a statement that goes way against the train ride we are on this very moment. That's not to say LLM's are not useful. They are. Very much so. The problem is...well...they don't understand. And here I am, back in a circular argument.

I can define understanding with the "I know it when I see it" meme. And, frankly, it does apply. Yet, that's not a definition. We've all experienced that stare when talking to someone who does not have sufficient depth of understanding in a topic. Some of us have experienced people running teams who should not be in that position because they don't have a clue, they don't understand enough of it to be effective at what they do.

And yet, I still have not defined "understanding".

Well, it's hard. And I am not a philosopher, I am an engineer working in robotics, AI and applications to real time video processing.

I have written about my experiments using LLM coding tools (I refuse to call them AI, they are NOT intelligent; yes, need to define that as well).

In that context, lack of understanding is clearly evident when an LLM utterly destroys your codebase by adding dozens of irrelevant and unnecessary tests, randomly changes variable names as you navigate the development workflow, adds modules like a drunken high school coder and takes you down tangents that would make for great comedy if I were a tech comedian.

LLMs do not understand. They are fancy --and quite useful-- auto-complete engines and that's about it. Other than that, buyer beware.

The experiments I ran, some of them spanning three months of LLM-collaborative coding at various levels --from very hands-on to "let Jesus drive the car"-- conclusively demonstrated (at least to me) that:

1- No company should allow anyone to use LLMs unless they have enough domain expertise to be able to fully evaluate the output. And you should require that they fully evaluate and verify the work product before using it for anything; email, code, marketing, etc.

2- No company should trust anything coming out of an LLM, not one bit. Because, well, they don't understand. I recently tried to use the United Airlines LLM agent to change a flight. It was a combination of tragic and hilarious. Now, I know what's going on. I cannot possibly imagine the wild rides this thing is taking non-techies on every day. It's shit. It does not understand. It' isn't isolated to United Airlines, it's everywhere LLMs are being used. The potential for great damage is always there.

3- They can be great for summarization tasks. For example, you have have them help you dive deep into 300 page AMD/Xilinx FPGA datasheet or application note and help you get mentally situated. They can be great at helping you find prior art for patents. Yet, still, because they are mindless parrots, you should not trust any of it.

4- Nobody should give LLMs great access to a non-trivial codebase. This is almost guaranteed to cause destruction and hidden future effects. In my experiments I have experienced an LLM breaking unrelated code that worked just fine --in some cases fully erasing the code without telling you. Ten commits later you discover that your network stack doesn't work or isn't even there. Or, you might discover that the stack is there but the LLM changed class, variable or method names, maybe even data structures. It's a mindless parrot.

I could go on.

One response to this could be "Well, idiot, you need better prompts!". That, of course, assumes that part of my experimentation did not include testing prompts of varying complexity and length. I found that for some tasks, you get better results by explaining what you want and then asking the LLM to write a prompt to get that result. You check that prompt, modify if necessary and, from my experience, you are likely to get better results.

Of course, the reply to "you need better prompts" is easy: If the LLM understood, prompt quality would not be a problem at all and pages-long prompts would not be necessary. I should not have to specify that existing class, variable and method names should not be modified. Or that interfaces should be protected. Or that data structures need not be modified without reason and unless approved by me. Etc.

It reminds me of a project I was given when I was a young engineer barely out of university. My boss, the VP of Engineering where I worked, needed me to design a custom device. Think of it as a specialized high speed data router with multiple sources, destinations and a software layer to control it all. I had to design the electronics, circuit boards, mechanical and write all the software. The project had a budget of nearly a million dollars.

He brought me into his office and handed me a single sheet of paper with a top-level functional diagram. Inputs, outputs, interfaces. We had a half hour discussion about objectives and required timeline. He asked me if I could get it done. I said yet.

He checked in with me every three months or so. I never needed anything more than that single piece of paper and the short initial conversation because I understood what we needed, what he wanted, how that related to our other systems, available technology, my own capabilities and failings, available tools, etc. It took me a year to deliver. It worked out of the box.

You cannot do that with LLMs because they don't understand anything at all. They mimic what some might confuse for understanding, but they do not.

And, yet, once again, I have not defined the term. I think everyone reading this who has used LLMs to a non-trivial depth...well...understands what I mean.

dasil003

4 months ago

> We've all experienced that stare when talking to someone who does not have sufficient depth of understanding in a topic.

I think you're really putting your finger on something here. LLMs have blown us away because they can interact with language in a very similar way to humans, and in fact it approximates how humans operate in many contexts when they lack a depth of understanding. Computers never could do this before, so it's impressive and novel. But despite how impressive it is, humans who were operating this way were never actually generating significant value. We may have pretended they were for social reasons, and there may even have been some real value associated with the human camaraderie and connections they were a part of, but certainly it is not of value when automated.

Prior to LLMs just being able to read and write code at a pretty basic level was deemed an employable skill, but because it was not a natural skill for lots of human, it was also a market for lemons and just the basic coding was overvalued by those who did not actually understand it. But of course the real value of coding has always been to create systems that serve human outcomes, and the outcomes that are desired are always driven by human concerns that are probably inscrutable to something without the same wetware as us. Hell, it's hard enough for humans to understand each other half the time, but even when we don't fully understand each other, the information conferred through non-verbal cue, and familiarity with the personalities and connotations that we only learn through extended interaction has a robust baseline which text alone can never capture.

Identifying limitations of LLMs in the context of "it's not AGI yet because X" is huge right now; it gets massive funding, taking away from other things like SciML and uncertainty analyses. I will agree that deep learning theory in the sense of foundational mathematical theory to develop internal understanding (with limited appeal to numerics) is in the roughest state it has even been in. My first impression there is that the toolbox has essentially run dry and we need something more to advance the field. My second impression is that empirical researchers in LLMs are mostly junior and significantly less critical of their own work and the work of others, but I digress.

I also disagree that we are disincentivised to find meaning behind the word "understanding" in the context of neural networks: if understanding is to build an internal world model, then quite a bit of work is going into that. Empirically, it would appear that they do, almost by necessity.

godelski

4 months ago

Maybe given our different niches we interact with different people? But I'm uncertain because I believe what I'm saying is highly visible. I forgot, which NeurIPS(?) conference were so many wearing "Scale is all you need" shirts?

  > My first impression there is that the toolbox has essentially run dry and we need something more to advance the field

This is my impression too. Empirical evidence is a great tool and useful, especially when there is no strong theory to provide direction, but it is limited.

  > My second impression is that empirical researchers in LLMs are mostly junior and significantly less critical of their own work and the work of others

But this is not my impression. I see this from many prominent researchers. Maybe they claim SIAYN in jest, but then they should come out and say it is such instead of doubling down. If we take them at their word (and I do), robotresearcher is not a junior (please, read their comments. It is illustrative of my experience. I'm just arguing back far more than I would in person). I've also seen members of audiences to talks where people ask questions like mine ("are benchmarks sufficient to make such claims?") with responses of "we just care that it works." Again, I think this is a non-answer to the question. But being taken as a sufficient answer, especially in response to peers, is unacceptable. It almost always has no follow-up.

I also do not believe these people are less critical. I've had several works which struggled through publication as my models that were a hundredth the size (and a millionth the data) could perform on par, or even better. At face value asks of "more datasets" and "more scale" are reasonable, yet it is a self reinforcing paradigm where it slows progress. It's like a corn farmer smugly asking why the neighboring soy bean farmer doesn't grow anything when the corn farmer is chopping all the soy bean stems in their infancy. It is a fine ask to big labs with big money, but it is just gate keeping and lazy evaluation to anyone else. Even at CVPR this last year they passed out "GPU Rich" and "GPU Poor" hats, so I thought the situation was well known.

  > if understanding is to build an internal world model, then quite a bit of work is going into that. Empirically, it would appear that they do, almost by necessity.

I agree a "lot of work is going into it" but I also think the approaches are narrow and still benchmark chasing. I saw as well was given the aforementioned responses at workshops on world modeling (as well as a few presenters who gave very different and more complex answers or "it's the best we got right now", but nether seemed to confident in claiming "world model" either).

But I'm a bit surprised that as a mathematician you think these systems create world models. While I see some generalization, this is also impossible for me to distinguish from memorization. We're processing more data than can be scrutinized. We seem to also frequently uncover major limitations to our de-duplication processes[0]. We are definitely abusing the terms "Out of Distribution" and "Zero shot". Like I don't know how any person working with a proprietary LLM (or large model) that they don't own, can make a claim of "zero shot" or even "few shot" capabilities. We're publishing papers left and right, yet it's absurd to claim {zero,few}-shot when we don't have access to the learning distribution. We've merged these terms with biased sampling. Was the data not in training or is it just a low likelihood region of the model? They're indistinguishable without access to the original distribution.

Idk, I think our scaling is just making the problem harder to evaluate. I don't want to stop that camp because they are clearly producing things of value, but I do also want that camp to not make claims beyond their evidence. It just makes the discussion more convoluted. I mean the argument would be different if we were discussing small and closed worlds, but we're not. The claims are we've created world models yet many of them are not self-consistent. Certainly that is a requirement. I admit we're making progress, but the claims were made years ago. Take GameNGen[1] or Diamond Diffusion. Neither were the first and neither were self-consistent. Though both are also impressive.

[0] as an example: https://arxiv.org/abs/2303.09540

[1] https://news.ycombinator.com/item?id=41375548

[2] https://news.ycombinator.com/item?id=41826402

hodgehog11

4 months ago

Apologies if I ramble a bit here, this was typed in a bit of a hurry. Hopefully I answer some of your points.

First, regarding robotresearcher and simondota's comments, I am largely in agreement with what they say here. The "toaster" argument is a variant of the Chinese Room argument, and there is a standard rebuttal here. The toaster does not act independently of the human so it is not a closed system. The system as a whole, which includes the human, does understand toast. To me, this is different from the other examples you mention because the machine was not given a list of explicit instructions. (I'm no philosopher though so others can do a better job of explaining this). I don't feel that this is an argument for why LLMs "understand", but rather why the concept of "understanding" is irrelevant without an appropriate definition and context. Since we can't even agree on what constitutes understanding, it isn't productive to frame things in those terms. I guess that's where my maths background comes in, as I dislike the ambiguity of it all.

My "mostly junior" comment is partially in jest, but mostly comes from the fact that LLM and diffusion model research is a popular stream for moving into big tech. There are plenty of senior people in these fields too, but many reviewers in those fields are junior.

> I've also seen members of audiences to talks where people ask questions like mine ("are benchmarks sufficient to make such claims?") with responses of "we just care that it works."

This is a tremendous pain point to me more than I can convey here, but it's not unusual in computer science. Bad researchers will live and die on standard benchmarks. By the way, if you try to focus on another metric under the argument that the benchmarks are not wholly representative of a particular task, expect to get roasted by reviewers. Everyone knows it is easier to just do benchmark chasing.

> I also do not believe these people are less critical.

I think the fact that the "we just care that it works" argument is enough to get published is a good demonstration of what I'm talking about. If "more datasets" and "more scale" are the major types of criticisms that you are getting, then you are still working in a more fortunate field. And yes, I hate it as much as you do as it does favor the GPU rich, but they are at least potentially solvable. The easiest papers of mine to get through were methodological and often got these kinds of comments. Theory and SciML papers are an entirely different beast in my experience because you will rarely get reviewers that understand the material or care about its relevance. People in LLM research thought that the average NeurIPS score in the last round was a 5. Those in theory thought it was 4. These proportions feel reflected in the recent conferences. I have to really go looking for something outside the LLM mainstream, while there was a huge variety of work only a few years ago. Some of my colleagues have noticed this as well and have switched out of scientific work. This isn't unnatural or something to actively try to fix, as ML goes through these hype phases (in the 2000s, it was all kernels as I understand).

> approaches are narrow and still benchmark chasing > as a mathematician you think these systems create world models

When I say "world model", I'm not talking about outputs or what you can get through pure inference. Training models to perform next frame prediction and looking at inconsistencies in the output tells us little about the internal mechanism. I'm talking about appropriate representations in a multimodal model. When it reads a given frame, is it pulling apart features in a way that a human would? We've known for a long time that embeddings appropriately encode relationships between words and phrases. This is a model of the world as expressed through language. The same thing happens for images at scale as can be seen in interpretable ViT models. We know from the theory that for next frame prediction, better data and more scaling improves performance. I agree that isn't very interesting though.

> We are definitely abusing the terms "Out of Distribution" and "Zero shot".

Absolutely in agreement with everything you have said. These are not concepts that should be talked about in the context of "understanding", especially at scale.

> I think our scaling is just making the problem harder to evaluate.

Yes and no. It's clear that whatever approach we will use to gauge internal understanding needs to work at scale. Some methods only work with sufficient scale. But we know that completely black-box approaches don't work, because if they did, we could use them on humans and other animals.

> The claims are we've created world models yet many of them are not self-consistent.

For this definition of world model, I see this the same way as how we used to have "language models" with poor memory. I conjecture this is more an issue of alignment than a lack of appropriate representations of internal features, but I could be totally wrong on this.

godelski

4 months ago

  > The toaster does not act independently of the human so it is not a closed system

I think you're mistaken. No, not at that, at the premise. I think everyone agrees here. Where you're mistaken is that when I login to Claude it says "How can I help you today?"

No one is thinking that the toaster understands things. We're using it to point out how silly the claim of "task performance == understanding" is. Techblueberry furthered this by asking if the toaster is suddenly intelligent by wrapping it with a cron job. My point was about where the line is drawn. The turning on the toaster? No, that would be silly and you clearly agree. So you have to answer why the toaster isn't understanding toast. That's the ask. Because clearly toaster toasts bread.

You and robotresearcher have still avoided answering this question. It seems dumb but that is the crux of the problem. The LLM is claimed to be understanding, right? It meets your claims of task performance. But they are still tools. They cannot act independently. I still have to prompt them. At an abstract level this is no different than the toaster. So, at what point does the toaster understand how to toast? You claim it doesn't, and I agree. You claim it doesn't because a human has to interact with it. I'm just saying that looping agents onto themselves doesn't magically make them intelligent. Just like how I can automate the whole process from planting the wheat to toasting the toast.

You're a mathematician. All I'm asking is that you abstract this out a bit and follow the logic. Clearly even our automated seed to buttered toast on a plate machine needs not have understanding.

From my physics (and engineering) background there's a key thing I've learned: all measurements are proxies. This is no different. We don't have to worry about this detail in most every day things because we're typically pretty good at measuring. But if you ever need to do something with precision, it becomes abundantly obvious. But you even use this same methodology in math all the time. Though I wouldn't say that this is equivalent to taking a hard problem, creating an isomorphic map to an easier problem, solving it, then mapping back. There's an invective nature. A ruler doesn't measure distance. A ruler is a reference to distance. A laser range finder doesn't measure distance either, it is photodetector and a timer. There is nothing in the world that you can measure directly. If we cannot do this with physical things it seems pretty silly to think we can do it with abstract concepts that we can't create robust definitions for. It's not like we've directly measured the Higgs either. But what, do you think entropy is actually a measurement of intelligible speech? Perplexity is a good tool for identifying an entropy minimizer? Or does it just correlate? Is a FID a measurement of fidelity or are we just using a useful proxy? I'm sorry, but I just don't think there are precise mathematical descriptions of things like natural English language or realistic human faces. I've developed some of the best vision models out there and I can tell you that you have to read more than the paper because while they will produce fantastic images they also produce some pretty horrendous ones. The fact that they statistically generate realistic images does not imply that they actually understand them.

  > I'm no philosopher

Why not? It sounds like you are. Do you not think about metamathematics? What math means? Do you not think about math beyond the computation? If you do, I'd call you a philosopher. There's a P in a PhD for a reason. We're not supposed to be automata. We're not supposed to be machine men, with machine minds, and machine hearts.

  > This is a tremendous pain point ... researchers will live and die on standard benchmarks.

It is a pain we share. I see it outside CS as well, but I was shocked to see the difference. Most of the other physicists and mathematicians I know that came over to CS were also surprised. And it isn't like physicists are known for their lack of egos lol

  > then you are still working in a more fortunate field

Oh, I've gotten the other comments too. That research never found publication and at the end of the day I had to graduate. Though now it can be revisited. I once was surprise to find that I saved a paper from Max Welling's group. My fellow reviewers were confident in their rejections just since they admitted to not understanding differential equations the AC sided with me (maybe they could see Welling's name? I didn't know till months after). It barely got through a workshop, but should have been in the main proceedings.

So I guess I'm saying I share this frustration. It's part of the reason I talk strongly here. I understand why people shift gears. But I think there's a big difference between begrudgingly getting on the train because you need to publish to survive and actively fueling it and shouting that all outer trains are broken and can never be fixed. One train to rule them all? I guess CS people love their binaries.

  > world model

I agree that looking at outputs tells us little about their internal mechanisms. But proof isn't symmetric in difficulty either. A world model has to be consistent. I like vision because it gives us more clues in our evaluations, let's us evaluate beyond metrics. But if we are seeing video from a POV perspective, then if we see a wall in front of us, turn left, then turn back we should still expect to see that wall, and the same one. A world model is a model beyond what is seen from the camera's view. A world model is a physics model. And I mean /a/ physics model, not "physics". There is no single physics model. Nor do I mean that a world model needs to have even accurate physics. But it does need to make consistent and counterfactual predictions. Even the geocentric model is a world model (literally a model of worlds lol). The model of the world you have in your head is this. We don't close our eyes and conclude the wall in front of you will disappear. Someone may spin you around and you still won't do this, even if you have your coordinates wrong. The issue isn't so much memory as it is understanding that walls don't just appear and disappear. It is also understanding that this also isn't always true about a cat.

I referenced the game engines because while they are impressive they are not self consistent. Walls will disappear. An enemy shooting at you will disappear sometimes if you just stop looking at it. The world doesn't disappear when I close my eyes. A tree falling in a forest still creates acoustic vibrations in the air even if there is no one to hear it.

A world model is exactly that, a model of a world. It is a superset of a model of a camera view. It is a model of the things in the world and how they interact together, regardless of if they are visible or not. Accuracy isn't actually the defining feature here, though it is a strong hint, at least it is for poor world models.

I know this last part is a bit more rambly and harder to convey. But I hope the intention came across.

robotresearcher

4 months ago

> You and robotresearcher have still avoided answering this question.

I have repeatedly explicitly denied the meaningfulness of the question. Understanding is a property ascribed by an observer, not possessed by a system.

You may not agree, but you can’t maintain that I’m avoiding that question. It does not have an answer that matters; that is my specific claim.

You can say a toaster understands toasting or you can not. There is literally nothing at stake there.

godelski

4 months ago

You said the LLMs are intelligent because they do tasks. But the claim is inconsistent with the toaster example.

If a toaster isn't intelligent because I have to give it bread and press the button to start then how's that any different from giving an LLM a prompt and pressing the button to start?

It's never been about the toaster. You're avoiding answering the question. I don't believe you're dumb, so don't act the part. I'm not buying it.

robotresearcher

4 months ago

I didn’t describe anything as intelligent or not intelligent.

I’ll bow out now. Not fun to be ascribed views I don’t have, despite trying to be as clear as I can.

robotresearcher

4 months ago

Intellectual caution is a good default.

Having said that, can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?

As an example, how would a chess program that understands chess differ from one that is merely better at it than any human who ever lived?

(Chess the formal game; not chess the cultural phenomenon)

Some people don’t find the example satisfying, because they feel like chess is not the kind of thing where understanding pertains.

I extend that feeling to more things.

godelski

4 months ago

  > any human who ever lived

Is this falsifiable? Even restricting to those currently living? On what tests? In which way? Does the category of error matter?

  > can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?

I'd argue you didn't understand the examples from my previous comment or the direct reply[0]. Does it become a duck as soon as you are able to trick an ornithologist? All ornithologists?

But yes. Is it fair if I use Go instead of Chess? Game 4 with Lee Sedol seems an appropriate example.

Vafa also has some good examples[1,2].

But let's take an even more theoretical approach. Chess is technically a solved game since it is non-probabilistic. You can compute an optimal winning strategy from any valid state. Problem is it is intractable since the number of action state pairs is so large. But the number of moves isn't the critical part here, so let's look at Tic-Tac-Toe. We can pretty easily program up a machine that will not lose. We can put all actions and states into a graph and fit that on a computer no problem. Do you really say that the program better understands Tic-Tac-Toe than a human? I'm not sure we should even say it understands the game at all.

I don't think the situation is resolved by changing to unsolved (or effectively unsolved) games. That's the point of the Heliocentric/Geocentric example. The Geocentric Model gave many accurate predictions, but I would find it surprising if you suggested an astronomer at that time, with deep expertise in the subject, understood the configuration of the solar system better than a modern child who understands Heliocentricism. Their model makes accurate predictions and certainly more accurate than that child would, but their model is wrong. It took quite a long time for Heliocentrism to not just be proven to be correct, but to also make better predictions than Geocentrism in all situations.

So I see 2 critical problems here.

1) The more accurate model[3] can be less developed, resulting in lower predictive capabilities despite being a much more accurate representation of the verifiable environment. Accuracy and precision are different, right?

2) Test performance says nothing about coverage/generalization[4]. We can't prove our code is error free through test cases. We use them to bound our confidence (a very useful feature! I'm not against tests, but as you say, caution is good).

In [0] I referenced Dyson, I'd appreciate it if you watched that short video (again if it's been some time). How do you know you aren't making the same mistake Dyson almost did? The mistake he would have made had he not trusted Fermi? Remember, Fermi's predictions were accurate and they even stood for years.

If your answer is time, then I'm not convinced it is a sufficient explanation. It doesn't explain Fermi's "intuition" (understanding) and is just kicking the can down the road. You wouldn't be able to differentiate yourself from Dyson's mistake. So why not take caution?

And to be clear, you are the one making the stronger claim: "understanding has a well defined definition." My claim is that yours is insufficient. I'm not claiming I have an accurate and precise definition, my claim is that we need more work to get the precision. I believe your claim can be a useful abstraction (and certainly has been!), but that there are more than enough problems that we shouldn't hold to it so tightly. To use it as "proof" is naive. It is equivalent to claiming your code is error free because it passes all test cases.

[0] https://news.ycombinator.com/item?id=45622156

[1] https://arxiv.org/abs/2406.03689

[2] https://arxiv.org/abs/2507.06952

[3] Certainly placing the Earth at the center of the solar system (or universe!) is a larger error than placing the sun at the center of the solar system and failing to predict the tides or retrograde motion of Mercury.

[4] This gets exceedingly complex as we start to differentiate from memorization. I'm not sure we need to dive into what the distance from some training data needs be to make it a reasonable piece of test data, but that is a question that can't be ignored forever.

robotresearcher

4 months ago

>> any human who ever lived > Is this falsifiable? Even restricting to those currently living? On what tests? In which way? Does the category of error matter?

Software reliably beats the best players that have ever played it in public, including Kasparov and Carlsen, the best players of my lifetime (to my limited knowledge). By analogy to the performance ratchet we see in the rest of sports and games, and we might reasonably assume that these dominant living players are the best the world has ever seen. That could be wrong. But my argument does not hang on this point, so asking about falsifiability here doesn't do any work. Of course it's not falsifiable.

Y'know what else is not falsifiable? "That AI doesn't understand what it's doing".

  > can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?

> I'd argue you didn't understand the examples from my previous comment or the direct reply[0]. Does it become a duck as soon as you are able to trick an ornithologist? All ornithologists?

No one seems to have changed their opinion about anything in the wake of AIs routinely passing the Turing Test. They are fooled by the chatbot passing as a human, and then ask about ducks instead. The most celebrated and seriously considered quacks like a duck argument has been won by the AIs and no-one cares.

By the way, the ornithologists' criteria for duck is probably genetic and not much to do with behavior. A dead duck is still a duck.

And because we know what a duck is, no-one is yelling at ducks that 'they don't really duck' and telling duck makers they need a revolution in duck making and they are doomed to failure if they don't listen.

Not so with 'understanding'.

godelski

4 months ago

  > Y'know what else is not falsifiable? "That AI doesn't understand what it's doing".

Which is why people are saying we need to put in more work to define this term. Which is the whole point of this conversation.

  > seriously considered quacks like a duck argument has been won by the AIs and no-one cares.

And have you ever considered that it's because people are refining their definitions?

Often when people find that their initial beliefs are wrong or not precise enough then they update their beliefs. You seem to be calling this a flaw. It's not like the definitions are dramatically changing, they're refining. There's a big difference

robotresearcher

4 months ago

My first post here is me explaining that I have a non-standard definition of what ‘understanding’ means, which helps me avoid an apparently thorny issue. I’m literally here offering a refinement of a definition.

This is a weird conversation.

godelski

4 months ago

  > This is a weird conversation.

People are disagreeing with your refinement. The toaster example is exactly this.

Maybe what was interpreted is different than what you meant to convey, but certainly my interpretation was not unique. I'm willing to update my responses if you are willing to clarify but we'll need to work together on that. Because unfortunately just because the words make perfect sense to you doesn't mean they do to others.

I'll even argue that this is some of the importance of understanding. Or at least what we call understanding.

pennaMan

4 months ago

so your definition of "understand" is "able to develop the QC test (or explain tests already developed)"

I hate to break it to you, but the LLMs can already do all 3 tasks you outlined

It can be argued for all 3 actors in this example (the QC operator, the PhD chemist and the LLM) that they don't really "understand" anything and are iterating on pre-learned patterns in order to complete the tasks.

Even the ground-breaking chemist researcher developing a new test can be reduced to iterating on the memorized fundamentals of chemistry using a lot of compute (of the meat kind).

The mythical Understanding is just a form of "no true Scotsman"

godelski

4 months ago

  > that does not have a technical meaning

I don't think the definition is very refined, but I think we should be careful to differentiate that from useless or meaningless. I would say most definitions are accurate, but not precise.

It's a hard problem, but we are making progress on it. We will probably get there, but it's going to end up being very nuanced and already it is important to recognize that the word means different things in vernacular and in even differing research domains. Words are overloaded and I think we need to recognize this divergence and that we are gravely miscommunicating by assuming the definitions are obvious. I'm not sure why we don't do more to work together on this. In our field we seem to think we got it all covered and don't need others. I don't get that.

  > In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human.

And I do not think this is accurate at all. I would not say my calculator understands math despite it being able to do it better than me. I can say the same thing about a lot of different things which we don't attribute intelligence to. I'm sorry, but the logic doesn't hold.

Okay, you might take an out by saying the calculator can't do abstract math like I can, right? Well we're going to run into that same problem. You can't test your way out of it. We've known this in hard sciences like physics for centuries. It's why physicists do much more than just experiments.

There's the classic story of Freeman Dyson speaking to Fermi, which is why so many know about the 4 parameter elephant[0], but it is also just repeated through our history of physics. Guess what? Dyson's experiments worked. They fit the model. They were accurate and made accurate predictions! Yet they were not correct. People didn't reject Galileo just because the church, there were serious problems with his work too. Geocentricism made accurate predictions, including ones that Galileo's version of Heliocentrism couldn't. These historical misunderstandings are quite common, including things like how the average person understands things like Schrodinger's Cat. The cat isn't in a parallel universe of both dead and alive lol. It's just that we, outside the box can't determine which. Oh, no, information is lossy, there's injective functions, the universe could then still be deterministic yet we wouldn't be able to determine that (and my name comes into play).

So idk, it seems like you're just oversimplifying as a means to sidestep the hard problem[1]. The lack of a good technical definition of understanding should tell us we need to determine one. It's obviously a hard thing to do since, well... we don't have one and people have been trying to solve it for thousands of years lol.

  > Just my opinion, but my professional opinion from thirty-plus years in AI.

Maybe I don't have as many years as you, but I do have a PhD in CS (thesis on neural networks) and a degree in physics. I think it certainly qualifies as a professional opinion. But at the end of the day it isn't our pedigree that makes us right or wrong.

[0] https://www.youtube.com/watch?v=hV41QEKiMlM

[1] I'm perfectly fine tabling a hard problem and focusing on what's more approachable right now, but that's a different thing. We may follow a similar trajectory but I'm not going to say the path we didn't take is just an illusion. I'm not going to discourage others from trying to navigate it either. I'm just prioritizing. If they prove you right, then that's a nice feather in your hat, but I doubt it since people have tried that definition from the get go.

robotresearcher

4 months ago

> It's a hard problem

So people say.

I’m not sidestepping the Hard Problem. I am denying it head on. It’s not a trick or a dodge! It’s a considered stance.

I'm denying that an idea that has historically resisted crisp definition, and that the Stanford Encyclopedia of Philosophy introduces as 'protean', needs to be taken seriously as an essential missing part of AI systems, until someone can explain why.

In my view, the only value the Hard Problem has is to capture a feeling people have about intelligent systems. I contend that this feeling is an artifact of being a social ape, and it entails nothing about AI.

pastel8739

4 months ago

Regardless of whether you think understanding is important, it’s clear from this thread that a lot of people find understanding valuable. In order to trust an AI with decisions that affect people, people will want to believe that the AI “understands” the implications of its decisions, for whatever meaning of “understand” those people have in their head. So indeed I think it is important that AI researchers try to get their AIs to understand things, because it is important to the consumers that they do.

robotresearcher

4 months ago

I agree with this. I contend that as the AIs improve in performance, the designation of understanding will accrete to them. I predict there will never be a component, module, training process, or any other significant piece of an AI that is the ‘understanding’ piece that some believe is missing today.

Also, the widespread human belief that something is valuable has absolutely no entailments to me other than treating the believers with normal respect. It’s very easy to think of things that are important to billions that you believe are not true or relevant to a reality-driven life.

godelski

4 months ago

It's a sidestep if your stance doesn't address critiques.

  > needs to be taken seriously as an essential missing part of AI systems, until someone can explain why.

Ignoring critiques is not the same as a lack of them

Zarathruster

4 months ago

While I agree with you in the main, I also take seriously the "until someone can explain why" counterpoint.

Though I agree with you that your calculator doesn't understand math, one might reasonably ask, "why should we care?" And yeah, if it's just a calculator, maybe we don't care. A calculator is useful to us irrespective of understanding.

If we're to persuade anyone (if we are indeed right), we'll need to articulate a case for why understanding matters, with respect to AI. I think everyone gets this on an instinctual level- it wasn't long ago that LLMs suggested we add rocks to our salads to make them more crunchy. As long as these problems can be overcome by throwing more data and compute at them, people will remain incurious about the Understanding Problem. We need to make a rigorous case, probably with a good working alternative, and I haven't seen much action here.

godelski

4 months ago

  > "why should we care?"

I'm not the one claiming that a calculator thinks. The burden of proof lies on those that do. Claims require evidence and extraordinary claims require extraordinary evidence.

I don't think anyone is saying that the calculator isn't a useful tool. But certainly we should push back when people are claiming it understands math and can replace all mathematicians.

  > If we're to persuade anyone, we'll need to articulate a case for why understanding matters

This is a more than fair point. Though I have not found it to be convincing when I've tried.

I'll say that a major motivating reason of why I went into physics in the first place is because I found that a deep understanding was a far more efficient way of learning how to do things. I started as an engineer and even went into engineering after my degree. Physics made me a better engineer, and I think a better engineer than had I stayed in engineering. Understanding gave me the ability to not just take building blocks and put them together, but to innovate. Being able to see things at a deeper level allowed me to come to solutions I otherwise could not have. Using math to describe things allowed me to iterate faster (just like how we use simulations). Understanding what the math meant allowed me to solve the problems where the equations no longer applied. It allowed me to know where the equations no longer applied. It told me how to find and derive new ones.

I often found that engineers took an approach of physical testing first, because "the math only gets you so far." But that was just a misunderstanding of how far their math took them. It could do more, just they hadn't been taught that. So maybe I had to take a few days working things out on pen and paper, but that was a cheaper and more robust solution than using the same time to test and iterate.

Understanding is a superpower. Problems can be solved without understanding. A mechanic can fix an engine without knowing how it works. But they will certainly be able to fix more problems if they do. The reason to understand is because we want things to work. The problem is, the world isn't so simple that every problem is the same or very similar to another. A calculator is a great tool. It'll solve calculations all day. Much faster than me, with higher accuracy, but it'll never come up with an equation on its own. That isn't to call it useless, but I need to know this if I want to get things done. The more I understand what my calculator can and can't do, the better I can use that tool.

Understanding things, and the pursuit to understand more is what has brought humans to where they are today. I do not understand why this is even such a point of contention. Maybe the pursuit of physics didn't build a computer, but it is without a doubt what laid the foundation. We never could have done this had we not thought to understand lightning. We would have never been able to tame it like we have. Understanding allows us to experiment with what we cannot touch. It does not mean a complete understanding nor does it mean perfection, but it is more than just knowledge.

Zarathruster

4 months ago

Super late to this, sorry.

> I'm not the one claiming that a calculator thinks. The burden of proof lies on those that do. Claims require evidence and extraordinary claims require extraordinary evidence.

You're right, I may have misconstrued the original claim. I took the parent to be saying something like "calculators understand math, but also, understanding isn't particularly important with respect to AI" but I may have gotten some wires crossed. This isn't the old argument about submarines that swim, I don't think.

> Understanding is a superpower.

Thanks, this is all well-put.

robotresearcher

4 months ago

Critiques should come with some argument if they want to be taken seriously.

If I say it’s not real intelligence because the box isn’t blue, how much does anyone owe that critique? How about if a billion people say that blueness is the essence missing from AIs?

Tell me why blue matters and we have a conversation.

user

4 months ago

[deleted]

nebula8804

4 months ago

Only problem is this time enough money is being burned that if AGI does not come, it will probably be extremely painful/fatal for a lot of people that had nothing to do with this field or the decisions being made. What will be the consequences if that comes to pass? So many lives were permanently ruined due to the GFC.

JKCalhoun

4 months ago

I'm not sure. There's a view that, as I understand it, suggests that language is intelligence. That language is a requirement for understanding.

An example might be kind of the contrary—that you might not be able to hold an idea in your head until it has been named. For myself, until I heard the word gestalt (maybe a fitting example?) I am not sure I could have understood the concept. But when it is described it starts to coalesce—and then when named, it became real. (If that makes sense.)

FWIW, Zeitgeist is another one of those concepts/words for me. I guess I have to thank the German language.

Perhaps it is why other animals on this planet seem to us lacking intelligence. Perhaps it is their lack of complex language holding their minds back.

godelski

4 months ago

  > There's a view that suggests that language is intelligence.

I think you find the limits when you dig in. What are you calling language? Can you really say that Eliza doesn't meet your criteria? What about a more advanced version? I mean we've been passing the Turing Test for decades now.

  > That language is a requirement for understanding.

But this contradicts your earlier statement. If language is a requirement then it must precede intelligence, right?

I think you must then revisit your definition of language and ensure that it matches to all the creatures that you consider intelligent. At least by doing this you'll make some falsifiable claims and can make progress. I think an ant is intelligent, but I also think ants do things far more sophisticated than the average person thinks. It's an easy trap, not knowing what you don't know. But if we do the above we get some path to aid in discovery, right?

  > that you might not be able to hold an idea in your head until it has been named

Are you familiar with Anendophasia?

It is the condition where a person does not have an internal monologue. They think without words. The definition of language is still flexible enough that you can probably still call that language, just like in your example, but it shows a lack of precision in the definition, even if it is accurate.

  > Perhaps it is why other animals on this planet seem to us lacking intelligence

One thing to also consider is if language is necessary for societies or intelligence. Can we decouple the two? I'm not aware of any great examples, although octopi and many other cephalopods are fairly asocial creatures. Yet they are considered highly intelligent due to their adaptive and creative nature.

Perhaps language is a necessary condition for advanced intelligence, but not intelligence alone. Perhaps it is communication and societies, differentiating from an internalized language. Certainly the social group can play an influence here, as coalitions can do more than the sum of the individuals (by definition). But the big question is if these things are necessary. Getting the correct causal graph, removing the confounding variables, is no easy task. But I think we should still try and explore differing ideas. While I don't think you're right, I'll encourage you to pursue your path if you encourage me to pursue mine. We can compete, but it should be friendly, as our competition forces us to help see flaws in our models. Maybe the social element isn't a necessary condition, but I have no doubt that it is a beneficial tool. I'm more frustrated by those wanting to call the problem solved. It obviously isn't, as it's been so difficult to get generalization and consensus among experts (across fields).

the_gipsy

4 months ago

> It is the condition where a person does not have an internal monologue.

These people are just nutjobs that misinterpreted what internal monologue means, and have trouble doing basic introspection.

I know there are a myriad of similar conditions, aphantasia, synaesthesia, etc. But someone without internal monologue simply could not function in our society, or at least not pass as someone without obvious mental diminishment.

If there really were some other, hidden code in the mind, that could express "thoughts" in the same depth as language does - then please show it already. At least the tiniest bit of a hint.

godelski

4 months ago

I know some of these people. We've had deep conversations about what is going on in our thought processes. Their description significantly differs from mine.

These people are common enough that you likely know some. It's just not a topic that frequently comes up.

It is also a spectrum, not a binary thing (though full anendophasia does exist, it is just on the extreme end). I think your own experiences should allow you to doubt your claim. For example, I know when I get really into a fiction book I'm reading that I transition from a point where I'm reading the words in my head to seeing the scenes more like a movie, or more accurately like a dream. I talk to myself in my head a lot, but I can also think without words. I do this a lot when I'm thinking about more physical things like when I'm machining something, building things, or even loading dishwasher. So it is hard for me to believe that while I primarily use an internal monologue that there aren't people that primarily use a different strategy.

On top of that, well, I'm pretty certain my cat doesn't meow in her head. I'm not certain she has a language at all. So why would it be surprising that this condition exists? You'd have to make the assumption that there was a switch in human evolution. Where it happened all at once or all others went extinct. I find that less likely than the idea that we just don't talk enough about how we think to our friends.

Certainly there are times where you think without a voice in your head. If not, well you're on the extreme other end. After all, we aren't clones. People are different, even if there's a lot of similarities.

lovecg

4 months ago

I’m like that more often than not. Words and language always seemed like a “translation layer” to express myself to other people, not something essential that needs to happen in my head. Especially when thinking deeply about some technical problem there’s no language involved, just abstract shapes and seeing things “in my mind’s eye”.

We might just be rehashing that silly internet meme about “shape rotators”, but there could be a correlation here where people whose minds work this way are more dismissive of LLMs.

the_gipsy

4 months ago

I suggest you revisit the subject with your friends, with two key points:

1. Make it clear to them that with "internal monologue" you do not mean an actual audible hallucination

2. Ask them if they EVER have imagined themselves or others saying or asking anything

If they do, which they 100% will unless they lie, then you have ruled out "does not have an internal monologue", the claim is now "does not use his internal monologue as much". You can keep probing them what exactly that means, but it gets washy.

Someone that truly does not have an internal dialogue could not do the most basic daily tasks. A person could grab a cookie from the table when they feel like it (oh, :cookie-emoji:!), but they cannot put on their shoes, grab their wallet and keys, look in the mirror to adjust their hair, go to the supermarket, to buy cookies. If there were another hidden code that can express all huge mental state pulled by "buy cookies", by now we would at least have an idea that it exists underneath. We must also ask, why would we translate this constantly into language, if the mental state is already there? Translation costs processing power and slows down. So why are these "no internal monologue" people not geniuses?

I have no doubt that there is a spectrum, on that I agree with you. But the spectrum is "how present is (or how aware is the person of-) the internal monologue". E.g. some people have ADHD, others never get anxiety at all. "No internal monologue" is not one end of the spectrum for functioning adults.

The cat actually proves my point. A cat can sit for a long time before a mouse-hole, or it can hide to jumpscare his brother cat, and so on. So to a very small degree there is something that let's it process ("understand") very basic and near-future event and action-reactions. However, a cat could not possibly go to the supermarket to buy food, obviating anatomical obstacles, because: it has no language and therefore cannot make a complex mental model. Fun fact: whenever animals (apes, birds) have been taught language, they never ask questions (some claim they did, but if you dig in you'll see that the interpretation is extremely dubious).

godelski

4 months ago

  > 1. Make it clear to them that with "internal monologue" you do not mean an actual audible hallucination

What do you mean? I hear my voice in my head. I can differentiate this from a voice outside my head, but yes, I do "hear" it.

And yes, this has been discussed in depth. It was like literally the first thing...

But no, they do not have conversations in their heads like I do. They do not use words as their medium. I have no doubt that their experience is different from mine.

  > 2. Ask them if they EVER have imagined themselves or others saying or asking anything

This is an orthogonal point. Yes, they have imagined normal interactions. But frequently those imaginary conversations do not use words.

  > The cat actually proves my point.

Idk man, I think you should get a pet. My cat communicates with me all the time. But she has no language.

  > Fun fact: whenever animals (apes, birds) have been taught language, they never ask questions (some claim they did, but if you dig in you'll see that the interpretation is extremely dubious).

To be clear, I'm not saying my cat's intelligence is anywhere near ours. She can do tricks and is "smart for a cat" but I'm not even convinced she's as intelligent as the various wild corvids I feed.

the_gipsy

4 months ago

It's pretty self explanatory: there's actual voice heard with your ears, there's the internal monologue, and then there's a hallucination.

> Yes, they have imagined normal interactions. But frequently those imaginary conversations do not use words.

And you did not dig in deeper? How exactly do you imagine a conversation without words?

godelski

4 months ago

  > there's actual voice heard with your ears, there's the internal monologue, and then there's a hallucination.

This needs no explaining. I think I sufficiently made it clear that we agree with these distinctions.

  >> I hear my voice in my head. I can differentiate this from a voice outside my head, but yes, I do "hear" it.

Though to be more precise I would say that a hallucination appears to come from outside the head, even if you are aware that it is coming from inside. Still, clearly distinct from an internal monologue, which is always clearly internal.

  > And you did not dig in deeper?

  >>>> I know some of these people. ***We've had deep conversations about what is going on in our thought processes.***

Yes. Multiple hours long conversations. One of these people I know now studies psychology. I research intelligence and minds from an artificial standpoint and they from a biological. Yeah, we have gotten pretty deep and have the skills and language to do so far more than the average person.

I think you need to consider that you may just be wrong. You are trying very hard to defend your belief, but why? The strengths of our beliefs should be proportional to the evidence that supports them. I am not trying to say that your logic is bad, let's make that clear. But I think your logic doesn't account for additional data. If you weren't previously aware of this data then how could you expect the logic to reach the correct conclusion? I want to make this clear because I want to distinguish correctness from intelligence (actually relevant to the conversation this stemmed from). You can be wrong without being dumb, but you can also be right and dumb. I think on this particular issue you fall into the former, not the latter. I respect that you are defending your opinion and beliefs, but this is turning as you are rejecting data. Your argument now rests on the data being incorrect, right? Because that's the point. Either the data is wrong or your model is wrong (and let's distinguish that a model is derived through logic to explain data).

I want to remind you that this idea is testable too. I told you this because it is a way to convince yourself and update the data up have available to you. You can train yourself to do this in some cases. Not all and obviously it won't be an identical experience to these people, but you can get yourself to use lower amounts of language when thinking through problems. You had also mentioned that people with aphantasia couldn't function, but think about that too. These topics are quite related actually, considering how we've discussed anendophasia you should be able to reason that these people are really likely to have low aphantasia. Notice I said low, as this is a spectrum. You can train the images in your mind to be stronger too. The fact that some images are stronger than others should lead you to believe that this is a spectrum and that it is likely people operate at different base levels. It should also lead you to reason that this is likely trainable in an average person. The same goes for anendophasia. Don't make this binary, consider it a spectrum. That's how the scientific literature describes the topic too. But if you pigeonhole it to being binary and only true in the extreme cases then your model isn't flexible enough as it also isn't considering the variances in people.

Go talk with your friends. Get detailed. When you imagine an apple in your head how much do you see? As the person if their process involves words or if it is purely imagery. If words, how many? Is it a red apple? Green? Yellow? Can they smell it? Can they taste it? What's it smell and taste like? I will bet you every single person you talk to will answer these differently. I will even wager that each time you do the exercise you yourself will answer differently, even if the variance is much smaller. But that's data, and your model needs to be able to explain that data too. While I think you have the right thought process I don't think you are accounting for this variance, instead treating it as noise. But noise can be parameterized and modeled too. Noise is just the statistical description of uncertainty.

the_gipsy

4 months ago

Let me be clear: yes, I know I might be wrong. I hope I'm not dumb and wrong, or at least not dumb. I am also not writing here as some kind of debate exercise. I do because I find this topic extremely interesting and insightful. What if language is the intelligence? What if "guessing the next word" really was all that was there, to peak human intelligence, knowledge, and understanding of our world? I am not hyped by AI, it's rather that I find this possibility somewhat sad.

I've made up a model, an idea, and I don't think the data opposing it is trustworthy. My first problem is that there are many people that claim that they have NO internal monologue, which means NEVER constructing a sentence from theirselves or others in their head (except directly as verbal speech), and this seems outright impossible. When pressed, these people usually either admit that they do have some monologue, just "much less". Or they misunderstood it for something similar to schizophrenia, actual hallucinations. If they don't admit to actually, sometimes, having them, then they fail to explain where exactly the line between "thinking of someone or themselves saying something" and the internal monologue/dialogue is. As if they had been caught lying by the detective, they end the conversation. Or at least that's how I feel, I really don't know how to ask more questions here before making them feel too interrogated, or like someone that has self-diagnosed being told that they are imagining things.

With "absolutely none" group out of the way, it leaves us with people who claim to perceive the internal monologue very scarcely, and claim that they do not need to "think" or "do". How can we possibly test this scientifically? The data is all self-reported. Or at least I don't know if this can or has been neurologically researched.

Consider also that all self-reported data about internal monologue is "poisoned": we are trying to get objective data with the data itself as a vehicle. We are not asking if someone feels pain, or if they can solve a puzzle in a timeframe. We cannot measure electric activity with some instruments, nor evaluate yes-or-no questions.

What if it is true that some people do not perceive their internal monologue? I certainly don't remember it "popping" into my head at a certain age, and I think nobody does. When we learn language, we become conscious with it, because it allows to model the world, beyond putting things in our mouth and screaming. So it could be that not everybody perceives it equally, a spectrum like you said, and that some people rationalize it retroactively as not being there - just "thoughts", ideas, feelings. We reconstruct past events via a narration, filling in details by guessing, so why wouldn some people guess that they are not narrating in their head? It is not something that is taught in school or from or parents, you either perceive it as "internal monologue", or as "just thinking", because, well, it's the thinking doing it's thing.

llamajams

4 months ago

Somewhat out of my league in this thread but,I think I am.one of these people. I do remember a time before I had an internal monologue , in fact I remember the day in elementary school when I learned, after having been explained to me by my teacher, that everyone else was "talking to themselves in their head". I think I spent the next month or so obsessing over this new found ability. But before that day I was perfectly capable of thought, and conversation, and writing. Even now I can "switch modes" and have coherent thoughts occur, with no labeling or accompanying narrative. I can distinctly identify concepts and transitions between them but there are no words involved until I open my mouth. So I dont know if it was just a hidden background process before that day. But it definitely "feels' different when its in the foreground or back, or nor there.

godelski

4 months ago

  > What if language is the intelligence?

Almost certainly not. There does not seem to be a strong correlation between the two. We have a lot of different measures for intelligence when it comes to animals. We can place them across a (multidimensional) spectrum and humans seem unique with language. It also appears that teaching animals language does not cause them to rapidly change on these metrics despite generations of language capabilities.

  > What if "guessing the next word" really was all that was there, to peak human intelligence, knowledge, and understanding of our world?

I believe this is falsifiable. As I best understand it is a belief of this relationship: predict next word <--> understanding. Yet we know that neither direction holds true. I'll state some trivial cases for brevity[0] but I have no doubt you can determine more complicated ones and even find examples.

-> I can make accurate predictions about coin flips without any understanding of physics or how the coin is being flipped. All I need to do is be lucky. Or we can take many mechanical objects like a clock that can predict time.

Or a horse can appear to do math if I tell it how many times to stomp its foot. It made accurate predictions yet certainly has no understanding.

Ehh I'll give you a more real example. Here's a model that gives accurate predictions for turn by turn taxi directions where the authors extract the world model and find it is not only inaccurate but find that it significantly diverges. Vafa has a few papers on the topic, I suggest reading his work.

<- You can understand all the physics to a double pendulum and not predict the movement for any arbitrary amount of time moving forward if you do not also know the initial conditions. This is going to be true for any chaotic system.

I said we've seen this in the history of science. {Geo,Helio}centrism is a great example. Scientists who had no affiliation with the church still opposed Galileo because his model wasn't making accurate predictions for certain things. Yet the heliocentric model is clearly a better understanding and more accurate as a whole. If you want to dive deeper into this topic I'd highly recommend both the podcast "An Opinionated History of Math" and the book "Representing and Intervening" by Ian Hacking. They're both very approachable. FWIW, metaphysics talks about this quite a lot.

  > My first problem is that there are many people that claim that they have NO internal monologue

So again, I cannot stress that we should not represent this as a binary setting. The binary cases are the extreme (in both directions). Meaning very few people experience them.

The problem here is one of language and semantics, not effect. I completely believe that someone will say "I have no internal monologue" if >90% of their thinking is without an internal monologue. Just like how a guy who's 5'11.75" will call themselves 6'. Are they a liar? I wouldn't say so, they're >99% accurate. Would you for someone 5'11"? That's probably more contextually dependent.

So you distrust the data. That's fine. Let's assume poisoned. We should anyways since noise is an important part of any modeling[2]. It is standard practice...

So instead, do you distrust that there's a distribution into how much of an internal monologue individuals use? Or do you presume they all use them the same.

I'd find it hard to believe you distrust the spectrum. But if you trust the spectrum then where is the threshold for your claim? 0%? That's really not a useful conversation even if heavy tailed.

You are hyper-fixated on the edge case but its result isn't actually consequential to your model. The distribution is! You'll have to consider your claims much more carefully when you consider a distribution. You need to then claim a threshold, in both directions. Or if you make the claim that we're all the same (I'd find that quite surprising tbh, especially given the nature of linguistics), you need to explain that too and your expected distribution that would claim that (narrow).

All I can tell you is that my friend and I have had this conversation multiple times over many years and it seems very constant to me. I have no reason to believe they are lying and if they are they are doing so with an extreme level of consistency, which would be quite out of the norm.

[0] Arguing the relationship still requires addressing trivial relationships.

[1] https://arxiv.org/abs/2406.03689

[2] Even if there are no liars (or "lizardmen"[3]) we still have to account for miscommunication and misunderstandings.

[3] https://en.wiktionary.org/wiki/Lizardman%27s_Constant

the_gipsy

4 months ago

> We have a lot of different measures for intelligence when it comes to animals.

But there is an abismal difference between animal intelligence and human intelligence.

> predict next word <--> understanding

Yes, and I could say a stone understands the world because its state reflects the world: it gets hot, cold, wet, dry, radiated, whatever. Perhaps its internal state can even predict the world: if it's rolling downhill, it can predict that it will stop soon. But the stone is not conscious like a human, and neither is a clock nor a horse that can count to ten. The stone obviously is "reducing to the absurd" - a horse can actually "guess" to some degree, but nothing like a human. It cannot ask a question, and it cannot answer itself a question.

> I cannot stress that we should not represent this as a binary setting.

That was kind of my point, to eliminate the binary "no", leaving us with a spectrum.

My initial claim "these are just nutjobs" - my apologies for the phrasing - was addressing this: there are no people "without internal monologue AT ALL".

Since we seem to actually agree on this point, our difference is that I believe that the people with "little internal monologue" are simply not aware of it.

Let me phrase string it this way: If language is the understanding, then the internal monologue is not some quirky side effect. To understand something at the human level, we need to describe it with language, the rest are primitive instincts and "feelings".

We can model the past and the future. We can model ourselves in 10 years. And what is one of the most important things we would model? What we would say or think then - thinking being "saying something out silently in our head". Not really just feelings: "I would love my partner", sure but why? "Because . . .".

When we are utilizing language, the internal monologue, to construct the model, we cannot be "aware of it" constantly. That is, the bandwidth is taken by the tasks at hand that we are dealing with, it would be detrimental if every other phrase would be followed with "btw did I notice that I just understand this via a string of words?". The more complex actions or idea we process, the less aware we are that we are using language for it. That is "being in the flow". We can reconstruct it when done, and here, if there is a lack of awareness of internal monologue, it will be rationalized as something else.

> Or if you make the claim that we're all the same (I'd find that quite surprising tbh, especially given the nature of linguistics), you need to explain that too and your expected distribution that would claim that (narrow).

My explanation (without proof), is that it's just a matter of awareness.

> All I can tell you is that my friend and I have had this conversation multiple times over many years and it seems very constant to me. I have no reason to believe they are lying and if they are they are doing so with an extreme level of consistency, which would be quite out of the norm.

Can you think of some kind of tests question (or string of questions) that could prove either? I have been thinking about it obviously, but I can't come up with any way to empirically test that there is or is no internal monologue. Consistency could simply mean that their rationalization is consistent.

I'll leave you this article, which I found quite interesting: https://news.ycombinator.com/item?id=43685072 The person lost language, and lost what we could consider human-level consciousness at the same time, and then recovered both at the same rate. Of course, there was brain damage, so it's not an empirical conclusion.

Also this book https://en.wikipedia.org/wiki/The_Origin_of_Consciousness_in... while partially debunked and being pop-sci to begin with, has wildly interesting insights into the internal monologue and at least draws extremely interesting questions.

Mikhail_Edoshin

4 months ago

There is us a book written by a woman who suffered a stroke. She lost the ability to speak and understand language. Yet she remained conscious. It took her ten years to fully recover. The book is called "A stroke of insight".

the_gipsy

4 months ago

Conscious, like an animal or a baby. She could not function at all like a normal adult. Proves my point.

naasking

4 months ago

> It's to recite (or even apply) knowledge. To understand does actually require a world model.

This is a shell game, or a god of the gaps. All you're saying is that the models "understand" how to recite or apply knowledge or language, but somehow don't understand knowledge or language. Well what else is there really?

godelski

4 months ago

  > Well what else is there really?

Differentiate from memorization.

I'd say there's a difference between a database and understanding. If they're the same, well I think Google created AGI a long time ago.

naasking

hodgehog11

4 months ago

As someone who is invested in researching said math, I can say with some confidence that it does not exist, or at least not in the form claimed here. That's the whole problem.

I would be ecstatic if it did though, so if anyone has any examples or rebuttal, I would very much appreciate it.

nsagent

4 months ago

You're right that there is no purely mathematical argument; it's almost non-sensical to claim such.

Instead you can simply make the rather reasonable observation that LLMs are a product of their training distribution, which only contains partial coverage of all possible observable states of the world. Some highly regular observable states are thus likely missing, but an embodied agent (like a human) would be able to correctly reason about these states due to other sensory modes that observe these states. It's very clear that the number of possible states far exceeds any text describing those states (the number of states transcribed to text is vanishingly small in comparison, but they are locally coherent and quite regular when observed through other sensory modes like vision). That's the closest you can make to a mathematical argument.

As a result one common failure case for an LLM might involve describing a specific physical environment and associated set of actions that are in an area of low (or no) probability mass in the training data and asking the LLM to reason about it. Here's an example where I asked ChatGPT the following:

  imagine i walked into a room through a doorway, then turned around and walked right back out of the room. from the perspective of a person in the room facing the door, which side would my right arm be? please explain.

It responded by incorrectly stating the following:

  Let’s break this down step by step.

  1. *You walk into the room through the doorway*:

   * As you're entering, the person in the room is facing the door. So, you're entering from the **outside**.
   * Your **right arm** will be on the **right side** of your body, just as it is normally when you're standing.

  2. *You turn around and walk right back out*:

   * When you turn around to walk out, you’ll be facing **away** from the room, looking back toward the doorway.
   * Since you're walking **back out**, the person inside the room is still facing the door, while you are walking away from them, so your **right arm** is still on the right side of your body.

  So, from the perspective of the person in the room, as you’re exiting, your *right arm* will still be on their *left side*. That’s because they’re facing the door, and your right side is to their left when you’re turning around and leaving.

  To visualize:

  * Imagine a door in front of them.
  * You walk through and turn around, so you're now facing away from them.
  * Your right arm is on your right side, but to them, it's on their left side since you’re now moving in the opposite direction.

godelski

4 months ago

My claim is more about that data processing is not enough. I was too vague and I definitely did not convey myself accurately. I tried to clarify a bit in a sibling comment to yours but I'm still unsure if it is sufficient tbh.

For embodiment, I think this is sufficient but not necessary. A key part to the limitation is that the agent cannot interact with its environment. This is a necessary feature for distinguishing competing explanations. I believe we are actually in agreement here, but I do think we need to be careful how we define embodiment. Because even a toaster can be considered a robot. It seems hard to determine what does not qualify as a body when we get to the itty gritty. But I think in general when people are talking about embodiment they are discussing the capability of being interventional.

By your elaboration I believe we agree since part of what I believe to be necessary is the ability to self-analyze (meta-cognition) to determine low density regions of its model and then to be able to seek out and rectify this (intervention). Data processing is not sufficient for either of those conditions.

Your prompt is, imo, more about world modeling, though I do think this is related. I asked Claude Sonnet 4.5 with extended thinking enabled and it also placed itself outside the room. Opus 4.1 (again with extended thinking), got the answer right. (I don't use a standard system prompt, though that is mostly to make it not syncopathic and to try to get it to ask questions when uncertain and enforce step by step thinking)

  From the perspective of the person in the room, your right arm would be on their right side as you walk out.
  
  Here's why: When you initially walk into the room facing the person, your right arm appears on their left side (since you're facing each other). But when you turn around 180 degrees to walk back out, your back is now toward them. Your right arm stays on your right side, but from their perspective it has shifted to their right side.

  Think of it this way - when two people face each other, their right sides are on opposite sides. But when one person turns their back, both people's right sides are now on the same side.

The CoT output is a bit more interesting[0]. Disabling my system prompt gives an almost identical answer fwiw. But Sonnet got it right. I repeated the test in incognito after deleting the previous prompts and it continued to get it right, independent of my system prompt or extended thinking.

I don't think this proves a world model though. Misses are more important than hits, just as counter examples are more important than examples in any evidence or proof setting. But fwiw I also frequently ask these models variations on river crossing problems and the results are very shabby. A few appear spoiled now but they are not very robust to variation and that I think is critical.

I think an interesting variation of your puzzle is as follows

  Imagine you walked into a room through a doorway. Then you immediately turn around and walk back out of the room. 

  From the perspective of a person in the room, facing the door, which side would your right arm be? Please explain.

I think Claude (Sonnet) shows some subtle but important results in how it answers

  Your right arm would be on their right side.
  When you turn around to walk back out, you're facing the same direction as the person in the room (both facing the door). Since you're both oriented the same way, your right side and their right side are on the same side.

This makes me suspect there's some overfitting. CoT correctly uses "I"[1].

It definitely isn't robust to red herrings[2], and I think that's a kicker here. It is similar to failure results I see in any of these puzzles. They are quite easy to break with small variations. And we do need to remember that these are models trained on the entire internet (including HN comments), so we can't presume this is a unique puzzle.

[0] http://0x0.st/K158.txt

[1] http://0x0.st/K15T.txt

[2] http://0x0.st/K15m.txt

godelski

4 months ago

Let me clarify. I was too vague and definitely did not express things accurately. That is on me.

We have the math to show that it can be impossible to distinguish two explanations through data processing alone. We have examples of this in science, a long history of it in fact. Fundamentally there is so much that we cannot conclude from processing data alone. Science (the search of knowledge) is active. It doesn't require just processing existing data, it requires the search for new data. We propose competing hypotheses that are indistinguishable from the current data and seek out the data which distinguishes them (a pain point for many of the TOEs like String Theory). We know that data processing alone is insufficient for explanation. We know it cannot distinguish confounders. We know it cannot distinguish causal graphs (e.g. distinguish triangular maps. We are able to create them, but not distinguish them through data processing alone). The problem with scaling alone is that it makes the assertions that data processing is enough. Yet we have so much work (and history) telling us that data processing is insufficient.

The scaling math itself also shows a drastic decline in performance with scale and often do not suggest convergence even with infinite data. They are power laws with positive concavity, requiring exponential increase in data and parameters for marginal improvements on test loss. I'm not claiming that we need zero test loss to reach AGI, but the results do tell us that if this is strongly correlated then we'll need to spend an exponential amount more to achieve AGI even if we are close. By our measures, scaling is not enough unless we are sufficiently close. Even our empirical results align with this as despite many claiming that scale is all we need, we are making significant changes to the model architectures and training procedures (including optimizers). We are making these large changes because throwing the new data at the old models (even when simply increasing the number of parameters) does not work out. It is not just the practicality, it is the results. The scaling claim has always been a myth used to drive investments since it is a nice simple story that says that we can get there by doing what we've already been doing, just more. We all know that these new LLMs aren't dramatic improvements off their previous versions, despite being much larger, more efficient, and having processed far more data.

[side note]: We even have my namesake who would argue that there are truths which are not provably true with a system that is both consistent and efficient (effectively calculable). But we need not go that far, as omniscience is not a requirement for AGI. Though it is worth noting for the limits of our models, since at the core this matters. Changing our axioms changes the results, even with the same data. But science doesn't exclusively use a formal system, nor does it use a single one.

hodgehog11

4 months ago

My apologies for the much delayed reply as I have recently found myself with little extra time to post adequate responses. Your critiques are very interesting to ponder, so I thank you for posting them. I did want to respond to this one though.

I believe all of my counterarguments center around my current viewpoint that given the rapid rate of progress involved on the engineering side, it is no longer reasonable in deep learning theory to consider what is possible, and it is more interesting to try to outline hard limitations. This emposes a stark contrast between deep learning and classical statistics, as the boundaries in the latter are very clear and are not shared by the former.

I want to stress that at present, nearly every conjectured limitation of deep learning over the last several decades has fallen. This includes many back of the napkin, "clearly obvious" arguments, so I'm wary of them now. I think the skepticism all along has been fueled in response to hype cycles, so we must be careful not to make the same mistakes. There is far too much empirical evidence available to counter precise arguments against the claim that there is an underlying understanding within these models, so it seems we must resort to the imprecise to continue the debate.

Scaling, along one axis, suggests a high polynomial degree of additional compute (not exponential) is required for increasing improvements, this is true. But the progress over the last few years has occurred due to the discovery of new axes to scale on, which further reduces the error rate and improves performance. There are still many potential axes left untapped. What is significant about scaling to me is not how much additional compute is required, but the fact that the predicted bottom at the moment is very, very low, far lower than anything else we have ever seen, and that doesn't require any more data than we currently have. That should be cause for concern until we find a better lower bound.

> We all know that these new LLMs aren't dramatic improvements off their previous versions

No, I don't agree. This may be evident to many, but to some, the differences are stark. Our perceived metrics of performance are nonlinear and person-dependent, and these major differences can be imperceptible to most. The vast majority of attempts at providing more regular metrics or benchmarks that are not already saturated have shown that LLM development is not slowing down by any stretch. I'm not saying that LLMs will "go to the moon". But I don't have anything concrete to say they cannot either.

4 months ago

The thing is, achieving say, 99.99999% reliable AI would be spectacularly useful even if it's a dead end from the AGI perspective.

People routinely conflate the "useful LLMs" and "AGI", likely because AGI has been so hyped up, but you don't need AGI to have useful AI.

It's like saying the Internet is dead end because it didn't lead to telepathy. It didn't, but it sure as hell is useful.

It's beneficial to have both discussions: whether and how to achieve AGI and how to grapple with it, and how to improve a reliability, performance and cost of LLMs for more prosaic use cases.

It's just that they are separate discussions.

Animats

4 months ago

> The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to understand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.

That's the basic success of LLMs. They don't have much of a model of the world, and they still work. "Attention is all you need". Good Old Fashioned AI was all about developing models, yet that was a dead end.

There's been some progress on representation in an unexpected area. Try Perchance's AI character chat. It seems to be an ordinary chatbot. But at any point in the conversation, you can ask it to generate a picture, which it does using a Stable Diffusion type system. You can generate several pictures, and pick the one you like best. Then let the LLM continue the conversation continue from there.

It works from a character sheet, which it will create if asked. It's possible to start from an image and get to a character sheet and a story. The back and forth between the visual and textural domains seems to help.

For storytelling, such system may need to generate the collateral materials needed for a stage or screen production - storyboards, scripts with stage directions, character summaries, artwork of sets, blocking (where everybody is positioned on stage), character sheets (poses and costumes) etc. Those are the modeling tools real productions use to keep a work created by many people on track. Those are a form of world model for storytelling.

I've been amazed at how good the results I can get from this thing are. You have to coax it a bit. It tends to stay stuck in a scene unless you push the plot forward. But give it a hint of what happens next and it will run with it.

[1]https://perchance.org/ai-character-chat

user

4 months ago

[deleted]

wkat4242

4 months ago

Absolutely. AGI isn't a matter of adding more 9s. It's a matter of solving more "???"s. And those require not just work but also a healthy serving of luck.

As I understand it, to the breadth of LLMs was also something that was stumbled on kinda by accident, I understand they got developed as translators and were just 'smarter' than expected.

Also, to understand the world you don't need language. People don't think in language. Thought is understanding. Language is knowledge transfer and expression.

Meaning for hyperlexics is more akin to finding meaning in the edges of the graph, rather than the vertices. The form of language contributing a completely separate graph of knowledge, alongside its content, creating a rich, multimodal form of understanding.

Spatial thinkers have difficulty with procedural thinking, which is how most people are taught. Rather than the series of steps to solve the problem, they see the shape of the transform. LLMs as an assistive device can be very useful for spatial thinkers in providing the translation layer between the modes of thought.

hyperliner

4 months ago

[dead]

naasking

4 months ago

Are the particles that make up thoughts in our brain not also a representation of a thought? Isn't "thought" really some kind of Platonic ideal that only has approximate material representations? If so, why couldn't some language sentences be thoughts?

4 months ago

> The landscape itself doesn't capture anything, it just is.

Sure, but the landscape is something, namely an aggregate of particles. A thought in principle isn't its physical expression, but its information content, and it's represented in a human brain by some aggregate of particles. So no matter how you slice it, thoughts can only manifest within representations, and so calling language a representation of thought isn't some kind of dunk, because it also proves that human brains don't have thoughts.

It's not clear whether the information content of all possible human thoughts can be captured by language, but clearly at least some language expressions have the same information content as human thoughts.

rhetocj23

4 months ago

Its very interesting to see how many people struggle to understand this.

subjectivationx

4 months ago

We are paying the price now for not teaching language philosophy as a core educational requirement.

Most people have had no exposure to even the most basic ideas of language philosophy.

The idea all these people go to school for years and don't even have to take a 1 semester class on the main philosophical ideas of the 20th century is insane.

CamperBob2

4 months ago

Language philosophy is not relevant, and evidently never was. It predicted none of what we're seeing and facilitated even less.

One must imagine Sisyphus happy and Chomsky incoherent with rage.

CamperBob2

4 months ago

If it were that simple, LLMs wouldn't work at all.

qlm

4 months ago

I think it explains quite well why LLMs are useful in some ways but stupid in many other ways.

CamperBob2

4 months ago

LLMs clearly think. They don't have a sense of object permanence, at least not yet, but they absolutely, indisputably use pretrained information to learn and reason about the transient context they're working with at the moment.

Otherwise they couldn't solve math problems that aren't simple rephrasings of problems they were trained on, and they obviously can do that. If you give a multi-step undergraduate level math problem to the human operator of a Chinese room, he won't get very far, while an LLM can.

So that leads to the question: given that they were trained on nothing but language, and given that they can reason to some extent, where did that ability come from if it didn't emerge from latent structure in the training material itself? Language plus processing is sufficient to produce genuine intelligence, or at least something indistinguishable from it. I don't know about you, but I didn't see that coming.

bigstrat2003

4 months ago

They very clearly do not think. If they did, they wouldn't be able to be fooled by so many simple tests that even a very small (and thus, uneducated) human would pass.

CamperBob2

4 months ago

Are you really claiming that something doesn't think if it's possible to fool it with simple tricks?

Seriously?

4 months ago

[deleted]

theptip

4 months ago

What I mean here is that this is certainly not what Dwarkesh would claim. It’s a ludicrous strawman position.

Dwarkesh is AGI-pilled and would base his assumption of a world model on much more impressive feats than mere language understanding.

baobun

4 months ago

Watching the video it seems that Dwarkesh doesn't really have a clue what he's confidently talking about yet running fast with his personal half-baked ideas, to the points where it gets both confusing and cringe when Karpathy apparently manages to make sense of it and yes-anding the word salad AK. Karpathy is supposedly there to clear up misunderstanding yet lets all the nonsense Dwarkesh is putting before him slide.

"ludicrous" sure but I wouldn't be so certain about "strawman" or that Dwarkesh has a consistent view.

exe34

4 months ago

To me, it's a matter of a very big checklist - you can keep adding tasks to the list, but if it keeps marching onwards checking things off your list, some day you will get there. whether it's a linear or asymptotic march, only time will tell.

ekjhgkejhgk

4 months ago

I don't know if you will get there, that's far from clear at this stage.

harrall

4 months ago

I don’t have a deep understand of LLMs but don’t they fundamentally work on tokens and generate a multi-dimensional statistical relationship map between tokens?

So it doesn’t have to be LLM. You could theoretically have image tokens (though I don’t know in practice, but the important part is the statistical map).

And it’s not like my brain doesn’t work like that either. When I say a funny joke in response to people in a group, I can clearly observe my brain pull together related “tokens” (Mary just talked about X, X is related to Y, Y is relevant to Bob), filter them, sort them and then spit out a joke. And that happens in like less than a second.

tacitusarc

4 months ago

Yes! Absolutely. And this is likely what would be necessary for anything approaching actual AGI. And not just visual input, but all kinds of sensory input. The problem is that we have no ability, not even close, to process that even near the level of a human yet, much less some super genius being.

sysguest

4 months ago

yeah that "model of the world" would mean:

babies are already born with "the model of the world"

but a lot of experiments on babies/young kids tell otherwise

ekjhgkejhgk

4 months ago

> yeah that "model of the world" would mean: babies are already born with "the model of the world"

No, not necessarily. Babies don't interact with the world only by reading what people wrote wikipedia and stackoverflow, like these models are trained. Babies do things to the world and observe what happens.

I imagine it's similar to the difference between a person sitting on a bicycle and trying to ride it, vs a person watching videos of people riding bicycles.

I think it would actually be a great experiment. If you take a person that never rode a bicycle in their life and feed them videos of people riding bicycles, and literature about bikes, fiction and non-fiction, at some point I'm sure they'll be able to talk about it like they have huge experience in riding bikes, but won't be able to ride one.

aerhardt

4 months ago

We’ve been thinking about reaching the singularity from one end, by making computers like humans, but too little thought has been given to approaching the problem from the other end: by making babies build their world model by reading Stack Overflow.

pavlov

4 months ago

The “Brave New World meets OpenAI” model where bottle-born babies listen to Stack Overflow 24 hours a day until they one day graduate to Alphas who get to spend Worldcoin on AI-generated feelies.

zelphirkalt

4 months ago

That's it. Now you've done it! I will have stackoverflow Q&A, as well as moderator comments and closings of questions playing 24/7 to my first not yet born child! Q&A for the knowledge and the mod comments for good behavior, of course. This will lead to singularity in no time!

godelski

4 months ago

It's a lot more complicated than that.

You have instincts, right? Innate fears? This is definitely something passed down through genetics. The Hawk/Goose Effect isn't just limited to baby chickens. Certainly some mental encoding passes down through genetics as how much the brain controls, down to your breathing and heartbeat.

But instinct is basic. It's something humans are even able to override. It's a first order approximation. Inaccurate to do meaningfully complex things, but sufficient to keep you alive. Maybe we don't want to call the instinct a world model (it certainly is naïve) but can't be discounted either.

In human development, yeah, the lion's share of it happens post birth. Human babies don't even show typical signs of consciousness, even really till the age of 2. There's many different categories of "awareness" and these certainly grow over time. But the big thing that makes humans so intelligent is that we continue to grow and learn through our whole lifetimes. And we can pass that information along without genetics and have very advanced tools to do this.

It is a combination of nature and nurture. But do note that this happens differently in different animals. It's wonderfully complex. LLMs are quite incredible but so too are many other non-thinking machines. I don't think we should throw them out, but we never needed to make the jump to intelligence. Certainly not so quickly. I mean what did Carl Sagan say?

imtringued

4 months ago

One of the biggest mysteries of humans Vs LLMs is that LLMs need an absurd amount of data during pre training, then a little bit of data during fine tuning to make them behave more human. Meanwhile humans don't need any data at all, but have the blind spot that they can only know and learn about what they have observed. This raises two questions. What is the loss function of the supervised learning algorithm equivalent? Supposedly neurons do predictive coding. They predict what their neighbours are doing. That includes input only neurons like touch, pain, vision, sound, taste, etc. The observations never contain actions. E.g. you can look at another human, but that will never teach you how to walk because your legs are different from other people's legs.

How do humans avoid starving to death? How do they avoid leaving no children? How do they avoid eating food that will kill them?

These things require a complicated chain of actions. You need to find food, a partner and you need to spit out poison.

This means you need a reinforcement learning analogue, but what is going to be the reward function equivalent? The reward function can't be created by the brain, because it would be circular. It would be like giving yourself a high, without even needing drugs. Hence, the reward signal must remain inside the body but outside the brain, where the brain can't hack it.

The first and most important reward is to perform reproduction. If food and partners are abundant, the ones that don't reproduce simply die out. This means that reward functions that don't reward reproduction disappear.

Reproduction is costly in terms of energy. Do it too many times and you need to recover and eat. Hunger evolved as a result of the brain needing to know about the energy state of the body. It overrides reproductive instincts.

Now let's say you have a poisonous plant that gives you diarrhea, but you are hungry. What stops you from eating it? Pain evolves as a response to a damaged body. Harmful activities signal themselves in the form of pain to the brain. Pain overrides hunger. However, what if the plant is so deadly that it will kill you? The pain sensors wouldn't be fast enough. You need to sense the poison before it enters your body. So the tongue evolves taste and cyanide starts tasting bitter.

Notice something? The feelings only exist internally inside the human body, but they are all coupled with continued survival in one way or another. There is no such thing for robots or LLMs. They won't accidentally evolve a complex reward function like that.

godelski

4 months ago

  > Meanwhile humans don't need any data at all

I don't agree with this and I don't think any biologist or neuroscientist would either.

1) Certainly the data I discussed exists. No creature comes out a blank slate. I'll be bold enough to say that this is true even for viruses, even if we don't consider them alive. Automata doesn't mean void of data and I'm not sure why you'd ascribe this to life or humans.

2) humans are processing data from birth (technically before too but that's not necessary for this conversation and I think we all know that's a great way to have an argument and not address our current conversation). This is clearly some active/online/continual/ reinforcement/wherever-word-you-want-to-use learning.

It's weird to suggest an either or situation. All evidence points to "both". Looking at different animals even see both but also with different distributions.

I think it's easy to over simplify the problem and the average conversation tends to do this. It's clearly a complex with many variables at play. We can't approximate with any reasonable accuracy by ignoring or holding them constant. They're coupled.

  > The reward function can't be created by the brain, because it would be circular.

Why not? I'm absolutely certain I can create my own objectives and own metrics. I'm certain my definition of success is different from yours.

  > It would be like giving yourself a high, without even needing drugs

Which is entirely possible. Maybe it takes extreme training to do extreme versions but it's also not like chemicals like dopamine are constant. You definitely get a rush by completing goals. People become addicted to things like videogames, high risk activities like sky diving, or even arguing on the internet.

Just because there are externally driven or influenced goals doesn't mean internal ones can't exist. Our emotions can be driven both externally and internally.

  > Notice something?

You're using too simple of a model. If you use this model then the solution is as easy as giving a robot self preservation (even if we need to wait a few million years). But how would self preservation evolve beyond its initial construction without the ability to metaprocess and refine that goal? So I think this should highlight a major limitation in your belief. As I see it, the only other way is a changing environment that somehow allows continued survival by the constructions and precisely evolves such that the original instructions continue to work. Even with vague instructions that's an unstable equilibrium. I think you'll find there's a million edge cases even if it seems obvious at first. Or read some Asimov ;)

ben_w

4 months ago

> babies are already born with "the model of the world"

> but a lot of experiments on babies/young kids tell otherwise

I believe they are born with such a model? It's just that model is one where mummy still has fur for the baby to cling on to? And where aged something like 5 to 8 it's somehow useful for us to build small enclosures to hide in, leading to a display of pillow forts in the modern world?

sysguest

4 months ago

4 months ago

This makes no sense.

20 miles is still a challenge, and how many people run marathons because someone else is impressed if you run 26 miles, but couldn't care less if you run 20?

nextworddev

4 months ago

because that'd be quitting the race with 6.2 miles left to go

sarchertech

4 months ago

You could run a half marathon.

nextworddev

4 months ago

yeah but anyone can do that

TeMPOraL

4 months ago

FWIW, Karpathy literally says, multiple times, that he thinks we never left the exponential - that all human progress over last 4+ centuries averages out to that smooth ~2% growth rate exponential curve, that electricity and computing and AI are just ways we keep it going, and we'll continue on that curve for the time being.

It's the major point of contention between him and the host (who thinks growth rate will increase).

DanHulton

4 months ago

The thing about this, though - cars have been built before. We understand what's necessary to get those 9s. I'm sure there were some new problems that had to be solved along the way, but fundamentally, "build good car" is known to be achievable, so the process of "adding 9s" there makes sense.

4 months ago

godelski

4 months ago

It's a good way to think about lots of things. It's Pareto efficiency. The 80/20 rule

20% of your effort gets you 80% of the way. But most of your time is spent getting that last 20%. People often don't realize that this is fractal like in nature, as it draws from the power distribution. So of that 20% you still have left, the same holds true. 20% of your time (20% * 80% = 16% -> 36%) to get 80% (80% * 20% => 96%) again and again. The 80/20 numbers aren't actually realistic (or constant) but it's a decent guide.

It's also something tech has been struggling with lately. Move fast and break things is a great way to get most of the way there. But you also left a wake of destruction and tabled a million little things along the way. Someone needs to go back and clean things up. Someone needs to revisit those tabled things. While each thing might be little, we solve big problems by breaking them down into little ones. So each big problem is a sum of many little ones, meaning they shouldn't be quickly dismissed. And like the 9's analogy, 99.9% of the time is still 9hrs of downtime a year. It is still 1e6 cases out of 1e9. A million cases is not a small problem. Scale is great and has made our field amazing, but it is a double edged sword.

I think it's also something people struggle with. It's very easy to become above average, or even well above average at something. Just trying will often get you above average. It can make you feel like you know way more but the trap is that while in some domains above average is not far from mastery in other domains above average is closer to no skill than it is to mastery. Like how having $100m puts your wealth closer to a homeless person than a billionaire. At $100m you feel way closer to the billionaire because you're much further up than the person with nothing but the curve is exponential.

010101010101

4 months ago

https://youtu.be/bpiu8UtQ-6E?si=ogmfFPbmLICoMvr3

"I'm closer to LeBron than you are to me."

omidsa1

4 months ago

I also quite like the way he puts it. However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI. That's why we can expect fast acceleration to take off within two years.

breuleux

4 months ago

I don't think we can be confident that this is how it works. It may very well be that our level of intelligence has a hard limit to how many nines we can add, and AGI just pushes the limit further, but doesn't make it faster per se.

It may also be that we're looking at this the wrong way altogether. If you compare the natural world with what humans have achieved, for instance, both things are qualitatively different, they have basically nothing to do with each other. Humanity isn't "adding nines" to what Nature was doing, we're just doing our own thing. Likewise, whatever "nines" AGI may be singularly good at adding may be in directions that are orthogonal to everything we've been doing.

Progress doesn't really go forward. It goes sideways.

j45

4 months ago

Intuition of someone who has put in a decade or two of wondering openly can't me discounted as easily as someone who might be a beginner to it.

AGI to encompass all of humanity's knowledge in one source and beat every human on every front might be a decade away.

Individual agents with increased agency adequately covering more and more abilities consistently? Seems like a steady path that can be seen into the horizon to put one foot in front of the other.

For me, the grain of salt I'd take Karpathy with is much, much, smaller than average, only because he tries to share how he thinks and examines his own understanding and changes it.

His ability to explain complex things simply is something that for me helps me learn and understand things quicker and see if I arrive at something similar or different, and not immediately assume anything is wrong, or right without my understanding being present.

adventured

4 months ago

Adding nines to nature is exactly what humans are doing. We are nature. We are part of the natural order.

Anything that exists is part of nature, there can be no exceptions.

If I go burn a forest down on purpose, that is in fact nature doing it. No different than if a dolphin kills another animal for fun or a chimp kills another chimp over a bit of territory. Insects are also every bit as 'vicious' in their conquests.

bamboozled

4 months ago

It's also assuming that all advances in AI just lead to cold hard gains, people have suggested this before but would a sentient AI get caught up in philosophical, silly or religious ideas? Silicone investor types seem to hope it's all just curing diseases they can profit from, but it might also be, "let's compose some music instead"?

Unit327

4 months ago

AI doesn't have hopes and desires or something it would rather be doing. It has a utility function that it will optimise for regardless of all else. This doesn't change when it gets smarter, or even when it gets super-intelligence.

AnimalMuppet

4 months ago

Isn't that one of the measures of when it becomes an AGI? So that doesn't help you with however many nines we are away from getting an AGI.

Even if you don't like that definition, you still have the question of how many nines we are away from having an AI that can contribute to its own development.

I don't think you know the answer to that. And therefore I think your "fast acceleration within two years" is unsupported, just wishful thinking. If you've got actual evidence, I would like to hear it.

ben_w

4 months ago

AI has been helping with the development of AI ever since at least the first optimising compiler or formal logic circuit verification program.

Machine learning has been helping with the development of machine learning ever since hyper-parameter optimisers became a thing.

Transformers have been helping with the development of transformer models… I don't know exactly, but it was before ChatGPT came out.

None of the initials in AGI are booleans.

But I do agree that:

> "fast acceleration within two years" is unsupported, just wishful thinking

Nobody has any strong evidence of how close "it" is, or even a really good shared model of what "it" even is.

scragz

4 months ago

AGI is when it is general. a narrow AI trained only on coding and training AIs would contribute to the acceleration without being AGI itself.

4 months ago

[deleted]

techblueberry

4 months ago

I think the 9's include this assumption.

sdenton4

4 months ago

Ha, I often speak of doing the first 90% of the work, and then moving on to the following 90% of the work...

JimDabell

4 months ago

> The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.

Grok 3 reasoning - Grok 4: 80% increase in training compute for a 15% uplift in inteligence.

Inteligence: Source Atrifical Analysis

Training Compute: Source https://youtu.be/MtYsUdfZPMA?t=162

misnome

4 months ago

I mean the cost line does look somewhat exponential…

HarHarVeryFunny

4 months ago

I think the point Andrej was making here is that in some areas, such as self driving, the cost of failure is extremely high (maybe death), so 99.9% reliable doesn't cut it, and therefore doesn't mean you are almost done, or have done 99.9% of the work. It's "The last 10% is 90% of the work" recursively applied.

He was also pointing out that the same high cost of failure consideration applies to many software systems (depending on what they are doing/controlling). We may already be at the level where AI coding agents are adequate for some less critical applications, but yet far away from them being a general developer replacement. I see software development as something that uses closer to 100% of your brain than 10% - we may well not see AI coding agents approach human reliability levels until we have human level AGI.

The AI snake oil salesmen/CEOs like to throw out competitive coding or math olympiad benchmarks as if they are somehow indicative of the readiness of AI for other tasks, but reliability matters. Nobody dies or loses millions of dollars if you get a math problem wrong.

joe_the_user

4 months ago

and to explain the Diablo 2 Reference, the amount of time/effort it takes to go from level 98 to level 99 (the max level), is the same amount of time it takes to go from level 1 to level 98. I've heard "2 weeks" as a rough estimate of "unhealthy playtime", at least solo.

wilfredk

4 months ago

Perfect analogy.

tekbruh9000

4 months ago

Infinitely big little numbers

Academia has rediscovered itself

Signal attenuation, a byproduct of entropy, due to generational churn means there's little guarantee.

Occam's Razor; Karpathy knows the future or he is self selecting biology trying to avoid manual labor?

His statements have more in common with Nostradamus. It's the toxic positivity form of "the end is nigh". It's "Heaven exists you just have to do this work to get there."

Physics always wins and statistics is not physics. Gamblers fallacy; improvement of statistical odds does not improve probability. Probability remains the same this is all promises of some people who have no idea or interest in doing anything else with their lives; so stay the course.

startupsfail

4 months ago

>> Heaven exists you just have to do this work to get there.

Or perhaps Karpathy has a higher level understanding and can see a bigger picture?

You've said something about heaven. Are you able to understand this statement, for example: "Heaven is a memeplex, it exists." ?

tekbruh9000

4 months ago

Also note Karpathy notes the problems with agents are tractable but hard.

He's vague on the paths being explored to resolve them. His "higher level" view is probably awareness the solutions to software problems are hardware based fixes, but he cannot say that to software developers. Which has been the back and forth of tech since I was a kid in the 80s writing Basic; new state management unlocked by old software logic being embedded into new hardware.

Two main problems to solve for: too many people bought in to a status quo. And much simpler, the actual engineering of new hardware. One is only resolved by generational churn without resorting to all out police state action. So tech jobs as we know them will fade away slowly to not upset too many, and younger generations will not care as they will never experience anything else.

tekbruh9000

4 months ago

"Higher" than an EE with an MSc in elastic structures, ~30 years industry experience, now working with PhDs across the spectrum on energy models to embed in chips? Energy models in part, inferred from categorization of LLM contents and compression of those contents into geometric functions like I described?

"Higher level" implies acceptance of geometric structure. You place tokens like a Chomsky diagrams at each step up and down, where you should see parameters to transform geometry of the structure.

My team works "above" the contrived state management of software workers to more efficiently sync memory matrix to display matrix. LLMs are a form of compression [1]. My team is working on compressing them further into sets of points that make up each glyph and functions to recreate them.

Electromagnetic geometry transforms hardcoded[2] into hardware so reduce energy use of all the outdated string mangling of software dev as most know it.

What's higher level, relative to our machines, than design and implementation of the machine?

DnD dungeon master versus WOTC game designer.

Notice outside how there are no words and philosophy? Just color gradient and geometry?

Notice inside the human body no philosophy or words?

Language is not intelligence it's an emergent phenomena of geometry created by fundamental forces of physics organizing matter at various speeds relative to light.

You've read too much into an ultimately arbitrary statement meant to invoked a subtext, a subtle emotion context. You think of language as Legos, when it is music to feel.

[1] https://arxiv.org/abs/2309.10668 [2] https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...

breve

4 months ago

The lie is older than that:

https://web.archive.org/web/20161020091022/https://tesla.com...

https://motherfrunker.ca/fsd/

The fact that lie is old only makes it worse that Musk, Karpathy, and Tesla generally have still not taken responsibility for the lie. They are still not willing to refund the money they took for something they did not deliver.

getnormality

4 months ago

One of the most brilliant AI minds on the planet, and he's focused on education. How to make all the innovation of the last decade accessible so the next generation can build what we don't know how to do today.

No magical thinking here. No empty blather about how AI is going to make us obsolete with the details all handwaved away. Karpathy sees that, for now, better humans are the only way forward.

Also, speculation as to why AI coders are "mortally terrified of exceptions": it's the same thing OpenAI recently wrote about, trying to get an answer at all costs to boost some accuracy metric. An exception is a signal of uncertainty indicating that you need to learn more about your problem. But that doesn't get you points. Only a "correct answer" gets you points.

Frontier AI research seems to have yet to operationalize a concept of progress without a final correct answer or victory condition. That's why AI is still so bad at Pokemon. To complete open-ended long-running tasks like Pokemon, you need to be motivated to get interesting things to happen, have some minimal sense of what kind of thing is interesting, and have the ability to adjust your sense of what is interesting as you learn more.

4 months ago

Why does it have to be a dichotomy? Raise money from AGI-pilled investors with an AGI pitch. Raise more money from AGI-skeptics with a B2C growth story.

Moar money == moar GPUs.

throwaway314155

4 months ago

What exactly is a "pornbot"?

TZubiri

4 months ago

>An exception is a signal of uncertainty indicating that you need to learn more about your problem

No, that would be a warning. Ab exceprion is a signal something failed and it was impossible to continue

nextaccountic

4 months ago

Many exceptions are recoverable. This sometimes depends on the context, and on how well polished the software is

TZubiri

4 months ago

Yes. Note how I didn't say impossible to recover, just impossible to continue.

The execution couldn't continue in one path due to an error it needed to be caught in another path.

The difference with standard conditional mechanisms like if loops is mostly semantical. Exceptions are unforeseen errors, (technically they are sets of errors, which can have size 1, but the syntax is designed for catching groups of errors, if you want to react to a single error case you could also just use a condition with a return value and it ceases being an exception. )

thadk

4 months ago

Imagine a PhD mortally terrified of exceptions!

Now I see why Karpathy was talking of RL up-weights as if they were a destructive straw-drawn line of a drug for an LLM's training.

simonw

4 months ago

It looks like Andrej's definition of "agent" here is an entity that can replace a human employee entirely - from the first few minutes of the conversation:

When you’re talking about an agent, or what the labs have in mind and maybe what I have in mind as well, you should think of it almost like an employee or an intern that you would hire to work with you. For example, you work with some employees here. When would you prefer to have an agent like Claude or Codex do that work?

Currently, of course they can’t. What would it take for them to be able to do that? Why don’t you do it today? The reason you don’t do it today is because they just don’t work. They don’t have enough intelligence, they’re not multimodal enough, they can’t do computer use and all this stuff.

They don’t do a lot of the things you’ve alluded to earlier. They don’t have continual learning. You can’t just tell them something and they’ll remember it. They’re cognitively lacking and it’s just not working. It will take about a decade to work through all of those issues.

sarchertech

4 months ago

He’s not just talking about agents good enough to replace workers. He’s talking about whether agents are currently useful at all.

>Overall, the models are not there. I feel like the industry is making too big of a jump and is trying to pretend like this is amazing, and it’s not. It’s slop. They’re not coming to terms with it, and maybe they’re trying to fundraise or something like that. I’m not sure what’s going on, but we’re at this intermediate stage. The models are amazing. They still need a lot of work. For now, autocomplete is my sweet spot. But sometimes, for some types of code, I will go to an LLM agent.

>They kept trying to mess up the style. They’re way too over-defensive. They make all these try-catch statements. They keep trying to make a production code base, and I have a bunch of assumptions in my code, and it’s okay. I don’t need all this extra stuff in there. So I feel like they’re bloating the code base, bloating the complexity, they keep misunderstanding, they’re using deprecated APIs a bunch of times. It’s a total mess. It’s just not net useful. I can go in, I can clean it up, but it’s not net useful.

sothatsit

4 months ago

I don't think he is saying agents are not useful at all, just that they are not anywhere near the capability of human software developers. Karpathy later says he used agents to write the Rust translation of algorithms he wrote in Python. He also explicitly says that agents can be useful for writing boilerplate or for code that can be very commonly found online. So I don't think he is saying they are not useful at all. Instead, he is just holding agents to a higher standard of working on a novel new codebase, and saying they don't pass that bar.

Tbh I think people underestimate how much software development work is just writing boilerplate or common patterns though. A very large percentage of the web development work I do is just writing CRUD boilerplate, and agents are great at it. I also find them invaluable for searching through large codebases, and for basic code review, but I see these use-cases discussed less even though they're a big part of what I find useful from agents.

sarchertech

4 months ago

I’m not saying he’s saying agents aren’t useful at all. It’s literally in the quotes I provided that he says they are useful for some subset of tasks.

I’m saying that he is answering the question “are agents useful at all”. not “can agents replace humans”.

His answer is mostly not. He generally prefers autocomplete. But they are useful for some limited tasks.

sothatsit

4 months ago

Saying "He’s talking about whether agents are currently useful at all" is negatively loaded. It is very easy to take that and assume the answer is "no" based on the "at all".

If you wanted to be more neutral, you could have said something like "He's also questioning how useful agents really are today". That wouldn't have implied that they're not useful at all, but instead that they're less useful than people are claiming.

sarchertech

3 months ago

That question doesn’t do enough to highlight how wrong the OP’s interpretation was. He’s going far beyond just stating that agents are less useful than people are claiming. Less useful than people are claiming fits the OP’s interpretation.

weatherlite

4 months ago

> I’m not saying he’s saying agents aren’t useful at all

I'm not saying you're saying he's saying agents aren't useful at all

sarchertech

4 months ago

You’re not the person I’m replying to.

The person I’m replying to said

>I don't think he is saying agents are not useful at all, just that they are not anywhere near the capability of human software developers.

Implying I was supporting the first clause.

CaptainOfCoit

4 months ago

My biggest takeaway is that agents/LLMs in general are super helpful when paired together with a human who knows the inside and out of software development, who uses it side-by-side with their normal work.

They start being less useful when you start treating them as "I can send them ill-specified stuff, ignore them for 10 minutes and merge their results", as things spiral out of control. Basically "vibe-coding" as a useful concept doesn't work for projects you need to iterate on, only for things you feel OK with throwing away eventually.

Augmenting the human intellect with LLMs? Usually a increase in productivity. Replacing human coworkers with LLMs? Good luck, have fun.

rhetocj23

4 months ago

It does seem pretty clear that an individual who possess super high quality human capital, paired with something like an LLM (provided the LLM is good enough relative to the individual) can be a powerful combination.

The issues are:

1) There isnt enough supply of those individuals 2) Such an LLM of that kind doesnt exist (at least not in consistent nature) 3) The amount invested into what is going on will not yield returns commensurate to the required rate of return

Interestingly enough, I believe Andrej Karpathy is also focusing on education (levelling up the supply of human capital) - I came to the above conclusion about a month ago. And it 'feels' right to me.

consumer451

4 months ago

I am just some shmoe, but I agree with that assessment. My biggest take-away is that we got super lucky.

At least now we have a slight chance to prepare for the potential economic and social impacts.

Bengalilol

4 months ago

I am thinking the same.

And we should start considering on what makes us humans and how we can valorize our common ground.

tablatom

4 months ago

This. I believe it’s the most important question in the world right now. I’ve been thinking long and hard about this from an entirely practical perspective and have surprised myself that the answer seems to be our capacity to love. The idea is easily dismissed as romantic but when I say I’m being practical I really mean it. I’m writing about it here https://giftcommunity.substack.com/

hackerdood

4 months ago

  You can’t just tell them something and they’ll remember it.

It might take a decade to work through this issue if you just want to put a single LLM in a single computer and have it be a fully-fledged human, sure. And since he works at a company making some of the most advanced LLMs in the world, that perspective makes sense! But of course that's not how it's actually going to be (/already is).

LLMs are a necessary part of AGI(/"agents") due to their ability to avoid the Frame Problem[1], but they're far from the only needed thing. We're pretty dang good at "remembering things" with computers already, and connecting that with LLM ensembles isn't going to take anywhere close to 10 years. Arguably, we're already doing it pretty darn well in unified systems[2]...

If anyone's unfamiliar and finds my comment interesting, I highly recommend Minsky's work on the Society of Mind, which handled this topic definitively over 20 years ago. Namely;

A short summary of "Connectionism and Society of Mind" for laypeople at DARPA: https://apps.dtic.mil/sti/tr/pdf/ADA200313.pdf

A description of the book itself, available via Amazon in 48h or via PDF: https://en.wikipedia.org/wiki/Society_of_Mind

By far my favorite paper on the topic of connectionist+symbolist syncreticism, though a tad long: https://www.mit.edu/~dxh/marvin/web.media.mit.edu/~minsky/pa...

[1] https://plato.stanford.edu/entries/frame-problem/

[2] https://github.com/modelcontextprotocol/servers/tree/main/sr...

erichocean

4 months ago

> You can’t just tell them something and they’ll remember it.

I find it fascinating that this is the problem people consistently think we're a decade away on.

If you can't do this, you don't have employee-like AI agents, you have AI-enhanced scripting. It's basically the first thing you have to be able to do to credibly replace an actual human employee.

4 months ago

Agreed. I'd also add he's intellectually honest enough to not overhype what's happening just to hype whatever he's working on or appear to be a thought leader. Just very clear, pragmatic, and intellectually honest thought about the reality of things.

nunez

4 months ago

It's almost like having more money than you'll ever know what to do with lets you say and do what you _actually_ want to do.

fourthark

4 months ago

Most people don’t take this opportunity, though.

4 months ago

To be fair to the OP of the thread, he's just using Patel's title word-for-word. It's Patel who is being inaccurate.

dang

4 months ago

Oh that's clear, and the submitter didn't do anything wrong. It's just that on HN the idea is to find a different title when the article's own title is misleading or linkbait (https://news.ycombinator.com/newsguidelines.html).

The best way to do that of course is to find a more representative phrase from the article itself. That's almost always possible but I couldn't quite swing it in this case.

realty_geek

4 months ago

dang!! I have so much respect for this ironic situation where we are discussing the superpowers of AI while a very human, very decent being ponders deeply on how to compose a few words to make a suitable title. Please can we have a future world where such events can always happen every so often.

tim333

4 months ago

He says re agents:

and a bunch of similar things implying LLMs have no hope of reaching AGI

user

4 months ago

[deleted]

nextworddev

4 months ago

There's a lot of salt here

dang

4 months ago

> Hey, podcast bro needs to get clicks

Please don't cross into personal attack. It's not what this site is for, and destroys what it is for.

Edit: please don't edit comments to change their meaning once someone has replied. It's unfair to repliers whose comments no longer make sense, and it's unfair to readers who can no longer understand the thread. It's fine, of course, to add to an existing comment in such a case, e.g. by saying "Edit:" or some such and then adding what else you want to say.

user

4 months ago

[deleted]

keeda

4 months ago

Huh, I'm surprised that he goes from "No AI" to "AI autocomplete" to "Vibecoding / Agents" (which I assume means no human review per his original coinage of the term.) This seems to preclude the chat-oriented / pair-programming model which I find most effective. Or even the plan-spec-codegen-review approach, which IME works extremely well for straightforward CRUD apps.

Also they discuss the nanochat repo in the interview, which has become more famous for his tweet about him NOT vibe-coding it: https://www.dwarkesh.com/i/176425744/llm-cognitive-deficits

Things are more nuanced than what people have assumed, which seems to be "LLMs cannot handle novel code". The best I can summarize it as is that he was doing rather non-standard things that confused the LLMs which have been trained on vast amounts on very standard code and hence kept defaulting to those assumptions. Maybe a rough analogy is that he was trying to "code golf" this repo whereas LLMs kept trying to write "enterprise" code because that is overwhelmingly what they have been trained on.

I think this is where the chat-oriented / pair-programming or spec-driven model shines. Over multiple conversations (or from the spec), they can understand the context of what you're trying to do and generate what you really want. It seems Karpathy has not tried this approach (given his comments about "autocomplete being his sweet spot".)

For instance, I'm working on some straightforward computer vision stuff, but it's complicated by the fact that I'm dealing with small, low-resolution images, which does not seem well-represented in the literature. Without that context, the suggestions any AI gives me are sub-optimal.

However, after mentioning it a few times, ChatGPT now "remembers" this in its context, and any suggestion it gives me during chat is automatically tailored for my use-case, which produces much better results.

Put another way (not an AI expert so I may be using the terms wrong), LLMs will default to mining the data distribution they've been trained on, but with sufficient context, they should be able to adapt their output to what you really want.

arthurofbabylon

4 months ago

Agency. If one studied the humanities they’d know how incredible a proposal “agentic” AI is. In the natural world, agency is a consequence of death: by dying, the feedback loop closes in a powerful way. The notion of casual agency (I’m thinking of Jensen Huang’s generative > agentic > robotic insistence) is bonkers. Some things are not easily speedrunned.

(I did listen to a sizable portion of this podcast while making risotto (stir stir stir), and the thought occurred to me: “am I becoming more stupid by listening to these pundits?” More generally, I feel like our internet content (and meta content (and meta meta content)) is getting absolutely too voluminous without the appropriate quality controls. Maybe we need more internet death.)

whatevertrevor

4 months ago

> In the natural world, agency is a consequence of death: by dying, the feedback loop closes in a powerful way.

I don't follow. If we, in some distant future, find a way to make humans functionally immortal, does that magically remove our agency? Or do we not have agency to begin with?

If your position on the "free will" question is that it doesn't exist, then sure I get it. But that seems incompatible with the death prerequisite you have put forward for it, because if it doesn't exist then surely it's a moot point to talk prerequisites anyway.

arthurofbabylon

4 months ago

When I think of the term "agency" I think of a feedback loop whereby an actor is aware of their effect and adjusts behavior to achieve desired effects. To be a useful agent, one must operate in a closed feedback loop; an open loop does not yield results.

Consider the distinction between probabilistic and deterministic reasoning. When you are dealing with a probabilistic method (eg, LLMs, most of the human experience) closing the feedback loop is absolutely critical. You don't really get anything if you don't close the feedback loop, particularly as you apply a probabilistic process to a new domain.

For example, imagine that you learn how to recognize something hot by hanging around a fire and getting burned, and you later encounter a kettle on a modern stove-top and have to learn a similar recognition. This time there is no open flame, so you have to adapt your model. This isn't a completely new lesson, the prior experience with the open flame is invoked by the new experience and this time you may react even faster to that sensation of discomfort. All of this is probabilistic; you aren't certain that either a fire or a kettle will burn you, but you use hints and context to take a guess as to what will happen; the element that ties together all of this is the fact of getting burned. Getting burned is the feedback loop closing. Next time you have a better model.

Skillful developers who use LLMs know this: they use tests, or they have a spec sheet they're trying to fulfill. In short, they inject a brief deterministic loop to act as a conclusive agent. For the software developer's case it might be all tests passing, for some abstract project it might be the spec sheet being completely resolved. If the developer doesn't check in and close the loop, then they'll be running the LLM forever. An LLM believes it can keep making the code better and better, because it lacks the agency to understand "good enough." (If the LLM could die, you'd bet it would learn what "good enough" means.)

Where does dying come in? Nature evolved numerous mechanisms to proliferate patterns, and while everyone pays attention to the productive ones (eg, birth) few pay attention to the destructive (eg, death). But the destructive ones are just as important as the productive ones, for they determine the direction of evolution. In terms of velocity you can think of productive mechanisms as speed and destructive mechanisms as direction. (Or in terms of force you can think of productive mechanisms as supplying the energy and destructive mechanisms supplying the direction.) Many instances are birthed, and those that survive go on and participate in the next round. Dying is the closed feedback loop, shutting off possibilities and defining the bounds of the project.

whatevertrevor

4 months ago

I see your perspective about the inevitability of death causing a forcing-function directedness for agents, but that's a much much weaker claim than (emphasis mine):

> In the natural world, agency is a consequence of death: by dying, the feedback loop closes in a powerful way.

My original question was why could agency not exist without death, not why it was hampered without it. For clarity, I'm coming at from an analytic philosophy angle, not its more rhetorical counterpart that I struggle to wrap my head around.

I don't really view death or evolution as a necessity for agency. Nebulous AGI predictions aside: if a self-aware, conscious and intelligent being, capable of affecting consequential changes to its environment, becomes functionally immortal, it doesn't somehow lose its agency. I'd actually go further and say losing the forcing function of inevitable death is the biggest freedom a species can aim for. Without it, our agency is limited to solving problems of survival, in one form or another.

The existence of death is ultimately arbitrary and random, as random as our existence in the first place. The "direction" we get for evolution as a result of it, is another random function on top, also taking: the random circumstances the soup of organic molecules live in, as another parameter. Only once this random inevitability is conquered can we truly shape our lives and environments in ways that are a true reflection of who we are. Only then are we genuinely free. And "agency" without freedom is impotent at best.

(Addendum: I know positing "Immortality is good actually" can cause negative associations with "billionaires who want to cryopreserve themselves". This association has melded with the general romanticization of death in various philosophical and religious beliefs that has existed since millennia, further empowering the distaste against trying to reverse aging and eventually remove death as moral goals. While I personally have no plans (or means) to cryopreserve myself when I get old, I do believe it's a goal worth fighting for. One of the more important ones, alongside ensuring we have a planet to live on in the interim)

arthurofbabylon

4 months ago

I love the discussion — thank you.

Your comment makes me more bullish on death. Death isn’t arbitrary as you claim: it is a direct expression of an entity in its environment, it epitomizes contextualization. (I argue that honoring context is the opposite of arbitrariness.)

Further, death encapsulates multiple layers of abstraction. When an entity dies, it dies on every level (eg both instincts and socially learned heuristics). The death reaches deep down inside the hierarchy of its own form to eliminate possibilities. That is some seriously strong directionality; it’s not like “taking your second left” or some other mono-dimensional vector. Layers and layers of genes and learning are discarded. It is truly an incredibly powerful feedback-loop closure.

dist-epoch

4 months ago

Models die too - the less agentic ones are out-competed by the more agentic ones.

Every AI lab brags how "more agentic" their latest model is compared to the previous one and the competition, and everybody switches to the new model.

catlifeonmars

4 months ago

Yes but the point is that models must be imminently aware of their impending death to force the calculation of tradeoffs.

ngruhn

4 months ago

I don't agree but I did laugh

4 months ago

I agree, I think I learned the most on this topic from his videos. And before that (a while ago), it was Andrew Ng coursera's class. The latter had hands-on project, which is much better than just listening in term of retention.. I don't know if Andrej Karpathy has more structured classes somewhere.

noman-land

4 months ago

This is a good one.

https://karpathy.ai/zero-to-hero.html

discreteevent

4 months ago

On vibe coding vs using auto complete:

> The models have so many cognitive deficits. One example, they kept misunderstanding the code because they have too much memory from all the typical ways of doing things on the Internet that I just wasn’t adopting.

> I also feel like it’s annoying to have to type out what I want in English because it’s too much typing. If I just navigate to the part of the code that I want, and I go where I know the code has to appear and I start typing out the first few letters, autocomplete gets it and just gives you the code.

> They keep trying to make a production code base, and I have a bunch of assumptions in my code, and it’s okay. I don’t need all this extra stuff in there. So I feel like they’re bloating the code base, bloating the complexity, they keep misunderstanding, they’re using deprecated APIs a bunch of times. It’s a total mess. It’s just not net useful. I can go in, I can clean it up, but it’s not net useful.

aaroninsf

4 months ago

I'm pretty content to say this may be true, but may well prove quite wrong.

Why? Because humans—including the smartest of us—are continuously prone to cognitive errors, and reasoning about the non-linear behavior of complex systems is a domain we are predictably and durably terrible at, even when we try to compensate.

Personally I consider the case of self-driving cars illustrative and a go-to reminder for me of my own very human failure in this case. I was quite sure that we could not have autonomous vehicles in dynamic messy urban areas without true AGI; and that FSD would in the fashion of the failed Tesla offering, emerge first in the much more constrained space of the highway system. Which would also benefit from federal regulation and coordination.

No Waymos have eaten SF, and their driving is increasingly nuanced; and last night a friend and very early adopter relayed a series of anecdotes about some of the strikingly nuanced interactions he'd been party to recently, including being in a car that was attacked late at night, and, how one did exactly the right thing when approached head-on in a narrow neighborhood street that required backing out. Etc.

That's just one example, and IMO we are only beginning to experience the benefits of "network effects" so popular in tails of singularity take-off.

Ten years is a very, very, very long time under current conditions. I have done neural networks since the mid-90s (academically: published, presented, etc.) and I have proven terrible in anticipating how quickly "things" will improve. I have now multiple times witnessed my predictions that X or Y would take "5-8" or "8-10" years or "too far out to tell," instead arrive within 3 years.

pier25

4 months ago

If OpenAI had anything even resembling AGI they'd be milking the shit of that, even if only for marketing.

shdh

4 months ago

Cost decreases with time

Humans can work on a problem 8 hours a day? You can run inference 24/7

charcircuit

4 months ago

It decreases, but decreasing from $1 million per token to $0.9 million per token after a year is still a decrease, but it still is not viable. Paying an AGI a $100 billion dollars for it to work 24/7 for a year is worse than hiring 10 people for $30k a year to work shifts to do the same work 24/7.

willyxdjazz

4 months ago

Maybe I'm being too simplistic, but I think we're mixing two distinct debates.

Today we have an extraordinary invention—comparable to the wheel in its time. That invention is: predictive inference over all human knowledge. Period. I don't like calling it "Artificial Intelligence" because it's not intelligence; it's a prediction system that can project responses by illuminating patterns across all human knowledge encapsulated in text, audio, and video. What companies like OpenAI call "reasoning" models is simply that predictive process, but in a loop packaged as a product—one of the first marvelous uses of this fascinating invention: predictive inference over all human knowledge.

When the wheel was invented, no one could have imagined that, combined with hundreds of subsequent technologies, it would enable an electric car powered by solar energy. The wheel wasn't autonomous transportation—it was a fundamental component.

I see two debates getting mixed up here:

- The debate about the current invention: A tool that makes encyclopedias "speak" by connecting patterns across all human knowledge. As a tool, that's what it is—nothing more, nothing less. Tremendously useful, but a tool.

- The debate about the future dream: What this invention might enable when combined with hundreds of technologies that don't yet exist—similar to imagining an electric car when you only have the wheel.

It seems many experts are taking positions and getting "upset" because they're mixing these two debates. Some evaluate the wheel as if it should already be a solar electric car. Others defend the wheel by saying it already IS a solar electric car. Both are right in their observations, but they're talking about different things.

LLMs are a fundamental breakthrough—the "wheel" of the information age. But discussing whether they "understand" or have "world models" is like asking whether the wheel "comprehends transportation."

On the danger of confusing capabilities: Conflating the tool with the end goal leads us to poor decisions—from over-investment to under-utilization. When we expect AGI from what is fundamentally a pattern-matching engine, we set ourselves up for disappointment and misallocation of resources. No magic, just reality.

The temporal factor: The AGI debate is a debate about the future—about what might emerge from combinations of technologies we haven't yet invented.

jstummbillig

4 months ago

> I don't like calling it "Artificial Intelligence" because it's not intelligence

A pattern I noticed in a AI[sic] discussions: Handwavily declaring what intelligence is not, while not explaining what is.

foofoo12

4 months ago

> Handwavily declaring what intelligence is not, while not explaining what is.

That goes in the other direction too. Declaring it intelligent without explaining what it is. Or even worse, if any explanations are offered, they are often half truths or exaggerated.

willyxdjazz

4 months ago

You are right, I thought maybe something interesting in these debates is more education about how an LLM works. I don’t like calling it artificial intelligence because precisely we don’t understand well what “intelligence” is. What we do understand is how we came to build an LLM. Good point, I will keep that in mind for next time; it’s better to give more details and, above all, remove the “no” from assertions and clarify more. Thanks :)

user

4 months ago

[deleted]

rhetocj23

4 months ago

4 months ago

It's curious that Kurzweil's predictions about transcending biology align so closely with his expected lifespan. Reminds me of someone saying, if you ask a researcher for a timeline of a breakthrough they'll give you the expected span of their career.

Hegel thought history ended with the Prussian state, Fukuyama thought it ended in liberal America, Paul thought judgement day was so close you need not bother to marry, the singularity always comes around when the singularians get old. Funny how that works

williamcotton

4 months ago

The biggest problem I've had with Kurzweil and the exponential growth curve is that the elbow depends entirely on how you plot and scale the axis. With a certain vantage point we have arguably been on an exponential curve since the advent of Homo Sapiens.

somenameforme

4 months ago

I lost all respect for him after reading about his views on medical immortality. His argument is that over time human life expectancy has been constantly increasing * and he calculated that based on some arbitrary rate of acceleration, that science would be expanding human life expectancy by more than a year, per year - medical immortality in other words, and all expected to happen just prior to the time he's reaching his final years.

The overwhelming majority of all gains in human life expectancy have come due to reductions in infant mortality. When you hear about things like a '40' year life expectancy in the past it doesn't mean that people just dropped dead at 40. Rather if you have a child that doesn't make it out of childhood, and somebody else that makes it to 80 - you have a life expectancy of ~40.

4 months ago

4 months ago

What was the last example where humans succeeded at a hard problem like that?

Space flight?

mkipper

4 months ago

Even if it's not some staggering triumph of human achievement, I'd argue that Ozempic (etc.) is similar. A magic weight loss drug has always captured the public's imagination, and it feels like I've been hearing about new weight loss drug studies in the news for my entire life that never went anywhere.

dlcarrier

4 months ago

That was a stroke of luck. It's synthetic gila monster poison.

jaza

4 months ago

We've "succeeded" at space flight about as much as we've "succeeded" at AI. Yay, man on the moon! Over half a century later, and it turns out that the "next small step" - man on Mars - isn't so small and still hasn't been achieved. Anything remotely resembling sci-fi-style ubiquitous space travel remains exactly that - sci-fi!

newsclues

4 months ago

4 months ago

Great quote:

"When you get a demo and something works 90% of the time, that’s just the first nine. Then you need the second nine, a third nine, a fourth nine, a fifth nine. While I was at Tesla for five years or so, we went through maybe three nines or two nines. I don’t know what it is, but multiple nines of iteration. There are still more nines to go.

That’s why these things take so long."

onlyrealcuzzo

4 months ago

Importantly, the first 9s are the easiest.

If you need to get to 9 9s, the 9th 9 could be more effort than the other 8 combined.

Symmetry

4 months ago

Very interesting conversation I'm still listening too. One bit I disagreed with is that I still think that an LLM's context is more like a person's sensory memory[1] than their working memory. The way that data falls off the end of the buffer regardless of how much attention it provokes is entirely unlike our own working memory. On the other hand a reasoning model's scratchpad seems to fit the analogy much better.

[1]https://en.wikipedia.org/wiki/Sensory_memory

joshellington

4 months ago

To throw two pennies in the ocean of this comment section - I’d argue we still lack schematic-level understanding of what “intelligence” even is or how it works. Not to mention how it interfaces with “consciousness”, and their likely relation to each other. Which kinda invalidates a lot of predictions/discussions of “AGI” or even in general “AI”. How can one identify Artificial Intelligence/AGI without a modicum of understanding of what the hell intelligence even is.

qudat

4 months ago

The reason why it’s so hard to define intelligence or consciousness is because we are hopelessly biased with a datapoint of 1. We also apply this unjustified amount of mysticism around it.

https://bower.sh/who-will-understand-consciousness

MatrixMan

4 months ago

I don't think we can ever know that we are generally intelligent. We can be unsure, or we can meet something else which possesses a type of intelligence that we don't, and then we'll know that our intelligence is specific and not general.

So to make predictions about general intelligence is just crazy.

And yeah yeah I know that OpenAI defines it as the ability to do all economically relevant tasks, but that's an awful definition. Whoever came up with that one has had their imagination damaged by greed.

judahmeek

4 months ago

All intelligence is specific, as evidenced by the fact that a universal definition regarding the specifics of "common sense" doesn't exist.

MatrixMan

4 months ago

Common is not the same as general. A general key would open every lock. Common keys... well they're quite familiar.

judahmeek

4 months ago

My point was that all intelligence is based on an individual's experiences, therefore an individual's intelligence is specific to those experiences.

Even when we "generalize" our intelligence, we can only extend it within the realm of human senses & concepts, so it's still intelligence specific to human concerns.

MatrixMan

4 months ago

So if you encounter an unknown intelligence, like I dunno some kind of extra dimensional pen pal with a wildly different biology and environment than our own... Would you be open to the possibilities:

- despite our difference we have the same kind of intelligence

- our intelligences intersect, but there are capacities that each has that the other doesn't

It seems like for either to be true there would have to be some place of common ground into which we could both generalize independently of our circumstance. Mathematics is often thought to be such a place for instance, there's plenty of sci fi about beaming prime numbers into space as an attempt to leverage that common ground. Are you saying there aren't such places? That SETI is hopeless?

judahmeek

4 months ago

It's certainly possible that we may encounter other alien lifeforms whose intelligence intersects our own.

It's just not guaranteed.

MatrixMan

4 months ago

If we assume this about intelligence:

> Even when we "generalize" our intelligence, we can only extend it within the realm of human senses & concepts, so it's still intelligence specific to human concerns.

...then we might fail to recognized them as intelligent when we meet them. Same goes for emergent artificial doohickeys. A theory that allows for generalization might never fine an example of it, but it's still better than a theory which doesn't because the second sort surely won't.

judahmeek

4 months ago

When you make the term "general intelligence" so broad that it expands beyond the realm of human senses & concepts, statements about it become unfalsifiable because you, a human, can't conceive of a way to test said statement.

Unfalsifiable statements are worthless because they can't be tested.

Because my perspective is that definitions that are in fact too broad are unimportant because no one uses them.

walkabout

4 months ago

4 months ago

> we still lack schematic-level understanding of what “intelligence” even is or how it works. Not to mention how it interfaces with “consciousness”, and their likely relation to each other

I think you can get pretty far starting from behavior and constraints. The brain needs to act in such a way as to pay for its costs. And not just day to day costs, also ability to receive and give that initial inheritance.

From cost of execution we can derive an imperative for efficiency. Learning is how we avoid making the same mistakes and adapt. Abstractions are how we efficiently carry around past experience to be applied in new situations. Imagination and planning are how we avoid the high cost of catastrophic mistakes.

Consciousness itself falls from the serial action bottleneck. We can't walk left and right at the same time, or drink coffee before brewing it. Behavior has a natural sequential structure, and this forces the distributed activity in the brain to centralized on a serial output sequence.

My mental model is that of a structure-flow recursion. Flow carves structure, and structure channels flow. Experiences train brains and brain generated actions generate experiences. Cutting this loop and analyzing parts of it in isolation does not make sense, like trying to analyze the matter and motion in a hurricane separately.

chadcmulligan

4 months ago

I did the math some years ago on how much computing is required to simulate a human brain - a brain has around 90 billion neurons with each neuron having an average of 7,000 connections to other neurons. Lets assume thats all we need. So what do we need to simulate a neuron, one cpu? or can we fit more than one in a CPU, lets say 100 so we're down to one billion cpu's and 70 trillion messages flying between them every what? mSec?.

Simulating that is a long way away - so the only possibility is that brains have some sort of redundancy and we can optimise that away. Though computers are faster than brains so its possible maybe, how much faster? So lets say a neuron does its work in a mS and we can simulate this work in 1uS, ie a thousand times faster - thats still a lot. Can we get to a million times faster? even then its still a lot. Not to mention the power required for this.

Even if we can fit a million neurons in a CPU thats still 90 million CPU's. Only 10% are active say, still 9 million CPU's, a thousand times faster - 9,000 cpu's nearly there but still a while away.

cmrdporcupine

4 months ago

We don't even have an accurate convincing model of how the functions of the brain really work, so it's crazy to even think about its simulation like that. I have no doubt that the cost would be tremendous if we could even do it, but I don't even think we know what to do.

The LLM stuff seems most distinctly to not be an emulation of the human brain in any sense, even if it displays human-like characteristics at times.

keiferski

4 months ago

That would require philosophical work, something that the technicians building this stuff refuse to acknowledge as having value.

Ultimately this comes down to the philosophy of language and of the history of specific concepts like intelligence or consciousness - neither of which exist in the world as a specific quality, but are more just linguistic shorthands for a bundle of various abilities and qualities.

Hence the entire idea of generalized intelligence is a bit nonsensical, other than as another bundle of various abilities and qualities. What those are specifically doesn’t seem to be ever clarified before the term AGI is used.

Culonavirus

4 months ago

> I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["<insert general intelligence buzzword>"], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the <insert llm> involved in this case is not that.

https://en.wikipedia.org/wiki/I_know_it_when_I_see_it

whiplash451

4 months ago

Without going to deep into the rabbit hole, one could argue that at the first-order, intelligence is the ability to learn from experience towards a goal. In that sense, LLMs are not intelligent. They are just a (great) tool at the service of human intelligence. And so we’re just extremely far from machine intelligence.

nopinsight

4 months ago

A definition of AGI: https://www.agidefinition.ai/

A new contribution by quite a few prominent authors. One of the better efforts at defining AGI *objectively*, rather than through indirect measures like economic impact.

I believe it is incomplete because the psychological theory it is based on is incomplete. It is definitely worth discussing though.

—-

In particular, creative problem solving in the strong sense, ie the ability to make cognitive leaps, and deep understanding of complex real-world physics such as the interactions between animate and inanimate entities are missing from this definition, among others.

mabedan

4 months ago

I’m surprised there’s no mention of creativity and outside the box thinking. Listening to this podcast I was wondering if we could train the LLM with knowledge cutoff right before transformers, and ask it to come up with an ML method for LLMs. I’m quite sure none of today’s models would be able to (obviously without access to internet search)

user

4 months ago

[deleted]

spjt

4 months ago

I love vibe coding because it does the things I hate really well, like meeting test coverage requirements and writing doc comments.

cmrdporcupine

4 months ago

This of course depend completely on how you define "vibe" coding.

4 months ago

> Oh yeah? Instead of 5 years to render all human labor obsolete, it will take 20? The magnitude of that change is so large that the implications of it happening anytime in our lifetimes are too big to ignore.

While true, I would suggest two things:

First, that nobody actually knows how long it will take to make fully-general AI to drive robots, humanoid or otherwise. Look how long self-driving cars have taken, and that they're still geo-fenced.

Second, that it doesn't take AI for the robots themselves to have 90% of this impact. All those jokes about AI meaning "Actually Indians"? Well, the same robots controlled not by artificial intelligence but by remote control from cheap 3rd world labourers who charge $5/day, will make current arguments about the effect of immigration on unemployment look laughably naïve. Likewise, unfortunately, crime, because one thing we can guarantee is that someone's going to share their password or access token and some rich person's cheap robot servant will become Mr. Stabby the unknown assassin.

rhetocj23

4 months ago

I think its more likely that it wont happen and the hubris of folks like you is going to look comical in hindsight.

chronci739

4 months ago

> This is something that will happen.

Not in our lifetime.

The iPhone came out less than 20 years ago.

And what, you scan QR codes at restaurants with iphones?

TeMPOraL

4 months ago

You do. And so does your non-technical mother (or a friend of your mother).

The impact of the iPhone and its competitors is felt everywhere, it diffused into every domain of people's lives. Think: the whole of social media was pretty much enabled by smartphones.

Or a more pedestrian, random example: every day I go to the office, I see endless store managers, restaurant managers, etc. walking around their store, making photos to upload to HQ. But this is merely a symptom - the actual consequence is the change in busines structure. It's because smartphones make this easy, that it makes franchise and subcontracted businesses more viable, because it's easier for the HQ to micromanage more semi-independent subordinates.

There are many, many more examples like this everywhere you look. Which is why I'm inclined to agree with Karpathy: computers, iPhones, LLMs, are all the same thing - it's just the more notable manifestations of how we've been staying on 2% growth exponential curve for many hundreds of years now, and why we'll continue to stay on this curve.

But the caveat is: that curve is getting steep enough that the world is starting to transform faster than we can handle.

ben_w

4 months ago

The iPhone came out less than 20 years ago, and now I:

• Don't get out my debit card while shopping.

• Don't get lost exploring a new city.

• Have zero-cost video calls with anyone I want.

• Use most spare moments of my time — walking to the shops, or on public transport, or while hiking in the countryside — learning something new. When I'm not too damp for the capacitive touch screen, that can be interactive lessons, not just passive; but even for the passive consumption, mobile internet beats pre-loaded content on an MP3 player.

• Have a real-time augmented-reality translator, for the German I've not yet learned while living in Berlin, and all the other languages I don't (or barely) know while travelling outside the country.

asadotzler

4 months ago

• Don't get out my debit card while shopping.

You take out your phone though. How is taking your phone out of your pocket, logging in, and tapping it on a terminal significantly different from pulling a credit card or cash from your pocket and tapping the terminal or handing it to the checker?

• Don't get lost exploring a new city. You're young, I guess. We had GPS in cars well before iPhone. GPS navigation in cars was taking off mid-90s to mid-2000s. I had a Garmin in 2002.

• Have zero-cost video calls with anyone I want. I was doing that on my laptop and desktop before iPhone. Heck, I was doing free video conferencing with European friends in 1995.

• Use most spare moments of my time I did much of this filling in empty times on my laptops years before iPhone but you are right, not as much of it as with smartphones. Cramming my day full of even more noise, however, rather than having more breaks from it, feels like devolution to me.

• Have a real-time augmented-reality translator This is an improvement over pocket electronic translators I was using in Japan in the early 2000s, but really the improvements are mostly in fidelity and usability, not in function.

Don't get me wrong, smartphones changed a lot, but it seems like you're eliding at least a decade of pre-iphone advancements here and focusing on when these tasks became easy and in everyone's hands, rather than when the tasks actually became possible and were in reasonably widespread use. You're not a youngster like many here, so I can't attribute that to naivete and that leaves me thinking haste was at work here. Happy to hear back why I'm wrong and willing to change my mind on any of these.

ben_w

4 months ago

> How is taking your phone out of your pocket, logging in, and tapping it on a terminal significantly different from pulling a credit card or cash from your pocket and tapping the terminal or handing it to the checker?

Biometric ID to make the payment. I don't so much "log in" as "touch the fingerprint scanner built into the button that switches the screen on". Though if I cared to wear it, I do also have an Apple Watch and would therefore not even need to take anything out of my pocket.

> You're young, I guess. We had GPS in cars well before iPhone. GPS navigation in cars was taking off mid-90s to mid-2000s. I had a Garmin in 2002.

Just about to turn 42. I saw GPS in use only a little later than that, 2005 I think. But:

1) dedicated GPS was never in everyone's pocket until smartphones became normalised; and even then, location precision was mediocre until assisted GPS got phased in (IIRC the first consumer phone with A-GPS was about a year before the iPhone?)

2) the maps were incredibly bad; my experience in 2005 included it thinking we were doing 70 miles an hour through a field because the main road we were on was newer than the device's map.

3) Phone map apps also include traffic alerts, public transport info including live updates for delays, altitude data (useful for cyclists), ratings and hours for seemingly most of the cafes/restaurants/other attractions, and simply has a lot more detail because it can afford to (e.g. many of the public toilets).

> I was doing that on my laptop and desktop before iPhone. Heck, I was doing free video conferencing with European friends in 1995.

Critical point: "with anyone I want". Almost every independently functioning person in Europe, has a smartphone, and can be contacted without waiting for them to sit down at a desk terminal connected to a fixed line internet connection that was currently switched on.

Back in 1995, most people didn't have the internet at all, so no possibility at all to call them over the internet; those who did have it were either academics (yay JANET), had a relatively expensive wired ISDN line, or were on dialup (charged by the minute and had just about enough bandwidth for 3fps greyscale at 160x120 or so if the compression was what I think it was), and while mobile phones did exist back then, they were (1) unaffordable unless you were a yuppie, (2) didn't have cameras, (3) even worse bandwidth than dialup because 2G.

> This is an improvement over pocket electronic translators I was using in Japan in the early 2000s, but really the improvements are mostly in fidelity and usability, not in function.

I count "point camera at poster, see poster modified with translations overlaid over all text" as very much a change of function.

I mean, I don't need to translate Chinese, Japanese, Korean, or Arabic, but sometimes they come up in films and I get curious, but I can't type any of those alphabets in the first place so the only way to translate it is with something like Google Translate (and its predecessor Word Lens) that does it all as a video stream.

> focusing on when these tasks became easy and in everyone's hands, rather than when the tasks actually became possible and were in reasonably widespread use.

For much of this, that's the point. As the quote goes, "The future's already here, it's just not evenly distributed". I assumed it would be clear video calls can only be had with other people that also have video call equipment.

Or forward looking, look at how there are cars with no-steering-wheel-needed (even if Waymo has not actually removed them) full-self-drive, but they're geofenced. It's there, it's not everywhere.

With AI and human labour? Well, that's a two-part thing, the hardware and the software.

Hardware? I can buy a humanoid robot right now — it would be a bit silly, but I could, e.g.: https://de.aliexpress.com/item/1005009127396247.html

Software? The software running these robots can (just about) fold laundry, or tidy up litter and dishes — you know, all the things that people keep sarcastically listing to dismiss AI, saying "wake me up when they can XYZ": https://www.youtube.com/@figureai/videos

It's just… these robots are expensive, kinda slow, and the software gives me the same vibes I got from AI Dungeon (I think I saw it shortly after they changed away from GPT-2?), so I ask the same question of those today as I asked myself of a 3D printer in 2015, of an iPhone in 2010, of a multi-language electronic travel dictionary in 2009, of a dedicated GPS unit in 2005, of a laptop in 2002: can I really justify spending that much money on this thing? And my answer is the same: no.

I can't run the fanciest AI models on any of my devices, they won't fit, I'd have to buy a much beefier machine. There's a whole bunch of things that the SOTA AI models themselves can't do yet, but which can be done by tools that AI do know how to use, but I can't run all of those tools either. Any tool that gets invented in the next 20 years (or indeed ever), if it's documented at all in any language current LLMs can follow, those LLMs will be able to use them.

Now don't get me wrong, I'm not holding my breath or saying this will be soon. I've opined before that the minimum gap between "a level-5 self driving car" and "a humanoid robot that can get into any old car and drive it equally well" is 5-10 years just because of the smaller form factor having less room for compute and battery. Also, it seems obvious that "all human labour" is a harder problem than "can drive". If (if!) it is necessary to have humanoid robots in order to render all human labor obsolete, then I would be surprised if it takes any less than 15 years from today, but could be more — easily more, and by an arbitrarily large degree. I don't think humanoid robots are necessary for this, which reduces my lower bound, but at the same time it is just a lower bound.

modeless

4 months ago

If robots "only" have an iPhone-sized impact on the world I think that would be surprising but also still a huge deal worth caring about.

K0balt

4 months ago

We will never achieve AGI, because we keep moving the goalposts.

SOTA models are already capable of outperforming any human on earth in a dizzying array of ways, especially when you consider scale.

Humans also produce nonsensical, useless output. Lots of it.

Yes, LLMs have many limitations that humans easily transcend.

But few if any humans on earth can demonstrate the breadth and depth of competence that a SOTA model possesses.

Relatively few (probably less than half) are casually capable of the level of reasoning that LLMs exhibit.

And, more importantly, as anyone in the field when neural networks were new is aware, AGI never meant human level intelligence until the LLM age. It just meant that a system could generalize one domain from knowledge gained in other domains without supervision or programming.

davesque

4 months ago

> But few if any humans on earth can demonstrate the breadth and depth of competence that a SOTA model possesses.

Most humans can count the occurrence of letters in a word. The word competence here is doing quite a bit of work. I think most people understand competence to mean more than just encyclopedic knowledge, with very limited reasoning capability.

> AGI never meant human level intelligence until the LLM age. It just meant that a system could generalize one domain from knowledge gained in other domains without supervision or programming.

I think it's probably correct to say that many people who seriously studied the problem had a larger notion of AGI than the layperson who only ever talked about the Turing test in the most basic terms. Also, I don't think LLMs have even convincingly demonstrated a great ability to generalize.

They're basically really great natural language search engines but for the fact that they give incorrect but plausible answers about 5-10% of the time.

K0balt

3 months ago

>> they give incorrect but plausible answers about 5-10% of the time.

This describes most of the human population as well. Why do we expect machines to be more accurate and perfect in their correctness than humans before we say they are at parity, when they are clearly savant as much as idiot. It’s a strange bias.

rlanday

4 months ago

> SOTA models are already capable of outperforming any human on earth in a dizzying array of ways, especially when you consider scale.

So why are so many people still employed as e.g. software engineers? People aren’t prompting the models correctly? They’re only asking 10 times instead of 20? They’re holding it wrong?

K0balt

4 months ago

Long form engineering tasks aren’t doable yet without supervision. But I can say in our shop, we won’t be hiring any more junior devs, ever, except as (in my region, free) interns or because of some extraordinary capabilities, insights, or skills. There just isn’t any business case for hiring junior devs to do the grunt work anymore.

But, the vast majority of work that is done in the world is not in the same order of magnitude of complexity or rigor that is required by long form engineering.

While models may not outperform an experienced developer, they will likely outperform her junior assistant, and a dev using ai effectively will almost certainly outperform a team of three without ai, in most cases.

The salient fact here is not that the human is outperformed by the model in a narrow field of extraordinary capability, but rather that the model can outperform that dev in 100 other disciplines, and outperform most people in almost any cerebral task.

My claim is not that models outperform people in all tasks, but that models outperform all people at many tasks, and I think that holds true with some caveats, especially when you factor in speed and scale.

naveen99

4 months ago

What does junior or senior have anything to do with it ? I would think a smarter junior will run circles around a dumber senior engineer with LLM autocomplete.

K0balt

3 months ago

If you’re hiring dumb senior engineers you’re holding it wrong lol. Using LLMs is a lot like delegating to a team from a skills perspective, so it favors extensive domain knowledge. You don’t just commit whatever it writes, just like you wouldn’t commit what a junior dev writes without scrutiny. Experience makes that scrutiny more valuable and effective.

alganet

4 months ago

> We will never achieve AGI, because we keep moving the goalposts.

I think it's fair to do it to the idea of AGI.

Moving the goalpost is often seen as a bad thing (like, shifting arguments around). However, in a more general sense, it's our special human sauce. We get better at stuff, then raise the bar. I don't see a reason why we should give LLMs a break if we can be more demanding of them.

> SOTA models are already capable of outperforming any human on earth in a dizzying array of ways, especially when you consider scale.

Performance should include energy consumption. Humans are incredibly efficient at being smart while demanding very little energy.

> But few if any humans on earth can demonstrate the breadth and depth of competence that a SOTA model possesses.

What if we could? What if education mostly stopped improving in 1820 and we're still learning physics at school by doing exercises about train collisions and clock pendulums?

K0balt

4 months ago

I’m with you on the energy and limitations, and even on the moving of goalposts.

I’d like to add that I think limit definition of AGI has jumped the shark though and is already at ASI, since we expect our machine to exhibit professional level acumen across such a wide range of knowledge that it would be similar to the 0.01 percent top career scholars and engineers, or even above any known human capacity just due to breadth of knowledge. And we also expect it to provide that level of focused interaction to a small city of people all at the same time / provide that knowledge 10,000 times faster than any human can.

I think definitionally that is ASÍ.

As someone who has used Tesla FSD iterations for 4 years, their current system is quite incredible, and improving rapidly. It drives for me 95% of the time already.

musebox35

4 months ago

And that last 5% is the toughest nut to crack. There is a reason waymo is way ahead even if they can not scale. Cameras are passive devices with relatively poor dynamic range and low light behavior. They are nowhere near a match/replacement for the human eye. Just try to picture a 5 year old at dusk or indoors and what you see will not be what you get.

Rover222

It's like saying your newborn will have the same mass as earth in 50 years if he continues on his first month weight gain trajectory.

Ianjit

3 months ago

METR uses a 50% success rate in that analysis, beecause the models are non-determistic.

ben_w

4 months ago

METR measures tasks, not projects. No project I've worked on had individual tasks that were supposed to take longer than 2 weeks, the PM* broke them down to sub-tasks if they were any bigger.

* At least, where we had a PM. The places I was self-directed could arguably provide an interesting comparison.

cayleyh

4 months ago

"decade" being the universal time frame for "I don't know" :D

musebox35

4 months ago

Honestly, if you have any actual interest in LLMs or other generative ai variants, just go after a concrete goal post that you yourself set with measurable metrics to gauge your progress. Then the predicted timeline from podcasts and blog posts will become irrelevant. Experts and non-experts have both been terrible at predicting timelines since the dawn of ai. Self driving cars and llms are no exception. When you are making predictions based solely on intuition and experience it is mostly an extrapolation. It is not useless. It always helps to ask questions and try to frame the future within the bounds of our current understanding. But at the same time it is important to remember that this is just speculation, not empirical science. That is also why there is such varied opinions on the topic of ai timelines. Relax and enjoy witnessing a major leap in our understanding of natural language, vision, and high dimensional probabilistic vector spaces ;-)

ActorNightly

4 months ago

Not a decade. More like a century, and that is if society figures itself out enough to do some engineering on a planetary scale, and quantum computing is viable.

Fundamentally, AGI requires 2 things.

First it needs to be able to operate without information, learning as it goes. The core kernel should be such that it doesn't have any sort of training on real world concepts, only general language parsing that it can use to map to some logic structure to be able to determine a plan of action. So for example, if you give the kernel the ability to send ethernet packets, it should eventually figure out how to talk tls to communicate with the modern web, even if that takes an insane amount of repetition.

The reason for this is that you want the kernel to be able to find its way through any arbitrarily complex problem space. Then as it has access to more data, whether real time, or in memory, it can be more and more efficient.

This part is solvable. After all, human brains do this. A single rack of Google TPUs is roughly the same petaflops as a human brain operating at max capacity if you assume neuron activation is a add-multiply and firing speed of 200 times/second, and humans don't use all of their brain all the time.

The second part that makes the intelligence general is the ability to simulate reality faster than reality. Life is imperative by nature, and there are processes with chaotic effects (human brains being one of them), that have no good mathematical approximations. As such, if an AGI can truly simulate a human brain to be able to predict behavior, it needs to do this at an approximation level that is good enough, but also fast enough to where it can predict your behavior before you exhibit it, with overhead in also running simulations in parallel and figuring out the best course of actions. So for a single brain, you are looking at probably a full 6 warehouses full of TPUs.

dist-epoch

4 months ago

AIs already fake-simulate the weather (chaotic system) using 1% of the resources used by the real-simulating supercomputers.

zebrawaffles

4 months ago

Source?

ben_w

4 months ago

University of Washington in collaboration with Microsoft: https://www.washington.edu/news/2025/08/25/ai-simulates-1000... and https://www.washington.edu/news/2020/12/15/a-i-model-shows-p... the latter being a factor of 7000x improvement, reducing it to 0.014% of the required compute.

I'm surprised you missed it, given there's several other models in this space:

From NVIDIA: https://www.nvidia.com/en-us/high-performance-computing/eart...

Google: https://deepmind.google/science/weathernext/

And this is different model from Microsoft, this time a collaboration with Cambridge University: https://www.microsoft.com/en-us/research/blog/introducing-au...

ctoth

4 months ago

You want a "core kernel" with "general language parsing" but no training on real-world concepts.

Read that sentence again. Slowly.

What do you think "general language parsing" IS if not learned patterns from real-world data? You're literally describing a transformer and then saying we need to invent it.

And your TLS example is deranged. You want an agent to discover the TLS protocol by randomly sending ethernet packets? The combinatorial search space is so large this wouldn't happen before the sun explodes. This isn't intelligence! This is bruteforce with extra steps!

Transformers already ARE general algorithms with zero hardcoded linguistic knowledge. The architecture doesn't know what a noun is. It doesn't know what English is. It learns everything from data through gradient descent. That's the entire damn point.

You're saying we need to solve a problem that was already solved in 2017 while claiming it needs a century of quantum computing.

dang

4 months ago

Please make your substantive points without swipes or name-calling. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

ActorNightly

4 months ago

>What do you think "general language parsing" IS if not learned patterns from real-world data?

I want you to hertograize the enpostule by brasetting the leekerists, while making sure that the croalbastes are not exhibiting any ecrocrafic effects

Whatever you understand about that task, is what a kernel will "understand" as well. And however you go about solving it, the kernel will also will follow similar patterns of behaviour (starting with figuring out what hertrograize means, which then leads to other tasks, and so on)

>You want an agent to discover the TLS protocol by randomly sending ethernet packets? The combinatorial search space is so large this wouldn't happen before the sun explodes.

In pure combination, yes. In smart directed intelligent search, no. Ideally the kernel could listen for incoming traffic, and figure out patterns based on that. But the point is that the kernel should figure out that listening for traffic is optimal without you specifically telling it, because it "understands" the concept of other "entities" communicating with it and that communication is bound to be in a structured format, and has internal reward systems in place for figuring it out through listening rather than expending energy brute force searching.

Whatever that process is, it will get applied to much harder problems identically.

>Transformers already ARE general algorithms with zero hardcoded linguistic knowledge. The architecture doesn't know what a noun is. It doesn't know what English is. It learns everything from data through gradient descent. That's the entire damn point.

It doesn't learn what a noun is or english is, its a statistical mapping that just tends to work well. LLMs are just efficient look up maps. Look up maps can go only so far as to interpolate on the knowledge encoded within them. These can simulate intelligence in the sense of recursive lookups, but fundamentally that process is very guided, hence all the manual things like prompt engineering, mcp servers, agents, skills and so on.

ben_w

4 months ago

> It doesn't learn what a noun is or english is, its a statistical mapping that just tends to work well.

The word for creating that statistical map is "learning".

Now, you could argue that gradient descent or genetic algorithms or whatever else we have are "slow learners", I'd agree with that, but the weights and biases in any ML model are most definitely "learned".

RivieraKid

4 months ago

This would be great if true, I need 5 years to reach financial independence, so a decade should be plenty of time.

reenorap

4 months ago

Are "agents" just programs that call into an LLM and based on the response, it will do something?

fragmede

4 months ago

"Something" is broad and not well defined, but basically yeah. Rather than try to define it in terms of complexity of the something, I'll put it in terms of minutes. If the LLM returns a response, and that response gets fed into a system and run, and that's it, I wouldn't really call that agentic. It's got to go a few more rounds back and forth to be agentic, imo. In terms of time, I'd say the agent program has to be capable of at least 10 minutes of going from user input, then the program calling into the LLM, feeding the LLM response into a system, feeding that result back into the LLM, and feeding that into the system in a loop. Obviously there are ways to game that metric, like the terrible lines of code metric, but I think it's a decent handwave for when it feels like there's an agent working for me rather than a non-agentic system. What it's doing for those 10 minutes is important, calling "sleep 600" obviously doesn't count.

Eg for a programming LLM with an agentic agent and access to a computer, would be able to, given design-doc.md and Todo.md, implement feature X, making sure it compiles, run some basic smoke tests, write appropriate unit tests, make sure they all pass, and finally push the code and create a draft PR.

Naturally, not every call into the agent is going to take the full 10 minutes. It may need to ask questions before getting started, or stop if there's an unrecoverable error. Sometimes you'll just need to tell it "continue", but the system should be capable of a 10-minute run (hopefully longer!) given enough support.

sammyd56

4 months ago

An agent is just an LLM calling tools in a loop. If you're a "show me the code" type person like me, here's a worked example: https://samdobson.uk/posts/how-to-build-an-agent/

cootsnuck

4 months ago

Kinda. It's just an LLM that performs function calling (i.e. the LLM "decides" when a function needs to be called for a task and passes the appropriate function name and arguments for that function based on its context). So yea an "agent" is that LLM doing all of that and then your program that actually executes the function accordingly.

That's an "agent" at its simplest -- a LLM able to derive from natural language when it is contextually appropriate to call out to external "tools" (i.e. functions).

tim333

4 months ago

>I feel like when I'm awake, I'm building up a context window of stuff that's happening during the day. But when I go to sleep, something magical happens where I don't think that context window stays around.

• Video interview: https://www.youtube.com/watch?app=desktop&v=evSFeqTZdqs

That said, I've not seen work that looks promising to the problem of, as he phrased it: "They don’t have continual learning. You can’t just tell them something and they’ll remember it."

Saying any specific timeframe for that, 10 years or anything else, seems too certain. Some breakthrough might already exist and be unknown, but on the other hand it may require a fundamental advancement in mathematics in order to make it possible to find something at least close to optimal in a billion-dimensional (or whatever) vector space with only the first few dozen examples.

Ianjit

3 months ago

METR's study uses a 50% sucess rate. For most enterpise applications a 50% pass rate is unacceptable in automation, more like >99.9%.

psadri

4 months ago

When I think about a problem, I consciously explore a tree (or graph) of possibility chains. This requires a mental space to keep track of “state”. Sometimes jotting things down on paper helps if I can’t keep it all in my head. The process is: - generate some possibilities - rank them based on intuition (this might happen subconsciously!) - ask what if we follow possibility Pn - push Pn on to the stack. - recourse or pop stack if deadened

I feel LLMs are fairly capable when it comes to doing each of those steps in isolation. But not when it is all put together as a process.

atleastoptimal

4 months ago

I think he is bearish about agentic workflows because he works at the very highest level of coding. An agentic Karpathy is a few doubling cycles beyond an agentic junior engineer. Agents (or just LLMs on a loop that correct their errors) are very reliable for less complex tasks now, and theyre still getting better at an exponential rate.

We are still on trend by projections to reach human parity in many domains by 2027-2028, the only thing that would prevent this is a major unexpected slowdown in AI progress.

johnhamlin

4 months ago

Did anyone here actually watch the video before commenting? I’m seeing all the same old opinions and no specific criticisms of anything Karpathy said here.

dang

4 months ago

More specific responses have come in as people have digested more of the content.

This is the reflexive/reflective distinction (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...). Reflexive comments—the kind that express some pre-existing feeling or opinion that happens to get triggered by association—are much faster to produce, so unfortunately they show up first in many threads.

christkv

4 months ago

They are probably right and that is not anywhere close to a general intelligence but it still provides a bunch of value as long as it’s used in your own expert domain and you are not a lazy slob. We really get used to magic quickly these days. It’s not that long ago the Google employee was warning the world about skynet (internal early llm I guess) and got fired.

asdev

4 months ago

Are researchers scared to just come out and say it because they'll be labeled as wrong if the extreme tail case happens?

andy_ppp

4 months ago

No, it’s because of money and the hype cycle.

ionwake

4 months ago

4 months ago

I don’t think they’re scared, I think they know it’s a lose-tie game.

If you’re correct, there’s not much reward aside from the “I told you so” bragging rights, if you’re wrong though - boy oh boy, you’ll be deemed unworthy.

You only need to get one extreme prediction right (stock market collapse, AI taking over, etc ), then you’ll be seen as “the guru”, the expert, the one who saw it coming. You’ll be rewarded by being invited to boards, panels and government councils to share your wisdom, and be handsomely paid to explain, in hindsight, why it was obvious to you, and express how baffling it was that no one else could see what you saw.

On the other hand, if predict an extreme case and you get it wrong, there’s virtually 0 penalties, no one will hold that against you, and no one even remembers.

So yeah, fame and fortune is in taking many shots at predicting disasters, not the other way around.

themafia

4 months ago

5 decades.

4 months ago

> If you don't optimize power consumption you're going to increase surface area required to build it. There are hard physical limits having to do with signal propagation times.

While true, that probably stopped being an important constraint around the time we switched from thermionic valves to transistors as the fundamental unit of computation.

To be deliberately extreme: if we built cubic-kilometre scale compute hardware where each such structure only modelled a single cortical column from a human's brain, and then spread multiple of these out evenly around the full volume within Earth's geosynchronous orbital altitude until we had enough to represent a full human brain, that would still be on par with human synapses.

Synapses just aren't very fast.

nurettin

I remember attending a lecture from a famous quantum computing researcher in 2003. He said that quantum computing is 15-20 years away and then he followed up by saying that if he told anyone it was further away then he wouldn't get funding!

EA-3167

4 months ago

It's an excellent time-frame that sounds imminent enough to draw interest (and funding), but is distant enough that you can delay the promised arrival a few times in the span of a career before retiring.

Fusion research lives and dies on this premise, ignoring the hard problems that require fundamental breakthroughs in areas such as materials science, in favor of touting arbitrary benchmarks that don't indicate real progress towards fusion as a source of power on the grid.

"Full self driving" is another example; your car won't be doing this, but companies will brag about limited roll-outs of niche cases in dry, flat, places that are easy to navigate.

bhelkey

4 months ago

> companies will brag about limited roll-outs of niche cases in dry, flat, places that are easy to navigate

According to their website, Waymo offers autonomous rides to the general public in Austin, Atlanta, Phoenix, the San Francisco Bay Area, and Los Angeles [1].

* San Francisco is an extremely hilly city that gets a fair bit of fog.

* Los Angeles has notorious traffic and particularly aggressive drivers.

* Atlanta gets ~50 inches of rain a year, more than Seattle [2].

[1] https://waymo.com/faq/#:~:text=Where%20does%20Waymo%20operat...

[2] https://www.forbes.com/sites/marshallshepherd/2024/09/03/whi...

plastic3169

4 months ago

> ”Full self driving" is another example; your car won't be doing this, but companies will brag about limited roll-outs of niche cases in dry, flat, places that are easy to navigate.

Not expecting my car to be self-driving anytime soon, but I have understood there is actual working robotaxi service in San Francisco which is not easy or flat? I think we can’t keep saying self driving cars will never happen when this kind of thing already exists.

EA-3167

4 months ago

It's true that SF isn't flat, but it's incredibly well mapped, it never snows and you don't have to worry about roads ravaged by frost-heaves. There's a reason that the new Doordash automated delivery service is starting off in Phoenix and not Boston for example.

Yoric

4 months ago

And now (useful) quantum computing is 5 years away! Has been for a few years, too.

oldgradstudent

4 months ago

Any day now.

kordlessagain

4 months ago

4 months ago

People keep talking about AGI as if it's some mystical leap beyond human capability.

But let's be honest; software development at a modern startup is already the upper bound of applied intelligence. You're juggling shifting product specs, ambiguous user feedback, legacy code written by interns, and five competing JS frameworks, all while shipping on a Friday. Models can now do that. They can reason about asynchronous state, refactor a codebase across thousands of lines, and actually explain the difference between useEffect and useLayoutEffect without resorting to superstition.

If that's not general intelligence, what exactly are we waiting for - self-awareness?

blueside

4 months ago

LLMs have continually taught me that we have vastly overestimated human intelligence

woadwarrior01

4 months ago

Perhaps we're overestimating human intelligence and underestimating animal intelligence. Also funny that current LLMs are incapable of continual learning themselves.

teleforce

4 months ago

>LLMs have continually taught me that we have vastly overestimated human intelligence

LLMs have continually taught me that we have vastly underestimated human intelligence, fixed that for you

hax0ron3

4 months ago

Models can't do that now, though. If they could, pretty much every human software engineer would be unemployed right now.

4 months ago

Why does everyone have such short timelines to show progress? So what if it takes 50 years to develop, we’ll have AGI for the next million years

ben_w

4 months ago

If it takes 50 years, you might have it for millions of years along with anti-aging solutions that actually work, I'll have probably died of old age.

I don't know how much wish fulfilment there is in people's timelines.

awesome_dude

4 months ago

I have massive respect for Andrej, my first encounter with "him" was following his tutorials/notes when he was a grad student/tutor for AI/ML.

I was a lot disappointed when he went to work for Tesla, and I think that he had some achievement there, butnot nearly the impact I believe he potentially has.

His switch (back?) to OpenAI was, in my mind, much more in keeping with where his spirit really lies.

So, with that in mind, maybe I've drunk too much kool aid, maybe not. But I'm in agreement with him, the LLMs are not AGI, they're bloody good natural language processors, but they're still regurgitating rather than creating.

Essentially that's what humans do, we're all repeating what our education/upbringing told us worked for our lives.

But we all recognise that what we call "smart" is people recognising/inventing ways to do things that did not exist before. In some cases its about applying a known methodset to a new problem, in others its about using a substance/method in a way that other substances/methodsets are used, but the different substance/methodset produces something interesting (think, oh instead of boiling food in water, we can boil food in animal fats... frying)

AI/LLMs cannot do this, not at all. That spark of creativity is agonisingly close, but, like all 80/20 problems, is likely still a while away.

The timeline (10 years) - it was the early 2010s (over 10 years ago now) that the idea of backward propagation, after a long AI winter, finally came of age. It (the idea) had been floating about since at least the 1970s. And that ushered in the start of our current revolution, that and "Deep Learning" (albeit with at least another AI winter spanning the last 4 or 5 years until LLMs arrived)

So, given that timeline, and the restraints in the currrent technology, I think that Andrej is on the right track, and it will be interesting to see where we are in ten years time.

chasd00

4 months ago

if openAI didn't put a chat interface in front of an LLM and make it available to the public wouldn't we still be in the same AI winter? Google, Meta, Microsoft, all of the major players were doing lots of LLM work already, it wasn't until the general public found out through the OpenAI's website that it really took off. I can't remember who said it, it was some CEO, that OpenAI had no moat but nether did anyone else. They all had LLMs already of their own. Was the breakthrough the LLM or making it accessible to the general public?

robotswantdata

4 months ago

The meme in 22/23 was OpenAI was really “Available AI”

throwaway-0001

4 months ago

How to tell if you regurgitated this comment vs being truly creative? If you can show me objectively, I’m sold.

password54321

4 months ago

You know LLMs are regurgitating when they will contradict their statements just by clicking 'redo' on a prompt. I doubt if you were the ask the same question that they would suddenly say the complete opposite of what they just said.

Comparing LLMs trained on reddit comments and people who learn to speak as a byproduct of actually interacting with people and the world is nuts.

awesome_dude

4 months ago

That's not the creativity aspect, my comment is an observation, which, by definition, is a regurgitation of events.

Edit: This also demonstrates that people think (erroneously) that AI pumping out code, or content, or even essays, is inventive, but it's not.

This is merely a description and reduction, both of which AI can do, but neither of which are an invention.

throwaway-0001

4 months ago

Actually I think the line between creative and regurgitate is so blurred you can’t tell me a single creative thing you did. So if 99% of people are not creative, and just regurgitate then why we keep AI standards so high?

Can you show me one single thing you did in your life that was truly creative and not regurgitated?

awesome_dude

4 months ago

I think that was my point, I generally regurgitate. A person can do that a lot in life.

That's why people are conflating LLMs for AGI.

For now, I think that the key difference between me, and an LLM is that an LLM still needs a prompt.

It's not surveying the world around it determining what it needs to do.

I do a lot of something that I think an LLM cannot get do, look at things and try to find what attributes they have and how I can harness those to solve problems. Most of the attributes are unknown by the human race when I start.

throwaway-0001

4 months ago

Your fist prompt was just biological.

So if I make an ai with an a prompt and tell him to re prompt itself every day for the rest of his life means is smart now? Or just because I give him the first prompt is invalid? I doubt your first prompt was given by yourself. Was probably in your mums belly your first prompt.

—-

I could give an initial prompt to my ai to survey the server and act accordingly… and he can re prompt every day himself.

——

> I do a lot of something that I think an LLM cannot get do, look at things and try to find what attributes they have and how I can harness those to solve problems. Most of the attributes are unknown by the human race when I start.

Any examples? An ai can look at a conversation and extract insights better than most people. Negotiate better than most people.

—-

I heard nothing that you can do more than a llm. Self prompting yourself to do something I don’t think is a differentiator.

You also self prompt yourself based on Previous feedback. And you do this since you’re a baby. So someone also gave you the source prompt. Maybe dna.

awesome_dude

4 months ago

I do tire of your attempts to "corner" me into something I have no interest in doing.

I don't believe you have the capacity to understand why AGI hasn't been realised yet, and, frankly, I doubt you ever will.

throwaway-0001

4 months ago

So in the end you had no objective way to differentiate from a llm?

awesome_dude

4 months ago

ecocentrik

4 months ago

LLMs are close enough to pass the Turing Test. That was a huge milestone. They are capable of abstract reasoning and can perform many tasks very well but they aren't AGI. They can't teach themselves to play chess at the level of a dedicated chess engine or fly an airplane using the same model they use to copypasta a React UI. They can only fool non-proficient humans into believing that they might be capable of doing those things.

password54321

4 months ago

Turing Test was a thought experiment not a real benchmark for intelligence. If you read the paper the idea originated from it is largely philosophical.

As for abstract reasoning, if you look at ARC-2 it is barely capable though at least some progress has been made with the ARC-1 benchmark.

ecocentrik

4 months ago

I wasn't claiming the Turing Test was a benchmark for intelligence but the ability to fool a human into thinking a machine is intelligent in conversation is still a significant milestone. I should have said "some abstract reasoning". ARC-2 looks promising.

password54321

4 months ago

2. Andrej thinks that GPT5 pro is SOTA for code? Really? As a Sonnet normie.. can anyone please help me understand this?

edit:

3. You can't see any major tech developments on the GDP growth chart? Really? WTF? Have we all been smoking tech crack, this whole time? So GDP didn't grow extra from tech any single tech development, like the Internet? This broke my brain.

disclaimer: On the daily, I use LLM dev tools to add amazing LLM-enabled features to my pre-money SaaS. It's really cool and users love the features.

sheepscreek

4 months ago

In my case at least, it’s very good at following all instructions to the T. Claude 4.0 (haven’t used it much since 4.5 came out) would often miss some key things in my instructions. The output is very high quality as well. Many things (even complex coding tasks) work well in one shot.

For extremely complex multi-step problems though - it may need some help in breaking the tasks down to more manageable chunks. But will eventually ace it. As an example, I had good success with a project that involved:

- Rewriting all internals in a dotnet/C# application to use Apache Arrow types for data through the entire pipeline - Adapting the architecture to be streaming first instead of working through entire data in each stage - Designing and implementing a complex system that creates many different projections of the data based on everything that has read in the stream so far and create multiple outputs based on that, in parallel as the stream is being read in real-time - Recreating a prototype of the entire project in Rust

rcarmo

4 months ago

Hmm. Zeno's Paradox.

(I was in college during the first AI Winter, so... I can't help but think that the cycles are tighter but convergence isn't guaranteed.)

aiauthoritydev

4 months ago

Glad to see someone being honest here.

awongh

4 months ago

Now that Nvidia is the most valuable company, all this talk of actual AGI will be washed away by the huge amount of dollars driving the hype train.

Most of these companies value is built on the idea of AGI being achievable in the near future.

AGI being too close or too far away affects the value of these companies- too close and it'll seem too likely that the current leaders will win. Too far away and the level of spending will seem unsustainable.

michaelt

4 months ago

4 months ago

4 months ago

Oh, that's odd. This comment was intended for the vibe-coded.lol post

https://news.ycombinator.com/item?id=45622944

Must have been the flu-brain misfiring

konart

4 months ago

2035 singularity etc

spydum

4 months ago

2038 will be more significant

edbaskerville

4 months ago

superconduct123

4 months ago

I always get a weird feeling when AI researchers and CS people start talking about comparisons between human brains and AI/computers

superconduct123

4 months ago

Fair enough, I guess its a bit different nowadays since the background is usually a PhD in compsci

arawde

4 months ago

From personal experience making the same comparisons during undergrad, I think it just comes down to the availability of conceptual models. If the brain does X, there's a good chance that a computer does something that looks like X, or that X could be recreated through steps Y & Z, etc.

Once I started to realize just how much of the brain is inscrutable, because it is a machine operating on chemicals instead of strict electrical processing, I became a lot more reluctant to draw those comparisons

4 months ago

Ive also found this jarring and it speaks to the hubris of folks that have emerged in the past few decades who dont seem to have much relation to the humanities and liberal arts.

jjulius

4 months ago

>Why is there a presumption that we (as people who have only studied CS) know enough about biology/neuroscience/evolution to make these comparisons?

Hubris.

rootusrootus

4 months ago

Exactly. Someone way back when decided to call them neural networks, and now a lot of people think that they are a good representation of the real thing. If we make them fast enough, powerful enough, we'll end up with a brain!

Or not.

voidhorse

Now that is hubris.

GoatInGrey

4 months ago

This seems naively dismissive of arguments around substrates considering that playing "Go at superhuman levels" took 1MW of energy versus the 1-2 (or if you want to assume 100% of the brain was applied to the game, 20) watts consumed by the human brain.

__loam

4 months ago

How many examples did each system need to get good at the task too? It's currently a lot less for humans and we don't know why.

JumpCrisscross

4 months ago

> Your brain is doing computation with neurotransmitters instead of transistors

If it is, sure. But this isn't a given. We don't actually understand how the brain computes, as evidenced by our inability to simulate it.

> Evolution didn't discover some mystical process that imbues meat with special properties

Sure. But the complexity remains beyond our comprehension. Against the (nearly) binary action potential of a transmitter we have a multidimensional electrochemical system in the brain which isn't trivially reduced to code resembling anything we can currently execute on a transistor substrate.

> hese systems translate languages, write code, play Go at superhuman levels, and pass medical licensing exams... all tasks you'd have sworn required "real understanding" a decade ago

Straw man. Who said this? If anything, the symbolic linguists have been overpromising on this front since the 1980s.

ben_w

4 months ago

> Straw man. Who said this? If anything, the symbolic linguists have been overpromising on this front since the 1980s.

I'm sure I've seen people say this about language translation and playing go. Ditto chess, way back before Kasparov lost. I don't think I've seen anyone so specific as to say that about medical licensing exams, nor as vague as "write code", but on the latter point I do even now see people saying that software engineering is safe forever with various arguments given…

JumpCrisscross

4 months ago

Fair enough. I’m not going to argue nobody said anything. What I’ll contest is that anyone of consequence said it with consequence. These beliefs didn’t slow down the field. They didn’t stop it from raising capital or attracting engineers.

ctoth

4 months ago

Jonas & Kording showed that neuroscience methods couldn't reverse-engineer a simple 6502 processor [0]. If the tools can't crack a system we built and fully documented, our inability to simulate brains just means we're ignorant, not that substrate is magic. It also doesn't necessarily say great things for neuroscience!

And "who said this?"... come on. Searle, Dreyfus, thirty years of "syntax isn't semantics," all the hand-wringing about how machines can't really understand because they lack intentionality. Now systems pass those benchmarks and suddenly it's "well nobody serious ever thought that mattered." This is the third? fourth? tenth? round of goalpost-moving while pretending the previous positions never existed.

Pointing at "multidimensional electrochemical complexity" is just phlogiston with better vocabulary. Name something specific transformers can't do?

[0] https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

JumpCrisscross

4 months ago

> If the tools can't crack a system we built and fully documented, our inability to simulate brains just means we're ignorant, not that substrate is magic

Nobody said the substrate is magic. Just that it isn't understood. Plenty of CS folks have also been trying to simulate a brain. We haven't figured it out. The same logic that tells you the neuroscientific model is broken at some level should inform that the brains-as-computers model is similarly deficient.

> Pointing at "multidimensional electrochemical complexity" is just phlogiston with better vocabulary

Sorry, have you figured out how to simulate a brain?

Multidimensional because you have more than one signalling chemical. Electrochemical because you can't just watch what the electrons are doing.

> Name something specific transformers can't do?

That what can't do. A neuron? A neurotransmitter-receptor system? We literally can't simulate these systems beyond toy models. We don't even know what the essential parts are--can you safely lump together N neutransmitter molecules? What's N? We're still discovering new ion channels?!

voidhorse

4 months ago

I'm curious what you think understanding means.

I personally do not think operational proficiency and understanding are equivalent.

I can do many things in life pretty well without understanding them. The phenomenon of understanding seems distinct from the phenomenon of doing something/acting proficiently.

ben_w

4 months ago

dang

4 months ago

It takes time for more reflective comments to appear, because reflection is a slower mental operation. Reflexive responses are much faster and tend to be generic and shallow. (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...)

4 months ago

dang

4 months ago

99%? I have to stick up for HN here!

meowface

4 months ago

Ok, maybe not 99%. Probably at least 50% of comments in 70% of threads, though...

tauchunfall

4 months ago

there is a transcript, people can skim for interesting parts and read for 30 minutes and then comment.

edit: typo fix.

jlhawn

4 months ago

gotta listen at 2x speed!

therealmarv

4 months ago

a very human reaction ;)

jb1991

4 months ago

I would bet all of my assets of my life that AGI will not be seen in the lifetime of anyone reading this message right now.

That includes anyone reading this message long after the lives of those reading it on its post date have ended.

Which of course raises the interesting question of how I can make good on this bet.

ashivkum

4 months ago

genuinely curious to hear your reasoning for why this is the case. i'm always somewhere between bemused and annoyed opening the daily HN thread about AGI and seeing everyone's totally unfounded confidence in their predictions.

my position is I have no idea what is going to happen.

makotech221

4 months ago

its incredibly stupid to believe general intelligence is just a series of computations that can be done by a computer. The stemlords on the west coast need to take philosophy classes.

KylerAce

4 months ago

I don't think it's stupid to believe that the brain is somehow beyond turing computable considering how easy it is to create a system exactly as capable as a turing machine. I also don't think that anything in philosophy can provide empirical evidence that the brain is categorically special as opposed to emergently special. The sum total of the epistemology I've studied boiled down to people saying "I think human consciousness / the brain works like this" with varying degrees of complexity.

tokioyoyo

4 months ago

The problem with this argument is assuming there is general consensus on “what intelligence is”.

BoorishBears

4 months ago

That’s also about OpenAI claiming they have AGI. That doesn’t resolve based on actual AGI.

tim333

4 months ago

I wonder if there is a test for AGI which is definite enough to bet on? My personal test idea is when you can send for a robot to come fix your plumbing rather than needing a human.

vonneumannstan

4 months ago

>I would bet all of my assets of my life that AGI will not be seen in the lifetime of anyone reading this message right now. That includes anyone reading this message long after the lives of those reading it on its post date have ended.

By almost any definition available during the 90s GPT-5 Thinking/Pro would pretty much qualify. The idea that we are somehow not going to make any progress for the next century seems absurd. Do you have any actual justification for why you believe this? Every lab is saying they see a clear path to improving capabilities and theres been nothing shown by any research I'm aware of to justify doubting that.

jb1991

4 months ago

The fact is that no matter how "advanced" AI seems to get, it always falls short and does not satisfy what we think of as true AI. It's always a case of "it's going to get better", and it's been said like this for decades now. People have been predicting AGI for a lot longer than the time I predict we will not attain it.

LLMs are cool and fun and impressive (and can be dangerous), but they are not any form of AGI -- they satisfy the "artificial", and that's about it.

GPT by any definition of AGI is not AGI. You are ignoring the word "general" in AGI. GPT is extremely niche in what it does.

vonneumannstan

4 months ago

And if you read the Weaknesses section, you'll see very little of it is relevant to whether the Turing test demonstrates AGI. Only 1 of the 9 subsections is related to this. The other weaknesses listed include that intelligent entities may still fail the Turing test, that if the entity tested remains silent there is no way to evaluate it, and that making AI that imitates humans well may lower wages for humans.

port3000

4 months ago

They have to say that, or there'll be a loud sucking sound and hundreds of billions in capital will be withdrawn overnight

vonneumannstan

4 months ago

Ok that's great do you have evidence suggesting scaling is actually plateauing or that capabilities of GPT6 and Claude 4.5 Opus won't be better than models now?

jb1991

4 months ago

You are suggesting, in your reference to scaling, that this is a game of quantity. It is not.

tim333

4 months ago

I'd bet the other way because I think Moore's law like advances in compute will make things much easier for researchers.

4 months ago

It's been three years now, where is it? Everyone on hn is now a 10x developers, where are all the new startups making $$$? Employees are 10x more productive, where are the 10x revenues? Or even 2x?

Why is growth over the last 3 years completely flat once you remove the proverbial AI pickaxes sellers?

What if all the slop generated by llms counterbalance any kind of productivity boost? 10x more bad code, 10x more spam emails, 10x more bots

Etheryte

4 months ago

You can generally buy options only a few years out. A few years is decidedly shorter than the lifetime of everyone reading this thread.

lbhdc

4 months ago

“Markets can remain irrational longer than you can remain solvent.”

guluarte

4 months ago

that's probably a good idea, either AI bubble explodes or competitors catch up

asah

4 months ago

4 months ago

You seem to be arguing with someone who isn't here. My point is that if you think a calculator is going to help you do math you don't understand, you are going to have a really tough time once you get to 10th grade.

xboxnolifes

4 months ago

A calculator does 1 thinking task.

Yizahi

4 months ago

4 months ago

Will you take a wager of my one dollar versus your life assets? :)

vonneumannstan

4 months ago

You can make this bet functional if you really believe it, which you of course really don't. If you actually do then I can introduce you to some people happy to take your money in perpetuity.

akomtu

4 months ago

It's about the same as betting all life savings on nuclear war not breaking out in our lifetime. If AI gets created, we are toast and those assets won't be worth anything.

guluarte

4 months ago

my bet is we will just slowly automate things more and more until one day someone will point out when we reached "AGI"

FL33TW00D

4 months ago

How certain are you of this really? I'd take this bet with you.

You're saying that we won't achieve AGI in ~80 years, or roughly 2100, equivalent to the time since the end WW2.

To quote Shane Legg from 2009:

4 months ago

Is anyone _not_ short Oracle? The downside risk for them is that they’ll lose a deal worth 10x their annual revenues.

This whole AGI and "the future" thing is mostly a VC/Banks and shovel sellers problem. A problem that has become ours too because the ridiculous amounts of money "invested", so even warm fusion is not enough from an investment vs expectations perspective.

They are already playing musical money chairs, unfortunately we already know who's going to pay for all of this "exuberance" in the end.

I hope this whole thing crashes and burns as soon as possible, not because I don't "believe" in AI, but because people have been absolutely stupid about it. The workplace has been unbearable with all this stupidity and amounts of fake "courage" about every single problem and the usual judgment of the value of work and knowledge your run-of-the-mill dipshit manager has now.

spjt

4 months ago

The thing about AGI is that if it's even possible, it's not coming before the money runs out of the current AI hype cycle. At least we'll all be able to pick up a rack of secondhand H100's for a tenner and a pack of smokes to run uncensored diffusion models on in a couple years. The real devastation will be in the porn industry.

mrklol

4 months ago

I also don’t think our generation will see actual AGI, but imo the hard "intelligence“ part isn’t needed as we can use our intelligence. Using it as a tool will hopefully lead to plenty of cool things in the future.

rhetocj23

4 months ago

"The real devastation will be in the porn industry."

The UK govt has started to crack down on this. AI generated porn will lead to a war from govts on nailing this economic activity shut.

exasperaited

4 months ago

The UK government is not cracking down on AI porn generally but has started to crack down on the distribution of certain things, like:

- AI generated CSAM (out of a concern that it might cause people to seek to produce actual CSAM)

- AI generated rape and abuse images of real adults, again out of concern it will cause violence and its distribution is actually degrading and is experienced as and combined with threatening behaviour

- some extreme AI generated rape/abuse images of non-real people.

Despite what internet libertarians say, there is evidence to suggest that porn is changing people's sexual behaviours, particularly young people, both for good and ill.

At the moment there is no good reason to believe that AI-generated alternatives to harmful content are meaningfully less harmful to society.

There's more than enough evidence in articles posted on HN alone that people are beginning to experience psychosis brought on by spending too much time with AI content.

I don't really care if governments ban it; I'd like to see governments being much braver about criminalising AI generated misrepresentation, AI generated hoax content etc.

Sane governments should IMO absolutely ignore the ultra-libertarian angles; there is at least no reason that AI-generated content should be treated any differently under existing obscenity laws just because there are no real people in it.

spjt

4 months ago

Good luck cracking down when local inference gets cheap.

evandrofisico

4 months ago

4 months ago

This is the key insight I believe. It is inherently unpredictable. There are species that pass the mirror test with a far fewer equivalent number of parameters than large models are using already. Carmack has said something to the effect that about 10ksloc would glue the right existing achictectures together in the right way to make agi, but that it might take decades to stumble on that way, or someone might find it this afternoon.

palmotea

4 months ago

> Carmack has said something to the effect that about 10ksloc would glue the right existing achictectures together in the right way to make agi

What does he know about that?

4 months ago

I'd argue we've had more progress towards fusion than AGI.

chasd00

4 months ago

> I'd argue we've had more progress towards fusion than AGI.

way more pogress toward fusion than AGI. Uncontrolled runaway fusion reactions were perfected in the 50s (iirc) with the thermonuclear bombs. Controllable fusion reactions have been common for many years. A controllable, self-sustaining, and profitable fusion reaction is all that is left. The goalposts that mark when AGI has been reached haven't even been defined yet.

user

4 months ago

[deleted]

FiniteIntegral

4 months ago

Yet at the same time "towards" does not equate to "nearing". Relative terms for relative statements. Until there's a light at the end of the tunnel, we don't know how far we've got.

adastra22

4 months ago

Fusion used to be perpetually 30 years away. We’re making progress!

nh23423fefe

4 months ago

stop repeating that. first, it isn't true that intelligence is barely defined. https://arxiv.org/abs/0706.3639

second a definition is obviously not a prerequisite as evidenced by natural selection

thomasdziedzic

4 months ago

> stop repeating that. first, it isn't true that intelligence is barely defined. https://arxiv.org/abs/0706.3639

I don't think he should stop, because I think he's right. We lack a definition of intelligence that doesn't do a lot of hand waving.

You linked to a paper with 18 collective definitions, 35 psychologist definitions, and 18 ai researcher definitions of intelligence. And the conclusion of the paper was that they came up with their own definition of intelligence. That is not a definition in my book.

4 months ago

Also, I like like how almost nobody takes issue with a decade time interval. If he means that current LLMs, slowly plateauing in performance, would somehow take a decade to create AI (which he calls AGI)? Where would this fantastical gain in performance come from? Or he thinks it will be a different mechanism as a basis? But then what mechanism, it should be at least real in theory by now if it were to realize in a decade time.

Basically what I mean, is that if LLMs are future real AI basis, it would take less than a decade because they are in diminishing returns today. And if it is something completely new, then what exactly? And if it is something abstract, fuzzy and hypothetical, whence did a decade number come from?

This is basically Sam Altman's "5 to 10 years in the future"(1) all over again. Not less than 5 so as not to be verified in the near future, and no need to show at least something as a prototype or at least scientific theory. And no more than 10 year so as not to scare Softbank and other investors.

(1) https://fortune.com/2025/09/26/sam-altman-openai-ceo-superin...

https://www.forbes.com/sites/jodiecook/2024/07/16/openais-5-...

https://www.tomsguide.com/ai/chatgpt/sam-altman-claims-agi-i...

spjt

4 months ago

The difference with fusion is that we have a very good understanding of how fusion works, and exactly what we need to figure out how to do, to make it a viable energy source. It's basically just an engineering problem, albeit a very difficult one due to the extreme conditions. AGI is more like developing warp drive. With AGI, we really have no idea how the brain works or any clue of what problems need to be solved. It's basically just like the underpants gnomes.

Phase 1: Buying more GPU to increase the number of parameters in a LLM Phase 2: ??? Phase 3: AGI

AGI may come anywhere between next week, 1000 years in the future, or never. Anyone who claims to have any idea is full of shit, because we don't even know what problems we need to solve to get there. If we develop a good model of how human cognition works at a biological level, there is at least a direction, but that isn't going to be coming out of some AI hype factory with a datacenter full of H100's making videos of anthropomorphic cats working as pastry chefs.

qingcharles

4 months ago

I can't use fusion power yet.

Several hundred million people are using LLMs every day.

There has to be at least two orders of magnitude more investment in "AI" technologies than there are in fusion techs right now.

bamboozled

4 months ago

We're driving LLMs to get results though, which is different to what's being discussed.

4 months ago

That would be amazing but I'm not holding my breath.

benzible

4 months ago

What's his estimate of how far we are from a definition of AGI?

password54321

4 months ago

Can perform out of distribution tasks at least around average human level performance.

benzible

4 months ago

Every attempt to formally define "general intelligence" for humans has been a shitshow. IQ tests were literally designed to justify excluding immigrants and sterilizing the "feeble-minded." Modern psychometrics can't agree on whether intelligence is one thing (g factor) or many things, whether it's measurable across cultures, or whether the tests measure aptitude or just familiarity with test-taking and middle-class cultural norms.

Now we're trying to define AGI - artificial general intelligence - when we can't even define the G, much less the I. Is it "general" because it works across domains? Okay, how many domains? Is it "general" because it can learn new tasks? How quickly? With how much training data?

The goalposts have already moved a dozen times. GPT-2 couldn't do X, so X was clearly a requirement for AGI. Now models can do X, so actually X was never that important, real AGI needs Y. It's a vibes-based marketing term - like "artificial intelligence" was (per John McCarthy himself) - not a coherent technical definition.

password54321

4 months ago

keeda

4 months ago

This is actually discussed in the interview: https://www.dwarkesh.com/i/176425744/llm-cognitive-deficits

It seems to be more nuanced than what people have assumed. The best I can summarize it as is that he was doing rather non-standard things that confused the LLMs which have been trained on vast amounts of very standard code and hence kept defaulting to those assumptions.

Maybe a rough analogy is that he was trying to "code golf" this repo while LLMs kept trying to write "enterprise" code because that is overwhelmingly what they have been trained on.

user

4 months ago

[deleted]

4 months ago

    It crossed some threshold that was both real and magical

Only compared to our experience at the time.

    and future improvements are relying on that basic set of features at their core

Language models are inherently limited, and it's possible - likely, IMO - that the next set of qualitative leaps in machine intelligence will come from a different set of ideas entirely.

zer00eyz

4 months ago

Learning != Training.

Thats not a period, it's a full stop. There is no debate to be had here.

IF an LLM makes some sort of breakthrough (and massive data collation allows for that to happen) it needs to be "re trained" to absorb its own new invention.

But we also have a large problem in our industry, where hardware evolved to make software more efficient. Not only is that not happening any more but we're making our software more complex and to some degree less efficient with every generation.

This is particularly problematic in the LLM space: every generation of "ML" on the llm side seems to be getting less efficient with compute. (Note: this isnt quite the case in all areas of ML, yolo models working on embedded compute is kind of amazing).

Compactness, efficiency and reproducibility are directions the industry needs to evolve in, if it ever hopes to be sustainable.

zeroonetwothree

4 months ago

I think most people would consider AGI to be roughly matching that of humans in all aspects. So in that sense there’s no way that GPT3 was AGI. Of course you are free to use your own definition, I’m just reflecting what the typical view would be.

colonCapitalDee

4 months ago

AGI is when a computer can accomplish every cognitive task a typical human can. Given tools to speak, hear, and manipulate a computer, an AGI could be dropped in as a remote employee and be successful.

throwaway-0001

4 months ago

A human is agi when can accomplish all tasks of ChatGPT… how come the reverse doesn’t work?

4 months ago

[flagged]

AnimalMuppet

4 months ago

Most people cannot comprehend an audiobook? No way.

If you have evidence for that claim, show it. Otherwise, no, you're just making stuff up.

throwaway-0001

4 months ago

Did you ever had a mainstream product and answered customer questions? You should try to see what it truly means an average person.

Examples:

Send email with subject “I need support” (no body).

I answer by email: what you need?

Reply: I need to activate email support

…

Truly agi.

throw54465665

4 months ago

Sorry, it should be "most americans"!

Very simple proof, they can not even read/listen to their own constitution!

jaccola

4 months ago

I think this quote is often misapplied. The question "can a submarine safely move through water" IS a very interesting question (especially if you are planning a trip in one!).

Obviously this quote would be well applied if we were at a stage where computers were better at everything humans can do and some people were saying "This is not AGI because it doesn't think exactly the same as a human". But we aren't anywhere near this stage yet.

angiolillo

4 months ago

CSSer

4 months ago

The best part of this is I watched Sam Altman say he really thinks fusion is a short period of time away in response to a question about energy consumption a couple years ago. That was the moment I knew he's a quack.

ctkhn

4 months ago

Not to be anti YC on their forum, but the VC business model is all about splashing cash on a wide variety of junk that will mostly be worthless, hyping it to the max, and hoping one or two is like amazon or facebook. He's not an engineer, he's like Steve Jobs without the good parts.

jacobolus

4 months ago

Altman recently said, in response to a question about the prospect of half of entry-level white-collar jobs being replaced by "AI" and college graduates being put out of work by it:

> “I mean in 2035, that, like, graduating college student, if they still go to college at all, could very well be, like, leaving on a mission to explore the solar system on a spaceship in some completely new, exciting, super well-paid, super interesting job, and feeling so bad for you and I that, like, we had to do this kind of, like, really boring old kind of work and everything is just better."

Which should be reassuring to anyone having trouble finding an entry-level job as an illustrator or copywriter or programmer or whatever.

rightbyte

4 months ago

So STNG in 10 years?

edit: Oh. Solar system. Nvm. Totally reasonable.

SAI_Peregrinus

4 months ago

Fusion is 8 light-minutes away. The connection gets blocked often, so methods to buffer power for those periods are critical, but they're getting better so it's gotten a lot more practical to use remote fusion power at large scales. It seems likely that the power buffering problem is easier to solve than the local fusion problem, so more development goes to improving remote fusion power than local.

rohit89

4 months ago

Sam is an investor in a fusion startup. In any case, how long it takes us to get to working fusion is proportional to the amount of funding it recieves. I'm hopeful that increased energy needs will spur more investment into it.

timeon

4 months ago

He had to use distraction because he knows that he is doing part in increasing emissions.

4 months ago

Machine learning as a descriptive phrase has stopped being relevant. It implies the discovery of information in a training set. The pre-training of an LLM is most definitely machine learning. But what people are excited and interested in is the use of this learned data in generative AI. “Machine learning” doesn’t capture that aspect.

simpleladle

4 months ago

But the things we try to make LLMs do post-pre-training are primarily achieved via reinforcement learning. Isn't reinforcement learning machine learning? Correct me if I'm misconstruing what you're trying to say here

adastra22

4 months ago

You are still talking about training. Generative applications have always been fundamentally different from classification problems, and has now (in the form of transformers and diffusion models) taken on entirely new architectures.

If “machine learning” is taken to be so broad as to include any artificial neural network, all of which are trained with back propagation these days, then it is useless as a term.

The term “machine learning” was coined in the era of specialized classification agents that would learn how to segment inputs in some way. Thing email spam detection, or identifying cat pictures. These algorithms are still an essential part of both the pre-training and RLHF fine tuning of LLM models. But the generative architectures are new and very essential to the current interest in and hype surrounding AI at this point in time.

hnuser123456

4 months ago

It's a valid term that is worth introducing to the layperson IMO. Let them know how the magic works, and how it doesn't.

timidiceball

4 months ago

That was an impressive takeaway from the first machine learning course i took: that many things previously under the umbrella of Artificial Intelligence have since been demystified and demoted to implementations we now just take for granted. Some examples were real world map route planning for transport, locating faces in images, Bayesian spam filters.

porphyra

4 months ago

back in the day alpha-beta search was AI hehe

pixelpoet

4 months ago

As a young child in Indonesia we had an exceptionally fancy washing machine with all sorts of broken English superlatives on it, including "fuzzy logic artificial intelligence" and I used to watch it doing the turbo spin or whatever, wondering what it was thinking. My poor mom thought I was retarded.

porphyra

4 months ago

My rice cooker also has fuzzy logic. I guess they just use floats instead of bools.

brandonb

4 months ago

You can't just change the meaning of a word overnight and toss all that history away, which is why it comes across as an intentionally dishonest choice in the name of profits.

layer8

4 months ago

Maybe do some reading here: https://en.wikipedia.org/wiki/History_of_artificial_intellig...

Root_Denied

4 months ago

And you should do some reading into the edit history of that page. Wikipedia isn't immune from concerted efforts to astroturf and push marketing narratives.

4 months ago

Frankly it doesn’t matter if it’s a decade away.

AI has now been revealed to the masses. When AGI arrives most people will barely notice. It will just feel like slightly better LLMs to them. They will have already cemented notions of how it works and how it affects their lives.

rwaksmunski

4 months ago

AGI is still a decade away, and always will be.

gjm11

4 months ago

You say that as if people had been saying "10 years away" for ages, but I don't think that's true at all.

There's some information about historical predictions at https://www.openphilanthropy.org/research/what-should-we-lea... (written in 2016) from which (I am including the spreadsheet found at footnote 27) these are some I-hope-representative data points, with predictions from actual AI researchers, popularizers, pundits, and SF authors:

1960: Herbert Simon predicts machines can do all (intellectual) work humans can "within 20 years".

1961: Marvin Minsky says "within our lifetimes, machines may surpass us"; he was 33 at the time, suggesting a not-very-confident timescale of say 40 years.

1962: I J Good predicts something at or above human level circa 1978.

1963: John McCarthy allegedly hopes for "a fully-intelligent machine" within a decade.

1970: I J Good predicts 1994 +- 10 years.

1972: a survey of 67 computer scientists found 27% saying <= 20 years, 32% saying 20-50 years, and 42% saying > 50 years.

1977-8: McCarthy says things like "4 to 400 years" and "5 to 500 years".

1988: Hans Moravec predicts human-level intelligence in 40 years.

1993: Vernor Vinge predicts better-than-human intelligence in the range 2005..2030.

1999: Eliezer Yudkowsky predicts intelligence explosion circa 2020.

2001: Ben Goertzel predicts "during the next 100 years or so".

2001: Arthur C Clarke predicts human-level intelligence circa 2020.

2006: Douglas Hofstadter predicts somewhere around 2100.

2006: Ray Solomonoff predicts within 20 years.

2008: Nick Bostrom says <50% chance by 2033.

2008: Rodney Brooks says no human-level AI by 2030.

2009: Shane Legg says probably between 2018 and 2036.

2011: Rich Sutton estimates somewhere around 2030.

Of these, exactly one suggests a timescale of 10 years; the same person a little while later expresses huge uncertainty ("4 to 400 years"). The others are predicting timescales of multiple decades, also generally with low confidence.

Some of those predictions are now known to have been too early. There definitely seems to be a sort of tendency to say things like "about 30 years" for exciting technologies many of whose key details remain un-worked-out: AI, fusion power, quantum computing, etc. But it's definitely not the case that "a decade away" has been a mainstream prediction for a long time. People are in fact adjusting their expectations on the basis of the progress they observe in recent years. For most of the time since the idea of AI started being taken seriously, "10 years from now" was an exceptionally optimistic[1] prediction; hardly anyone thought it would be that soon. Now, at least if you listen to AI researchers rather than people pontificating on social media, "10 years from now" is a typical prediction; in fact my impression is that most people who spend time thinking about these things[2] expect genuinely-human-level AI systems sooner than that, though they typically have rather wide confidence intervals.

[1] "Optimistic" in the narrow sense in which expecting more progress is by definition "optimistic". There are many many ways in which human-level, or better-than-human-level, AI could in fact be a very bad thing, and some of them are worse if it happens sooner, so "optimistic" predictions aren't necessarily optimistic in the usual sense.

[2] Most, not all, of course.

password54321

4 months ago

People like Eliezer and Nick Bostrom are living proof that if you say enough and sound smart enough people will listen to you and think you have credibility.

Meanwhile you won't find anyone on here who is an author for Attention is All You Need. You know the thing that actually is the driving force behind LLMs.

gjm11

4 months ago

The context is that rwaksmunski implied that people have been saying "AGI is 10 years away" for ages, and I was pointing out that the sort of people who say "AGI is X years away" have not in fact been setting X=10 until very recently.

I wasn't claiming that the people on that list are the smartest or best-informed people thinking about artificial intelligence.

But, FWIW, from about 13:20 in https://www.youtube.com/watch?v=_sbFi5gGdRA Ashish Vaswani (lead author on that paper) being asked what will happen in 3-5 years and if I'm understanding him right he thinks AI systems might be solving some of the Millennium Prize Problems in mathematics by then; from about 17:10 he's asked about how scientists will work ~5 years in the future and he says AI systems will be apprentices or collaborators; at any rate he's not not saying that human-level AI is likely to come in the near future. From about 1:12:40 in https://www.youtube.com/watch?v=v0gjI__RyCY Noam Shazeer (second author on that paper), in response to a question about "fast takeoff", says that he does expect a very rapid improvement in AI capabilities; he's not explicit about when he expects that to happen or how far he expects it to go, but my impression from the other bits of that discussion I watched is that he too is not not saying that AI systems won't be at or beyond human level in the near future. From about 49:00 in https://www.youtube.com/watch?v=v0beJQZQIGA he's asked: if hardware progress stopped, would we still get to AGI? and he says he thinks yes, which in particular suggests that he does think AGI is in the foreseeable future though it doesn't say much about when.

That's all fairly vague, but I very much don't get the impression that either of these people thinks that AI systems are just dumb stochastic parrots or that genuinely human-level AI systems are terribly far off.

4 months ago

Both "general" and "intelligence" are _at least_ easily arguable without moving any goal posts, not that goal posts have ever been well established in the first place.

user

4 months ago

[deleted]

walkabout

4 months ago

I think a baseline requirement would be that it… thinks. That’s not a thing LLMs do.

adastra22

4 months ago

That’s an odd claim, given that we have so-called thinking models. Is there a specific way you have in mind in which LLMs are not thinking processes?

blibble

4 months ago

dlivingston

4 months ago

??? Many developers, experienced and not, play around with vibe coding. Is your critique of him that he has tried vibe coding?

theusus

4 months ago

I’m critiquing him that he lied in his claim. And anyone who claims same is just farming engagement.

user

4 months ago

[deleted]

strangescript

4 months ago

I love Karpathy, but he is wrong here. In a few short years we went from chat bots being toys and video creation predicted to be impossible in the near term to agents writing working apps and high def video that occasionally is indistinguishable from real life.

The rate depth, breadth and frequency of releases has only increased, not decreased. Meanwhile, everyone is waiting on bated breath for Gemini 3 to drop. A decade for reliable agents is not only comical, but willful cognitive dissonance at this point.