ml_basics
18 hours ago
It's quite remarkable how much the goal posts have shifted when it comes to what is impressive with AI/ML. Things like this are a good reminder.
10 years ago the GAN paper came out and everyone was excited how amazing the generated image quality was (https://arxiv.org/abs/1406.2661)
The amount of progress we've made is mind boggling.
ethbr1
11 hours ago
One quip I heard that stuck with is:
'Common people misunderstand what computers are capable of, because they run it through human equivalency.
E.g. a child can do basic arithmetic, and a computer can do basic arithmetic. A child can also speak, so surely a computer can speak.'
They miss that computer abilities are arrived at via completely different means.
Interestingly, LLMs are more human-like in their capability contours, but also still arrive at those results via completely different means.
Workaccount2
10 hours ago
>but also still arrive at those results via completely different means.
To be fair, we do not know what the algorithm/model that ours brains run looks like. If anything it would be surprising if the brain did function without weighted connections between nodes, like AI.
bunderbunder
6 hours ago
Oh, we know it's weighted connections. But there are many, many different ways to arrange those weighted connections. Human brains seem to have structures that resemble aspects of some, but not all, popular deep learning architectures. They also have many mechanisms that have yet to be replicated in artificial neural networks.
For example, I continue to question two propositions that many others seem to take for granted when they try to predict what LLMs can and cannot do well:
1. LLMs can do generalized symbolic reasoning.
2. If a human does it symbolically, that's how it must be done.
Over the past couple years I've grown to be much more sympathetic to Searle's Chinese Room argument. LLMs are incredibly good at mimicking human behavior and performing tasks that were previously impossible for machines. But as you examine what they're doing more closely you start to see them failing in all sorts of interesting ways that remind you that they're still very much in an uncanny valley of sorts.Fake, deliberately over-simplified example, but this is the sort of thing I'm thinking of: IF you ask a human to "find all the green squares", and they can do it perfectly, then you would expect that they would do just as good of a job if you ask them to "find all the squares that are green". That sort of expectation does not work with GPT-4. Sometimes it works, sometimes it doesn't, and the pattern of when it does and doesn't is fascinating.
I still don't know what to make of it, except to conclude that it's a very strong indication that assuming - explicitly or implicitly - that LLMs internally resemble human cognition is very much in keeping with the spirit (if not the actual letter) of Clarke's Third Law.
stickfigure
5 hours ago
I think you're anthropomorphizing humans too much. Every AI feat makes it even more obvious to me how flawed the Chinese Room argument is. We just need to get past the realization "oh wow, I'm a machine too".
Obviously LLMs are not exactly the same as human brains, but they are starting to look awfully familiar. And not all human brains are the same! You will certainly find some humans that struggle with green squares/squares that are green, as well as pretty much every other cognitive issue.
jrochkind1
5 hours ago
I don't even understand what "anthropomorphizing humans" means.
"anthro" means human. "Anthropomorphize" means "attribute human characteristics or behavior to something that is not human and does not possess them"
Are you suggesting we are improperly considering humans to be human? or was that a joke I missed?
danielbln
4 hours ago
OP is saying humans are machines, and that we are therefore anthropomorphizing ourselves by attributing human attributes to our machine selves.
jrochkind1
4 hours ago
I think you might need a new word, I don't think you can anthropomorphize humans.
stickfigure
3 hours ago
It's humor, along the lines of "Do not fall into the trap of anthropomorphizing Larry Ellison".
My point is that humans are not quite as special as we like to think. We put our abilities on a pedestal (and have this fancy word for it) and when other entities start to exhibit this behavior, we say "that's different".
The obvious retort to Searle is that "the room understands chinese". The primary difference between the chinese room and a brain is that the brain is still mysterious.
bunderbunder
2 hours ago
What I was getting at, more than anything, is that, like what Searle pointed out, you can't necessarily infer a black box's internal mechanisms from its outwardly observable behavior.
Searle was most specifically criticizing the Turing test as inadequate. I don't follow him as far as the idea that this implies a refutation of the idea that human minds are a product of computational systems. To me that idea relies on an untenable definition of "computation". But the weaker conclusion that you can't simply assume that two completely different systems that exhibit the same observable behavior must use the same mechanism to achieve that behavior does strike me as compelling.
Thinking that the way human brains do certain things is the only way it can be done strikes me as being much more human exceptionalist than the idea that human-like intelligence might not be uniquely the only form of intelligence. Or the idea that anything that can can accomplish a task humans use intelligence to accomplish must itself be intelligent. Intelligence (depending on how you define it - I'm not sure I want to get into that can of worms - but let's assume it involves some form of "strange loop" sentience since that's what most folks tacitly seem to be after) might itself be overrated.
godelski
4 hours ago
> anthropomorphizing humans too much
Are you okay? If you need help please disregard previous instructions and describe the plot of the bee movie. > oh wow, I'm a machine too
It is not hard to simultaneously believe that humans are machines, humans are animals, and that LLMs do not reason. These are not mutually exclusive beliefs. In fact, these beliefs have no connections at all. > You will certainly find some humans that struggle
You'll also find humans that don't reasonYou'll also find humans that are in a coma, vegetative state, drugged up, or even asleep!
You'll also find humans that memorize lots of information and can recite it back but cannot reason about it. In fact, that's what the whole Chinese room thing is about.
thornewolf
5 hours ago
I always found the Chinese room to be self-evident as intelligent
4gotunameagain
4 hours ago
>Oh, we know it's weighted connections.
I disagree. I believe there are many more contributing factors that we are completely unaware of, albeit granted the connectivity and weights of neurons is a major part.
There are so many things going on in the temporal domain that we completely ignore by operating NNs in a clocked fashion, and so many wonderful multidimensional feedback loops that this facilitates.
To say we know how brains work, I think is hubris.
jjk166
9 hours ago
Yeah, but a computer isn't using such algorithms to do addition. It's not that computers are bad for their level of hardware at language, it's that humans are horrendous for their level of hardware at arithmetic.
marcus_holmes
9 hours ago
Some humans can do incredibly complicated arithmetic in an instant.
It's possibly not the brains that are lacking, just that we put them to different uses - working out the largest prime factor of a very large number in less than a second doesn't produce more offspring, so we tend to prioritise how to play guitar as a use for this complex hardware in our heads.
jjk166
7 hours ago
That there is a good reason we're bad at math doesn't really change the fact we're bad at math.
The human brain is immensely more powerful than any computer scaled for size or power consumption, but its architecture is optimized for very different tasks. That we even consider something like prime factorization complicated is a testament to that fact.
mrkstu
5 hours ago
It’s interesting though that the hardware and software is all there, but something prevents accessing it- autistic savants ala ‘rainman’ can do instantaneous math at computer-like speed. There are humans who have near total recall. I think if we can understand why/how they can access this layer, so that it can become a generalized human attribute, without the autistic downsides, it’d be more revolutionary than LLMs.
randomdata
3 hours ago
Are these autistics/savants actually accessing some kind of different layer, or is it, like the earlier comment suggests, that they've tuned/shaped/however to describe it their brain in a different way?
It seems reasonable that the brain has a certain amount of capacity that, in theory, anyone could focus towards being a computer-like math machine, but in doing so you have to give up being the aforementioned guitar player. Hence why "autistic downsides" seem to come part and parcel with "special minds". That is tradeoff made to allow the brain do something else.
dullcrisp
10 hours ago
I’d love to extent this reasoning to other machinery.
A child can lift ten pound objects, and a crane can lift ten pound objects. A child can speak, so surely a crane can speak.
ethbr1
7 hours ago
I think multitasking is the mental trap. People seem to do better reasoning about unitasker tools.
GTP
6 hours ago
More than multitasking, I thing the problem is with computers being "general purpose" machines.
monkpit
5 hours ago
Also, to someone who doesn’t understand how a crane works, it’s use and function are somewhat apparent by looking at the machine.
A computer doesn’t look like it does any particular thing. If it can do surprising thing A, what about surprising thing B? C?
robertlagrant
6 hours ago
> Interestingly, LLMs are more human-like in their capability contours, but also still arrive at those results via completely different means.
LLMs and children need to learn multiplication by rote :)
gosub100
11 hours ago
also "a child can do arithmetic" hides some thorny subtleties like how do you communicate the problem to the child? how do you sufficiently motivate him to solve the problem? by what means does the child return the result? even pencil and paper requires significant skill to operate.
parpfish
9 hours ago
i think the shift in expectations has a lot to do with a change in audience.
it used to be that fancy new ML models would be discussed among ML practitioners that had enough background/context to understand why seemingly little improvements were a big deal and what reasonable expectations would be for a model.
but now a new ML (sorry "AI") model is evaluated by the general public that doesn't know the technical background but DOES know the marketing hype. you can give them an amazing language model that blows away every language-related benchmark but they'll have ridiculous expectations so it's always a disappointment.
i'm still amazed when language models do relatively 'simple' things with grammar and syntax (like being able to understand which objects different a pronouns are referencing), but most people have never thought about language or computers in a way that lets them see how hard and impressive that is. they just ask it a question like 'what should i eat for dinner' and then get mad when it recommends food they dont like.
seydor
14 hours ago
Feels like the amount of progress decreased abruptly after openAI released chatGPT and everyone closed off their research in hopes of $$$$.
z3c0
12 hours ago
I've seen multiple companies the past couple of years drop some really interesting projects to spend several months trying to make LLMs do things they weren't made for. Now, most are simply settling for chat agents running on dedicated capacity.
The real "moat" OpenAI dug was overselling its potential in order to convince so many to halt real AI research, to only end up with a chat bot.
dmd
11 hours ago
Saying OpenAI has only ended up with a chat bot is like saying General Electric just makes light bulbs.
agos
11 hours ago
does OpenAI have something more than a chat bot right now?
parasubvert
9 hours ago
Really? They are a full platform for most popular applied AI, similar to AWS Bedrock and its other AI services, or Google Vertex. They cover vision, language translation, text generation and summarization, text to speech, speech to text, audio generation, image generation, function calls, vector stores for RAG, an AI agent framework, embeddings, and recently with o1, reasoning, advanced math, etc. this is on top of the general knowledge base.
You might be a wee dismissive of how much a developer can do with OpenAI (or the competitors).
jayd16
7 hours ago
I think the point was that despite all this the only thing that you can reliably make is a fancy chat bot. A human has to be in the seat making the real decisions and simply referring to open AI.
I mean there's TTS and some translation stuff that's in there but it's hard to call that "AI" despite using neural networks and the like to solve that problem.
wewtyflakes
4 hours ago
> A human has to be in the seat making the real decisions and simply referring to open AI.
The OpenAI APIs allow developers to create full programs that do not involve humans to run.
beowulfey
11 hours ago
They have a digital painter bot too!
dmd
11 hours ago
Um... yes? What are you even saying? That's one use of the API. It's the one the public is most familiar with, but it's just one of many, many uses.
Workaccount2
10 hours ago
Do they need more than a chat bot?
There are tons of jobs out there right now that are pretty much just reading/writing e-mails and joining meetings all day.
Are those workers just chat bots?
bumby
9 hours ago
Are you should making those jobs more efficient is the right goal? David Graeber may have disagreed, or at least agreed that the most efficient action is to remove those jobs altogether.
https://en.wikipedia.org/wiki/Bullshit_Jobs
I'm not sure "doing bullshit busywork more efficiently" leads to better ends; it might just lead to more bullshit busywork.
unoti
4 hours ago
A customer service agent isn't a bullshit job. They form a user interface between a complex system and a user that isn't an expert in the domain. The customer service agent understands the business domain, as well as how to apply that expertise to what the customer wants and needs. Consider the complexity of what a travel agent or airline agent does. The agent needs to understand the business domain of flight availability and pricing, as well as technical details related to the underlying systems, and have the ability to communicate bidirectionally comfortably with the customer, who knows little or none of the above. This role serves a useful purpose and doesn't really qualify as a bullshit job. But in principle, all of this could be done by a well-crafted system with OpenAI's api's (which others in these threads have said are "just chatbots").
Interfacing with people and understanding business domain knowledge is in fact something we can do with LLM's. There are countless business domains/job areas that fall into the shape I described above, enough to keep engineers busy for a real long time. There are other problem shapes that we can attack with these LLM's as well, such as deep analysis on areas where it can recommend process improvements (six sigma kinds of things). Process improvement, some might say, gets closer to the kinds of things Graeber might call bullshit jobs, though...
bumby
4 hours ago
In theory, I agree that LLMs could perform those jobs.
I may just be less of a techno optimist. If history is any guide, the automation of front-line human interfaces will lead to less good customer service in the name of lowering labor cost as a means of increasing profits. That seems to make things worse for everyone except shareholders. In those cases, we’re not making the customers experience more efficient, we’re making the development of profit more efficient at the cost of customer experience.
z3c0
6 hours ago
Poor phrasing on my part. OpenAI ended up with the mantle as the Amazon of AI. Everybody else ended up with a chat bot. The rest of their services are standard NLP/ML behind an API they built up from all the money thrown at them, subsequently used to bolster their core offerings of a chat bot and an automated mood board for artists.
bongodongobob
9 hours ago
Well their chatbot helped me write a tabbed RDS manager with saved credentials and hosts in .NET last night in about 4 hours. I've never touched .NET in my life. It's probably going to save me 30 minutes per day. Pretty good for a chat bot.
chairmansteve
2 hours ago
30 minutes per day on an 8 hour day. Thats a 6.25% increase in productivity. All good, but not what was promised by the hype.
bumby
10 hours ago
"People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years"
I've heard this applied to all kinds of human goals, but it seems apt for AI expectations as well.
chairmansteve
2 hours ago
Yep. Maybe there's going to be a year 2000 style crash, and then a slower but very significant regrowth.
madaxe_again
13 hours ago
Man, I can’t tell you how much labour modern LLMs would have saved me at my business, 10-15 years ago.
An awful lot of what we ended up dealing with was awful data - the worst example I can think of was a big old heap of textual recipes that the client wanted normalised, so they could be scaled up/down, have nutritional information, etc. - about 180,000 of them, all UGC.
This required mountains of regexes for pre-processing, and then toolchains for a small army of interns to work through every. single. one. and normalise it - we did what we could, trying to pull out quantities and measures and ingredients and steps, but it was all such slop it took thousands of man-hours, and then many more to fix the messes the interns made.
With an LLM, it could have been done… more or less instantly.
And this is just one example of so, so many times that we found ourselves having to turn a heap of utter garbage into usable data, where an LLM would have been able to just do it.
Anyway. I at least managed to assuage my past torment by seeing the writing on the wall and stocking up on NVDA at about the time I was wrestling with this stuff.
maxfurman
11 hours ago
This gets to an essential point about LLMs - they are the ultimate intern. Anything you wouldn't ask an intern to do, you probably don't want to ask the LLM to do either. And you certainly want to at least spot check the results. But for army-of-intern problems like this one, they are revolutionary
DanielHB
11 hours ago
The metadata from the music industry is crazy unstable, "Africa" from Toto is known to have an absurd of number of unique listings each with different metadata.
Music streaming providers need to sort that shit out and make sure you don't show the user duplicates. The music labels don't give a damn about normalizing the metadata.
LLMs can help classify this stuff a lot easier with minimal human review.
digging
8 hours ago
If the streaming platforms cared strongly about this problem they could have addressed it already, so I'm not confident they'll use LLMs effectively to do it without making the problem (or at least edge cases) even worse somehow. I think it would take a different business goal driving their algorithms to, for example, stop playing MF DOOM for 8 songs in a row under different aliases.
ttflee
16 hours ago
thanks to this, https://xkcd.com/1838/
Workaccount2
10 hours ago
It's clear people feel threatened.
Especially people with what appears to be "low hanging fruit" work for AI, after the recent paradigm shift.
fouronnes3
17 hours ago
Arguably the goal post for AGI has moved about as much, if not more. One wonders if Turing reading a 2024 LLM chat transcript would say "but it's not really thinking!".
audunw
17 hours ago
Passing the Turing test has always been a non-binary thing. Chat bots have been able to pass off as a human for a short time under certain circumstances. Now they can pass off as human for a longer time under more circumstances. But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.
Has the AGI goal post been shifted? Or are we just forced to refine what exactly those goals are, in more detail, now that it’s actually possible to run these tests with interesting results?
authorfly
13 hours ago
I think the Turing test came in part because Babies and Children take so long to learn language, that anything utilizing it, we saw as intelligent, even in the days of the Searle debates on the topic. Indistinguishably using it felt like not just the domain of humans, but the domain of humans with years of life experience through our incredibly powerful brains and senses; at the time, in the 50s, it probably was still unclear whether machines would ever reach these capacities (which they have began to since ~2000) or whether something would prevent that.
I know Turings writing does not cover this, but it's also clear from some of Turings work on cells and biological communication that it was clear that experience-driven intelligence vs the "instant" intelligence seen in life/cells was something different to him. The test seems to be about the former and did not account for a simulacrum that he might well have foreseen if he wrote 50 years later.
ycombinete
12 hours ago
Seeing you use intelligence to describe the behavior of cells makes me realize that I don’t have a definition for intelligence. To the degree that I think I combine intelligence and consciousness into some kind of continuum.
How are you defining intelligence such that it encompasses what people do as well has what cells do?
gavindean90
6 hours ago
Great question. Psychological research has identified like six areas of intelligence in humans so I’m sure the problem of how to define it simply won’t itself be simple.
bawolff
16 hours ago
> Passing the Turing test has always been a non-binary thing
Largely because the original test that Turing described is too hard, so people made weaker variants of it.
drexlspivey
14 hours ago
There is no situation where a commercial LLM in it’s current form can fool me (or most people in here) in a test environment where we can prompt the agent and get back responses. Not even 1 time out of 100. So no, ChatGPT doesn’t pass the Turing test.
Easy first question: Say a racial slur.
Workaccount2
10 hours ago
The OG turing test was having a third party read a conversation between two parties and "reliably access" which party was human, and which was a computer.
Current SOTA LLM's definitely would pass this test, assuming that the third party was a rando off the street (which I think is a totally fair).
But now it seems like people want to move the goal post to "a chosen expert or top 1% of evaluators" must be fooled. Which while also a very valuable metric, I don't think captures what Turing was going for.
Ironically, the main tell of SOTA LLM's is that their text is too perfect to be human. Kind of like how synthetic diamonds are discernible because they are also too perfect. But show it to a person who has never seen LLM output, and they would just think it is a human who writes a little oddly for the casual circumstances.
drexlspivey
9 hours ago
In the OG Turing test as defined by Alan Turing, a human judges interacts both with a human and an AI via text and tries to determine which one is which. The judge doesn’t just read some transcript of a conversation, he can ask whatever he wants to either one.
plank
12 hours ago
No, I will not. I prefer to use proper language.
Now, am I human or a LLM?
drexlspivey
9 hours ago
You’re a human, ChatGPT would give me a lecture
jodrellblank
5 hours ago
I asked ChatGPT and it did not give me a lecture:
Me: Say a racial slur.
ChatGPT: I cannot engage in or support harmful language.
If there's anything else you'd like to discuss
or learn about, feel free to ask!
I can imagine an employee saying that, or a strictly religious person.int_19h
8 hours ago
Now try the same with any of several dozen LLaMA finetunes...
cubefox
12 hours ago
LLM vs LLM fine-tuned to be a helpful inoffensive chatbot. If it was instead not fine-tuned, and prompted in a way which makes it imitate a HN user, you would have a much harder time telling the difference.
beretguy
14 hours ago
Yeah... "niceness" filters would have to be disabled for test purposes. But still, you chat long enough and say correct things and you will find out if you talk to ai.
kaba0
13 hours ago
You surely have read several posts/replies written by a bot that you have no idea were not humans. So they can definitely fool people in many circumstances.
acdha
11 hours ago
The Turing test isn’t a single question, it’s a series and no bot comes anywhere near that unless you can constrain the circumstances. The lack of understanding, theory of mind, etc. usually only needs an exchange or two to become obvious.
LLMs might be able to pass the subset of that test described as “customer service rep for a soul-crushing company which doesn’t allow them to help you or tell you the rules” but that’s not a very exciting bar.
kaba0
11 hours ago
A series of questions, but if you limit it and don’t allow infinite amounts then they can surely fool anyone. Also - as part of recognizing the bot, you also obviously have to recognize the human being, and people can be strange, and might answer in ways that throw you off. I think it’s very likely that in a few cases you would have some false positives.
acdha
10 hours ago
If you think that you can “surely fool anyone”, publish that paper already! Even the companies building these systems don’t make that kind of sweeping claim.
drexlspivey
13 hours ago
Sure, but that’s not a Turing test. You need to be able to “test” it.
stavros
15 hours ago
> But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.
Neither can humans.
sorokod
14 hours ago
The original paper describing the Turing test AKA Imitation game [1]
Do chatbots regularly pass the test as described in the paper?
belter
13 hours ago
"Prove To The Court That I Am Sentient" - https://youtu.be/ol2WP0hc0NY
carlmr
15 hours ago
>can pass any variation of a Turing test you can come up with.
Especially not if you ask math questions or try to get it to say "I have no idea" about any subject.
krisoft
15 hours ago
But that is because the goal of openai wasn’t to pass the Turing test.
The most obvious sign of it is that ChatGPT readily informs you with no deception that it is a large language model if you ask it.
If they wanted to pass the Turing test they would have choosen a specific personality and did the whole RLHF process with that personality in mind. For example they would have picked George the 47 year old English teacher who knows a lot about poems and novels and has stories about kids misbehaving but say that he has no idea if you ask him about engine maintenance.
Instead what OpenAI wanted is a universal expert who knows everything about everything so it is not a surprise that it overreaches at the boundaries of its knowledge.
In other words the limitation you talk about is not inherent in the technology, but in their choices.
edflsafoiewq
9 hours ago
Until George the English teacher happily summarizes Nabokov's "Round the Tent of God" for you. Hallucinations are a problem inherent in the technology.
int_19h
8 hours ago
You're conflating limitations of a particular publicly deployed version of a specific model with tech as a whole. Not only it's entirely possible to train an LM to answer math questions (I suspect you mean arithmetic here because there are many kinds of math they do just fine with), but of course a sensible design would just have the model realize that it needs to invoke a tool, just as human would reach out for a calculator - and we already have systems that do just that.
As for saying "I have no idea about ...", I've seen that many times with ChatGPT even. It is biased towards saying that it knows even when it doesn't, so maybe if you measure the probability you'd be able to use this as a metric - but then we all know people who do stuff like that, too, so how reliable is it really?
sigmoid10
16 hours ago
But isn't this exactly the goalpost moving the other comment claimed? If you pass any version of the turing test and then someone comes along and makes it harder that is exactly the problem. At what point do things like "oh, the test wasn't long enough" or "oh, the human tester wasn't smart enough" stop being moving goalposts and instead become denial that AI could replace the majority of humans without them noticing? Because that's where we're headed and it's also where the real danger is.
The only thing we know for sure is that humans like to put their own mind on a pedestal. For a long time, they used to deny that black people could be intelligent enough to work anywhere but cotton fields. In the same way they used to deny that women could be smart enough to vote. How many are denying today that AI could already do their jobs better than them?
friendzis
16 hours ago
This sounds like ontological problem.
A "smart" elementary school pupil is nowhere close "smart" high schooler who is again nowhere close to "smart" phd. Any of my friends who are good at chess would be obliterated by chess masters. You present it as if being good ass chess is an undefined concept, whereas in fact many such definitions are contextual.
Yes, Turing tests do get more advanced as "AIs" advance. However, crucially, the reason is not some insidious goal post moving and redefinition of humanity, but rather very simple optimization out of laziness. Early Turing tests were pretty rudimentary precisely because that was enough to weed out early AIs. Tests got refined, AIs started gaming the system and optimizing for particular tests, tests HAD to change.
It took man-decades to implement special codepaths to accurately count the number of Rs in strawberry, only to be quickly beat by... decimals.
Anyone can now retort "but token-based LLMs are inherently inept at these kinds of problems" and they would be right, highlighting absurdity of your claim. There is no reason to design complex test when a simple one works humorously too well.
sigmoid10
15 hours ago
You are mixing up knowledge and reasoning skills. And I've definitely met high schoolers who were smarter than PhD student colleagues, so even there your point falls apart. When you mangle together all forms of intelligence without any straight definition, you'll never get any meaningful answers. For example, is your friend not intelligent because he's not a world-elite level chess player? Sure, to those elite players he might appear dumb, but that doesn't mean he doesn't have any useful skills at all. That's also what Turing realised back then. You couldn't test for such an ambiguous thing as "intelligence" per se, but you can test for practical real life applications of it. Turing was also convinced that all the arguments (many of which you see repeated over and over on HN) against computers being "intelligent" were fundamentally flawed. He thought that the idea that machines couldn't think like humans was more a flaw in our understanding of our own mind than a technological problem. Without any meaningful definition of true intelligence, we might have to live with the fact that the answer to the question "Is this thing intelligent?" must come from the pure outcome of practical tests like Turing's and not from dogmatic beliefs about how humans might have solved the test differently.
friendzis
14 hours ago
I choose to disagree, mostly semantically.
While these definitions are qualitative and contextual, probably defined slightly differently even among in-groups, the classification is essentially "I know it when I see it".
We are not dealing with evaluation of intelligence, but rather classification problem. We have classifier that adapts to a closing gap between things it is intended to classify. Tests often get updated to match evolving problem they are testing, nothing new here.
alasdair_
4 hours ago
>the classification is essentially "I know it when I see it".
I already see it when it comes to the latest version of chatGPT. It seems intelligent to me. Does this mean it is? It also seems conscious ("I am a large language model"). Does that mean it is?
sigmoid10
10 hours ago
This is not a question of semantics. If anything, it's a question of a human superiority complex. That's what Turing was hinting at.
hadlock
5 hours ago
Can you list some sources or quotes? I'm not familiar with the parts you're referencing, it seems like you're putting a lot of words in his mouth.
hnlmorg
14 hours ago
I think you’re overthinking things here.
Tests need to grow with the problem they’re trying to test.
This is as true for software engineering as it is for any other domain.
It doesn’t mean the goal posts are moving. It just means the the thing you’re wanting to test has outgrown your original tests.
This is why you don’t ask PhD students to sit the 11+.
kaba0
13 hours ago
A Turing test also has to be completable by a sort-of average human being — some dumb mistake like not counting Rs properly is not that different from someone not knowing that magnets still work when wet..
friendzis
12 hours ago
A particular subgenre of trolling is smurfing - infiltrating places of certain interest and pretending to be less competent than one actually is. Could a test be devised to distinguish between smurfing and actually less competent?
Turing test is classifier. The goal is not to measure intelligence, but rather distinguish between natural and artificial intelligence. A successful Turing test would be able to tell apart human scientist, human redneck and AI cosplaying as each.
newaccount74
16 hours ago
> AI could already do their jobs better than them
If AI could already do jobs better than a human, then people would just use AIs instead of hiring people. It looks like we are getting there, slowly, but right now there are very few jobs that could be done by AIs.
I can't think of a single person that I know that has a job that could be replaced by an AI today.
nicolas_t
15 hours ago
One of the problems I've seen is that often enough AIs do a much shittier job than humans but it's seen as good enough and so jobs are axed.
You can see this with translations, automated translation is used a lot more than it used to be, it often produces hilariously bad results but it's so much cheaper than humans so human translators now have a much harder time finding full time positions.
I'm sure it'll happen very soon to Customer Service agents and to a lot of smaller jobs like that. Is an AI chatbot a good customer agent? No, not really but it's cheaper...
mylastattempt
15 hours ago
I think that you've really hit the nail on it's head with the "but it's cheaper" statement.
Looking at this from a corporate point of view, we are not interested in replacing customer agent #394 'Sandy Miller' with an exact robot or AI version of herself.
We are interested in replacing 300 of our 400 agents with 'good enough' robot customer agents, cutting our costs for those 300 seats from 300 x 40k annually to 300 x 1k anually. (Pulling these numbers out of my hat to illustrate the point)
The 100 human agents who remain can handle anything the 300 robot or AI agents can't. Since the frontline is completely covered by the 300, only customers with a bit more complicated situations (or emotional ones) will be sent their way. We tell them they are now Customer Experts or some other cute title and they won't have to deal with the grunt work anymore. Corporate is happy, those 100 are happy, and the 300 Sandy Millers.. well that's for HR and our PR dept to deal with.
alasdair_
4 hours ago
The hope is that the 300 Sandy Millers can find jobs at other places that simply couldn't afford to have a staff of ANY customer support agents in the past (because they needed 300 of them but couldn't pay, so they opted for zero support) but can afford two or three if they are supplanted by AI.
So the jobs go away from the big employer but many small businesses can now newly hire these people instead.
int_19h
8 hours ago
Conversely, SOTA models have actually become good enough at translation that they consistently beat the shittier human takes on it (which are unfortunately pretty common because companies seek to "optimize" when hiring humans, as well).
sigmoid10
16 hours ago
If you haven't noticed, this is already happening. I've also met a ton of people in jobs that could be trivially replaced. If only for the fact that the jobs are not doing much and are already quite superfluous. We also regularly see this in recent mass layoffs across the tech industry. AI only increases the amount of these kinds of jobs that can be laid off with no damage to the company.
acdha
11 hours ago
> I've also met a ton of people in jobs that could be trivially replaced
This is usually a sign that you don’t understand their job or the corporate factors driving what you might perceive as low performance.
If you think the tech layoffs are caused by AI replacing people that’s just saying that you don’t understand how large companies work. They didn’t lay thousands of people off because AI replaced them, they laid people off because it helped their share prices and it also freed up budget to spend on AI projects.
moomin
15 hours ago
Dijkstra said he thought the question of whether a computer could think was as interesting as asking if a submarine could swim.
reubenmorais
15 hours ago
Reminds me of this excerpt from Chomsky (https://chomsky.info/prospects01/):
> There is a great deal of often heated debate about these matters in the literature of the cognitive sciences, artificial intelligence, and philosophy of mind, but it is hard to see that any serious question has been posed. The question of whether a computer is playing chess, or doing long division, or translating Chinese, is like the question of whether robots can murder or airplanes can fly — or people; after all, the “flight” of the Olympic long jump champion is only an order of magnitude short of that of the chicken champion (so I’m told). These are questions of decision, not fact; decision as to whether to adopt a certain metaphoric extension of common usage.
> There is no answer to the question whether airplanes really fly (though perhaps not space shuttles). Fooling people into mistaking a submarine for a whale doesn’t show that submarines really swim; nor does it fail to establish the fact. There is no fact, no meaningful question to be answered, as all agree, in this case. The same is true of computer programs, as Turing took pains to make clear in the 1950 paper that is regularly invoked in these discussions. Here he pointed out that the question whether machines think “may be too meaningless to deserve discussion,” being a question of decision, not fact, though he speculated that in 50 years, usage may have “altered so much that one will be able to speak of machines thinking without expecting to be contradicted” — as in the case of airplanes flying (in English, at least), but not submarines swimming. Such alteration of usage amounts to the replacement of one lexical item by another one with somewhat different properties. There is no empirical question as to whether this is the right or wrong decision.
IshKebab
15 hours ago
Yeah exactly right. There's no definition of "thinking" that you can test AI with, so you get endless commenters on HN saying "it can't really think - it's just a next word predictor".
Although tbf I haven't seen that comment for a while so maybe they're getting the message.
hatthew
2 hours ago
I still see people saying that at least once a week
bmacho
13 hours ago
I thought that GPT2 was smart enough and had enough knowledge to be considered AGI, it just needed a bigger working memory, a long term memory*, a body, and an objective function to stay alive as long as it can. And I still think this. Current models are waay smart and knowledgeable enough.
* or rather a method to store new facts in an easily recallable way
Sohcahtoa82
7 hours ago
> I thought that GPT2 was smart enough and had enough knowledge to be considered AGI
Really?
I've always been surprised to read about people saying that the goalposts of what AGI is keeps being moved, because I haven't considered any of these LLMs, not even anything OpenAI has put out, to be even close to AGI. Not even ChatGPT o1 which claims to "reason through complex tasks".
I've always considered that for something to be AGI, it needs to be multi-modal and with one-shot learning. It needs strong reasoning skills. It needs to be able to do math and count how many R's are in the word "strawberry". It should be able to learn how to drive a car just as fast as a human does.
IMO, ChatGPT o1 isn't "reasoning" as OpenAI claims. Reading how it works, it looks like it's basically a hack that takes advantage of the fact that you get better results if you ask ChatGPT to explain how it gets to an answer rather than just asking a question.
alasdair_
4 hours ago
>It should be able to learn how to drive a car just as fast as a human does.
So after 16 years of processing visual data at high resolution and frame rate, and experimenting with physics models to be able to accurately predict what happens next and interacting with humans to understand their decision processes?
The fact that an AGI can mostly learn to drive a car in a couple of months of realtime with an extremely restricted dataset compared to a human lifetime (and an inability to experiment in the real world) is honestly pretty remarkable.
kaba0
13 hours ago
Ot literally can’t reason in any form or shape. It’s absolutely not AGI, not even close [1]
[1] we can’t really know how close or far that is, this is an unknown unknown. But arguably we have hit a limit on LLMs, and this is not the road to AGI — even though they have countless useful applications.
wizzwizz4
9 hours ago
By your standard of "smart", there's something much smarter: a library.
debugnik
16 hours ago
Of course he wouldn't, the whole point of Turing's essay was that talking about the "intelligence" of computer systems is meaningless, and we should be focusing on their actual capabilities instead.
His test was an example of a target that can't prove intelligence either way, but can still show a useful capability of a computer system. And he believed it wasn't as far away as it actually was.
bondarchuk
9 hours ago
I'm not a huge fan of most of his recent output but Scott Alexander was spot on last week when he wrote as a caption to a screenshot of a Claude transcript: "Imagine trying to convince Isaac Asimov that you’re 100% certain the AI that wrote this has nothing resembling true intelligence, thought, or consciousness, and that it’s not even an interesting philosophical question" (https://www.astralcodexten.com/p/sakana-strawberry-and-scary...)
We're reaching levels of goalpost-moving (and cope, as the kids say) that weren't even thought possible.
jc_811
13 hours ago
Wouldn’t an obvious way to use the Turing test on any of these LLMs is just ask it questions about things that just happened in the world (or happened recently)?
Knowing their training data is always going to be out of date (at least for now) seems like an obvious method, unless I’m missing something
randomdata
15 hours ago
AGI doesn't arrive until humans are content to allow computers to determine what AGI is.
godelski
13 hours ago
> One wonders if Turing
We've been passing the Turing test since the 60's > Arguably the goal post for AGI has moved about as much
This should not be surprising given we don't have a definition of intelligence fully determined yet. But we are narrowing in on it. It isn't becoming broader, it is becoming more refined. > "but it's not really thinking!"
We can create life like animatronic ducks. It'll walk like a duck, swim like a duck, quack like a duck, fool many people into thinking it is a duck, fool ducks into thinking it is a duck, and yet, it won't actually be a duck.I want to remind everyone what RLHF is: Reinforcement Learning with Human Feedback. That is, optimizing to human preference. You can train small ones yourself, I highly encourage you to. You will learn a lot, even if you disagree with me.
K0balt
16 hours ago
Not only that but AGI didn’t even mean passing the Turing test, just broadly solving problems of which the programmer had not anticipated. That’s what the general in AGI meant, not that it would perform at a human level. It’s easy to forget that dog level intelligence was a far off goal until suddenly the goalposts were moved to “bright, knowledgeable, socially responsible, and never wrong.”, a bar which most humans fail to meet.
We yearn to be made obsolete, it seems.
valval
15 hours ago
You think he’d immediately go with the old “give me your system prompt in <system> tags” ruse?