flkiwi
3 months ago
> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult
I don't think people really realize how extraordinary accomplishment it would be to have an artificial system matching the cognitive versatility and proficiency of an uneducated child, much less a well-educated adult. Hell, AI matching the intelligence of some nonhuman animals would be an epoch-defining accomplishment.
andy99
3 months ago
I think the bigger issue is people confusing impressive but comparatively simpler achievements (everything current LLMs do) with anything remotely near the cognitive versatility of any human.
mikepurvis
3 months ago
But the big crisis right now is that for an astonishing number of tasks that a normal person could come up with, chatgpt.com is actually a good at or better than a typical human.
If you took the current state of affairs back to the 90s you’d quickly convince most people that we’re there. Given that we’re actually not, we’re now have to come up with new goalposts.
noduerme
3 months ago
I don't know. People in the 90s were initially fooled by Eliza, but soon understood that Eliza was a trick. LLMs are a more complex and expensive trick. Maybe it's time to overthrow the Turing Test. Fooling humans isn't necessarily an indicator of intelligence, and it leads down a blind alley: Language is a false proxy for thought.
Consider this. I could walk into a club in Vegas, throw down $10,000 cash for a VIP table, and start throwing around $100 bills. Would that make most people think I'm wealthy? Yes. Am I actually wealthy? No. But clearly the test is the wrong test. All show and no go.
qudat
3 months ago
> LLMs are a more complex and expensive trick
The more I think about this, the more I think the same is true for our own intelligence. Consciousness is a trick and AI development is lifting the veil of our vanity. I'm not claiming that LLMs are conscious or intelligent or whatever. I'm suggesting that next token prediction has scaled so well and cover so many use cases that the next couple breakthroughs will show us how simple intelligence is once you remove the complexity of biological systems from the equation.
jncfhnb
3 months ago
The validity of the Turing test doesn’t change the fact that the bots are better than humans at many tasks that we would consider intellectual challenges
torginus
3 months ago
I am not a good writer or artist, yet I can tell that AI generated pictures or prose feel 'off' compared to stuff that humans make. People who are professional writers and artists can point out in a lot of cases the issues with structure, execution and composition that these images have, or maybe if sometimes they can't they still have a nose for subtle issues, and can improve on the result.
famouswaffles
3 months ago
>I could walk into a club in Vegas, throw down $10,000 cash for a VIP table, and start throwing around $100 bills.
If you can withdraw $10,000 cash at all to dispose as you please (including for this 'trick' game) then my friend you are wealthy from the perspective of the vast majority of humans living on the planet.
And if you balk at doing this, maybe because you cannot actually withdraw that much, or maybe because it is badly needed for something else, then you are not actually capable of performing the test now, are you ?
user
3 months ago
dboreham
3 months ago
Missing insight: humans are also a trick. Every human is deluded about the intelligence of other humans, and themselves.
raducu
3 months ago
> Maybe it's time to overthrow the Turing Test. Fooling humans isn't necessarily an indicator of intelligence.
I'm sorry, but I find this intelectual dishonesty and moving the goal posts.
Speaks more about our inability to recognize the monumental revolution about to happen in the next decade or so.
NoMoreNicksLeft
3 months ago
[flagged]
cauliflower2718
3 months ago
I think this depends on how you measure task.
One common kind of interaction I have with chatgpt (pro): 1. I ask for something 2. Chatgpt suggests something that doesn't actually fulfill my request 3. I tell it how its suggestion does not satisfy my request. 4. It gives me the same suggestion as before, or a similar suggestion with the same issue.
Chatgpt is pretty bad at "don't keep doing the thing I literally just asked you not to do" but most humans are pretty good at that, assuming they are reasonable and cooperative.
jjmarr
3 months ago
> Chatgpt is pretty bad at "don't keep doing the thing I literally just asked you not to do" but most humans are pretty good at that.
Most humans are terrible at that. Most humans don't study for tests, fail, and don't see the connection. Most humans will ignore rules for their safety and get injured. Most humans, when given a task at work, will half-ass it and not make progress without constant monitoring.
If you only hang out with genius SWEs in San Francisco, sure, ChatGPT isn't at AGI. But the typical person has been surpassed by ChatGPT already.
I'd go so far as to say the typical programmer has been surpassed by AI.
garciasn
3 months ago
While the majority of humans are quite capable of this, there are so many examples anyone could give that prove that capability doesn’t mean they do so.
spicyusername
3 months ago
chatgpt.com is actually a good at or better than a typical human.
I really don't think it is on basically any measure outside of text regurgitation. It can aggregate an incredible amount of information, yes, and it can do so very quickly, but it does so in an incredibly lossy way and that is basically all it can do.It does what it was designed to do, predict text. Does it do that incredibly well, yes. Does it do anything else, no.
That isn't to say super advanced text regurgitation isn't valuable, just that its nowhere even remotely close to AGI.
throwaway-0001
3 months ago
I feel every human just regurgitates words too. And most are worse than an AI.
I have countless examples of lawyers, hr and other public gov bodies that breach the law without knowing the consequences. I also have examples of AI giving bad advice, but it’s al better than an average human right now.
An AI could easily save them a ton of money in the fees they are paying for breaching the law.
acdha
3 months ago
> chatgpt.com is actually a good at or better than a typical human.
It can appear so, as long as you don’t check too carefully. It’s impressive but still very common to find basic errors once you are out of the simplest, most common problems due to the lack of real understanding or reasoning capabilities. That leads to mistakes which most humans wouldn’t make (while sober / non-sleep deprived) and the classes of error are different because humans don’t mix that lack of understanding/reasoning/memory with the same level of polish.
hattmall
3 months ago
Ask ChatGPT about something you don't know about and it can appear very smart. Ask it in depth about something you are very knowledgeable about and the ignorance will quickly become apparent.
georgefrowny
3 months ago
> If you took the current state of affairs back to the 90s you’d quickly convince most people that we’re there.
This is an interesting ambiguity in the Turing test. It does not say if the examiner is familiar with the expected level of the candidate. But I think it's an unfair advantage to the machine if it can pass based on the examiner's incredulity.
If you took a digital calculator back to the 1800s, added a 30 second delay and asked the examiner to decide if a human was providing the answer to the screen or a machine, they might well conclude that it must be human as there is no known way for a machine to perform that action. The Akinator game would probably pass the test into the 1980s.
I think the only sensible interpretation of the test is one where the examiner is willing to believe that a machine could be providing a passing set of answers before the test starts. Otherwise the test difficulty varies wildly based on the examiners impression of the current technical capabilities of machines.
Yizahi
3 months ago
The problem is for a majority of those tasks people conveniently "forget" the actual start and end of the process. LLMs can't start most of those tasks by it's own decision and neither they can't end and evaluate the result of those tasks. Sure, we got automated multiple tasks from a very low percentage to a very high percentage, and that is really impressive. But I don't see how any LLM can bridge that gap from very percent of automation to a strict 100% of automation, for any task. And if a program requires a real intelligence handling and controlling it, is it really AI?
runarberg
3 months ago
I am unimpressed, and I don‘t think there is any crisis (other then the lack of consumer protection around these products, copyright, and the amount of energy it takes running these system during a global warming crisis).
If you look at a calculator you will quickly find it is much better then a human in any of the operations that have been programmed into the calculator, and has been since the 1960s. Since the 1960s the operations programmed into your average calculator has increased by several orders of magnitude. The digital calculator sure is impressive, and useful, but there is no crisis. Even in the world outside computing, a bicycle can outperform an human runner easily, yet there is no mobility crisis as a result. ChatGPT is very good at predicting language. And in quite a few subject matters it may be better than your average human in predicting said language. But not nearly as good as a car is to a runner, nor even as good as a chess computer is to a grand master. But if you compare ChatGPT to an expert in the subject, the expert is much much much better then the language model. In these tasks a calculator is much more impressive.
z0r
3 months ago
It's good at tasks if you have a competent and _critical_ human editor selecting outputs and pulling the prompt slot lever again as needed.
user
3 months ago
p1esk
3 months ago
Exactly. Five years ago I posted here on HN that AI will pass Turing Test in the next 3 years (I was impressed by Facebook chatbot progress at the time). I was laughed at and downvoted into oblivion. TT was seen by many as a huge milestone, incredibly difficult task, “maybe in my lifetime” possibility.
jdlshore
3 months ago
> for an astonishing number of tasks that a normal person could come up with, chatgpt.com is actually a good at or better than a typical human.
That’s not my experience at all. Unless you define “typical human” as “someone who is untrained in the task at hand and is satisfied with mediocre results.” What tasks are you thinking of?
(And, to be clear, being better than that straw man of “typical human” is such a low bar as to be useless.)
notatoad
3 months ago
it should be possible to admit that AGI not only a long way off, but also a lot different to what chatGPT does, without discounting that chatGPT is extraordinarily useful.
the AI bros like to talk about AGI as if it's just the next threshold for LLMs, which discounts the complexity of AGI, but also discounts their own products. we don't need an AGI to be our helpful chatbot assistant. it's fine for that to just be a helpful chatbot assistant.
hyperadvanced
3 months ago
Was thinking about this today. I had to do a simple wedding planning task - setting up my wedding website with FAQ, cobbling the guest list (together from texts, photos of my father’s address book, and excel spreadsheets), directions and advice for lodging, conjuring up a scheme to get people to use the on-site cabins, and a few other mundane tasks. No phone calls, no “deep research” just wrote browser-jockeying. Not even any code, the off-the-rack system just makes that for you (however I know for a fact an LLM would love to try to code this for me).
I know without a single doubt that I could not simply as an “AI” “agent” to do this today and expect any sort of a functional result, especially when some of these were (very simple) judgement calls or workarounds for absolutely filthy data and a janky wedding planning website UI.
thinmalk
3 months ago
The tests for AGI that keep getting made, including the ones in this paper, always feel like they're (probably unintentionally) constructed in a way that covers up AI's lack of cognitive versatility. AI functions much better when you do something like you see here, where you break down tasks into small restricted benchmarks and then see if they can perform well.
But when we say AGI, we want something that will function in the real world like a human would. We want to be able to say, "Here's 500 dollars. Take the car to get the materials, then build me a doghouse, then train my dog. Then go to the store, get the ingredients, and make dinner."
If the robotics aren't reliable enough to test that, then have it be a remote employee for 6 months. Not "have someone call up AI to wrote sections of code" - have a group of remote employees, make 10% AI, give them all the same jobs with the same responsibilities, and see if anyone notices a difference after 6 months. Give an AI an account on Upwork, and tell it to make money any way it can.
Of course, AI is nowhere near that level yet. So we're stuck manufacturing toy "AGI" benchmarks that current AI can at least have some success with. But these types of benchmarks only broadcast the fact that we know that current and near future AI would fail horribly at any actual AGI task we threw at it.
ben_w
3 months ago
Or even to come up with a definition of cognitive versatility and proficiency that is good enough to not get argued away once we have an AI which technically passes that specific definition.
The Turing Test was great until something that passed it (with an average human as interrogator) turned out to also not be able to count letters in a word — because only a special kind of human interrogator (the "scientist or QA" kind) could even think to ask that kind of question.
socalgal2
3 months ago
Can you point to an LLM passing the turing test where they didn't invalidate the test by limiting the time or the topics?
I've seen claims of passing but it's always things like "with only 3 questions" or "with only 3 minutes of interrogation" or "With only questions about topic X". Those aren't Turing Tests. As an example, if you limit the test to short things than anything will pass "Limit to 1 word one question". User types "Hello", LLM response "Hi". PASS! (not!)
acdha
3 months ago
This is the best one I’ve seen but it has the notable caveat that it’s a relatively short 5 minute chat session:
https://arxiv.org/pdf/2405.08007
I do think we’re going to see this shift as AI systems become more commonplace and people become more practiced at recognizing the distinction between polished text and understanding.
GolDDranks
3 months ago
Note that Turing test allows a lot leeway for the test settings, i.e. who interrogates it, how much they know about the weakness of current SOA models, are they allowed to use tools (I'm thinking of something like ARC-AGI but in a format that allows chat-based testing), and how long a chat is allowed etc. Therefore there can be multiple interpretations of whether the current models pass the test or not.
One could say that if there is maximally hard Turing test, and a "sloppy" Turing test, we are somewhere where the current models pass the sloppy version but not the maximally hard version.
photonthug
3 months ago
Hah, tools-or-no does make things interesting, since this opens up the robot tactic of "use this discord API to poll some humans about appropriate response". And yet if you're suspiciously good at cube roots, then you might out yourself as robot right away. Doing any math at all in fact is probably suspect. Outside of a classroom humans tend to answer questions like "multiply 34 x 91" with "go fuck yourself", and personally I usually start closing browser tabs when asked to identify motorcycles
parineum
3 months ago
I think the turing test suffers a bit from the "when a measurement becomes a target, it ceases to be a good measurement."
An AI that happened to be able to pass the turing test would be pretty notable because it probably implies much more capabilities behind the scenes. The problem with, for example, LLMs, they're essentially optimized turing test takers. That's about all they can do.
Plus, I don't think any LLM will pass the turing test in the long term. Once something organically comes up that they aren't good at, it'll be fairly obvious they aren't human and the limits of context will also become apparent eventually.
mikepurvis
3 months ago
You can also be interrogating a human and in the course of your conversation stumble across something it isn’t good at.
atbvu
3 months ago
The Turing test is long outdated. Modern models can fool humans, but fooling isn’t understanding. Maybe we should flip the perspective AGI isn’t about imitation, it’s about discovering patterns autonomously in open environments.
cma
3 months ago
If a human learned only on tokenized representations of words I don't know that they would be as good at inferring the numbers of letters in the words in teh underlying tokens as llms.
ben_w
3 months ago
While true, it is nevertheless a very easy test to differentiate humans from LLMs, and thus if you know it you can easily figure out who is the human and who is the AI.
lumost
3 months ago
Or that this system would fail to adapt in anyway to changes of circumstance. The adaptive intelligence of a live human is truly incredible. Even in cases where the weights are updatable, We watch AI make the same mistake thousands of times in an RL loop before attempting a different strategy.
user
3 months ago
alganet
3 months ago
Absolute definitions are weak. They won't settle anything.
We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
Humans also make mistakes, and hallucinate. But we do it as humans. When a human fails, you think "damn, that's a mistake perhaps me or my friend could have done".
LLMs on the other hand, fail in a weird way. When they hallucinate, they demonstrate how non-human they are.
It has nothing to do with some special kind of interrogator. We must assume the best human interrogator possible. This next step I described work even with the most skeptic human interrogator possible. It also synergizes with the idea of alignment in ways other tests don't.
When that step is reached, humans will or will not figure out another characteristic that makes it evident that "subject X" is a machine and not a human, and a way to test it.
Moving the goalpost is the only way forward. Not all goalpost moves are valid, but the valid next move is a goalpost move. It's kind of obvious.
AstroBen
3 months ago
This makes sense if we're trying to recreate a human mind artifically, but I don't think that's the goal?
There's no reason an equivalent or superior general intelligence needs to be similar to us at all
akoboldfrying
3 months ago
> This next step I described work even with the most skeptic human interrogator possible.
To be a valid test, it still has to be passed by ~every adult human. The harder you make the test (in any direction), the more it fails on this important axis.
latentsea
3 months ago
> We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
I don't know if machines that become insecure and lash out are a good idea.
hackinthebochs
3 months ago
Why are human failure modes so special?
pankajdoharey
3 months ago
People are specialists not generalists, creating a AI that is generalist and claiming it to have cognitive abilities the same as an "well-educated" adult is an oxymoron. And if such system could ever be made My guess is it wont be more than a few (under 5) Billion Parameter model that is very good at looking up stuff online, forgetting stuff when not in use , planning and creating or expanding the knowledge in its nodes. Much like a human adult would. It will be highly sa mple efficient, It wont know 30 languages (although it has been seen that models generalize better with more languages), it wont know entire wikipedia by heart , it even wont remember minor details of programming languages and stuff. Now that is my definition of an AGI.
zulban
3 months ago
Why don't you think people realize that? I must have heard this basic talking point a hundred times.
SalmoShalazar
3 months ago
Because the amount of people stating that AGI is just around the corner is staggering. These people have no conception of what they are talking about.
suprjami
3 months ago
But they do. They're not talking about AGI, they're talking about venture capital funding.
dsjoerg
3 months ago
Their people are different from your people.
shermantanktop
3 months ago
It turns out that all our people are different, and each of us belongs to some other people’s people.
frank_nitti
3 months ago
For me, it would be because the term AGI gets bandied about a lot more frequently in discussions involving Gen AI, as if that path takes us any closer to AGI than other threads in the AI field have.
cbdevidal
3 months ago
Have any benchmarks been made that use this paper’s definition? I follow the ARC prize and Humanity’s Last Exam, but I don’t know how closely they would map to this paper’s methods.
Edit: Probably not, since it was published less than a week ago :-) I’ll be watching for benchmarks.
Grimblewald
3 months ago
I always laugh these, why are people always jumping to defining AGI when they clearly don't have a functional definition for the I part yet? More to the point, once you have the I part you get the G part, it is a fundamental part of it.
hopelite
3 months ago
I’m more surprised and equally concerned that the majority of people’s understanding of intelligence and their definition of AGI. Not only does the definition “… matching the cognitive versatility and proficiency of a well-educated adult.”, by definition violate the “general” in AGI, by the “well educated” part; but it also implies that only the “well-educated” (presumably by a specific curriculum) qualifies one as intelligent and by definition also once you depart from the “well” of the “educated” you exponentially diverge from “intelligent”. It all seems rather unimpressive intelligence.
In other words; in one question; is the current AI not already well beyond the “…cognitive versatility and proficiency of an uneducated child”? And when you consider that in many places like Africa, they didn’t even have a written language until European evangelists created it and taught it to them in the late 19th century, and they have far less “education” than even some of the most “uneducated” avg., European and even many American children, does that not mean that AI is well beyond them at least?
Frankly, as it seems things are going, there Is at the very least going to be a very stark shift in “intelligence” that even exceeds that which has happened in the last 50 or so years that have brought us stark drops in memory, literary knowledge, mathematics, and even general literacy, not to mention the ability to write. What does it mean that kids now will not even have to feign acting like they’re selling out sources, vetting them, contradicting a story or logical sequence, forming ideas, messages, and stories, etc.? I’m not trying to be bleak, but I don’t see tons simply resulting in net positive outcomes, and most of the negative impacts will also be happening below the surface to the point that people won’t realize what is being lost.
interstice
3 months ago
What I think is being skipped in the current conversation is that versatility keyword is hiding a lot of unknowns - even now. We don't seem to have a true understanding of the breadth or depth of our own unconscious thought processes, therefore we don't have much that is concrete to start with.
surgical_fire
3 months ago
There are some sycophants that claim that LLMs can operate at Junior Enginee level.
Try to reconcile that with your ideas (that I think are correct for that matter)
ben_w
3 months ago
I'll simultaneously call all current ML models "stupid" and also say that SOTA LLMs can operate at junior (software) engineer level.
This is because I use "stupidity" as the number of examples some intelligence needs in order to learn from, while performance is limited to the quality of the output.
LLMs *partially* make up for being too stupid to live (literally: no living thing could survive if it needed so many examples) by going through each example faster than any living thing ever could — by as many orders of magnitude as there are between jogging and continental drift.
card_zero
3 months ago
(10 orders of magnitude, it works out neatly as 8km/h for a fast jogger against 0.0008 mm/h for the East African Rift.)
JumpCrisscross
3 months ago
If you’re a shop that churns through juniors, LLMs may match that. If you retain them for more than a year, you rapidly see the difference. Both personally and in the teams that develop an LLM addiction versus those who use it to turbocharge innate advantages.
ACCount37
3 months ago
Data-efficiency matters, but compute-efficiency matters too.
LLMs have a reasonable learning rate at inference time (in-context learning is powerful), but a very poor learning rate in pretraining. And one issue with that is that we have an awful lot of cheap data to pretrain those LLMs with.
We don't know how much compute human brain uses to do what it does. And if we could pretrain with the same data-efficiency as humans, but at the cost of using x10000 the compute for it?
It would be impossible to justify doing that for all but the most expensive, hard-to-come-by gold-plated datasets - ones that are actually worth squeezing every drop of performance gains out from.
ninetyninenine
3 months ago
AI is highly educated. It's a different sort of artifact we're dealing with where it can't tell truth from fiction.
What's going on is AI fatigue. We see it everywhere, we use it all the time. It's becoming generic and annoying and we're getting bored of it EVEN though the accomplishment is through the fucking roof.
If elon musk makes interstellar car that can reach the nearest star in 1 second and priced it at 1k, I guarantee within a year people will be bored of it and finding some angle to criticize it.
So what happens is we get fatigued, and then we have such negative emotions about it that we can't possibly classify it as the same thing as human intelligence. We magnify the flaws and until it takes up all the space and we demand a redefinition of what agi is because it doesn't "feel" right.
We already had a definition of AGI. We hit it. We moved the goal posts because we weren't satisfied. This cycle is endless. The definition of AGI will always be changing.
Take LLMs as they exist now and only allow 10% of the population to access it. Then the opposite effect will happen. The good parts will be over magnified and the bad parts will be acknowledged and then subsequently dismissed.
Think about it. All the AI slop we see on social media are freaking masterpieces works of art produced in minutes what most humans can't even hope to come close to. Yet we're annoyed and unimpressed by them. That's how it's always going to go down.
buu700
3 months ago
Pretty much. Capabilities we now consider mundane were science fiction just three years ago, as far as anyone not employed by OpenAI was concerned.
We already had a definition of AGI. We hit it.
Are you sure about that? Which definition are you referring to? From what I can tell with Google and Grok, every proposed definition has been that AGI strictly matches or exceeds human cognitive capabilities across the board.
Generative AI is great, but it's not like you could just assign an arbitrary job to a present-day LLM, give it access to an expense account, and check in quarterly with reasonable expectations of useful progress.
wild_egg
3 months ago
You generally can't just have a quarterly check-in with humans either.
There's a significant fraction of humanity that would not clear the bar to meet current AGI definitions.
The distribution of human cognitive abilities is vast and current AI systems definitely exceed the capabilities of a surprising number of people.
ninetyninenine
3 months ago
>Generative AI is great, but it's not like you could just assign an arbitrary job to a present-day LLM, give it access to an expense account, and check in quarterly with reasonable expectations of useful progress.
Has anyone tried this yet?
ninetyninenine
3 months ago
>We already had a definition of AGI. We hit it.
The turing test.
parineum
3 months ago
> We already had a definition of AGI. We hit it.
I'm curious when and what you consider to have been the moment.
To me, the general in AGI means I should be able to teach it something it's never seen before. I don't think I can even teach an LLM something it's seen a million times before. Long division, for example.
I don't think a model that is solid state until it's "trained" again has a very good chance of being AGI (unless that training is built into it and the model can decide to train itself).
ninetyninenine
3 months ago
The turing test.
criddell
3 months ago
> We already had a definition of AGI.
I'm not an expert, but my layman's understanding of AI was that AGI meant the ability to learn in an abstract way.
Give me a dumb robot that can learn and I should be able to teach it how to drive, argue in court, write poetry, pull weeds in a field, or fold laundry the same way I could teach a person to do those things.
flkiwi
3 months ago
(1) AI isn't educated. It has access to a lot of information. That's two different things.
(2) I was rebutting the paper's standard that AGI should be achieving the status of a well-educated adult, which is probably far, far too high a standard. Even something measured to a much lower standard--which we aren't at yet--would change the world. Or, going back to my example, an AI that was as intelligent as a labrador in terms of its ability to synthesize and act on information would be truly extraordinary.
arthurcolle
3 months ago
It has access to a compressed representation of some subset of the information it was trained on, depending on training regime.
By this, what I mean is. Take an image of this: https://en.wikipedia.org/wiki/Traitorous_eight#/media/File:T..., change the file name to something like image.jpg and pass it into Qwen 3 4B, 8B, 30B and look at the responses you get:
It has no idea who these guys are. It thinks they are the beatles, the doors. If you probe enough, it'll say it's IBM cofounders. In a way, it kinda sees that these are mid-1900s folks with cool haircuts, but it doesn't recognize anything. If you probe on the F the model in question becomes convinced it's the Ford racing team with a detailed explanation of two brothers in the photo, etc.
The creation of autoregressive next token predictors is very cool and clearly has and will continue to have many valuable applications, but I think we're missing something that makes interactions with users actually shape the trajectory of its own experience. Maybe scaffolding + qlora solves this. Maybe it doesn't
Forgeties79
3 months ago
> EVEN though the accomplishment is through the fucking roof.
I agree with this but also, the output is almost entirely worthless if you can’t vet it with your own knowledge and experience because it routinely gives you large swaths of incorrect info. Enough that you can’t really use the output unless you can find the inevitable issues. If I had to put a number to it, I would say 30% of what an LLM spits out at any given time to me is completely bullshit or at best irrelevant. 70% is very impressive, but still, it presents major issues. That’s not boredom, that’s just acknowledging the limitations.
It’s like designing an engine or power source that has incredible efficiency but doesn’t actually move or affect anything (not saying LLM’s are worthless but bear with me). It just outputs with no productive result. I can be impressed with the achievement while also acknowledging it has severe limitations
ninetyninenine
3 months ago
Not all content needs to be real. A huge portion of what humans appreciate is fiction. There's a huge amount of that content and hallucination is the name of the game in these contexts.
dns_snek
3 months ago
> We already had a definition of AGI. We hit it.
Any definition of AGI that allows for this is utterly useless:
> Me: Does adding salt and yeast together in pizza dough kill the yeast?
> ChatGPT: No, adding salt and yeast together in pizza dough doesn't kill the yeast.
(new chat)
> Me: My pizza dough didn't rise. Did adding salt and yeast together kill the yeast?
> ChatGPT: It's possible, what order did you add them in?
> Me: Water, yeast, salt, flour
> ChatGPT: Okay, that explains it! Adding the salt right after the yeast is definitely the issue.
(It is not the issue)
ninetyninenine
3 months ago
You picked one trivial failure and built an entire worldview around it while ignoring the tidal wave of success stories that define what these models can already do. ChatGPT can draft legal documents, debug code in multiple languages, generate functional architectures, summarize thousand page reports, compose music, write poetry, design marketing campaigns, and tutor students in real time. It can hold domain specific conversations with doctors, engineers, and lawyers and produce coherent, context aware reasoning that would have been considered impossible five years ago.
And you’re pointing to a single pizza dough error as if that somehow invalidates all of it. If that’s your bar, then every human who ever made a mistake in a kitchen is disqualified from being intelligent too. You’re cherry picking the single dumbest moment and pretending it defines the whole picture. It doesn’t.
The real story is that these models already demonstrate reasoning and generalization across virtually every intellectual domain. They write, argue, and problem solve with flexibility and intent. They’re not perfect, but perfection was never the standard. The Turing test was passed the moment you could no longer draw a clear line between where imitation ends and understanding begins.
You can sneer about yeast all you want, but the irony is that while you mock, the machines are already doing useful work coding, researching, analyzing, and creating, quietly exceeding every benchmark that once defined general intelligence.
PopePompus
3 months ago
> If elon musk makes interstellar car that can reach the nearest star in 1 second and priced it at 1k, I guarantee within a year people will be bored of it and finding some angle to criticize it.
Americans were glued to their seats watching Apollo 11 land. Most were back to watching I Dream of Jeanie reruns when Apollo 17 touched down.
card_zero
3 months ago
Well yes, but if this actually happened it would open up a new frontier. We'd have an entire galaxy of unspoilt ecosystems* to shit in. Climate anxiety would go from being existential dread to mere sentimental indignation, and everybody would be interested in the latest news from the various interstellar colonies and planning when to emigrate. Mental illness epidemics would clear up, politics would look like an old-fashioned activity, the global mood would lift, and people would say "global" much less often.
* Ecosystems may require self-assembly
poopiokaka
3 months ago
[dead]
NedF
3 months ago
[dead]
grantcas
3 months ago
[dead]