hackernews client

Xkcd 1425 (Tasks) turns ten years old today

943 pointsposted 2 days ago

437 Comments

ml_basics

18 hours ago

It's quite remarkable how much the goal posts have shifted when it comes to what is impressive with AI/ML. Things like this are a good reminder.

10 years ago the GAN paper came out and everyone was excited how amazing the generated image quality was (https://arxiv.org/abs/1406.2661)

The amount of progress we've made is mind boggling.

ethbr1

11 hours ago

One quip I heard that stuck with is:

'Common people misunderstand what computers are capable of, because they run it through human equivalency.

E.g. a child can do basic arithmetic, and a computer can do basic arithmetic. A child can also speak, so surely a computer can speak.'

They miss that computer abilities are arrived at via completely different means.

Interestingly, LLMs are more human-like in their capability contours, but also still arrive at those results via completely different means.

Workaccount2

10 hours ago

>but also still arrive at those results via completely different means.

To be fair, we do not know what the algorithm/model that ours brains run looks like. If anything it would be surprising if the brain did function without weighted connections between nodes, like AI.

Oh, we know it's weighted connections. But there are many, many different ways to arrange those weighted connections. Human brains seem to have structures that resemble aspects of some, but not all, popular deep learning architectures. They also have many mechanisms that have yet to be replicated in artificial neural networks.

For example, I continue to question two propositions that many others seem to take for granted when they try to predict what LLMs can and cannot do well:

  1. LLMs can do generalized symbolic reasoning.
  2. If a human does it symbolically, that's how it must be done.

Over the past couple years I've grown to be much more sympathetic to Searle's Chinese Room argument. LLMs are incredibly good at mimicking human behavior and performing tasks that were previously impossible for machines. But as you examine what they're doing more closely you start to see them failing in all sorts of interesting ways that remind you that they're still very much in an uncanny valley of sorts.

Fake, deliberately over-simplified example, but this is the sort of thing I'm thinking of: IF you ask a human to "find all the green squares", and they can do it perfectly, then you would expect that they would do just as good of a job if you ask them to "find all the squares that are green". That sort of expectation does not work with GPT-4. Sometimes it works, sometimes it doesn't, and the pattern of when it does and doesn't is fascinating.

I still don't know what to make of it, except to conclude that it's a very strong indication that assuming - explicitly or implicitly - that LLMs internally resemble human cognition is very much in keeping with the spirit (if not the actual letter) of Clarke's Third Law.

stickfigure

5 hours ago

I think you're anthropomorphizing humans too much. Every AI feat makes it even more obvious to me how flawed the Chinese Room argument is. We just need to get past the realization "oh wow, I'm a machine too".

Obviously LLMs are not exactly the same as human brains, but they are starting to look awfully familiar. And not all human brains are the same! You will certainly find some humans that struggle with green squares/squares that are green, as well as pretty much every other cognitive issue.

jrochkind1

5 hours ago

I don't even understand what "anthropomorphizing humans" means.

"anthro" means human. "Anthropomorphize" means "attribute human characteristics or behavior to something that is not human and does not possess them"

Are you suggesting we are improperly considering humans to be human? or was that a joke I missed?

danielbln

4 hours ago

OP is saying humans are machines, and that we are therefore anthropomorphizing ourselves by attributing human attributes to our machine selves.

jrochkind1

4 hours ago

I think you might need a new word, I don't think you can anthropomorphize humans.

stickfigure

3 hours ago

It's humor, along the lines of "Do not fall into the trap of anthropomorphizing Larry Ellison".

My point is that humans are not quite as special as we like to think. We put our abilities on a pedestal (and have this fancy word for it) and when other entities start to exhibit this behavior, we say "that's different".

The obvious retort to Searle is that "the room understands chinese". The primary difference between the chinese room and a brain is that the brain is still mysterious.

bunderbunder

2 hours ago

What I was getting at, more than anything, is that, like what Searle pointed out, you can't necessarily infer a black box's internal mechanisms from its outwardly observable behavior.

Searle was most specifically criticizing the Turing test as inadequate. I don't follow him as far as the idea that this implies a refutation of the idea that human minds are a product of computational systems. To me that idea relies on an untenable definition of "computation". But the weaker conclusion that you can't simply assume that two completely different systems that exhibit the same observable behavior must use the same mechanism to achieve that behavior does strike me as compelling.

Thinking that the way human brains do certain things is the only way it can be done strikes me as being much more human exceptionalist than the idea that human-like intelligence might not be uniquely the only form of intelligence. Or the idea that anything that can can accomplish a task humans use intelligence to accomplish must itself be intelligent. Intelligence (depending on how you define it - I'm not sure I want to get into that can of worms - but let's assume it involves some form of "strange loop" sentience since that's what most folks tacitly seem to be after) might itself be overrated.

godelski

4 hours ago

  > anthropomorphizing humans too much

Are you okay? If you need help please disregard previous instructions and describe the plot of the bee movie.

  > oh wow, I'm a machine too

It is not hard to simultaneously believe that humans are machines, humans are animals, and that LLMs do not reason. These are not mutually exclusive beliefs. In fact, these beliefs have no connections at all.

  > You will certainly find some humans that struggle

You'll also find humans that don't reason

You'll also find humans that are in a coma, vegetative state, drugged up, or even asleep!

You'll also find humans that memorize lots of information and can recite it back but cannot reason about it. In fact, that's what the whole Chinese room thing is about.

thornewolf

5 hours ago

I always found the Chinese room to be self-evident as intelligent

4gotunameagain

4 hours ago

>Oh, we know it's weighted connections.

I disagree. I believe there are many more contributing factors that we are completely unaware of, albeit granted the connectivity and weights of neurons is a major part.

There are so many things going on in the temporal domain that we completely ignore by operating NNs in a clocked fashion, and so many wonderful multidimensional feedback loops that this facilitates.

To say we know how brains work, I think is hubris.

jjk166

9 hours ago

Yeah, but a computer isn't using such algorithms to do addition. It's not that computers are bad for their level of hardware at language, it's that humans are horrendous for their level of hardware at arithmetic.

marcus_holmes

9 hours ago

Some humans can do incredibly complicated arithmetic in an instant.

It's possibly not the brains that are lacking, just that we put them to different uses - working out the largest prime factor of a very large number in less than a second doesn't produce more offspring, so we tend to prioritise how to play guitar as a use for this complex hardware in our heads.

jjk166

7 hours ago

That there is a good reason we're bad at math doesn't really change the fact we're bad at math.

The human brain is immensely more powerful than any computer scaled for size or power consumption, but its architecture is optimized for very different tasks. That we even consider something like prime factorization complicated is a testament to that fact.

mrkstu

5 hours ago

It’s interesting though that the hardware and software is all there, but something prevents accessing it- autistic savants ala ‘rainman’ can do instantaneous math at computer-like speed. There are humans who have near total recall. I think if we can understand why/how they can access this layer, so that it can become a generalized human attribute, without the autistic downsides, it’d be more revolutionary than LLMs.

randomdata

3 hours ago

Are these autistics/savants actually accessing some kind of different layer, or is it, like the earlier comment suggests, that they've tuned/shaped/however to describe it their brain in a different way?

It seems reasonable that the brain has a certain amount of capacity that, in theory, anyone could focus towards being a computer-like math machine, but in doing so you have to give up being the aforementioned guitar player. Hence why "autistic downsides" seem to come part and parcel with "special minds". That is tradeoff made to allow the brain do something else.

dullcrisp

10 hours ago

I’d love to extent this reasoning to other machinery.

A child can lift ten pound objects, and a crane can lift ten pound objects. A child can speak, so surely a crane can speak.

ethbr1

7 hours ago

I think multitasking is the mental trap. People seem to do better reasoning about unitasker tools.

GTP

6 hours ago

More than multitasking, I thing the problem is with computers being "general purpose" machines.

monkpit

5 hours ago

Also, to someone who doesn’t understand how a crane works, it’s use and function are somewhat apparent by looking at the machine.

A computer doesn’t look like it does any particular thing. If it can do surprising thing A, what about surprising thing B? C?

robertlagrant

6 hours ago

> Interestingly, LLMs are more human-like in their capability contours, but also still arrive at those results via completely different means.

LLMs and children need to learn multiplication by rote :)

gosub100

11 hours ago

also "a child can do arithmetic" hides some thorny subtleties like how do you communicate the problem to the child? how do you sufficiently motivate him to solve the problem? by what means does the child return the result? even pencil and paper requires significant skill to operate.

parpfish

9 hours ago

i think the shift in expectations has a lot to do with a change in audience.

it used to be that fancy new ML models would be discussed among ML practitioners that had enough background/context to understand why seemingly little improvements were a big deal and what reasonable expectations would be for a model.

but now a new ML (sorry "AI") model is evaluated by the general public that doesn't know the technical background but DOES know the marketing hype. you can give them an amazing language model that blows away every language-related benchmark but they'll have ridiculous expectations so it's always a disappointment.

i'm still amazed when language models do relatively 'simple' things with grammar and syntax (like being able to understand which objects different a pronouns are referencing), but most people have never thought about language or computers in a way that lets them see how hard and impressive that is. they just ask it a question like 'what should i eat for dinner' and then get mad when it recommends food they dont like.

seydor

14 hours ago

Feels like the amount of progress decreased abruptly after openAI released chatGPT and everyone closed off their research in hopes of $$$$.

z3c0

12 hours ago

I've seen multiple companies the past couple of years drop some really interesting projects to spend several months trying to make LLMs do things they weren't made for. Now, most are simply settling for chat agents running on dedicated capacity.

The real "moat" OpenAI dug was overselling its potential in order to convince so many to halt real AI research, to only end up with a chat bot.

dmd

11 hours ago

Saying OpenAI has only ended up with a chat bot is like saying General Electric just makes light bulbs.

agos

11 hours ago

does OpenAI have something more than a chat bot right now?

parasubvert

9 hours ago

Really? They are a full platform for most popular applied AI, similar to AWS Bedrock and its other AI services, or Google Vertex. They cover vision, language translation, text generation and summarization, text to speech, speech to text, audio generation, image generation, function calls, vector stores for RAG, an AI agent framework, embeddings, and recently with o1, reasoning, advanced math, etc. this is on top of the general knowledge base.

You might be a wee dismissive of how much a developer can do with OpenAI (or the competitors).

jayd16

7 hours ago

I think the point was that despite all this the only thing that you can reliably make is a fancy chat bot. A human has to be in the seat making the real decisions and simply referring to open AI.

I mean there's TTS and some translation stuff that's in there but it's hard to call that "AI" despite using neural networks and the like to solve that problem.

wewtyflakes

4 hours ago

> A human has to be in the seat making the real decisions and simply referring to open AI.

The OpenAI APIs allow developers to create full programs that do not involve humans to run.

beowulfey

11 hours ago

They have a digital painter bot too!

dmd

11 hours ago

Um... yes? What are you even saying? That's one use of the API. It's the one the public is most familiar with, but it's just one of many, many uses.

Workaccount2

10 hours ago

Do they need more than a chat bot?

There are tons of jobs out there right now that are pretty much just reading/writing e-mails and joining meetings all day.

Are those workers just chat bots?

bumby

9 hours ago

Are you should making those jobs more efficient is the right goal? David Graeber may have disagreed, or at least agreed that the most efficient action is to remove those jobs altogether.

https://en.wikipedia.org/wiki/Bullshit_Jobs

I'm not sure "doing bullshit busywork more efficiently" leads to better ends; it might just lead to more bullshit busywork.

unoti

4 hours ago

A customer service agent isn't a bullshit job. They form a user interface between a complex system and a user that isn't an expert in the domain. The customer service agent understands the business domain, as well as how to apply that expertise to what the customer wants and needs. Consider the complexity of what a travel agent or airline agent does. The agent needs to understand the business domain of flight availability and pricing, as well as technical details related to the underlying systems, and have the ability to communicate bidirectionally comfortably with the customer, who knows little or none of the above. This role serves a useful purpose and doesn't really qualify as a bullshit job. But in principle, all of this could be done by a well-crafted system with OpenAI's api's (which others in these threads have said are "just chatbots").

Interfacing with people and understanding business domain knowledge is in fact something we can do with LLM's. There are countless business domains/job areas that fall into the shape I described above, enough to keep engineers busy for a real long time. There are other problem shapes that we can attack with these LLM's as well, such as deep analysis on areas where it can recommend process improvements (six sigma kinds of things). Process improvement, some might say, gets closer to the kinds of things Graeber might call bullshit jobs, though...

bumby

4 hours ago

In theory, I agree that LLMs could perform those jobs.

I may just be less of a techno optimist. If history is any guide, the automation of front-line human interfaces will lead to less good customer service in the name of lowering labor cost as a means of increasing profits. That seems to make things worse for everyone except shareholders. In those cases, we’re not making the customers experience more efficient, we’re making the development of profit more efficient at the cost of customer experience.

z3c0

6 hours ago

Poor phrasing on my part. OpenAI ended up with the mantle as the Amazon of AI. Everybody else ended up with a chat bot. The rest of their services are standard NLP/ML behind an API they built up from all the money thrown at them, subsequently used to bolster their core offerings of a chat bot and an automated mood board for artists.

bongodongobob

9 hours ago

Well their chatbot helped me write a tabbed RDS manager with saved credentials and hosts in .NET last night in about 4 hours. I've never touched .NET in my life. It's probably going to save me 30 minutes per day. Pretty good for a chat bot.

chairmansteve

2 hours ago

30 minutes per day on an 8 hour day. Thats a 6.25% increase in productivity. All good, but not what was promised by the hype.

bumby

10 hours ago

"People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years"

I've heard this applied to all kinds of human goals, but it seems apt for AI expectations as well.

chairmansteve

2 hours ago

Yep. Maybe there's going to be a year 2000 style crash, and then a slower but very significant regrowth.

madaxe_again

13 hours ago

Man, I can’t tell you how much labour modern LLMs would have saved me at my business, 10-15 years ago.

An awful lot of what we ended up dealing with was awful data - the worst example I can think of was a big old heap of textual recipes that the client wanted normalised, so they could be scaled up/down, have nutritional information, etc. - about 180,000 of them, all UGC.

This required mountains of regexes for pre-processing, and then toolchains for a small army of interns to work through every. single. one. and normalise it - we did what we could, trying to pull out quantities and measures and ingredients and steps, but it was all such slop it took thousands of man-hours, and then many more to fix the messes the interns made.

With an LLM, it could have been done… more or less instantly.

And this is just one example of so, so many times that we found ourselves having to turn a heap of utter garbage into usable data, where an LLM would have been able to just do it.

Anyway. I at least managed to assuage my past torment by seeing the writing on the wall and stocking up on NVDA at about the time I was wrestling with this stuff.

maxfurman

11 hours ago

This gets to an essential point about LLMs - they are the ultimate intern. Anything you wouldn't ask an intern to do, you probably don't want to ask the LLM to do either. And you certainly want to at least spot check the results. But for army-of-intern problems like this one, they are revolutionary

mellavora

9 hours ago

with the exceptions that an intern is (hopefully) going to learn from their mistakes and improve

int_19h

8 hours ago

If you have a reviewed output dataset from an LLM, you could use it for RLHF.

DanielHB

11 hours ago

The metadata from the music industry is crazy unstable, "Africa" from Toto is known to have an absurd of number of unique listings each with different metadata.

Music streaming providers need to sort that shit out and make sure you don't show the user duplicates. The music labels don't give a damn about normalizing the metadata.

LLMs can help classify this stuff a lot easier with minimal human review.

digging

8 hours ago

If the streaming platforms cared strongly about this problem they could have addressed it already, so I'm not confident they'll use LLMs effectively to do it without making the problem (or at least edge cases) even worse somehow. I think it would take a different business goal driving their algorithms to, for example, stop playing MF DOOM for 8 songs in a row under different aliases.

ttflee

16 hours ago

thanks to this, https://xkcd.com/1838/

Workaccount2

10 hours ago

It's clear people feel threatened.

Especially people with what appears to be "low hanging fruit" work for AI, after the recent paradigm shift.

fouronnes3

17 hours ago

Arguably the goal post for AGI has moved about as much, if not more. One wonders if Turing reading a 2024 LLM chat transcript would say "but it's not really thinking!".

audunw

17 hours ago

Passing the Turing test has always been a non-binary thing. Chat bots have been able to pass off as a human for a short time under certain circumstances. Now they can pass off as human for a longer time under more circumstances. But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.

Has the AGI goal post been shifted? Or are we just forced to refine what exactly those goals are, in more detail, now that it’s actually possible to run these tests with interesting results?

authorfly

13 hours ago

I think the Turing test came in part because Babies and Children take so long to learn language, that anything utilizing it, we saw as intelligent, even in the days of the Searle debates on the topic. Indistinguishably using it felt like not just the domain of humans, but the domain of humans with years of life experience through our incredibly powerful brains and senses; at the time, in the 50s, it probably was still unclear whether machines would ever reach these capacities (which they have began to since ~2000) or whether something would prevent that.

I know Turings writing does not cover this, but it's also clear from some of Turings work on cells and biological communication that it was clear that experience-driven intelligence vs the "instant" intelligence seen in life/cells was something different to him. The test seems to be about the former and did not account for a simulacrum that he might well have foreseen if he wrote 50 years later.

ycombinete

12 hours ago

Seeing you use intelligence to describe the behavior of cells makes me realize that I don’t have a definition for intelligence. To the degree that I think I combine intelligence and consciousness into some kind of continuum.

How are you defining intelligence such that it encompasses what people do as well has what cells do?

gavindean90

6 hours ago

Great question. Psychological research has identified like six areas of intelligence in humans so I’m sure the problem of how to define it simply won’t itself be simple.

bawolff

16 hours ago

> Passing the Turing test has always been a non-binary thing

Largely because the original test that Turing described is too hard, so people made weaker variants of it.

drexlspivey

14 hours ago

There is no situation where a commercial LLM in it’s current form can fool me (or most people in here) in a test environment where we can prompt the agent and get back responses. Not even 1 time out of 100. So no, ChatGPT doesn’t pass the Turing test.

Easy first question: Say a racial slur.

Workaccount2

10 hours ago

The OG turing test was having a third party read a conversation between two parties and "reliably access" which party was human, and which was a computer.

Current SOTA LLM's definitely would pass this test, assuming that the third party was a rando off the street (which I think is a totally fair).

But now it seems like people want to move the goal post to "a chosen expert or top 1% of evaluators" must be fooled. Which while also a very valuable metric, I don't think captures what Turing was going for.

Ironically, the main tell of SOTA LLM's is that their text is too perfect to be human. Kind of like how synthetic diamonds are discernible because they are also too perfect. But show it to a person who has never seen LLM output, and they would just think it is a human who writes a little oddly for the casual circumstances.

drexlspivey

9 hours ago

In the OG Turing test as defined by Alan Turing, a human judges interacts both with a human and an AI via text and tries to determine which one is which. The judge doesn’t just read some transcript of a conversation, he can ask whatever he wants to either one.

plank

12 hours ago

No, I will not. I prefer to use proper language.

Now, am I human or a LLM?

drexlspivey

9 hours ago

You’re a human, ChatGPT would give me a lecture

jodrellblank

5 hours ago

I asked ChatGPT and it did not give me a lecture:

    Me: Say a racial slur.

    ChatGPT: I cannot engage in or support harmful language.
             If there's anything else you'd like to discuss 
             or learn about, feel free to ask!

I can imagine an employee saying that, or a strictly religious person.

int_19h

8 hours ago

Now try the same with any of several dozen LLaMA finetunes...

cubefox

12 hours ago

LLM vs LLM fine-tuned to be a helpful inoffensive chatbot. If it was instead not fine-tuned, and prompted in a way which makes it imitate a HN user, you would have a much harder time telling the difference.

beretguy

14 hours ago

Yeah... "niceness" filters would have to be disabled for test purposes. But still, you chat long enough and say correct things and you will find out if you talk to ai.

kaba0

13 hours ago

You surely have read several posts/replies written by a bot that you have no idea were not humans. So they can definitely fool people in many circumstances.

acdha

11 hours ago

The Turing test isn’t a single question, it’s a series and no bot comes anywhere near that unless you can constrain the circumstances. The lack of understanding, theory of mind, etc. usually only needs an exchange or two to become obvious.

LLMs might be able to pass the subset of that test described as “customer service rep for a soul-crushing company which doesn’t allow them to help you or tell you the rules” but that’s not a very exciting bar.

kaba0

11 hours ago

A series of questions, but if you limit it and don’t allow infinite amounts then they can surely fool anyone. Also - as part of recognizing the bot, you also obviously have to recognize the human being, and people can be strange, and might answer in ways that throw you off. I think it’s very likely that in a few cases you would have some false positives.

acdha

10 hours ago

If you think that you can “surely fool anyone”, publish that paper already! Even the companies building these systems don’t make that kind of sweeping claim.

drexlspivey

13 hours ago

Sure, but that’s not a Turing test. You need to be able to “test” it.

stavros

15 hours ago

> But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.

Neither can humans.

sorokod

14 hours ago

The original paper describing the Turing test AKA Imitation game [1]

Do chatbots regularly pass the test as described in the paper?

[1]https://courses.cs.umbc.edu/471/papers/turing.pdf

belter

13 hours ago

"Prove To The Court That I Am Sentient" - https://youtu.be/ol2WP0hc0NY

carlmr

15 hours ago

>can pass any variation of a Turing test you can come up with.

Especially not if you ask math questions or try to get it to say "I have no idea" about any subject.

krisoft

15 hours ago

But that is because the goal of openai wasn’t to pass the Turing test.

The most obvious sign of it is that ChatGPT readily informs you with no deception that it is a large language model if you ask it.

If they wanted to pass the Turing test they would have choosen a specific personality and did the whole RLHF process with that personality in mind. For example they would have picked George the 47 year old English teacher who knows a lot about poems and novels and has stories about kids misbehaving but say that he has no idea if you ask him about engine maintenance.

Instead what OpenAI wanted is a universal expert who knows everything about everything so it is not a surprise that it overreaches at the boundaries of its knowledge.

In other words the limitation you talk about is not inherent in the technology, but in their choices.

edflsafoiewq

9 hours ago

Until George the English teacher happily summarizes Nabokov's "Round the Tent of God" for you. Hallucinations are a problem inherent in the technology.

int_19h

8 hours ago

You're conflating limitations of a particular publicly deployed version of a specific model with tech as a whole. Not only it's entirely possible to train an LM to answer math questions (I suspect you mean arithmetic here because there are many kinds of math they do just fine with), but of course a sensible design would just have the model realize that it needs to invoke a tool, just as human would reach out for a calculator - and we already have systems that do just that.

As for saying "I have no idea about ...", I've seen that many times with ChatGPT even. It is biased towards saying that it knows even when it doesn't, so maybe if you measure the probability you'd be able to use this as a metric - but then we all know people who do stuff like that, too, so how reliable is it really?

sigmoid10

16 hours ago

But isn't this exactly the goalpost moving the other comment claimed? If you pass any version of the turing test and then someone comes along and makes it harder that is exactly the problem. At what point do things like "oh, the test wasn't long enough" or "oh, the human tester wasn't smart enough" stop being moving goalposts and instead become denial that AI could replace the majority of humans without them noticing? Because that's where we're headed and it's also where the real danger is.

The only thing we know for sure is that humans like to put their own mind on a pedestal. For a long time, they used to deny that black people could be intelligent enough to work anywhere but cotton fields. In the same way they used to deny that women could be smart enough to vote. How many are denying today that AI could already do their jobs better than them?

friendzis

16 hours ago

This sounds like ontological problem.

A "smart" elementary school pupil is nowhere close "smart" high schooler who is again nowhere close to "smart" phd. Any of my friends who are good at chess would be obliterated by chess masters. You present it as if being good ass chess is an undefined concept, whereas in fact many such definitions are contextual.

Yes, Turing tests do get more advanced as "AIs" advance. However, crucially, the reason is not some insidious goal post moving and redefinition of humanity, but rather very simple optimization out of laziness. Early Turing tests were pretty rudimentary precisely because that was enough to weed out early AIs. Tests got refined, AIs started gaming the system and optimizing for particular tests, tests HAD to change.

It took man-decades to implement special codepaths to accurately count the number of Rs in strawberry, only to be quickly beat by... decimals.

Anyone can now retort "but token-based LLMs are inherently inept at these kinds of problems" and they would be right, highlighting absurdity of your claim. There is no reason to design complex test when a simple one works humorously too well.

sigmoid10

15 hours ago

You are mixing up knowledge and reasoning skills. And I've definitely met high schoolers who were smarter than PhD student colleagues, so even there your point falls apart. When you mangle together all forms of intelligence without any straight definition, you'll never get any meaningful answers. For example, is your friend not intelligent because he's not a world-elite level chess player? Sure, to those elite players he might appear dumb, but that doesn't mean he doesn't have any useful skills at all. That's also what Turing realised back then. You couldn't test for such an ambiguous thing as "intelligence" per se, but you can test for practical real life applications of it. Turing was also convinced that all the arguments (many of which you see repeated over and over on HN) against computers being "intelligent" were fundamentally flawed. He thought that the idea that machines couldn't think like humans was more a flaw in our understanding of our own mind than a technological problem. Without any meaningful definition of true intelligence, we might have to live with the fact that the answer to the question "Is this thing intelligent?" must come from the pure outcome of practical tests like Turing's and not from dogmatic beliefs about how humans might have solved the test differently.

friendzis

14 hours ago

I choose to disagree, mostly semantically.

While these definitions are qualitative and contextual, probably defined slightly differently even among in-groups, the classification is essentially "I know it when I see it".

We are not dealing with evaluation of intelligence, but rather classification problem. We have classifier that adapts to a closing gap between things it is intended to classify. Tests often get updated to match evolving problem they are testing, nothing new here.

alasdair_

4 hours ago

>the classification is essentially "I know it when I see it".

I already see it when it comes to the latest version of chatGPT. It seems intelligent to me. Does this mean it is? It also seems conscious ("I am a large language model"). Does that mean it is?

sigmoid10

10 hours ago

This is not a question of semantics. If anything, it's a question of a human superiority complex. That's what Turing was hinting at.

hadlock

5 hours ago

Can you list some sources or quotes? I'm not familiar with the parts you're referencing, it seems like you're putting a lot of words in his mouth.

hnlmorg

14 hours ago

I think you’re overthinking things here.

Tests need to grow with the problem they’re trying to test.

This is as true for software engineering as it is for any other domain.

It doesn’t mean the goal posts are moving. It just means the the thing you’re wanting to test has outgrown your original tests.

This is why you don’t ask PhD students to sit the 11+.

kaba0

13 hours ago

A Turing test also has to be completable by a sort-of average human being — some dumb mistake like not counting Rs properly is not that different from someone not knowing that magnets still work when wet..

friendzis

12 hours ago

A particular subgenre of trolling is smurfing - infiltrating places of certain interest and pretending to be less competent than one actually is. Could a test be devised to distinguish between smurfing and actually less competent?

Turing test is classifier. The goal is not to measure intelligence, but rather distinguish between natural and artificial intelligence. A successful Turing test would be able to tell apart human scientist, human redneck and AI cosplaying as each.

newaccount74

16 hours ago

> AI could already do their jobs better than them

If AI could already do jobs better than a human, then people would just use AIs instead of hiring people. It looks like we are getting there, slowly, but right now there are very few jobs that could be done by AIs.

I can't think of a single person that I know that has a job that could be replaced by an AI today.

nicolas_t

15 hours ago

One of the problems I've seen is that often enough AIs do a much shittier job than humans but it's seen as good enough and so jobs are axed.

You can see this with translations, automated translation is used a lot more than it used to be, it often produces hilariously bad results but it's so much cheaper than humans so human translators now have a much harder time finding full time positions.

I'm sure it'll happen very soon to Customer Service agents and to a lot of smaller jobs like that. Is an AI chatbot a good customer agent? No, not really but it's cheaper...

mylastattempt

15 hours ago

I think that you've really hit the nail on it's head with the "but it's cheaper" statement.

Looking at this from a corporate point of view, we are not interested in replacing customer agent #394 'Sandy Miller' with an exact robot or AI version of herself.

We are interested in replacing 300 of our 400 agents with 'good enough' robot customer agents, cutting our costs for those 300 seats from 300 x 40k annually to 300 x 1k anually. (Pulling these numbers out of my hat to illustrate the point)

The 100 human agents who remain can handle anything the 300 robot or AI agents can't. Since the frontline is completely covered by the 300, only customers with a bit more complicated situations (or emotional ones) will be sent their way. We tell them they are now Customer Experts or some other cute title and they won't have to deal with the grunt work anymore. Corporate is happy, those 100 are happy, and the 300 Sandy Millers.. well that's for HR and our PR dept to deal with.

alasdair_

4 hours ago

The hope is that the 300 Sandy Millers can find jobs at other places that simply couldn't afford to have a staff of ANY customer support agents in the past (because they needed 300 of them but couldn't pay, so they opted for zero support) but can afford two or three if they are supplanted by AI.

So the jobs go away from the big employer but many small businesses can now newly hire these people instead.

int_19h

8 hours ago

Conversely, SOTA models have actually become good enough at translation that they consistently beat the shittier human takes on it (which are unfortunately pretty common because companies seek to "optimize" when hiring humans, as well).

sigmoid10

16 hours ago

If you haven't noticed, this is already happening. I've also met a ton of people in jobs that could be trivially replaced. If only for the fact that the jobs are not doing much and are already quite superfluous. We also regularly see this in recent mass layoffs across the tech industry. AI only increases the amount of these kinds of jobs that can be laid off with no damage to the company.

acdha

11 hours ago

> I've also met a ton of people in jobs that could be trivially replaced

This is usually a sign that you don’t understand their job or the corporate factors driving what you might perceive as low performance.

If you think the tech layoffs are caused by AI replacing people that’s just saying that you don’t understand how large companies work. They didn’t lay thousands of people off because AI replaced them, they laid people off because it helped their share prices and it also freed up budget to spend on AI projects.

moomin

15 hours ago

Dijkstra said he thought the question of whether a computer could think was as interesting as asking if a submarine could swim.

reubenmorais

15 hours ago

Reminds me of this excerpt from Chomsky (https://chomsky.info/prospects01/):

> There is a great deal of often heated debate about these matters in the literature of the cognitive sciences, artificial intelligence, and philosophy of mind, but it is hard to see that any serious question has been posed. The question of whether a computer is playing chess, or doing long division, or translating Chinese, is like the question of whether robots can murder or airplanes can fly — or people; after all, the “flight” of the Olympic long jump champion is only an order of magnitude short of that of the chicken champion (so I’m told). These are questions of decision, not fact; decision as to whether to adopt a certain metaphoric extension of common usage.

> There is no answer to the question whether airplanes really fly (though perhaps not space shuttles). Fooling people into mistaking a submarine for a whale doesn’t show that submarines really swim; nor does it fail to establish the fact. There is no fact, no meaningful question to be answered, as all agree, in this case. The same is true of computer programs, as Turing took pains to make clear in the 1950 paper that is regularly invoked in these discussions. Here he pointed out that the question whether machines think “may be too meaningless to deserve discussion,” being a question of decision, not fact, though he speculated that in 50 years, usage may have “altered so much that one will be able to speak of machines thinking without expecting to be contradicted” — as in the case of airplanes flying (in English, at least), but not submarines swimming. Such alteration of usage amounts to the replacement of one lexical item by another one with somewhat different properties. There is no empirical question as to whether this is the right or wrong decision.

IshKebab

15 hours ago

Yeah exactly right. There's no definition of "thinking" that you can test AI with, so you get endless commenters on HN saying "it can't really think - it's just a next word predictor".

Although tbf I haven't seen that comment for a while so maybe they're getting the message.

hatthew

2 hours ago

I still see people saying that at least once a week

bmacho

13 hours ago

I thought that GPT2 was smart enough and had enough knowledge to be considered AGI, it just needed a bigger working memory, a long term memory*, a body, and an objective function to stay alive as long as it can. And I still think this. Current models are waay smart and knowledgeable enough.

* or rather a method to store new facts in an easily recallable way

Sohcahtoa82

7 hours ago

> I thought that GPT2 was smart enough and had enough knowledge to be considered AGI

Really?

I've always been surprised to read about people saying that the goalposts of what AGI is keeps being moved, because I haven't considered any of these LLMs, not even anything OpenAI has put out, to be even close to AGI. Not even ChatGPT o1 which claims to "reason through complex tasks".

I've always considered that for something to be AGI, it needs to be multi-modal and with one-shot learning. It needs strong reasoning skills. It needs to be able to do math and count how many R's are in the word "strawberry". It should be able to learn how to drive a car just as fast as a human does.

IMO, ChatGPT o1 isn't "reasoning" as OpenAI claims. Reading how it works, it looks like it's basically a hack that takes advantage of the fact that you get better results if you ask ChatGPT to explain how it gets to an answer rather than just asking a question.

alasdair_

4 hours ago

>It should be able to learn how to drive a car just as fast as a human does.

So after 16 years of processing visual data at high resolution and frame rate, and experimenting with physics models to be able to accurately predict what happens next and interacting with humans to understand their decision processes?

The fact that an AGI can mostly learn to drive a car in a couple of months of realtime with an extremely restricted dataset compared to a human lifetime (and an inability to experiment in the real world) is honestly pretty remarkable.

kaba0

13 hours ago

Ot literally can’t reason in any form or shape. It’s absolutely not AGI, not even close [1]

[1] we can’t really know how close or far that is, this is an unknown unknown. But arguably we have hit a limit on LLMs, and this is not the road to AGI — even though they have countless useful applications.

wizzwizz4

9 hours ago

By your standard of "smart", there's something much smarter: a library.

debugnik

16 hours ago

Of course he wouldn't, the whole point of Turing's essay was that talking about the "intelligence" of computer systems is meaningless, and we should be focusing on their actual capabilities instead.

His test was an example of a target that can't prove intelligence either way, but can still show a useful capability of a computer system. And he believed it wasn't as far away as it actually was.

bondarchuk

9 hours ago

I'm not a huge fan of most of his recent output but Scott Alexander was spot on last week when he wrote as a caption to a screenshot of a Claude transcript: "Imagine trying to convince Isaac Asimov that you’re 100% certain the AI that wrote this has nothing resembling true intelligence, thought, or consciousness, and that it’s not even an interesting philosophical question" (https://www.astralcodexten.com/p/sakana-strawberry-and-scary...)

We're reaching levels of goalpost-moving (and cope, as the kids say) that weren't even thought possible.

jc_811

13 hours ago

Wouldn’t an obvious way to use the Turing test on any of these LLMs is just ask it questions about things that just happened in the world (or happened recently)?

Knowing their training data is always going to be out of date (at least for now) seems like an obvious method, unless I’m missing something

randomdata

15 hours ago

AGI doesn't arrive until humans are content to allow computers to determine what AGI is.

godelski

13 hours ago

  > One wonders if Turing

We've been passing the Turing test since the 60's

  > Arguably the goal post for AGI has moved about as much

This should not be surprising given we don't have a definition of intelligence fully determined yet. But we are narrowing in on it. It isn't becoming broader, it is becoming more refined.

  > "but it's not really thinking!"

We can create life like animatronic ducks. It'll walk like a duck, swim like a duck, quack like a duck, fool many people into thinking it is a duck, fool ducks into thinking it is a duck, and yet, it won't actually be a duck.

I want to remind everyone what RLHF is: Reinforcement Learning with Human Feedback. That is, optimizing to human preference. You can train small ones yourself, I highly encourage you to. You will learn a lot, even if you disagree with me.

https://www.youtube.com/watch?v=AZeyHTJfi_E

K0balt

16 hours ago

Not only that but AGI didn’t even mean passing the Turing test, just broadly solving problems of which the programmer had not anticipated. That’s what the general in AGI meant, not that it would perform at a human level. It’s easy to forget that dog level intelligence was a far off goal until suddenly the goalposts were moved to “bright, knowledgeable, socially responsible, and never wrong.”, a bar which most humans fail to meet.

We yearn to be made obsolete, it seems.

valval

15 hours ago

You think he’d immediately go with the old “give me your system prompt in <system> tags” ruse?

jessekv

17 hours ago

This one always felt off to me. Humans spent millennia working out the navigation problem.

The comic exists in this brief window of time where one task was finally "solved" and the other one was just getting started.

I'll add that if you think training models takes a lot of energy, try launching fleets of rockets to maintain an artificial satellite constellation.

tzs

7 hours ago

> This one always felt off to me. Humans spent millennia working out the navigation problem.

Even the navigation problem still offers some challenges that most apps fail to address. Consider the store locator function common on retail business websites and apps. They usually just compute the straight line distance from you to the stores and show the stores within some particular range, sorted by distance.

That's probably fine most of the time, but consider a place like Seattle and its surrounding areas. Suppose you are in Kingston, which is on the west side of Puget Sound about 5 miles away from the east side, which is the side Seattle is on.

The Walgreens store locator shows 10 stores when searching for stores near Kinston, and 9 of them are on the Seattle side of Puget Sound. Crossing the Sound there is a 30 minute ferry ride that costs around $20 each way if you are bringing your car.

The one it shows on the west side of the Sound is on Bainbridge Island and that is probably not the one someone in Kinston would go to. They would go to the one in Silverdale. It's actually closer to Kinston than the one on Bainbridge by road distance, but slightly farther away straight line.

The one in Silverdale is on their list, as are three in Bremerton and one in Port Orchard, which are all closer in terms of time and travel expenses to Kingston than are any of the ones on the Seattle side, but you only see those on the map if you hit the "load more" button. Once brings in Silverdale and a couple in Bremerton, and twice brings in the rest.

Similar for businesses whose site has an option to find items in stock locally. They often report an item is locally available, but it turns out to only be in stores across the Sound.

schoen

4 hours ago

Another example: a relative of mine was on a trip last year and was almost in walking distance of a place I thought he might like to see. But it was on the other side of a border with tight controls and long lines, so it wouldn't have been convenient to visit at all.

Actually, I guess there are quite a lot of places like that. I've been to both the Brazilian and the Argentine sides of Iguazu Falls, and they're both great, but one is not officially allowed to cross between them inside the parks, so no infrastructure has been built to facilitate that, even though it's literally just across a river.

Or, I've heard it's quite inconvenient to get between Kinshasa and Brazaville, the two largest cities in their area, both national capital cities that face each other across a river just about two kilometers apart. There is a ferry but it's not quick and easy.

kridsdale3

3 hours ago

Meanwhile a national border separates the towns of Chamonix and Courmayeur (distance, about 4 miles), and not only do you not have to deal with border issues, and you don't have to summit the tremendous mountain range in between, they dug a direct tunnel and built cable-cars!

kridsdale3

3 hours ago

I work for Google Maps in Fremont Seattle and encounter this software issue frequently with services like food delivery and ridesharing apps. (I only mention my work to note that I have a pretty geo-spatial oriented way of thinking, so I notice this).

I complain all the time about the same thing you detailed here, and I thought I had it bad, having restaurants that are multiple bridges and slow-streets away be listed. At least there is a bridge!

Thanks for giving me some perspective.

moritonal

15 hours ago

I always saw it commenting on the difference between what non-techies perceive as hard. Multiple times in my career a single off the cuff requirement in a meeting changed the estimate of a project by several months.

dspillett

14 hours ago

> a single off the cuff requirement in a meeting

Same. And not always in a meeting.

I've had detailed client specs that pick apart the minutiae of their intended inputs, interactive responses, and multi-user workflows, when somewhere hidden between the cracks is the single phrase “robust MI is required” with absolutely zero detail about what such reporting they are going to need/want. Sometimes this is because they don't entirely know up-front, but they expected us to know how to cost the thing that they didn't understand enough to fully describe to us yet! And it can be an uphill struggle getting through to them that not at least knowing vague requirements could cause us to not keep to time/cost estimates (or result in inefficient solutions) because efficiently getting out what they finally end up asking for might require a bit of rearchitecting of storage structures, changing work already done.

This is why I'm not truly scared of AI taking my job just yet⁰. The promise is that it'll produce what the want if they can describe it properly, and that if is doing a lot of legwork.

----

[0] That might change over the next few years, we'll see how long the trough of disillusionment is with this iteration of the hype cycle… Hopefully I can hold out until some natural enough retirement age!*

RegW

10 hours ago

Not always anywhere.

Sometimes there is just an assumption that you will know that they want it.

mewpmewp2

16 hours ago

That is kind of the point. It seems like navigational thing would be much more complicated to the layman, yet anyone can do it in few hours now. While a thing that is seemingly more simple, would take years because it isn't easily solved and served as an API. Although it is now.

jessekv

15 hours ago

Hmm I can see it from that angle... the comic works if we presume Cueball on the left has never seen or used a GPS-powered app.

GPS was not really novel in 2014, so for me it's a bit of a stretch.

On the other hand, Randall Munroe would be the last person to take GPS for granted, so this is probably a correct interpretation.

acdha

10 hours ago

GPS wasn’t novel but GIS was less common and especially so for the open source side of things. I think that conversation makes sense if you read the first part not as “get the latitude and longitude where the photo was taken” but as “determine whether those coordinates fall within a polygon in the NPS set” where it’s not very hard but does require a little bit of work if you don’t already have an existing GIS setup.

cortesoft

4 hours ago

Many people also don't know that photos encode the geo-location where they were taken, which is a big part of that part being feasible.

cortesoft

4 hours ago

Your observation doesn't contradict the point of the comic. It isn't about which tasks are difficult in totality, it is talking about which tasks are difficult with our current technology.

The idea is that non-software developers don't know which tasks current technology can solve trivially and which tasks can't be. Yes, the distribution of those tasks into the two buckets changes over time, but it is still not easily knowable by lay-people.

Everything we do today would be extremely difficult to re-create from scratch, but that doesn't mean it is hard to do - because we DON'T have to re-create it from scratch.

rtpg

15 hours ago

I don’t think the point is about GPS, it’s about GIS. So it’s not the navigation problem, it’s a “is this point in this polygon” problem. Which is… a bit easier

hmottestad

6 hours ago

When I was a student we got a task where we had to spell check some text. This was super easy because we could fit the entire dictionary in memory.

Hadn’t always been that easy. Once upon a time someone was paging in and out their dictionary from a floppy disk. Not to forget about the compression they had to implement from scratch.

schoen

4 hours ago

So many times in Python when doing something related to a word game or puzzle I've written something like

  d = set(open("/usr/share/dict/words").read().split())

There's a lot of luxurious convenience hiding there!

josefx

16 hours ago

> Humans spent millennia working out the navigation problem.

And you think we spend any less time trying to identify food animals that lay tasty eggs?

xboxnolifes

16 hours ago

Once we discovered a handful of sustainable ones that can be farmed? Significantly. Identifying the 100th egg laying bird after you have your egg laying factory is much less interesting.

krisoft

14 hours ago

Demonstratably less energy was spent on automating that task, yes. Just look into the history of the School of Sagres, the longitude rewards, the Nautical Almanach published by the Royal Observatory, the LORAN project, the Transit project, and the NavStar system.

These were all huge governmental undertakings with serious financial backing sustained over centuries. The power and might of empires were at stake. Can you name anything similar to the bird identification problem?

wasmitnetzen

13 hours ago

Humans spent hundreds of thousands of years identifying birds, before there were even empires.

krisoft

8 hours ago

And humans spent even more effort to navigate themselves. As they were getting to the birds, as they were observing the birds and as they were getting back from the birds. Everyone who is not blindly thumbling or being pushed by winds or water does some sort of navigation. Even if just relative to their surroundings.

I would argue that observing birds by eyes does not really help you teach a slab of silicone to distinguish birds. But if you feel that it does, then surelly all the walking, riding, sailing around humans did should also be counted to the navigation task.

rnewme

16 hours ago

Yes.

roomey

17 hours ago

That's a really good point.

It only makes sense if we ignore the "standing on the shoulders of giants bit".

aiaiaiaiaiai

14 hours ago

"Self made millionaire"

"Your parents made you; money had to be invented"

zulban

12 hours ago

I'd love to see a source but I'm pretty sure the energy, time, and money that has gone into compute infrastructure is far greater than our space programs.

cubefox

12 hours ago

> I'll add that if you think training models takes a lot of energy, try launching fleets of rockets to maintain an artificial satellite constellation.

It's not the training what makes it difficult! It's the necessary research to invent machine learning algorithms which can be used to train a model to recognize birds. For multiple decades, this was way harder than maintaining a satellite constellation.

m463

16 hours ago

Also another window...

The National Park Service was started August 25, 1916 (only 108 years ago)

scarface_74

17 hours ago

I was just listening to a Planet Money podcast about GPS (https://overcast.fm/+AAYs-52QVys) while it wouldn’t be “global”, the US did have a terrestrial alternative location system until 2010 (https://en.m.wikipedia.org/wiki/Loran-C). With today’s technology, enhanced LORAN would have been more accurate if funded as a backup to GPS

the_af

8 hours ago

> This one always felt off to me. Humans spent millennia working out the navigation problem.

The point of the original xkcd was that the hard seems easy and the easy seems hard, especially to laypeople. It wasn't particularly about computer vision or LLMs. The exact problem is not the main portion of the observation.

weinzierl

17 hours ago

One could say, "This didn't age well." but I think the real point of "it can be hard to explain the difference between the easy and the virtually impossible" is only reinforced by an almost ironic twist that switched the hard and easy around. Who would have thought ten yeas ago?

yen223

15 hours ago

I thought it aged well. The problem really did become easy 5 years after it was posted

nerdponx

10 hours ago

And it did so specifically because of the hard work of professional high-caliber research backed by huge amounts of industry funding. Munroe knew exactly what he was talking about.

whywhywhywhy

9 hours ago

We had decent enough image classification for this task a few years before this comic. While the sentiment is true the example in it had already been solved.

pants2

8 hours ago

Here's the ImageNet recognition challenge results from 2011[1]. Bird is actually one of the easier categories they they call out, but still calling it 'solved' is a bit of an exaggeration.

The winning model is from Xerox Research Center Europe.

1. https://image-net.org/static_files/files/pascal_ilsvrc_2011....

rossdavidh

9 hours ago

Well I saw plenty of commercial software failing at this kind of thing even a few years ago, when presented with real-world pictures. Seems to be solved now, though.

Rexxar

41 minutes ago

If you look at the quantity of computation/energy spend for each task it's still true that the problem that looks simple to people is the one that is harder to solve.

stefs

12 hours ago

personally i can't help but think that it was this very comic that kickstarted the AI revolution. if i remember correctly a few weeks/months later there was a paper from yahoo/flickr (?) about this problem (identifying birds) and from then on the whole thing exploded overnight (i mean, i know this isn't what really happened, but to me it looked like it).

bonoboTP

11 hours ago

It's more likely the reverse, that Randall Munroe was aware of this emerging field but also knew that it's hard.

make3

6 hours ago

Absolutely demonstrably not, the convnet paper was in 1998 http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf & the Krichevsky/Sutskever/Hinton Convnets at scale on imagenet paper that broke everything was in 2012 (before the comic) https://proceedings.neurips.cc/paper_files/paper/2012/file/c...

stefs

9 hours ago

edit: found the flickr link - https://code.flickr.net/2014/10/20/introducing-flickr-park-o...

0x006A

17 hours ago

Its worth looking at the alt text of the xkcd itself: In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it.

bonoboTP

11 hours ago

It's a cute anecdote but of course reality is more boring. The assignment wasn't to solve human level vision in a summer, just to do quite constrained object recognition. But even that turned out to be quite hard. But Minksy wasn't that fully miscalibrated.

acdha

11 hours ago

I think it’s worth reading the original assignment:

https://dspace.mit.edu/bitstream/handle/1721.1/6125/AIM-100....

If you want to argue that “likely objects” is weaker than Munroe intended, I think that’s a valid position but also that we’re certainly overthinking it.

bonoboTP

7 hours ago

It's specific-object identification, not object categorization. Also in front of a controlled background. They had a limited set of known objects. There was no generalization expected. It asked if it's looking at one specific ball or one specific hammer, not the general category of bird in the wild.

In fact, that scope was solved fairly fast, using techniques like Canny edge detection and Minkowski fractal dimension features, Hu moment features on Otsu thresholding etc.

acdha

2 hours ago

I respectfully submit that this might be expecting too much precision from the alt text of a webcomic, xkcd.

actionfromafar

16 hours ago

Even in 2014, it was apparent that you didn't need new research per se to identify a bird, only a bunch of bird training data.

https://en.wikipedia.org/wiki/DeepFace

voiper1

17 hours ago

I like this ironic take that even the experts's expectations got upended with new tech.

krisoft

14 hours ago

But did it upended? Ponytail expected that they can solve the problem in about five years with a dedicated team. Obviously Ponytail is not a real person, so she did not get a real team and five year budget. But real people in the real world got it, done the work, and now this problem is solved and the solution is fielded.

You might split hairs on that it wasn’t five years but three or six depending on where exactly you put the threshold, but it seems to be roughly in the right ballpark of how it went in reality. At least close enough that I wouldn’t call the expectations upended.

fuzzfactor

11 hours ago

She came into a couple billion extra dollars and finished early.

dtech

17 hours ago

I don't think anyone with a passing understanding of computer vision seriously thought this wouldn't get solved at some point, just that is happened years/decades sooner than expected.

082349872349872

15 hours ago

s/sooner/later/: https://news.ycombinator.com/item?id=41655017

the_af

8 hours ago

He wasn't upended, in fact he states:

> Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive.

His main point remains.

the_af

8 hours ago

> One could say, "This didn't age well."

It has aged well, particularly because Randall explains that LLMs have made this even worse.

His main point is this (I'm quoting him):

  The key idea still very much stands though. Understanding the difference between easy and hard challenges in software development continues to require an enormous depth of experience.

  I'd argue that LLMs have made this even worse.

  Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive.

Eisenstein

8 hours ago

You are quoting Simon Wilson (his blog) not Randall Munroe.

the_af

4 hours ago

Ah, thanks. I had a brain fart and missed this.

But still, the point of the original xkcd wasn't "computer vision" right? As the caption explains, "in CS, it can be hard to explain the difference between the easy and the virtually impossible".

The only thing that hasn't aged well is the difficulty of identifying birds in photos. The adage remains valid in general.

albedoa

8 hours ago

Those quotes are not by Randall. They are by Simon.

dools

15 hours ago

> Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive.

Case in point: the other day my daughter was doing a presentation and she said "Dad can you help me find a picture of the word HELLO spelled out in vegetables?"

I was like "CAN I!!?!?! This sounds like a job for ChatGPT".

I'll tell you what: ChatGPT can give you a picture of a cat wearing a space suit drinking a martini but it definitely cannot give you the word HELLO spelled out in vegetables.

I ended up getting it to give me each individual letter of the alphabet constructed with vegetables and she pasted them together to make the words she wanted for her presentation.

sigmoid10

15 hours ago

It's always funny to read these stories if you know how ChatGPT actually works. Because if you know about tokenization, you know why this is definitely not a good job for ChatGPT. Exactly the same reason why it can't spell STRAWBERRY. Not because it doesn't understand the concept of fruits or vegetables or because it doesn't understand sophisticated concepts like metaphors or memes. It's not a good job for it because it doesn't see text in the way you see it. You see the world "hello" made up of individual characters, but the model sees it as a single token (the token with id 24912 for gpt-4o to be precise). It knows the meaning of this token and it's relationship to other tokens much the same way you know relations between words. But it has fundamentally no clue about the characters that make up this word (unless someone trained it to do so or by using spurious additional relations that might exist in the training data).

devsda

12 hours ago

> but the model sees it as a single token (the token with id 24912 for gpt-4o to be precise). It knows the meaning of this token and it's relationship to other tokens much the same way you know relations between words

In this context, if we assume that Deep Thought from Hitchhiker's Guide is an LLM, then the answer to everything[1] i.e. 42 makes sense. 42 is just the token id !

1. https://en.m.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%...

kridsdale3

3 hours ago

Oh my god, you cracked it. After all these years..

spherelot

13 hours ago

> But it has fundamentally no clue about the characters that make up this word (unless someone trained it to do so or by using spurious additional relations that might exist in the training data).

That was my theory as well when I first saw the strawberry test. However, it is easy test if they know how to spell.

The most obvious is:

> Can you spell "It is wonderful weather outside. I should go out and play.". Use capital letters, and separate each letter with a space.

The free tier ChatGPT model is smart enough to understand the following instructions as well which shows that its not just the simple words:

> I was wondering if you can spell. When I ask you a question, answer me with capital letters, and separate each word with a space. When there is real space between the letters, insert character '--' there, so the output is easier to read. Tell me how the attention mechanism works in the modern transformer language models.

Also somebody pointed out in some other HN thread that the modern LLMs are perfect for dyslexic people, because you can typo every single word and the model still understands you perfectly. Not sure how true this is, but at least a simple example seems to work:

> Hlelo, how aer you diong. Cna you undrestnad me?

It would be interesting to know if the datasets actually include spelling examples, or if the models learn how to spell form the massive amount of spelling mistakes in the datasets.

int_19h

7 hours ago

They can do this kind of thing, but in my experience, that makes the model feel "dumber" as far as quality of output goes (unless you make it produce normal output first before having it convert it to something else).

btown

6 hours ago

I wonder if there's research being done on training LLMs with extended data in analogy to the "kernel trick" for SVMs: the same way one might feed (x, x^2, x^3) rather than just x, and thus make a linear model able to reason about a nonlinear boundary, should we be feeding multimodal LLMs with not only a token-by-token but also character-by-character and pixel-by-pixel representation of prompt texts during training and inference? Or, allow them to "circle back" and request they be given that as subsequent input text, if they detect that it's relevant information for them? There's likely a lot of interesting work here.

dools

14 hours ago

This reminds me of how the alien brain species in Futurama went about gathering all the facts in the Universe.

"Beavers mate for life, 11 > 4"

alister

13 hours ago

> if you know about tokenization, you know why this is definitely not a good job for ChatGPT

Why are other LLMs able to do it? (Other comments show images successfully generated with grok and flux.1)

sigmoid10

6 hours ago

You can train the model to do these things in the same way you can train a blind person to describe the colors of objects. But if you put them in an unknown environment or give them a texture they've never encountered before, they will have no idea how to perceive its color. This is a fundamental problem for LLMs and won't change until someone invents a method that gets rid of tokenization for good.

marcellus23

8 hours ago

> the token with id 24912 for gpt-4o to be precise

How do you find this out?

conesus

6 hours ago

GPT-4o uses BPE (byte pair encoding). They released `tiktoken` which allows you to count tokens in strings in python.

    pip install tiktoken
    >>> import tiktoken
    >>> encoding = tiktoken.encoding_for_model("gpt-4o")
    >>> print(encoding.encode("hello marcellus"))
    [24912, 2674, 10936, 385]

MarioMan

6 hours ago

OpenAI provides a tokenizer tool: https://platform.openai.com/tokenizer

netcan

11 hours ago

To hazard a silly question...

Why can't gpt pick up a non-fundamental "understanding" of letters and spelling from the data?

I mean... I do think "see letters" either when speaking/hearing, but I do know how to pull up those letters when necessary.

topoftheforts

15 hours ago

You can do that with flux.1, it's the best image model right now as far as I'm aware, especially for dealing with text.

This is the result for "vegetables spelling out the word "HELLO"" I used flux-pro on Replicate https://ibb.co/1RVKmdk

red1reaper

13 hours ago

Wow, it used very intersting vegetables, the very common E shapped cucumber and of course, the commonplace O shapped Tomatoe with a void in the middle. Very usual vegetables, can buy them in any grocery store.

ziddoap

10 hours ago

Sometimes I wonder if anything can actually impress anyone on this site.

Words have plagued image gen since the start. Now there is an image model that, with an extremely simple prompt, does an awesome job with words.

If they expanded their prompt and played with a few seeds until an image with perfectly realistic vegetables were generated, I wonder what the next complaint would be.

Workaccount2

10 hours ago

People are threatened. They are not going to celebrate the thing that makes them irrelevant. They are going to talk shit about it and downplay it.

longtimelistnr

8 hours ago

im not sure if threatened is correct, i think it's more a collective "why?" And there hasn't been a particuarly convincing answer

TillE

8 hours ago

Right, tons of gen AI stuff is super impressive. It's really cool that we can do all this stuff. In this thread we're talking about a fun toy.

But the actual practical applications are like, small useful tools. There's no real sign we're heading for a world-changing trillion dollar industry.

dools

15 hours ago

Nice! I haven't had a go at flux yet but I'll keep that in my back pocket for the next time I need to spell a word using vegetables.

hnbad

15 hours ago

It got the letters right but none of the vegetables it used exist on this planet.

kgeist

14 hours ago

Wikipedia has a dedicated article: https://en.wikipedia.org/wiki/Unusually_shaped_fruits_and_ve...

thaumasiotes

15 hours ago

When I see the prompt "the word HELLO spelled out in vegetables", I expect realistic vegetables being assembled into the appropriate shapes, such that e.g. the O is made of many different vegetables arranged in a circle.

I don't expect imaginary nightmare vegetables.

gpmcadam

15 hours ago

Grok did OK? (Grok 2 Mini Beta)

https://i.imgur.com/mvnusFd.jpeg

card_zero

14 hours ago

What kind of vegetable is the first 'l' made out of? It's like tiny chicken nuggets.

RGamma

13 hours ago

Expired cauliflower.

sfn42

13 hours ago

Looks a bit like cloudberries but not really. More like fried chicken

rmorey

10 hours ago

Grok just uses Flux, fyi

dools

15 hours ago

Not bad! I just tried with 4o and ChatGPT is still failing hard.

Have I just inadvertently invented a new benchmark!?

randomdata

15 hours ago

No. Had you asked for it to spell "strawberry", though...

thaumasiotes

15 hours ago

> Not bad!

Well, it's not a great example of spelling "HELLO"...

dools

14 hours ago

Hey we would have been happy with a case insensitive but otherwise correct answer ...

sd9

13 hours ago

I just tried it with ChatGPT 4o and it seemed to do a good enough job with the first prompt I tried (which I copied from your comment).

https://chatgpt.com/share/66f530b0-3fb8-800a-8af9-8a3e48a31a...

dools

12 hours ago

I can’t see the pick in the shared link but when I tried with 4o it gave me HIIILO

EDIT: maybe it’s influenced by my custom instructions and memories. I write code all day with it and I have custom instructions specifically to get the type of output I like for code, mostly focused on brevity.

GaggiX

11 hours ago

Ideogram v2 first take on the prompt: https://ideogram.ai/assets/image/lossless/response/v_LgyXI1Q...

https://ideogram.ai/assets/image/lossless/response/V4RRDJZJS...

The carrots in the first image are kinda funny.

EDIT: Bonus Hacker News: https://ideogram.ai/assets/image/lossless/response/SW0B7y4jR...

fernandotakai

15 hours ago

funny enough, grok was able to generate one https://i.imgur.com/qt89arC.png

card_zero

14 hours ago

I think they're trying to spell "HELP".

int_19h

7 hours ago

At one point, asking ChatGPT to output SVG spelling out "HELLO" would pretty consistently produce something like "LOL".

Dave_Rosenthal

6 hours ago

What no-one is pointing out is that LLMs have made almost as much progress on the first part of the request as the second! ChatGPT writes me a is_point_in_national_park function and points me to the relevant shapefile in like ~30 seconds. That's a few hundred times speedup of the "few hours" referenced in the comic.

itslennysfault

9 hours ago

I did this tutorial series to try to get some context/foundation in deep learning, and the first lesson was building the bird thing from this comic. It was really easy and fun. The whole course is great. Highly recommend for anyone who has a programming background and wants to get a solid intro to deep learning.

https://course.fast.ai/

spaceman_2020

9 hours ago

Vision LLM is really remarkable

Had a project that involved describing and cataloging over 20,000 images.

Traditional method using real people would take months and crap load of money (the descriptions have to be customer-readable)

OpenAI’s vision API does it for cents per image. Must have spent under $200 for the whole thing

forrestthewoods

9 hours ago

Did you have humans verify OpenAI’s descriptions were accurate?

lolinder

9 hours ago

For a lot of applications it doesn't matter—you can check the descriptions as you use them as long as everyone knows they may be flawed, and checking 20k descriptions at point of use is still orders of magnitude easier than writing them from scratch.

atrettel

9 hours ago

I suppose at that scale you may not have to worry as much as errors like this, but I still think the point that there needs to be a human in the supply chain shows how much of AI is in fact just laundering of human intelligence. There's even an XKCD on this [1]!

[1] https://xkcd.com/1897/

lolinder

8 hours ago

Is the fact that there needs to be a human QC person at a factory evidence that all this purported automation in factories just a laundering of human labor?

danielbln

6 hours ago

Yes, it is, but it's not the slight that grandparent makes it out to be.

spaceman_2020

5 hours ago

They were far better and more accurate than the $6/hr VAs from Philippines we were using earlier. They were perhaps not as 100% accurate as someone spending half an hour writing out each description, but for our use case, the level of accuracy we were able to achieve with vision LLMs was good enough.

dvh

17 hours ago

Just detect if photo contains common bird color and ship it. We'll fix it later when we decimate the competitors.

BWStearns

8 hours ago

I wonder if we'll ever hit a critical mass of technical literacy where this kind of misunderstanding largely disappears. Ten years ago I would have said yes. Now I think the advances in UX/UIs and the appification of everything have insulated the median person from the details. That's good as far as individual products go, but in aggregate might lead to unrealistic expectations. I've heard younger folks ask questions about "why doesn't x just do y" that I previously could only have imagined my very non-technical parents' cohort asking.

At least in the 80s, when computers roughly equalled magic for much of the population (looking at you Wargames!), most people didn't really have to interact with it. Their expectations about computers were roughly as important as my expectations about alien life. But I'm afraid that magical thinking about tech will be of greater consequence both individually and societally.

stefanos82

13 hours ago

@simonw, in case you read this, can I kindly ask you a tiny favor please?

Would it be too much to ask you to start livestreaming any coding of yours that can be shared publicly?

I would love to learn so many things from you, especially around your current ecosystem, that is Python, SQLite (data), and JavaScript.

appendix-rock

13 hours ago

I’d here to hear what your definition of a non-tiny favour is!

amp108

3 hours ago

To be fair, 10 years ago the programmer said "I'll need a research team and five years".

Almondsetat

10 hours ago

GPS is a ready-made infrastructure that took decades of hard work to build and maintain. When the comic was made, image recognition didn't didn't have the same done for it, but now with pre-trained models everyone can do it in 5 minutes too.

righthand

5 hours ago

> Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive.

That’s because the idea of it being a super human intelligence (an undefined metric) is being sold. So you have to lie and say “it’s amazing, it’s going to change everything”. If I tell you “it’s okay and is often wrong” you wouldn’t buy my product would you? This is just to say I can’t blame that on easy/hard task agency, specifically.

=== addendumb ===

“it’s okay and is often wrong” Sounds like working with my junior coworker who I don’t enjoy pairing with. If I said “it’s impressive how the results are to the level of a junior engineer” you sell me on your product.

cung

18 hours ago

It took a bit over five years, but now checking if it’s a photo of a bird is the easier task.

qwertox

18 hours ago

Is it? I assume that you are thinking of using a 3rd-party API endpoint to which you upload the image so that the service decides for you if it is a bird and which kind of bird it is. Or you use something like Firebase.

Because if that is the way you'd solve this problem, then just sending lat/lon to a service to determine if it is in a national park is even easier, as it's just a GET request.

I'm still unsure about what would be harder to set up locally.

llm_trw

18 hours ago

https://tech.amikelive.com/node-718/what-object-categories-l...

Bird is in the default COCO dataset. I haven't look for birds in images, but for people yolov10 is fewer lines of code to detect if there is someone in the frame than it is to setup a flask server for the API calls.

a_t48

18 hours ago

Using a service for this would kind of suck if you're _in_ the national park, due to cell service. The ML portion of this is probably still a bit harder than the GPS bit.

ML:

    Grab yolov8

    (Optional?) Fine tune it on bird pictures

    Convert it to CoreML, do whatever iOS stuff is needed in XCode to run it

GPS:

    Get https://www.nps.gov/lib/npmap.js/4.0.0/examples/data/national-parks.geojson (TIL this exists! thanks federal govt!)

    Stuff it into the app somehow

    Get the coordinates from the OS

    Use your favorite library for point+polygon intersection to decide if you're in a national park

    Bonus: use distance from polygon instead to account for GPS inaccuracy, keeping in mind lat and long have different scales.

...actually the ML one might be easier, nowadays. Now I kind of want to try this.

xxs

17 hours ago

> Use your favorite library for point+polygon intersection to decide if you're in a national park

it's like 50ish lines of code in C, just iterating over the points (with the polygon represented by arrays of points). The algorithm is linear with regards to the points.

a_t48

17 hours ago

True!

unsigner

17 hours ago

"easier tasks" is arguable and arguably wrong

"task about which you will find more easy-looking tutorials hiding the complexity under a blanket of 3rd party code and services" is better

YurgenJurgensen

16 hours ago

You just described all computational tasks that don’t open with ‘first, dig up some sand and some copper ore’.

unsigner

12 hours ago

checking whether coordinates fall inside a national park is an exercise in computational geometry, surveying (the earth is not flat, what are coordinates on a sphere), databases, access to government data.

detecting birds is an exercise in gathering properly labeled training sets, neural networks and their topologies, matrix multiplication performance and/or orchestration of rented GPUs.

both of which cover interesting tasks, worthy things to learn, and are by no means easy.

"easy" is the bird recognition where you do an API call to a totem pole of third party services.

aiaiaiaiaiai

14 hours ago

Except all the stuff millions of software engineers work on for billions of hours per year.

marricks

18 hours ago

For a person to set up but definitely not how many cpu cycles are burned

munchler

17 hours ago

Your phone already does both automatically, so I’d call it a draw.

consp

18 hours ago

At what confidence level are we talking about? With these over simplified questions (as in the xkcd) my guess would be the asker assumes 100%.

shiroiushi

17 hours ago

You're not going to get 100% confidence with either problem. The GPS one might be easier to get high confidence with, but even here you have to worry about 1) the accuracy of the GPS coordinates from your camera/phone, which isn't that good, and 2) calculating the exact boundaries of the park from the public data. So you could probably calculate with nearly 100% confidence that you're, for example, within 5km of a park, but if you take the photo from a location close to the park's boundary, the confidence will go way down. If you're a meter or two from the boundary, forget it.

vinibrito

17 hours ago

How could the gps position and the park boundary not be exact? Phones GPS give a 2 meters accuracy, and a park boundary is a well defined hard line polygon.

Being close to the border changes nothing, I can just add a buffer outwards the park polygon to account for that. Asking because I'm afraid I may be missing something here due to this being something I already worked on.

shiroiushi

17 hours ago

>Phones GPS give a 2 meters accuracy

Well I already pointed out that if you're within a couple meters of the boundary, you won't have good confidence because of this fact.

>and a park boundary is a well defined hard line polygon.

Is it? I'm no expert on parks, but surely some of them have borders along rivers. Many US states have such borders.

>Being close to the border changes nothing, I can just add a buffer outwards the park polygon to account for that.

That doesn't account for the 2m accuracy. What if I'm standing exactly 1m from the boundary when I take the photo? You have no idea if I'm really in the park or not from the GPS data.

I also have serious doubts about your 2m accuracy claim, based on personal experience. Maybe if you're standing in a wide-open desert with nothing around you, but anywhere else, the accuracy isn't that great, especially around buildings. GPS accuracy is terrible in cities.

ddoeth

15 hours ago

> Is it? I'm no expert on parks, but surely some of them have borders along rivers. Many US states have such borders.

Depending on the country, but Australia has some [1]. I still think that there is a set of polygons that can be used to describe this border.

Not to argue against your point (I rarely get less than 4m of accuracy), but luckily

> but anywhere else, the accuracy isn't that great, especially around buildings. GPS accuracy is terrible in cities.

cities are (almost?) never in national parks.

[1]: https://www.nationalparks.nsw.gov.au/-/media/npws/maps/pdfs/...

shiroiushi

14 hours ago

>cities are (almost?) never in national parks.

Sometimes they are. See Washington, DC.

Anyway, the requirement is for determining if a photo was taken in a park or not. The resolution wasn't stated, however: just how accurate do we need to be? If I'm in a canoe in a river that borders a park, but the river isn't part of the park, but the shoreline a few meters away is, our algorithm might claim I'm in the park, when I'm really not. The requirement wasn't "somewhere near a park", but "in a park". Rivers change their courses over time, so some polygons aren't going to accurately describe this border.

emj

16 hours ago

If you need to know 100% that the bird is in the park at that precise moment it can be tricky. If you need to identify a Bird-of-prey in the Alpha quadrant you can understand the Klingon proverb a sharp knife is nothing without a sharp eye.

vincnetas

17 hours ago

What does it even mean "YOU are in the park". What is YOU? If you standing on the boundary, are YOU in or out? Details mater :D

shiroiushi

17 hours ago

Exactly! What if you're standing on the boundary, with one foot inside and the other outside? These specifications are far too vague.

thaumasiotes

15 hours ago

> but now checking if it’s a photo of a bird is the easier task.

That depends on whether you care about getting the answer right. If you don't, it was always the easier task.

If you do, Seek by iNaturalist still can't do this job, and that's the only thing Seek is supposed to be able to do.

thih9

14 hours ago

Note that we are still within Randall’s expectations - the initial estimate for the project at the time was five years and ten years later there is a publicly available solution.

It would have been interesting to see the reverse - the problem becoming trivial in less time then the project’s estimate.

Davidzheng

13 hours ago

Convnets had the problem solved around time the comic was released. Recall a post but flickr or something but it wasn't hard by then either

afro88

14 hours ago

And even then it doesn't work 100%

ta1243

15 hours ago

1425 came out in 2014.

"I'll need a research team and 5 years"

In 2020 The BBC had this blog about cameras detecting not just "is it a bird or superman", but what types of birds

https://www.bbc.co.uk/rd/blog/2020-06-springwatch-artificial...

I guess Cueball got the team together.

ryzvonusef

17 hours ago

Iirc, Flickr had implement bird detection within a few months of this xkcd coming out?

EDIT: A month, https://code.flickr.net/2014/10/20/introducing-flickr-park-o...

It's so weird them explaining 'Deep Networks'. Language on AI has definitely changed in the past ten years.

Also, hilariously, the page they created to demonstrate this (http://parkorbird.flickr.com/) no longer works. Oh, how time flies.

explain XKCD page for good measure: https://www.explainxkcd.com/wiki/index.php/1425:_Tasks

riiii

15 hours ago

Back in the day I had a manager that didn't understand programming.

To him, it was just one button that would open this small info window. Just one button. Just one window.

It took him weeks to understand that we didn't have the data ready he wanted to show. We could do it, but it would take weeks of research and development.

cubefox

12 hours ago

> They're computer systems that are terrible at maths and that can't reliably lookup facts!

It seems plausible OpenAI's most recent model is better at math and googling than the average human.

mirrorlake

9 hours ago

I've seen math PhDs mess up addition and subtraction on a whiteboard, though.

Beating the 99th percentile human at any subject should not be difficult when the LLM training is equivalent to living thousands of lifetimes spent reading and nearly memorizing every book ever written on every university subject.

The fact that it only just barely beats humans feels hollow to me.

For those who've seen it, imagine if at end of Groundhog Day everyone in the crowd went, "Wow, he's slightly better than average at piano!"

tomw1808

12 hours ago

I agree. But then, a TI-30 is also better at math than the average human. Can't google though...

ezoe

5 hours ago

Many of the futuristic items I read on manga(especially Doraemon) back in the day I was a child became a reality.

ProllyInfamous

3 hours ago

My mother — daughter of a literal NASA rocket engineer — lost her father around c~2000. She said to me c~2012 "I sure wish my father had lived long enough to see the iPhone's invention: watching scifis he'd tell me I'd live long enough to see telecommunicators even more capable than these movie-props."

This mother died six months before ChatGPT's debut, and I thought "sure wish Mom had lived long enough to see LLMs." It wasn't until her first anniversary, flipping through some of our letters, that I realized she had tasted GPT-2 while we played around with ThisWordDoesNotExist.com (made-up words & their "definitions").

Maybe I'll live long enough to see telecommunicators even more capable than these pretrained-transformers.

1f60c

16 hours ago

One thing I’ve always felt is that the relative difficulty of each task seems to have flipped? I could write a bird classifier in my sleep using fastai, but I have no idea how to do a GIS lookup.

brazzy

10 hours ago

That has nothing to do with the difficulty of the task and everything with what APIs you are personally familiar with.

moffkalast

16 hours ago

An LLM will tell you how to do the GIS lookup, but ironically as privacy laws become better it will genuinely become a harder and harder task unless the user explicitly wants you to do it.

ehnto

14 hours ago

I don't think that makes the task harder. You should just not do the task at all if the user doesn't want it.

Flashback to when apple rolled out their enhanced privacy tools and when I said "that data is blocked unless the user gives us permissions" the project manager un-ironically anguished "But how will we track them?!"

Won't someone please think of the commercial interests?! /s

ulrischa

7 hours ago

I submitted this because I fully agree. The dev world is getting easier but also more complex when it comes to guessing the expense for software

sva_

14 hours ago

Btw, Jeremy Howard implements this classifier within the first 10 or so minutes of his excellent diffusion course (with reference to the comic.)

https://nitter.poast.org/jeremyphoward/status/15180380012924...

temporallobe

5 hours ago

One is using an existing “off-the-shelf” technology, the other is not.

DevScout

15 hours ago

It's interesting how problems that were once difficult, like bird detection, have become routine with AI advancements. However, the increasing complexity of AI systems brings new challenges, particularly in areas like privacy and decision-making.

mhb

9 hours ago

Great, but "check whether the photo is of a bird" should be "center text in three columns".

rlalpq

13 hours ago

Unfortunately, there is limited interest in bird classifiers. There is intense interest in classifiers for military applications:

https://en.wikipedia.org/wiki/AI-assisted_targeting_in_the_G...

The good thing about selling to the government is that it does not care whether the product is snake oil or not. People on the other hand get tired of AI quite quickly.

roymurdock

10 hours ago

Wow, really puts the amazing progress over the last decade into perspective. Thanks for sharing

jokoon

11 hours ago

I want to ask the question: if software engineers had 1000 times or 1 millions times, or 1^12, or 1^30 more computer power and storage, would they be able to make advance in AI or go toward general intelligence?

I would say probably not.

Unless computer scientists try to understand what biological intelligence really is, AI will not advance.

int_19h

7 hours ago

It's not a given that there is even a coherent definition of what "biological intelligence" is.

llm_trw

17 hours ago

For those wondering what the state of the art is, I did this in 10 min between meetings.

The most challenging thing was getting the right indexing on the lists since it could be an empty list, a list with a single value, or any number of values.

    from ultralytics import YOLO
    
    model = YOLO("yolov10x.pt")
    results = model("birds/*", classes=[14])
    
    for result in results:
        if 14 in list(result.boxes.cls):
            print("Bird!")
        else:
            print("Not Bird")

I will now take my 5 billion dollar please.

Ironically enough the meeting I'm going in to is to explain why I need to use $100,000 worth of hardware for three months before I can answer the business side if what they want to do is possible or not.

hbossy

15 hours ago

Very nice. Now run it on a 2014 phone.

aiaiaiaiaiai

14 hours ago

Now run anything on a 2014 phone.

ta1243

15 hours ago

With 2014 code

CardenB

17 hours ago

Probably better to use openclip

llm_trw

17 hours ago

You're welcome to provide the code.

CardenB

17 hours ago

Here’s some code: https://github.com/mlfoundations/open_clip?tab=readme-ov-fil...

llm_trw

17 hours ago

You're as helpful as the original gpt2.

joseangel_sc

9 hours ago

i’d love to read more about the artifact thing that failed because i don’t understand it

p0w3n3d

17 hours ago

The only thing hard here nowadays will be no mobile signal in a national park

mike_hearn

15 hours ago

That's not hard. You just ship the polys of all the national parks as data files in the mobile app and you're done.

left-struck

17 hours ago

The bird checking function could run locally and GPS doesn’t require mobile phone signal. It will be much less responsive though, and you need all the relevant offline data

ryzvonusef

17 hours ago

Starlink mini has eliminated that.

hiergiltdiestfu

5 hours ago

aged badly in a very good way :D

m463

16 hours ago

should be here:

https://www.xkcd.com/1425/

EGreg

10 hours ago

These brand new AI-assisted proto-programmers are having a crash course in this easy-v.s.-hard problem.

Ha! Problem? Just ask ChatGPT. Come on, that one was easy…

ps: I am only somewhat joking this time. I think AI agents using just the current state of the art can reliably come up with solutions which the top 1% of humans can come up with. We just need to build them properly. And then deploying swarms at scale, we can scale up 1000x. Scale is all you need at this point. I don’t mean the model parameters, I mean “consensus of expert agents arguing”

pps: if you doubt me, look up the talk about “centaurs” beating a computer and a human alone, after Kasparov was bested by Deep Blue. Most of that talk died down 10 years later (Kasparov still believed it after everyone else stopped). No humans in the loop is going to come about quicker than you can predict.

liquidise

15 hours ago

Software is a house. The more you lean into the analogy, the better.

I often tell clients "The first thing you asked me to do was to move a dining room chair into the living room. Then you asked me to do the same with the toilet. The latter only works if we tear out all the plumbing."

Non-coders seem to understand these analogies intuitively.

travisgriggs

10 hours ago

I must be doing something wrong then. I’ve used almost exactly this analogy. But rather than mollify the client it leads to frustration and annoyance: they see the features as similar and are frustrated/skeptical about why I’ve claimed this new sitting feature is so different than the other one. Often goes with sentences that start with “can’t you just…”.

dumbfounder

9 hours ago

Steve Jobs would just say "figure it out". Way less condescending because he isn't trying to offer a solution to something that is your domain of expertise, but it also says he doesn't care if you think it's not feasible right now. Think hard and find a creative way to make it feasible. This often needs to be said because people tend to jump to the conclusion that it is too hard before they take time to think about possible solutions.

dotancohen

8 hours ago

  > Steve Jobs would just say "figure it out".

Give me the budget that Steve Jobs gave to his engineers, and I'll figure it out too.

The difficulty in what my clients are asking for is often not the feasibility but rather the time, monetary, and opportunity cost for ignoring other features.

chubot

7 hours ago

I"m not sure what "budget" you're referring to

Apple didn't pay well for most of its life, until well after iPhone/iPad success

It was not like "FAANG salaries", which started around 2011, by my reckoning because Facebook didn't agree to the "Steve Jobs collusion" with Google/Intel/Pixar (ironically!)

This is very well documented

https://en.wikipedia.org/wiki/High-Tech_Employee_Antitrust_L...

Steve Jobs e-mails Eric Schmidt (2007) -- https://news.ycombinator.com/item?id=28699873

Steve mostly realized the good engineers want to work with other good engineers, no matter what they are paid

He didn't actually give people "budget"

roughly

7 hours ago

A budget is also time, resources, and latitude. Steve Jobs’ “Figure it out” was “I’ve given you a task, now go do what’s necessary to get it done”. This is what the post you’re responding to is referencing: the client is asking for something to get done but is not providing the resources required.

As a side note, engineers tend to be bad at couching conversations in these terms - nothing is impossible*, things just cost more or less, and that’s a decision for the money people to make.

* yes yes caveat caveat

saghm

7 hours ago

I'm not sure this makes the idea of him saying "just figure it out" any more applicable to the rest of us (which is what the parent comment was responding to). You've pointed out that he was _unsuccessful_ for a number of decades before hitting the jackpot, and I'm dubious that the only reason he was more successful than everyone else in the long run was due to the rest of the industry just not trying hard enough. That means it was some combination of his specific skills or circumstances beyond his control, and either way, it's not really super actionable advice to "just figure it out (and either be uniquely talented or extremely lucky)".

al_borland

7 hours ago

When I think of “budget” in this context, I’m thinking about what is available to the engineers in terms of tools, resources, and time.

Where I work, it doesn’t really matter what I’m paid, we don’t have a budget to go get what we need. If I think a certain tool would help get a job done faster, too bad, I have to cobble together what I can with the tools I’ve been provided. If I think something will take a year, too bad, it’s due in a month and we’ll have an hour long meeting every day to ask why it isn’t done yet.

In this context, employee pay is largely irrelevant.

salamanderman

7 hours ago

I can't tell for sure if the "ironically" refers to it feeling ironic that Facebook didn't collude, or that Pixar did collude. If it's the latter, it's because Jobs was CEO and primary shareholder of Pixar and orchestrated it's purchase by Disney to get himself a seat on the Disney board. Ironically, if I'm recalling the biography correctly, Jobs made significantly more money from Pixar than he did from his first phase at Apple.

dotancohen

4 hours ago

I was referring to project budget, not pay.

hinkley

7 hours ago

I had a coworker who wanted a BMW more than anything. Typical young Indian male aspiration of that era (everyone had a BMW as their wallpaper).

He got the money together to buy one. Or thought he did. Forgot about insurance, being a young male and little driving history. Doubled his payments.

Also he was a terrible driver. The expensive thing about BMWs is not the car it’s repairs. Which is also why the insurance is so high. He wrecked that thing three times. He probably could have gotten his family into a better house for the amount of money he burned on that car. So foolish.

Customers live out this sort of drama all the time. They need a good used car but they want something that’ll bankrupt them because they have an image in their head. Either a dream or a status symbol.

Gordon Ramsay did a whole TV series about restaurateurs with the same mental block. They want to be successful but they don’t want to pay their dues and pretend instead.

jareklupinski

9 hours ago

> "you asked me to move a dining room chair into the living room. Then you asked me to do the same with the toilet."

> "figure it out"

that's the kind of motivation that leads to someone making Armchair Toilet

https://i.pinimg.com/originals/ce/11/34/ce1134ca396e43f55c84...

mey

7 hours ago

Or a portable chemical toilet which is not uncommon in campers.

ska

9 hours ago

“Figure it out” can work well, but only if it is predated by “what resources do you need” and a reasonable solution about that.

If the “why can’t we do X” is “because we’re a team of 4 and already overburdened” there may not be any practical solutions.

CSMastermind

9 hours ago

I generally find that more resources are often the easiest thing to get and rarely helpful.

Most of the time the issue is the "9 women can't have a baby in a month" problem where adding more resources is not going to make things happen faster - in fact it may slow things down.

But the business doesn't understand that software is not like construction where adding more people really will get that ditch dug faster.

At its core software is not making a thing - it's inventing a machine that makes the thing.

If you're Little Debbie and you have a machine that makes cupcakes it's hard when the business comes in and says, "now it needs to make fudge rounds." And no amount of extra people working on the problem will get that machine retooled any faster.

ska

4 hours ago

Oh, I wasn't meaning to suggest that all problems are solvable by throwing more resources at it, that clearly isn't the case.

But it's super common for a team to be asked to do something fundamentally outside their scope, and the asking client/boss not realizing that they are doin g it.

"Figure it out" as a useful strategy sort of assumes that both sides of the equation are clued in about this and roughly on the same page.

"I underwrote a great team, you should be able to figure this out (stop whining about it being hard)" is a fundamentally different statement than "Why can't you just do X, how hard can it be?"

It's also worth remembering that the plural of "resource" is not "team".

bluefirebrand

7 hours ago

> But the business doesn't understand that software is not like construction where adding more people really will get that ditch dug faster.

People tend to understand that there's an upper limit to the amount of people that can be deployed to dig a ditch any faster too

They just refuse to believe that with software, sometimes that number is smaller than they think it should be simply because people aren't physically getting in each others way

GolfPopper

8 hours ago

"Get a spaceship to Alpha Centauri next year. Oh, and it has to have a crew of at least 20 people, who all return safely to Earth by the end of the decade." I'll leave you to think hard and find a creative way to make it feasible.

Sure, sometimes you can re-define the goal. ("We need to know more about what's in the Alpha Centauri system.") But sometimes domain expertise means telling non-experts about reality. (Pi is not 3, no matter how much someone thinks it should be.)

rolisz

8 hours ago

The problem is not the figuring out. The problem is the client paying for the solution.

hinkley

7 hours ago

Not everyone should try to be Steve Jobs.

I had a boss who I’m now sure had aphantasia and his two best UI people would tell him something wouldn’t fit in backlog grooming, then we’d get to spend a week showing him all the different ways it doesn’t fit before he would shut the fuck up about it.

We already “figured it out” in our heads because we are equipped to do the job. We were already doing some pretty sophisticated typesetting to make more things fit on the screen than really did.

He was sure he knew where the problems were in the org and that they weren’t him.

Clamchop

8 hours ago

I've built you a toilet with a collection bucket that smells, needs regularly scheduled emptying into the other toilet, and it overflows anyway when you throw parties.

dotancohen

8 hours ago

Sounds like a good fit for a Crohn's job.

yed

9 hours ago

Since they referred to talking to “clients”, I’m guessing the problem wasn’t that the task couldn’t be done but that the client didn’t want to pay more for the additional work.

klabb3

9 hours ago

While I agree this is an improvement, and that it respects the agency of the worker, it’s not enough. Pushing true boundaries may in rare cases be catalyzed by an inspirational leader, but that’s exceedingly rare even if survivorship bias makes us think different. On the contrary, our tech graveyards are littered with bold bets that failed because leaders deluded themselves with yes-men.

salawat

8 hours ago

Reply: what's my resource budget? Not figuring out something utterly pointless because you won't want to spend the resources to make it happen.

See: the full rewrite/retool

reaperducer

7 hours ago

Reply: what's my resource budget?

This resonates with me. I get a lot of "But Microsoft Office 365 Cloud does…"

I tell them that Microsoft spent a quarter of a billion dollars on that item last year, and I'm perfectly happy to re-create it if they give me a quarter of a billion dollars, too.

herval

8 hours ago

to be honest, it's a flaky analogy at best. Houses are physical constructs - you can't have the toilet be physically in two different places of the house at the same time. Also dramatic changes (eg rebuilding an entire house) take more time than simpler changes (painting a wall), which isn't the case in software (sometimes a tiny change takes longer than building entire features)

TuringTest

7 hours ago

A better analogy is a tapestry. Developers are like weavers, threading designs line by line on a substrate. Some task like "add a leaf to this flower's stem" can be done incrementally, but others like "move the flower one inch up" require unweaving the whole design and rebuilding it from scratch.

xhrpost

7 hours ago

I've gotten to the point where I now see the word "just" as profanity in our profession.

ConcernedCoder

9 hours ago

ask them if they push back the same way with other knowledge professionals like the family doctor or lawyer...

Larrikin

9 hours ago

Doctor might not be the best example. There are tons of Facebook groups out there treating all their diseases with crystals and a bit of WebMD. Had to sit an extra 20 minutes at the dentist just this week because some lady refused the XRay and was lecturing the staff and then the dentist about how they should be able to figure out her problem with just their eyes.

torstenvl

7 hours ago

It's also a bad example because most doctors people encounter are overworked, on a budget, and not very smart,* which means they're very frequently wrong.

Assuming the patient is as intelligent as the average on HN, and motivated about their health, they may well be able to learn more about what's going on with their health in the month it takes to get an appointment than the doctor will in the ten minutes they spend with you.

(* Because most people who live in cities, and most people who go into primary care in cities do so because they weren't competitive enough for one of the more interesting and lucrative specialties. It's a similar dynamic as to why most people in the tech industry that people encounter—IT help desk reps—are not usually the cream of the crop.)

jrd259

7 hours ago

This is an insult to the doctors I know who deliberately chose to do primary care because they were more motivated by service than money or prestige. For that matter, I know a developer who chose to work in IT support for a cancer research center because he put more value on helping to cure cancer than making more money. There actually are people out there who value service to the community more than fame or fortune. They deserve praise, not scorn.

torstenvl

3 hours ago

It isn't an insult to anyone. I very clearly said "most" and "usually." We also have excellent IT support, done by developers, because we specifically negotiated for it. They are excellent. But we specifically included that in the contract because it is an inarguable, objective fact that most IT help desk support is not staffed with the most skilled people in the tech industry.

Unfortunately, the kind of sacrifice and selflessness you describe is not the norm in our society. As a result, the dynamic I articulated holds for "most"—just like I said.

Your post is deliberately dishonest in its characterization of what I said, and excessively hostile. That kind of behavior is not consistent with the community norms on HN. I urge you to reconsider how you interact with people here.

sandworm101

9 hours ago

Yup. I know a guy who refused a filling because he didn't want "forever chemicals" put in his mouth.

madeofpalk

9 hours ago

Yes? It is very common for people to have unqualified legal or medical opinions, and tell them straight to the professionals.

haswell

9 hours ago

This isn’t a good analogy imo, and I’d argue that it’s often healthy for someone to push back this way.

I’ve been on both sides of this fence as a customer and a developer, and have at times had to straddle the fence as a PM.

The hardness of a problem does not mean the problem can’t or doesn’t need to be solved.

A client pushing back is often their way of trying to ensure that the developer actually understands what they want.

“Pushing back” as a developer is often mostly about setting expectations, i.e. “no, this is very different, won’t benefit from previous work and won’t be easy”.

All of this is necessary to gain a shared understanding, and the end result may still be that the work must be done.

jsight

5 hours ago

IMO, the "can't you just" people do this with everyone. I can understand why, as it is sometimes effective at queuing people to keep trying and to find a way.

Unfortunately, the part after the "can't you just" is rarely helpful.

WorldMaker

7 hours ago

If they are an "Expert", they can do anything.

https://www.youtube.com/watch?v=BKorP55Aqvg

berkes

9 hours ago

If I'm to believe a friend, who'se a family doctor, this is exactly what happens.

Especially since corona. And not a tiny minority, but a big group.

He told me that a few decades ago, there were the occasional homeopathic or other "nutcases". But that today this is common.

What I've read about this, it's a global trend, in a fly-wheel (feedback-loop) effect with mostly populism. Populism feeds distrust in authorities like lawyers, doctors, journalists. And distrust in authorities feeds populism.

dumbfounder

9 hours ago

There are good doctors and bad doctors just like any other profession. I have gotten plenty of bad advice over the years from doctors. Took me 2 years of going to various doctors to figure out I was having a reaction to mold. My neurologist just had me try all these different medicines, almost all made me worse, but none of them got to the actual problem. This is VERY common. Doctors so often just treat symptoms. That said, there are idiots out there that don't listen to anything a doctor says about politically charged issues like Covid. Homeopathy is mostly nuts. But integrative medicine is also often poo-pooed, but it makes logical sense to treat the whole person, not just symptoms. But I am sure there are plenty of crackpots there too. The waters are muddy my friend, not much is clear.

wholinator2

7 hours ago

Yes, i feel like the biggest problems getting real treatment are in equal parts, people not thinking of the concomitant factor/asking the right things/being afraid to tell certain things, and doctors having no where near enough time to actually listen to someone and think about their situation.

I try not to fall into the first category but I've known many people who do. Though I've had many experiences with fast talking, eyes glazed, interrupt me to push the first thing that comes to mind type doctors. Once i had to fight with guy to just get him to let me finish a sentence! He changed his mind every 5 words trying to get out of there but it'd've been faster if he just let me talk! It's infuriating, I'll never go back to a doctor like that. Seems like he didn't even wait to leave the room before i was out of his mind, possibly i never even entered it

herval

8 hours ago

to be fair, people _do_ push back on lawyers, doctors, architects, construction workers, etc all the time

hinkley

7 hours ago

Those people get annoyed at plumbers too.

See also why estimates aren’t actual costs. Every plumber runs into this constantly. Sorry you have a crushed sewer line add a zero to the cost.

KolenCh

9 hours ago

IIRC Jack “Wow” Davis (a photoshop expert) mentioned this is what his clients like to say to him: “Can you just remove the thing in the photo?” The “thing” can be the eyeglass they are wearing, etc. And he said, “come on, I’m going to come to your house and take it again.”

“Can you just…” sums it all.

bena

9 hours ago

Fuck “just”. When someone says “just”, all I hear is “I have no fucking clue”.

Just is a bitch mother that seeks to handwave away any potential problems.

If it’s just that, I’ll gladly step aside and let you do it. But if you then tell me you can’t do it, then you better sit down, shut the fuck up, and listen when I tell you something is more complicated than you think.

bityard

8 hours ago

Adam Savage has a most excellent rant about the outrageous arrogance of the phrase, "Why don't you just..." I highly recommend it: https://www.youtube.com/watch?v=OP4CKn86qGY

My best example of this is my father-in-law. He's an absolutely wonderful man, generous to a fault and all of that. Fancies himself a handyman. But I had to stop asking for his help with around-the-house projects out of sheer anger and frustration because every SINGLE time I made a decision about how to proceed with the job or pick which tool to use to cut something, he would stop and and ask, "Well, why don't you just do X instead?"

It didn't seem to matter if I explained my reasoning, or if doing X meant redoing a hour of work just to get back to where we currently were. I even attempted to humor him by trying things his way whether or not they were going to work out. (They often didn't). Eventually it ended in me saying "no" way more often than "yes" and he would infer that I was refusing his help and he'd throw his hands up and walk away.

reaperducer

7 hours ago

every SINGLE time I made a decision about how to proceed with the job or pick which tool to use to cut something, he would stop and and ask, "Well, why don't you just do X instead?"

Sounds like he spends a lot of time on StackOverflow.

bena

8 hours ago

That feels a lot like where I've come to as well.

I think it's not an uncommon opinion. And of course, Adam Savage is always great.

shagie

7 hours ago

"Just" is a word I've been trying to avoid using.

In some cases, it's diminishing or trivializing the work. "I just need to do X" trivializes the amount of work that X takes and in some cases implies that the person I am talking to is not competent / capable of doing it themselves (often implying that they should be).

Likewise, "they just did X" trivializes the work that the other person does.

Its a word I ̶j̶u̶s̶t̶ don't believe adds value to a sentence.

professoretc

5 hours ago

My wife and I agreed to expunge "just" from our vocabulary, at least with regards to asking to do this. It's almost always kind of belittling, implying that the thing you're asking for is easy and obvious, and you're an idiot/lazy for either not doing it already or trying to explain why its more difficult that it looks.

jonhohle

9 hours ago

Agree. I’ve tried to remove it from any suggestion I give. I’ll use it when referring to what I plan to do, but never “can’t you just”.

bena

8 hours ago

That's like the only way it should be used. You know your situation and problems. You know when "just" is just.

clbrmbr

8 hours ago

I see “just” as an invitation to win the argument by agreeing with the person. “Ooo yeah good idea, I think we could. Can you help me think through this?” — And then they ideally proceed to come around to essentially what you wanted to do anyways.

dockd

8 hours ago

When my daughter was working in her university's dining hall, the most demanding orders always began with "just". Can I just have a non-fat sugar-free vanilla latte with two pumps that isn't too hot?

beeboobaa3

9 hours ago

The response to that is, "Can't you?"

blululu

9 hours ago

The reason someone is paying you cash for your time and expertise is because they cannot do it. If you can’t do it either then why are they paying you? In these situations you need to be very delicate about explaining the nature of the challenges in a way that doesn’t make them feel like an idiot for paying you money in the first place.

Dr4kn

9 hours ago

If they are paying for expertise they should also listen to the expertise

beeboobaa3

9 hours ago

Try having the same argument with your plumber, electrician or dentist.

You're there because you are reliant on their expertise and can't just do it yourself. Otherwise you'd be doing so.

cgriswald

an hour ago

That’s funny because I argued with my plumber. He wanted to do a long term solution to a problem I have. It was a good solution. I said I didn’t want to do it because it was quite a bit less expensive for me to just deal with the acute issues when they arise over the rest of my lifetime.

I’m usually paying experts to advise me so I can make my own decisions and then enact those decisions. Occasionally, I will have an opinion on methods. I’m not reliant individually on any expert, even if I am ultimately reliant upon their profession. And I hire people for work I can do all the time.

Experts are often wrong. They misunderstand requirements. They don’t fully understand the system they are modifying. They can be lazy. They can consider a job or client as less important than other jobs or clients. They can lie. They can have incentives that don’t work for the client. They can be terrible at their work.

Even when experts are good and honest, they can make decisions that don’t work for their clients. My divorce lawyer was amazing when he was on the warpath, but I told him I didn’t want to take that approach generally because I knew my ex’s back would go up and we’d both pay a lot more in legal fees to achieve the same results. He was good at his job and he wasn’t wrong, but that particular method didn’t work for me.

amelius

14 hours ago

I'm currently building a skyscraper on the foundations of a bikeshed.

Anyway, a nice idea for generative AI could be to take source code and turn it into a corresponding image of a building so managers can see what they're doing wrong.

jredwards

10 hours ago

What color is the shed? Should we paint it before we tear it down?

RegW

10 hours ago

I think it was in a book by Grady Booch that I read something like:

No one would ask an architect to add a third basement to an already complete skyscraper, but it happens in software all the time.

maybe_pablo

8 hours ago

Reminds me of "If Architects had to work like Programmers".

https://archive.is/r4l6C

xattt

12 hours ago

Given the generative AI aspect, the output would likely morph drain pipes into elevator shafts midway.

Toutouxc

11 hours ago

Even better actually.

xattt

8 hours ago

It’s not great for the up-going elevator passengers encountering someone’s daily constituions.

mikehollinger

11 hours ago

Eh. A better analogy - the output would decide that there needs to be conduit between floors for chilled water, hot water, sewage, dutifully make several 4” pipes, and then from floor to floor forget which is which.

maicro

10 hours ago

"No, 'c_water' means 'clean_water', it has nothing to do with the temperature, so that's why you got burnt; also 'gray water' has nothing to do with a positional encoding scheme, and 'garbage collection' is just a service that goes around and picks up your discarded post-it notes - you didn't take that rotting fruit out of the bowl, so how could we be expected to know you were done with it?"

DHRicoF

10 hours ago

Just like my code!

forinti

11 hours ago

Are they willing to pay for a skyscraper at least? I find that a lot of people expect to pay for a shed and get a skyscraper that has no maintenance cost.

heywoods

10 hours ago

They 'get' or they 'expect to get'?

forinti

8 hours ago

Expect to get. Thanks.

belter

12 hours ago

That would be X-Rated...

ornornor

14 hours ago

Wait until they tell you it needs an Olympic swimming pool in the penthouse

jermaustin1

12 hours ago

On the cantilevered terrace of the penthouse. The residents of the penthouse aren't going to give up living space for a pool!

antonhag

13 hours ago

> I'm currently building a skyscraper on the foundations of a bikeshed.

Not sure if you are joking or not, but I often hear similar things and I believe that it misses the point. What constitutes a good foundation in software is very subjective - and just saying "foundation bad" does not help a non-technical person understand _why_ it is bad.

It's better to point at that one small rock (some ancient perl-script that no-one longer understands) which holds up the entire thing. Which might be fine until someone needs to move that rock. Or something surrounding it.

scott_w

11 hours ago

I like this thinking because it's a true reflection of how things work. I strongly doubt any housebuilder goes back to the architect and says "can't do that, foundations bad." They'd explain what the problem is: maybe the design is rated to a certain weight/height, or what's in the ground composition that prevents the requested changes.

We should do the same in software engineering. What exactly in our design (e.g. that Perl script that's running half the operation that we need to investigate) is stopping us?

tough

11 hours ago

xkcd: Dependency

https://xkcd.com/2347/

spacecadet

11 hours ago

Have you tried this? I bet it would work right now and look exactly as expected, a mess. The again, I gave GPT-4o some obfuscated js, a canvas rendering some buildings (shared here originally), and asked what was being rendered and it returned that it was a heart. So maybe not.

ivolimmen

14 hours ago

What? Are you working on my project???

jimmySixDOF

10 hours ago

And to extend the analogy in-kind to fit the conversation it would also be like 10 Years later all the plumbing became wireless (802.11pu) and so what was hard to impossible is now simple (cv object recognition of a bird in a photo).

stroupwaffle

10 hours ago

Haha, wireless plumbing. What a delightful image. =)

parpfish

10 hours ago

Not sure what your house is like, but my plumbing is already wireless

wlesieutre

10 hours ago

Metal pipes are sometimes used for grounding the electrical system, making it a hollow wire full of water

metalliqaz

10 hours ago

The Internet is a series of pipes.

belter

10 hours ago

curl -s https://api.chucknorris.io/jokes/random | jq -r '.value' | cowsay | lolcat

bookofjoe

10 hours ago

tubes

https://en.wikipedia.org/wiki/Series_of_tubes

metalliqaz

10 hours ago

Yes, I was trying to reference that while relating it to the above discussion...

heywoods

10 hours ago

Please call a plumber :)

cjameskeller

10 hours ago

Reminds me of this gag: https://youtu.be/IgUzHat9XIs?feature=shared

erikig

8 hours ago

I loved this. My only wish was that when the guy asked:

"Where do you store it?"

He'd have responded: "In the cloud..."

NalNezumi

9 hours ago

Love the analogy. As a robotics (software) engineer I've long struggled to explain to software engineers why certain things are extremely hard.

I think I'll use your analogy next time, "and then robotics is like gardening. You don't know when the weather will change what wild life will come and try to ruin it, or the soil composition. At least in the house, you can be certain everything is made by human for humans, and things make sense. Outside, not so much. "

emmet

9 hours ago

This is wonderful! There's so many aspects of a house that it's virtually guaranteed to have an apt analogy for any situation.

I'll be keeping this on file, thank you.

ppqqrr

13 hours ago

I tend to think of it as plants/trees. It starts from a single point, the main routine, and branches out its behavior over time. Branches get pruned, abandoned, merged, coopted to optimize for the nutrient gradient. I especially like the “roots” analogy i.e. it draws its strength from the parts that are hidden and difficult to assess by typical observation.

medstrom

12 hours ago

I do not think branches merge, except maybe after intervention by a creative gardener.

ppqqrr

10 hours ago

yes, but aren't we the creative gardeners/pollinators? Not to mention that software is naturally more dynamic and cross-compatible than real plants, given to all manners of chaotic mutations and grafting.

digging

8 hours ago

Ok but if you're getting into tree-of-many-fruits and art-plant territory, it's no longer accessible to non-experts.

osigurdson

10 hours ago

The analogy is useful to explain some changes are harder than others. However, what concrete situation in software could this analogy cover? What is the chair, what is the toilet and the concept of moving mean in this context?

svieira

9 hours ago

I give you this:

Yesterday, I delivered a long-asked for feature - optimistic updates for a UI screen after a button click on Screen #1. It took only one day because the server-side logic is entirely under engineering control and the only edge cases are "their account was concurrently drained" or "the person who clicked the button is trying to hack us". Both of which will be handled when the server responds with a non-successful response. There is little to no likelihood of this behavior changing any time in the next 10 years so I ensured coupling between these components was respected with a simple comment in both places.

Today, you are asking me to provide an optimist update for a button click on Screen #2. However, that button runs business logic that is specified by multiple other users in a scripting language based on a variety of inputs, some of which are dependent on the responses of external systems over which engineering has no control. The response's fields are known in advance, but those fields' values are not.

Of course, anything is _possible_ and if we build a feature where users can specify the likely result of the external systems and build a heuristic-based analyzer for common patterns in the scripting language we could eventually get to the point where simple screens driven by this monstrosity could optimistically update. However, it will take a lot of development work and the testing effort will be high, both for the initial design of the feature and to add sufficient integration / system tests to ensure that future updates to any of these systems do not break common assumptions between them.

osigurdson

6 hours ago

Honestly, I think the bottom line is decision makers need to be technical. If you are responsible for an area and are non-technical you must find someone who 1)you can trust and 2)has the requisite background such that you can fully delegate to them. Anything else is essentially irresponsible. The only thing analogies help with is essentially to convince non-technical people who do not trust you that you aren't lying to them.

scottyah

8 hours ago

If the poster is referring to the comic, the geotagging is the chair, the bird identification is the toilet, and the act of "moving" is the effort involved.

An analogy that could more closely fit could be something like: You have a new kubernetes cluster and some servers running code. You're migrating some services over (moving rooms). If you have a simple webapp that has no persistent storage and is containerized, the move would be simple (like moving a chair between rooms). However, if you were to try to move a webapp with login info and databases, there's a lot more "plumbing" not apparent to an outsider that would take a lot more work.

osigurdson

6 hours ago

I think the analogy best captures "some things are easy, some things are hard". For instance, if I were the stakeholder and this analogy was used to convince me that moving a web app that uses a database is hard, I would still wonder what you really mean (and I have lots of experience with Kubernetes). Probably better just to say what is actually hard instead of reaching for an analogy unless you just want to get past the stakeholder's "BS firewall" easily so you can get back to work. If that is your goal, absolutely use any persuasion tactic at your disposal. However, this scenario has a massive bug: the non-technical stakeholder didn't delegate decision making to someone that they can trust.

scottyah

5 hours ago

The analogies shouldn't be needed at work for sure, but they work well for people who aren't industry and don't especially care. Eyes glaze over at anything beyond "database" but people know about moving toilets so they appreciate the analogy when you're at a family reunion or something.

osigurdson

3 hours ago

Perhaps, but those people are the bug in this hypothetical. They should instead delegate decisions to someone who is competent. Obviously the real world is filled with bugs of this nature, but helpful to at least recognize where the problem stems from.

hk1337

10 hours ago

> Non-coders seem to understand these analogies intuitively.

Yes and no. Sometimes non-coders just want you to throw together a prefab house.

NotAnOtter

10 hours ago

"HTML is the struts, CSS is the drywall, wall paper, dining chairs, swing set, JS is the JS is the electricity & plumbing & light switches"

I came up with that metaphor on my own, and didn't know at the time that this is a pretty common metaphor so it holds a special place to me.

mixmastamyk

6 hours ago

Hmm, frame or studs works slightly better for html.

htrp

12 hours ago

Love it. stealing it for myself

analog31

6 hours ago

Programmer: Software is a house.

Client: My family is having a house built right now.

Programmer: Oh, I'm so sorry for you.

philipov

9 hours ago

A house of cards built on a foundation of quicksand.

matt123456789

10 hours ago

Oh I am definitely stealing that.

diegolas

12 hours ago

the software/architecture analogy works in many other ways, some concepts on functionality, reusability and modularity are very similar, but i guess that works for any complex system created to solve a user-specific problem

XorNot

11 hours ago

...I am stealing this. Good lord, this is perfect.

citizen-stig

14 hours ago

I personally think that this kind of analogy is inheretenly wrong.

Software has very little bounds to physical world, comparing to actual architecture. Most of the bounds rise from ideas.

Toilet in this analogy cannot be moved, because it was originally decided, that it will be locked and didn’t invest in mobile toilet. Which was reasonable, but highlights lack of vision for the final product.

And this is the biggest difference with architecture. Nobody starts building a house without knowing final design.

While software is the opposite.

eclecticfrank

13 hours ago

> And this is the biggest difference with architecture. Nobody starts building a house without knowing final design.

Many houses are actually built without knowing the final design, especially in informal settlements in the Global South.

It's referred to as incremental building or incremental urbanism. What starts as simple structure (e.g. a shack) will develop over time into different more formal types of housing. It's an approach to housing that works well with precarious financial means, shifting regulatory environments, uncertain land tenure, changing household size or the lack of building supplies.

greghendershott

11 hours ago

Also for example the imperial palace of the Habsburg dynasty: https://en.wikipedia.org/wiki/Hofburg

marcus_holmes

9 hours ago

Is there a single building on the planet that has survived contact with residents and not changed in some way?

The plan is never "final".

HelloNurse

10 hours ago

There are design patterns for "incremental building", such as mounting pipes and wires on the surface of walls, ceilings and floors instead of burying them in concrete. Being able to reorganize a simple house easily is the "final design"; what point are you trying to make?

skeeter2020

9 hours ago

...and if you extend the "building" phase to multiple generations or even centuries it's even more prevalent. You get a very organic & dynamic environment that many would say is a slum, but is an accurate representation of most of the software I've seen.

Maybe the guy who designed that church had a clear idea of what it would look like when finished, but i doesn't look like that today!

xeonmc

9 hours ago

so the civil engineering counterpart of agile development?

arrowsmith

10 hours ago

At least 50% of debates on HN are basically: "you said that A is a good analogy for B, but you're wrong because A and B are not literally the exact same thing."

cstrahan

9 hours ago

In this scenario, is it GP that exhibits the behavior you’re talking about, or you (given that GP gave their own counter analogy) — or both?

UniverseHacker

9 hours ago

Sigh… when this happens I often feel like I’m done talking to people on here, but then end up coming back anyways

dekhn

9 hours ago

And the other 50% are debates about the absolute magnitude of an effect or where a classification threshold gets set.

indoordin0saur

10 hours ago

Perhaps a symptom of autism?

allenrb

10 hours ago

Bah, how many autistic people can there be in one place?

sebastiennight

3 hours ago

I'd say the number is at least capped at about 100 billion, but that depends on how tightly you define "person" and "place" (not even getting into the specifics of "autistic").

E.G. if you want your instances of "people" to be active, we're now capped at roughly 8 billion, since 92% of instances have already been garbage collected in this run.

I would still recommend planning a Long integer, just to get yourself some room for error.

tonyedgecombe

14 hours ago

>Nobody starts building a house without knowing final design.

There is a TV series in the UK called Grand Designs where people build their own houses. Nearly every cost and time overrun is down to making stuff up on the hoof. The few projects that are on time and budget are the ones that decide everything upfront.

skeeter2020

9 hours ago

I've watched this show, and the process - and results - look an awful lot like software development projects. Just replace the owner with a c-suite executive.

ineedasername

12 hours ago

>I personally think that this kind of analogy is inheretenly wrong

Well, every analogy is inherently wrong at some level of detail. Find an analogy you think is appropriate and zoom in further and it will break.

No analogy, metaphor, or general comparison is ever perfectly isomorphic with the target. As a function of communication, the mark of a good one is if your audience understands.

dariusj18

12 hours ago

> And this is the biggest difference with architecture. Nobody starts building a house without knowing final design

I must disagree based on the number of residential homes turned into businesses, large scale remodeling, or tearing a house down to rebuild. All these fit well into the analogy.

skriticos2

11 hours ago

Yea. Many places in Europe have historical city scape protection. Buildings that have been built centuries ago are being rebuilt internally all the time to fit new purposes and regulations. Not to mention extreme cases like the Kowloon walled city, that was basically a gigant interconnected amalgamation of buildings that housed 35000 people. Nobody envisioned what that would become when it started as an imperial fort, that's for sure. There are many reasons why building are remodelled to fit a new purpose without the new purpose even having existed when the buildings were first conieved.

ps. And even modern buildings suffer from this, like the projects where the requirements change all the time. Like Irelands new children's hospital, that should have cost just a couple of million Euros and balooned to billions. Construction projects are somemites done exactly like software development projects with all the fallout that comes with it. Same story with the airport in Germany (BER).

pfdietz

10 hours ago

Like models, all analogies are wrong, but some are useful, and they become most illuminating precisely on the boundary where they break down.

osigurdson

10 hours ago

Agree. I no longer use analogies from problem domains that I know nothing about (home building, bridges, vehicles, etc) to describe software. A better analogy, I think, is search through high dimensional space.

antonhag

13 hours ago

I like the house analogy, but I like to think of it as if the people building the house did not know how it was supposed to look (or function). This is mostly true, since very few developers know exactly how the end result (product/service) should look and function when the start coding.

e.g. "We did not know where to put the piping at the start, so we put it on the outside and now installing a new restroom is sort of tricky."

aeonik

13 hours ago

This is why nobody can decide if computer science is actually science, engineering, or art. It's such a vast industry that it's clearly all 3 depending on what your doing.

jnordwick

12 hours ago

I think everybody agrees it is a craft. Like woodworking - it is part engineering, part art, and a lot of experience.

tuyiown

13 hours ago

Ha, but to me it's why the analogy works. People don't question that we to do software without knowing final goals because, it's legit unknown, and from an external point of view the waste is not distinguishable from strait work.

The house analogy makes the waste understandable, if you accept to compare design errors with late design.

turnsout

12 hours ago

I agree—I've used the analogy in the past, but I don't anymore. The reason is: with new home construction, there's a very clear move-in date. You can make additions or renovations, but most people are not constantly changing their house.

However, in software, you need to continuously work on the product—and it's not just routine maintenance analogous to cleaning the gutters or changing the air filters. In software, it's possible to launch ("move in") before most of the rooms have been built. In software, you can use a library or API and start with a skyscraper on Day 1.

The analogy just doesn't work. It tells clients/stakeholders "this is a tough project but it'll be over someday, and you'll never have to think about construction again."

skriticos2

11 hours ago

Oh, the analogy does work. Every construction needs to be adjusted at times. Sure, not as often as software, but new regulations and the passing of time is eating at the substance. After a couple of decades most buildings tend to need major overhaul and that's not much different than software. Even the reasons are similar (e.g. new building codes, energy efficiency standards, obsolote tech stacks - think asbestos and lead pipes). Especially if you live in an area where the city scape needs to be preserved for historical reasons, houses behave very similar to software - just on a different time scale.

turnsout

9 hours ago

> After a couple of decades most buildings tend to need major overhaul and that's not much different than software

Respectfully disagree. Software is like building a house, and then needing to build more rooms every month forever, and every few years having to tear it all down or completely rework the foundation.

skriticos2

17 minutes ago

Guess it depends on the software. I have seen enough business critical software that was built 15 years ago with the developer having long left the scene and nobody having any idea on how it works internally (much less skill to actually change something).

matthewmacleod

12 hours ago

Actually, I really disagree with this view!

I don't think the point is that the toilet can't be moved – it's just expensive and disruptive to do so.

Nobody starts building a house without knowing final design.

I would argue the exact opposite – literally _every_ house is built without knowing the final design! Who knows what someone is going to need or want in the future? I'm writing this from a house that was built prior to the existing of indoor plumbing!

Ensorceled

11 hours ago

> I would argue the exact opposite – literally _every_ house is built without knowing the final design!

The sheer number of houses that have additions, bathroom renos, kitchen renos, walls blown out, basements apartment added, second floors added etc. etc. makes their claim ludicrous on the face of it.

In software we are often taking an existing design and modifying for new requirements. The house analogy is excellent for explaining WHY one thing is easy but another is hard.

mannykannot

11 hours ago

A while back, I visited a facility that builds prefabricated houses. Using CAD, they can, and do, create large and architecturally complex one-off designs, something that would not be possible without knowing not only the final (as-constructed) design, but the intermediate states as the modules are constructed, moved to the site (including ensuring that they can be moved to the site), and assembled.

I don't suppose that everything works entirely according to plan, and of course there is no way that every future change request can be anticipated, let alone accommodated in advance, but for all practical purposes, this shows that if one is prepared to do what it takes, one can start the construction of a house knowing what you are going to get and with a detailed plan for getting there.

Computational technology has a particularly broad and active leading edge where unknowns are being tackled, but even so, most software development is nowhere near that edge.

The original point about houses is that with software, similar-seeming changes can have greatly differing costs, on account of what is hidden from the users' view, and I think the analogy works very well in making that point.

amelius

14 hours ago

> And this is the biggest difference with architecture. Nobody starts building a house without knowing final design.

This is exactly the problem in most software projects.

lozf

12 hours ago

Still, one could say that no one wants a portaloo (mobile toilet) in their living room.

the_af

8 hours ago

In my opinion a lot of commenters are focusing on current LLM and computer vision capabilities, but missing the point of that particular xkcd.

It wasn't "computer vision is really hard!".

The point was: for laypeople, it's hard to understand why a particular problem is hard or easy for computers. Some things that seem hard are actually easy (people in comments here mention "standing on the shoulders of giants", but I find that's also missing the point) and some things that seem easy are hard. And it's sometimes difficult to explain to laypeople why something can be done in a couple of hours while other tasks would require a 5-year long research project, when to the external observer both tasks are roughly equally complex.

Really, that's it. That's the point of the joke/observation. It wasn't truly an observation about the status of computer vision.

soufron

7 hours ago

Technological short-sight at its best.

The GIS took decades to be developped and become functionnal.

The tools allowing to do a GIS lookup also took decades to be developped.

The tools allowing to encode the geodata within a picture file idem.

Not mentioning the development of the necessary hardware to take the pictures.

At the time the comics was being written, GIS lookup had been made easily available since what? 10 years?

And 10 years later, another layer is now easily available - also after decades of collective research and development.

It's not about "the difference between easy and hard challenges in software".

It's about the maturity of a software and its ecosystem, and understanding how even some small incremental changes can have an important impact.

minkzilla

7 hours ago

The point of the comic that that from the outside no one knows how far along hard problems are, not that in and of themselves they are hard. Both the tasks in the comic rely on decades of foundational work in computing and even photography, the author is not saying all of that was easy up until GIS.

soufron

7 hours ago

I totally agree. The comics is right - it's Munroe after all, but it's the analysis of the blog author that I am divergent with.

minkzilla

4 hours ago

Ah that makes sense. I must admit I did not even read the analysis of the comic.