amluto
6 hours ago
I've contemplated this a bit, and I think I have a bit of an unconventional take:
First, this is really impressive.
Second, with that out of the way, these models are not playing the same game as the human contestants, in at least two major regards. First, and quite obviously, they have massive amounts of compute power, which is kind of like giving a human team a week instead of five hours. But the models that are competing have absolutely massive memorization capacity, whereas the teams are allowed to bring a 25-page PDF with them and they need to manually transcribe anything from that PDF that they actually want to use in a submission.
I think that, if you gave me the ability to search the pre-contest Internet and a week to prepare my submissions, I would be kind of embarrassed if I didn't get gold, and I'd find the contest to be rather less interesting than I would find the real thing.
asboans
5 hours ago
Firstly, automobiles are really impressive.
Second, with that out the way, these cars are not playing the same game as horses… first, and quite obviously they have massive amounts of horsepower, which is kind of like giving a team of horses… many more horses. But also cars have an absolutely massive fuel capacity. Petrol is such an efficient store of chemical energy compared to hay and cars can store gallons of it.
I think if you give my horse the ability of 300 horses and fed it pure gasoline, I would be kind of embarrassed if it wasn’t able to win a horse race.
furyofantares
3 hours ago
Yeah man, and it would be wild to publish an article titled "Ford Mustang and Honda Civic win gold in the 100 meter dash at the Olympics" if what happened was the companies drove their cars 100 meters and tweeted that they did it faster than the Olympians had run.
Actually that's too generous, because the humans are given a time limit in ICPC, and there's no clear mapping to say how the LLM's compute should be limited to make a comparison.
It IS an interesting result to see how models can do on these tests - and it's also a garbage headline.
krisoft
2 hours ago
> what happened was the companies drove their cars 100 meters and tweeted that they did it faster than the Olympians had run
That would be indeed an interesting race around the time cars were invented. Today that would be silly, since everyone knows what cars are capable of, but back then one can imagine a lot more skepticism.
Just as there is a ton of skepticism today of what LLMs can achieve. A competition like this clearly demonstrates where the tech is, and what is possible.
> there's no clear mapping to say how the LLM's compute should be limited to make a comparison
There is a very clear mapping of course. You give the same wall clock time to the computer you gave to the humans.
Because what it is showing is that the computer can do the same thing a human can under the same conditions. With your analogy here they are showing that there is such a thing as a car and it can travel 100 meters.
Once it is a foregone conclusion that an LLM can solve the ICPC problems and that question has been sufficiently driven home to everyone who cares we can ask further ones. Like “how much faster can it solve the problems compared to the best humans” or “how much energy it consumes while solving them”? It sounds like you went beyond the first question and already asking these follow up questions.
furyofantares
2 hours ago
You're right, they did limit to 5 hours and, I think, 3 models, which seems analogous at least.
Not enough to say they "won gold". Just say what actually happened! The tweets themselves do, but then we have this clickbait headline here on HN somehow that says they "won gold at ICPC".
in-silico
2 hours ago
Cars going faster than humans or horses isn't very interesting these days, but it was 100+ years ago when cars were first coming on the scene.
We are at that point now with AI, so a more fitting headline analogy would be "In a world first, automobile finishes with gold-winning time in horse race".
Headlines like those were a sign that cars would eventually replace horses in most use-cases, so the fact that we could be in the the same place now with AI and humans is a big deal.
gwbrooks
21 minutes ago
It was more than interesting 100+ years ago -- it was the subject of wildly inconsistent, often fear-based (or incumbent-industry-based) regulation.
A vetoed 1896 Pennsylvania law would have required drivers who encountered livestock to "disassemble the automobile" and "conceal the various components out of sight, behind nearby bushes until [the] equestrian or livestock is sufficiently pacified". The Locomotive on Highways Act of 1865 required early motorized vehicles to be preceded by a person on foot waving a red flag or carrying a red lantern and blowing a horn.
It might not quite look like that today, but wild-eyed, fear-based regulation as AI use grows is a real possibility. And at least some of it will likely seem just as silly in hindsight.
LPisGood
2 hours ago
I think your analogy is interesting but it falls apart because “moving fast” is not something we consider uniquely human, but “solving hard abstract problems” is
furyofantares
2 hours ago
Not my analogy, parent is the one who brought up automobiles. Maybe that's who you meant to reply to.
I'm talking about the headline saying they "won gold" at a competition they didn't, and couldn't, compete in.
Swizec
4 hours ago
> Firstly, automobiles are really impressive. Second, with that out the way, these cars are not playing the same game as horses
Yes. That’s why cars don’t compete in equestrian events and horses don’t go to F1 races.
This non-controversial surely? You want different events for humans, humans + computers, and just computers.
Notice that self driving cars have separate race events from both horses and human-driven cars.
in-silico
2 hours ago
The point is that up until now, humans were the best at these competitions, just like horses were the best at racing up until cars came around.
The other commenter is pointing out how ridiculous it would be for someone to downplay the performance of cars because they did it differently from horses. It doesn't matter if they did it using different methods, that fact that the final outcome was better had world-changing ramifications.
The same applies here. Downplaying AI because it has different strengths or plays by different rules is foolish, because that doesn't matter in the real world. People will choose the option that that leads to the better/faster/cheaper outcome, and that option is quickly becoming AI instead of humans - just like cars quickly became the preferred option over horses. And that is crazy to think about.
thrwaway55
25 minutes ago
I feel the main difference is cars can't compress time in the way an array of computers can. I could win this competition with an infinitely parallel array of random characters typed by infinite monkeys on infinite typewriters instantly since one of them would be perfectly right given infinite submissions. When I make my tweet I would pick a single monkey cus I need infinite money to feed my infinite workforce and that's more impressive clearly.
Now obviously it's more impressive as they don't have infinite compute and had finite time but the car only has one entry in each race unless we start getting into some anime ass shit with divergent timelines and one of the cars (and some lesser amount of horses) finishing instantly.
To your last point we don't know that this was cheaper since they don't disclose the cost. I would blindly guess a mechanical turk for the same cost would outperform at least today.
throw310822
3 hours ago
I think you missed that the whole point of this race was:
"did we build a vehicle faster than a horse, yes/no?"
Which matters a lot when horses are the fastest land vehicle available. (We're so used to thinking of horses as a quaint and slow mean of transport that maybe we don't realize that for millennia they've been the fastest possible way to get from one place to another.)
Swizec
2 hours ago
> "did we build a vehicle faster than a horse, yes/no?"
Yeah fair. There's also that famous human vs horse race that happens every few years. So far humans keep winning (because it's long distance)
dwohnitmok
31 minutes ago
If you're talking about the Man versus Horse Marathon (https://en.wikipedia.org/wiki/Man_versus_Horse_Marathon) it's the other way around. Overwhelmingly the horses win. Only occasionally does the human.
Swizec
17 minutes ago
I stand corrected. My memory garbled that. Thanks!
gxs
3 hours ago
Yeah I think the only thing OP was passing judgement on is on the competition aspect of it, not the actual achievement of any human or non human participant
That’s how I read it at least - exactly how you put it
lbrandy
4 hours ago
I was struck how the argument is also isomorphic to how we talked about computers and chess. We're at the stage where we are arguing the computer isn't _really_ understanding chess, though. It's just doing huge amounts of dumb computation with huge amounts of opening book and end tables and no real understanding, strategy or sense of whats going on.
Even though all the criticism were, in a sense, valid, in the end none of it amounted to a serious challenge to getting good at the task at hand.
LaffertyDev
4 hours ago
I don’t think you’ll find many race tracks that permit horses and cars to compete together.
(I did enjoy the sarcasm, though!)
melenaboija
2 hours ago
Comparing power with reasoning does not make any sense at all.
Humans have surpassed their own strength since the invention of the lever thousands of years ago. Since then, it has been a matter of finding power sources millions of times greater such as nuclear energy
GoatInGrey
3 hours ago
Snark aside, I would expect a car partaking in a horse race to beat all of the horses. Not because it's a better horse, but because it's something else altogether.
Ergo, it's impressive with nuance. As the other commenter said.
bgwalter
4 hours ago
The massive amounts of compute power is not the major issue. The major issue is unlimited amount of reference material.
If a human can look up similar previous problems just as the "AI" can, it is a huge advantage.
Syzygy tables in chess engines are a similar issue. They allow perfect play, and there is no reason why a computer gets them and a human does not (if you compare humans against chess engines). Humans have always worked with reference material for serious work.
chpatrick
3 hours ago
Humans are allowed to look up and learn from as many previous problems as they want before the competition. The AI is also trained on many previous problems before the competition. What's the difference?
bgwalter
3 hours ago
Deleted, because the "AI" geniuses and power users pointed out that Tao does not have a point. You can get this one to -4 as well, since that seems to be the primary pleasure for "AI" one armed bandit users.
chpatrick
3 hours ago
It doesn't say anywhere that Gemini used any of those things at ICPC, or that it used more real-world time than the humans.
Also, who cares? It's a self contained non-human system that could solve an ICPC problem it hasn't seen before on its own, which hasn't been achieved before.
If there was a savant human contestant with photographic memory who could remember every previous ICPC problem verbatim and can think really fast you wouldn't say they're cheating, just that they're really smart. Same here.
If there was a man behind the curtain that was somehow making this not an AI achievement then you would have a point, but there isn't.
raspasov
2 hours ago
I think "hasn't seen before" is a bit of an overstatement. Sure, the problem is new in the literal sense that it does exist verbatim elsewhere, but arguably, any competition problem is hardly novel: they are all some permutation of problems that exist and have been solved before: pathfinding, optimization, etc. I don't think anyone is pretending to break new scientific ground in 5 hours.
paladin314159
6 hours ago
> I think that, if you gave me the ability to search the pre-contest Internet and a week to prepare my submissions, I would be kind of embarrassed if I didn't get gold, and I'd find the contest to be rather less interesting than I would find the real thing.
I don't know what your personal experience with competitive programming is, so your statement may be true for yourself, but I can confidently state that this is not true for the VAST majority of programmers and software engineers.
Much like trying to do IMO problems without tons of training/practice, the mid-to-hard problems in the ICPC are completely unapproachable to the average computer science student (who already has a better chance than the average software engineer) in the course of a week.
In the same way that LLMs have memorized tons of stuff, the top competitors capable of achieving a gold medal at the ICPC know algorithms, data structures, and how to pattern match them to problems to an extreme degree.
amluto
5 hours ago
> I can confidently state that this is not true for the VAST majority of programmers and software engineers.
That may well be true. I think it's even more true in cases where the user is not a programmer by profession. I once watched someone present their graduate-level research in a different field and explain how they had solved a real-world problem in their field by writing a complicated computer program full of complicated heuristics to get it to run fast enough and thinking "hmm, I'm pretty sure that a standard algorithm from computer graphics could be adapted to directly solve your problem in O(n log n) time".
If users can get usable algorithms that approximately match the state of the art out of a chatbot (or a fancy "agent") without needing to know the magic words, then that would be amazing, regardless of whether those chatbots/agents ever become creative enough to actually advance the state of the art.
(I sometimes dream of an AI producing a piece of actual code that comes even close to state of the art for solving mixed-integer optimization problems. That's a whole field of wonderful computer science / math that is mostly usable via a couple of extraordinarily expensive closed-source offerings.)
avcxz
5 hours ago
> That's a whole field of wonderful computer science / math that is mostly usable via a couple of extraordinarily expensive closed-source offerings.
Take a look at Google OR-Tools: https://developers.google.com/optimization/
amluto
5 hours ago
OR-Tools is a whole grab-bag of tools, most of which are wrappers around various solvers, including Gurobi and CPLEX. It seems like CP-SAT is under the OR-Tools umbrella, and CP-SAT may well be state-of-the-art for the specific sets of problems that it's well-suited for.
dddgghhbbfblk
an hour ago
I think that's because the framing around this (and similar stories about eg IMO performances) is imo slightly wrong. It's not interesting that they can get a gold medal in the sense of trying to rank them against human competitors. As you say, the direct comparisons are, while not entirely meaningless, at least very hard to interpret in the best of cases. It's very much an apples to oranges situation.
Rather, the impressive thing is simply that an AI is capable of solving these problems at all. These are novel (ie not in training set) problems that are really hard and beyond the ability of most professional programmers. The "gold medal" part is informative more in the sense that it gives an indication of how many problems the AI was able to solve & how well it was able to do them.
When talking with some friends about chatgpt just a couple years ago I remember being very confident that there was no way this technology would be able to solve this kind of novel, very challenging reasoning problem, and that there was no way it would be able to solve IMO problems. It's remarkable how quickly I've been proven wrong.
roadside_picnic
3 hours ago
> whereas the teams are allowed to bring a 25-page PDF
This is where I see the biggest issue. LLMs are first-and-foremost text compression algorithms. They have a compressed version of a very good chunk of human writing.
After being text compression engines, LLMs are really good at interpolating text based on the generalization induced by the lossy compression.
What this result really tells us is that, given a reasonably well compressed corpus of human knowledge, the ICPC can be view as an interpolation task.
Rudybega
2 hours ago
If we develop a system that can:
- compress (in a relatively recoverable way) the entire domain of human knowledge
- interpolate across the entire domain of human knowledge
- draw connections or conclusions that haven't previously been stated explicitly
- verify or disprove those conclusions or connections
- update its internal model based on that (further expanding the domain it can interpolate within)
Then I think we're cooking with gasoline. I guess the question becomes whether those new conclusions or connections result in a convergent or divergent increase in the number of new conclusions and connections the model can draw (e.g. do we understand better the domains we already know or does updating the model with these new conclusions/connections allow us to expand the scope of knowledge we understand to new domains).
tdb7893
an hour ago
As someone who has been to the ICPC finals around a decade ago I agree that the limited time is really the big problem that these machine learning models don't really experience in the same way. Though that being said these problems are hard, the actual coding of the algorithms is pretty easy (most of the questions use one of a handful of algorithms that you've implemented a hundred times by the time you're in the finals) but recognizing which one will actually solve the problem correctly is not obvious at all. I know a lot of people that struggled in their undergrad algorithms class and I think a lot of those people given the ICPC finals problems would struggle even with being able to research.
modeless
6 hours ago
It doesn't matter how many instances were running. All that matters is the wall clock time and the cost.
The fact that they don't disclose the cost is a clue that it's probably outrageous today. But costs are coming down fast. And hiring a team of these guys isn't exactly cheap either.
zeroonetwothree
6 hours ago
Human teams are limited to three people. So why doesn’t it matter how many instances they used?
kenjackson
an hour ago
This is what the argument is? 10 years ago if you said you could do this with every computer on the planet and every computer scientist focused on trying to create the code to do this I would’ve given you absurd odds against it getting 12 problems right on ICPC. 10 years ago it couldn’t even reliably parse the question statement.
modeless
6 hours ago
Human brains and cloud instances are not remotely equivalent. What you can compare on an equivalent basis is cost.
ben_w
5 hours ago
All instances of any given model are kinda the same, for lack of a better word, "person": same knowledge, same skills, same failings.
warkdarrior
4 hours ago
I bet with human teams it'll take longer to solve a problem the more people you have on the team.
OtherShrezzing
3 hours ago
The human teams also get limited to one computer shared between 3 people. The models have access to an effectively unbounded number of computers.
My argument does feel a bit like the “Watson doesn’t need to physically push the button” equivalents from when that system beat Jeopardy for the first time. I assume 5 hours on a single high-end Mac would probably still be enough compute in the near future.
theragra
5 hours ago
I think your analogy is lacking. Human brain is much more efficient, so it is not right to say "giving a human team a week instead of five hours". Most likely, the whole OpenAI compute cannot match one brain in terms of connections and relations and computation power.
stevenhuang
4 hours ago
As always with these comparisons you neglect to account for the eons necessary for evolution to create human brains.
GoatInGrey
3 hours ago
But as a product of evolved organisms, LLMs are also a product of evolution. They also came several hundreds of thousands of years later.
_diyar
5 hours ago
I think your assessment is spot on. But I also think there's a bigger picture that's getting lost in the sauce, not just in your comment but in the general discourse around AI progress:
- We're currently unlocking capabilities to solve many tasks which could previously only be solved by the top-1% of the experts in the field.
- Almost all of that progress is coming from large scale deep learning. Turns out transformers with autoregression + RL are mighty generalists (tho yet far from AGI).
Once it becomes cheap enough so the average joe can tinker with models of this scale, every engineering field can apply it to their niche interest. And ultimately nobody cares if you're playing by the same rules as humans outside of these competitions, they only care that you make them wealthy, healthy and comfy.
m3kw9
42 minutes ago
the end game is that running similar tasks at any moment time and place.
dist-epoch
5 hours ago
If you want to play that game, let's compute how much energy was spent to grow, house and educate one team since they were born, over 20 years against how much was spent training the model.
somewholeother
4 hours ago
This is a fair analogy, but let's also consider that these human beings weren't designed with the express purpose of becoming experts in their field and performing in this way for this specific purpose (albeit in a generalist manner).
We are most definitely in agreement about the folly of comparing the abilities of LLMs to humans, since LLMs are to a greater extent the product of much collective human endeavour. "Living memories" would perhaps be a better description of their current state, and their resultant impact on the human psyche.
th0ma5
3 hours ago
Yes yes given this why didn't it do better and isn't it embarrassing to have done it through statistical brute force and not intelligence.