brianolson
4 hours ago
> Why aren’t these AI companies submitting to the TOP500 to show off their computing prowess?
my knowledge is 10+ years out of date, but once upon a time if they'd chosen to, Google could have had _several_ entries in the top 10 of the TOP500 list
It's just poker, they didn't want to tip their hand
ziofill
3 hours ago
Also, would those 550k Blackwell have good FP64 performance? How would one even compare them?
davidmr
2 hours ago
I’ve worked on several systems that had enough flop/s to make it in the top 5-10, but for which we never submitted benchmarks. Sometimes their backend network layout technically would make them several smaller clusters for an HPL run, sometimes it’s because the cluster is too heterogeneous to get a good benchmark result, and sometimes it’s because the employer wants to keep a low profile.
Most of the time, it just that it’s a hassle. It takes a while to prep and tune a big hero run for benchmarking, and if you spend a billion dollars on a cluster, it’s making you a lot more than that. Taking it down for a day or two stops the money printers.
ls612
an hour ago
Why would the scientific computing people want to tip their hand? It’s an open secret that the main point of these mammoth FP64 compute machines is to simulate nuclear weapons detonations to comply with the CTBT you’d think that crowd would really not be fans of broadcasting their capabilities.
iberator
4 hours ago
Cloud computing is not a supercomputer. Different architecture, bandwitch, interconnectivity and latencies.
dgacmu
4 hours ago
That's not nearly as true when you look at AI training clusters. They're basically supercomputers but without an FP64 focus.
(These are the systems to which GP was referring at Google.)
cynicalkane
3 hours ago
Even before AI training clusters became important, Google has had an outstanding custom fabric (there's papers about it) together with the ability to tune NICs for their own cases, and "their own cases" meant nearly everything engineered within Google. Ethernet hardware has had low kernel latency and DMA for a long time; it's the rest of the stack that hurts. But as far back as the early 2010s (if not further back, that goes beyond my knowledge horizon), you could just make it not hurt, if you had the software engineers to do it.
jeffbee
3 hours ago
I thought TPUs couldn't reasonably run LINPACK at all because TPUs do not acknowledge that FP64 exists.
I know Google wants to compare their stuff to El Capitan or whatever but the comparison does not seem valid to me.
wmf
3 hours ago
Historically there have been a bunch of clusters on the Top 500 that weren't used for HPC. The tell is that they used Ethernet (this was before RoCE). It's less efficient but you can still get an OK Linpack score.