kristianp
10 hours ago
About 94.9 GB/s DRAM bandwidth for the Core Ultra 7 258V they measured. Aren't Intel going to respond to the 200GB/s bandwidth of the M1 Pro introduced 3 years ago? Not to mention 400GB/s of Max and 800GB/s of the Ultra?
Most of the bandwidth comes from cache hits, but for those rare workloads larger than the caches, Apples products may be 2-8x faster?
adrian_b
6 hours ago
AMD Strix Halo, to be launched in early 2025, will have a 256-bit memory interface for LPDDR5x of 8 or 8.5 GHz, so it will match M1 Pro.
However, Strix Halo, which has a much bigger GPU, is designed for a maximum power consumption for CPU+GPU of 55 W or more (up to 120 W), while Lunar Lake is designed for 17 W, which explains the choices for the memory interfaces.
kvemkon
an hour ago
> LPDDR5x of 8 or 8.5 GHz
Not 8000 or 8500 MT/s and thus the frequency is halved?
Dylan16807
5 hours ago
That's good. And better than match, that's 30% faster, at least until the M4 Pro launches with a RAM frequency upgrade.
On the other hand, I do think it's fair to compare to the Max too, and it loses by a lot to that 512 bit bus.
wtallis
8 hours ago
Lunar Lake is very clearly a response to the M1, not its larger siblings: the core counts, packaging, and power delivery changes all line up with the M1 and successors. Lunar Lake isn't intended to scale up to the power (or price) ranges of Apple's Pro/Max chips. So this is definitely not the product where you could expect Intel to start using a wider memory bus.
And there's very little benefit to widening the memory bus past 128-bit unless you have a powerful GPU to make good use of that bandwidth. There are comparatively few consumer workloads for CPUs that are sufficiently bandwidth-hungry.
nox101
8 hours ago
with all of the local ML being introduced by Apple and Google and Microsoft this thinking seems close to "640k is all you need"
I suspect consumer workloads to rise
throwuxiytayq
6 hours ago
I think the number of people interested in running ML models locally might be greatly overestimated [here]. There is no killer app in sight that needs to run locally. People work and store their stuff in the cloud. Most people just want a lightweight laptop, and AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them. Production quality models are pretty much cloud only, and I don’t think open source models, especially ones viable for local inference will close the gap anytime soon. I’d like all of those things to be different, but I think that’s just the way things are.
Of course there are enthusiasts, but I suspect that they prefer and will continue to prefer dedicated inference hardware.
0x000xca0xfe
39 minutes ago
Microsoft wants to bring Recall back. When ML models come as part of the OS suddenly there are hundreds of millions of users.
tucnak
3 hours ago
> AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them
M2 Max is passively cooled... and does 1/2 of 4090's token bandwidth in inference.
Onavo
5 hours ago
> I think the number of people interested in running ML models locally might be greatly overestimated [here]. There is no killer app in sight that needs to run locally. People work and store their stuff in the cloud. Most people just want a lightweight laptop, and AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them. Production quality models are pretty much cloud only, and I don’t think open source models, especially ones viable for local inference will close the gap anytime soon. I’d like all of those things to be different, but I think that’s just the way things are. Of course there are enthusiasts, but I suspect that they prefer and will continue to prefer dedicated inference hardware.
Do you use FTP instead of Dropbox?
epolanski
3 hours ago
The few reviews we have seen now show that lunar lake is competitive with m3s too depending on the application.
formerly_proven
3 hours ago
Is the full memory bandwidth actually available to the CPU on M-series CPUs? Because that would seem like a waste of silicon to me, to have 200+ GB/s of past-LLC bandwidth for eight cores or so.
wmf
9 hours ago
The "response" to those is discrete GPUs that have been available all along.
Aaargh20318
3 hours ago
Discrete GPUs are a dead end street. They are fine for gaming, but for GPGPU tasks unified memory is a game changer.
kristianp
9 hours ago
True, but I thought Intel might start using more channels to make that metric look less unbalanced in Apple's favour. Especially now that they are putting RAM on package.
tjoff
3 hours ago
Why the obsession of this particular metric? And how can one claim something is unbalanced while focusing on one metric?
sudosysgen
8 hours ago
Not really, the killer is latency, not throughput. It's very rare that a CPU actually runs out of memory bandwidth. It's much more useful for the GPU.
95GB/s is 24GB/s per core, at 4.8Ghz that's 40 bits per core per cycle. You would have to be doing basically nothing useful with the data to be able to get through that much bandwidth.
fulafel
an hour ago
40 bits per clock in a 8-wide core gets you 5 bits per instruction, and we have AVX512 instructions to feed, with operand sizes 100x that (and there are multiple operands).
Modern chips do face the memory wall. See eg here (though about Zen 5) where they in the same vein conclude "A loop that streams data from memory must do at least 340 AVX512 instructions for every 512-bit load from memory to not bottleneck on memory bandwidth."
unsigner
6 hours ago
There might be a chicken-and-egg situation here - one often hears that there’s no point having wider SIMD vectors or more ALU units, as they would spend all their time waiting for the memory anyway.
jart
3 hours ago
The most important algorithm in the world, matrix multiplication, just does a fused multiply add on the data. Memory bandwidth is a real bottleneck.
svantana
35 minutes ago
Is it though? The matmul of two NxN matrices takes N^3 macs and 2*N^2 memory access. So the larger the matrices, the more the arithmetic dominates (with some practical caveats, obviously).