aurareturn
5 months ago
It has A19 Pro. A19 Pro has matmul acceleration in its GPU, the equivalent of Nvidia's Tensor cores. This would make future Macs extremely viable for local LLMs. Currently, Macs have high memory bandwidth and high VRAM capacity but low prompt processing speeds. Give it a large context and it'll take forever before the first token is generated.
If the M5 generation gets this GPU upgrade, which I don't see why not, then the era of viable local LLM inferencing is upon us.
That's the most exciting thing from this Apple's event in my opinion.
PS. I also like the idea of the ultra thin iPhone Air, the 2x better noise cancellation and live translation of Airpods 3, high blood pressure detection of the new Watch, and the bold sexy orange color of the iPhone 17 Pro. Overall, this is as good as it gets for incremental updates in Apple's ecosystem in a while.
vasco
5 months ago
> bold sexy orange color
Luckily they added the blood pressure check for when you get too excited about the color orange.
formerly_proven
5 months ago
It is almost strange, since iPhones were only available in ugly drab colors for several generations. And the Pro models in particular were previously never available in a decent color.
bobmcnamara
5 months ago
BondiBlue4lyfe
astrange
5 months ago
A19 supports MTE: https://news.ycombinator.com/item?id=45186265
Which is a very powerful feature for anyone who likes security or finding bugs in their code. Or other people's code. Even if you didn't really want to find them.
rising-sky
5 months ago
MIE
baybal2
5 months ago
[dead]
mgerdts
5 months ago
If you compare the specs of the 10 and 11 series watches you will see they both claim high blood pressure detection.
https://www.apple.com/watch/compare/?modelList=watch-series-...
In the past few weeks the oxymeter feature was enabled by a firmware update on series 10. Measurements are done on the watch, results are only reported on a phone.
sgustard
5 months ago
Good to know! The fine print:
As of September 9, 2025, hypertension notifications are currently under FDA review and expected to be cleared this month, with availability on Apple Watch Series 9 and later and Apple Watch Ultra 2 and later. The feature is not intended for use by people under 22 years old, those who have been previously diagnosed with hypertension, or pregnant persons.
user
5 months ago
zimpenfish
5 months ago
Going to be interesting comparing the series 10 blood pressure sensing against my Hilo (formerly Aktiia) band on the other wrist. Although without calibration against a cuff, I'm not super convinced the Apple Watch will give reliable information.
SirMaster
5 months ago
Also works on the Series 9.
zumu
5 months ago
> the bold sexy orange color of the iPhone 17 Pro
The color line up reminds me of the au MEDIA SKIN phones (Japanese carrier) circa 2007. Maybe it's because I had one back in the day, but I can't help but think they took some influence.
user_7832
5 months ago
> MEDIA SKIN phones
Wow, thanks for sharing the name, these are really good! I don't know why I was surprised to realize that great designers have made fantastic products even in the past...
Some sites with images, for anyone curious: 1. https://www.dezeen.com/2007/01/17/tokujin-yoshioka-launches-... 2. https://spoon-tamago.com/best-of-2007-part-iv/
babl-yc
5 months ago
I've always been a bit confused about when to run models on the GPU vs the neural engine. The best I can tell, GPU is simpler to use as a developer especially when shipping a cross platform app. But an optimized neural engine model can run lower power.
With the addition of NPUs to the GPU, this story gets even more confusing...
avianlyric
5 months ago
In reality you don’t much of a choice. Most of the APIs Apple exposes for running neural nets don’t let you pick. Instead some Apple magic in one of their frameworks decides where it’s going to host your network. At least from what I’ve read, these frameworks will usually distribute your networks over all available matmul compute, starting on the neural net (assuming your specific network is compatible) and spilling onto the GPU as needed.
But there isn’t a trivial way to specifically target the neural engine.
commandersaki
5 months ago
Hoping this budget macbook rumour based on A19/A19 Pro is real.
cj
5 months ago
Isn’t the MacBook Air already pretty cheap at $999?
sercand
5 months ago
Where did you see the matmul acceleration support? I couldn't find this detail online.
aurareturn
5 months ago
Apple calls it "Neural Accelerators". It's all over their A19 marketing.
whyenot
5 months ago
I wish they would offer the 17 pro in some lighter colors (like the new sage green for the regular 17). Not everyone wants bold, and the color selection for pro is always so limited. They don't even have white with this generation, just silver.
Nokinside
5 months ago
The first SoC including Neural Engine was the A11 Bionic, used in iPhone 8, 8 Plus and iPhone X, introduced in 2017. Since then, every Apple A-series SoC has included a Neural Engine.
aurareturn
5 months ago
The Neural Engine is its own block. Neural Engine is not used for local LLMs on Macs. Neural Engine is optimized for power efficiency while running small models. It's not good for LARGE language models.
This change is strictly adding matmul acceleration into each GPU core where it is being used for LLMs.
runjake
5 months ago
The matmul stuff is part of the Neural Accelerator marketing, which is distinct from the Neural Engine you're talking about.
I don't blame you. It's confusing.
atcon
5 months ago
Viable may be already here: demo of smollm3/3b <https://news.ycombinator.com/item?id=44501413> on iphone with asr + tts: <https://x.com/adrgrondin/status/1965097304995889642>
Intrigued to explore with a19/m5 and test energy efficiency.
SirMaster
5 months ago
The live translation is software. It works on the AirPods Pro 2 and the AirPods 4 with AND.
So is the high blood pressure detection. It's not from the new watch, it works also in the series 10 and series 9 watches.
AdventureMouse
5 months ago
> If the M5 generation gets this GPU upgrade, which I don't see why not, then the era of viable local LLM inferencing is upon us.
I don't think local LLMs will ever be a thing except for very specific use cases.
Servers will always have way more compute power than edge nodes. As server power increases, people will expect more and more of the LLMs and edge node compute will stay irrelevant since their relative power will stay the same.
seanmcdirmid
5 months ago
LocalLLMs would be useful for low latency local language processing/home control, assuming they ever become fast enough where the 500ms to 1s network latency becomes a dominate factor in having a fluid conversation with a voice assistant. Right now the pauses are unbearable for anything but one way commands (Siri, do something! - 3 seconds later it starts doing the thing...that works but it wouldn't work if Siri needed to ask follow up questions). This is even more important if we consider low latency gaming situations.
Mobile applications are also relevant. An LLM in your car could be used for local intelligence. I'm pretty sure self driving cars use some about of local AI already (although obviously not LLM, and I don't really know how much of their processing is local vs done on a server somewhere).
If models stop advancing at a fast clip, hardware will eventually become fast and cheap enough that running models locally isn't something we think about as being a non-sensical luxury, in the same way that we don't think that rendering graphics locally is a luxury even though remote rendering is possible.
jameshart
5 months ago
> Servers will always have way more compute power than edge nodes
This doesn't seem right to me.
You take all the memory and CPU cycles of all the clients connected to a typical online service, compared to the memory and CPU in the datacenter serving it? The vast majority of compute involved in delivering that experience is on the client. And there's probably vast amounts of untapped compute available on that client - most websites only peg the client CPU by accident because they triggered an infinite loop in an ad bidding war; imagine what they could do if they actually used that compute power on purpose.
But even doing fairly trivial stuff, a typical browser tab is using hundreds of megs of memory and an appreciable percentage of the CPU of the machine it's loaded on, for the duration of the time it's being interacted with. Meanwhile, serving that content out to the browser took milliseconds, and was done at the same time as the server was handling thousands of other requests.
Edge compute scales with the amount of users who are using your service: each of them brings along their own hardware. Server compute has to scale at your expense.
Now, LLMs bring their special needs - large models that need to be loaded into vast fast memory... there are reasons to bring the compute to the model. But it's definitely not trivially the case that there's more compute in servers than clients.
pdpi
5 months ago
As an industry, we've swung from thin clients to fat clients and back countless times. I'm sure LLMs won't be immune to that phenomenon.
Closi
5 months ago
IMO the benefit of a local LLM on a smartphone isn't necessarily compute power/speed - it's reliability without a reliance on connectivity, it can offer privacy guarantees, and assuming the silicon cost is marginal, could mean you can offer permanent LLM capabilities without needing to offer some sort of cloud subscription.
hapticmonkey
5 months ago
If the future is AI, then a future where every compute has to pass through one of a handful of multinational corporations with GPU farms...is something to be wary of. Local LLMs is a great idea for smaller tasks.
Nevermark
5 months ago
Boom! [0]
> Deepseek-r1 was loaded and ran locally on the Mac Studio
> M3 Ultra chip [...] 32-core CPU, an 80-core GPU, and the 32-core Neural Engine. [...] 512GB of unified memory, [...] memory bandwidth of 819GB/s.
> Deepseek-r1 was loaded [...] 671-billion-parameter model requiring [...] a bit less than 450 gigabytes of [unified] RAM to function.
> the Mac Studio was able to churn through queries at approximately 17 to 18 tokens per second
> it was observed as requiring 160 to 180 Watts during use
Considering getting this model. Looking into the future, a Mac Studio M5 Ultra should be something special.
[0] https://appleinsider.com/articles/25/03/18/heavily-upgraded-...
waterTanuki
5 months ago
I regularly use local LLMs at work (full stack dev) due to restrictions and occasionally I get some results comparable to gpt-5 or opus 4
rowanG077
5 months ago
That's assuming diminishing returns won't hit hard. If a 10x smaller local model is 95%(Whatever that means) as good as the remote model it makes sense to use local models most of the time. It remains to be seen if that will happen but it's certainly not unthinkable imp.
PaulRobinson
5 months ago
Apple literally mentioned local LLMs in the event video where they announced this phone and others.
Apple's privacy stance is to do as much as possible on the user's device and as little as possible in cloud. They have iCloud for storage to make inter-device synch easy, but even that is painful for them. They hate cloud. This is the direction they've had for some years now. It always makes me smile that so many commentators just can't understand it and insist that they're "so far behind" on AI.
All the recent academic literature suggests that LLM capability is beginning to plateau, and we don't have ideas on what to do next (and no, we can't ask the LLMs).
As you get more capable SLMs or LLMs, and the hardware gets better and better (who _really_ wants to be long on nVIDIA or Intel right now? Hmm?), people are going to find that they're "good enough" for a range of tasks, and Apple's customer demographic are going to be happy that's all happening on the device in their hand and not on a server [waves hands] "somewhere", in the cloud.
fennecfoxy
5 months ago
I think they will be, but more for hand-off. Local will be great for starting timers, adding things to calendar, moving files around. Basic, local tasks. But it also needs to be intelligent enough to know when to hand off to server-side model.
Android crowd has been able to run LLMs on-device since LlamaCPP first came out. But the magic is in the integration with OS. As usual there will be hype around Apple, idk, inventing the very concept of LLMs or something. But the truth is neither Apple nor Android did this; only the wee team that wrote the attention is all you need paper + the many open source/hobbyist contributors inventing creative solutions like LoRA and creating natural ecosystems for them.
That's why I find this memo so cool (and will once again repost the link): https://semianalysis.com/2023/05/04/google-we-have-no-moat-a...
brookst
5 months ago
Couldn’t you apply that same thinking to all compute? Servers will always have more, timesharing means lower cost, people will probably only ever own dumb terminals?
MPSimmons
5 months ago
The crux is how big the L is in the local LLMs. Depending on what it's used for, you can actually get really good performance on topically trained models when leveraged for their specific purpose.
alwillis
5 months ago
> don't see why not, then the era of viable local LLM inferencing is upon us. I don't think local LLMs will ever be a thing except for very specific use cases.
I disagree.
There's a lot of interest in local LLMs in the LLM community. My internet was down for a few days and did I wish I had a local LLM on my laptop!
There's a big push for privacy; people are using LLMs for personal medical issues for example and don't want that going into the cloud.
Is it necessary to talk to a server just to check out a letter I wrote?
Obviously with Apple's release of iOS 26 and macOS 26 and the rest of their operating systems, tens of millions of devices are getting a local LLM with 3rd party apps that can take advantage of them.
unethical_ban
5 months ago
It's a thing right now.
I'm running Qwen 30B code on my framework laptop to ask questions about ruby vs. python syntax because I can, and because the internet was flaky.
At some point, more doesn't mean I need it. LLMs will certainly get "good enough" and they'll be lower latency, no subscription, and no internet required.
hotstickyballs
5 months ago
If compute power is the deciding factor server vs edge discussion then we’d never have smartphones.
nsonha
5 months ago
local LLM may not be good enough for answering questions (which I think won't be true really soon) or generating images, but it today should be good enough to infer deeplinks and app extension calls or agentic walk-through... and ushers a new era of controlling phone by voice command.
chisleu
5 months ago
Because of the prompt processing speed, small models like Qwen 3 coder 30b a3b are the sweet spot for mac platform right now. Which means a 32 or 64GB mac is all you need to use Cline or your favorite agent locally.
DrAwdeOccarim
5 months ago
Yes, I use LM Studio daily with Qwen 3 30b a3b. I can't believe how good it is locally.
supportengineer
5 months ago
I was reminded of this today for no particular reason:
"iPhone4 vs HTC Evo"
ottah
5 months ago
Nah, memory is still the bottleneck. Kernel performance is already pretty good, but cpu memory is still dramatically slower than gpu memory.
user
5 months ago
aagha
5 months ago
Apple is playing 3D chess while every other PC maker is learning how to play checkers.
bendoy
5 months ago
I'm most excited about the heart rate sensor in Airpods Pro 3!
amelius
5 months ago
> It has A19 Pro.
But it's not general purpose. Broken by design.
I'll pass. Not going to support this. We need less of this crap not more.
Uehreka
5 months ago
I will believe this when I see it. It’s totally possible that those capabilities are locked behind some private API or that there’s some weedsy hardware complication not mentioned that makes them non-viable for what we want to do with them.
aurareturn
5 months ago
Already available via Metal: https://x.com/liuliu/status/1932158994698932505
llm_nerd
5 months ago
They might recommend using CoreML to leverage them, though I imagine it will be available to Metal.
The whole point of CoreML is that your solution uses whatever hardware is available to you, including enlisting a heterogeneous set of units to conquer a large problem. Software written years ago would use the GPU matmul if deployed to a capable machine.
ActorNightly
5 months ago
Good luck actually getting access to ANE. There is a reason why Pytorch doesn't use it even if its been around for a while.
aurareturn
5 months ago
No luck needed. This is for the GPU and is already available via Metal. https://x.com/liuliu/status/1932158994698932505
SilverElfin
5 months ago
deleted
apparent
5 months ago
According to this page, [1] it reduces unwanted noise 4x as much as the original AirPods Pro and 2x as much as the AirPods Pro 2.
Though I do wonder, given the logarithmic nature of sound perception, are these numbers deceptive in terms of what the user will perceive?
WanderPanda
5 months ago
It was 4x over the original version IIRC so should be ~ 2x over the previous
Aperocky
5 months ago
So.. 6 hour batteries like the Apple Watch?
apparent
5 months ago
According to Apple's comparison tool, the Air has 27 hrs of video playback, compared to 30 for the 17 and 39 for the Pro.
Based on that, it doesn't sound like it's that much worse. Of course, if you're trying to maximize battery longevity by not exceeding 80% charge, that might make it not very useful for many people.
mbirth
5 months ago
But there’s this now:
baby
5 months ago
IMO it's underwhelming considering folding phones have been out for many years now and we still don't have a folding iPhone. What are the PMs doing at Apple.
ndiddy
5 months ago
I think folding phones will remain a small niche unless someone figures out how to make a foldable screen that doesn't get permanently scratched by your fingernails.
bayindirh
5 months ago
> What are the PMs doing at Apple.
Probably trying to find better screen materials, and addressing reliability issues.
I used Palm devices with resistive touch screens. It was good, but when you go glass, there's no turning back.
I would never buy a phone with folding screens protected by plastic. I want a dependable slab. Not a gimmicky gadget which can die any moment. I got my fix for dying flex cables with Casiopeia PDAs. Never again.
erikpukinskis
5 months ago
Folding phones are ~1.5% of the market.
Apple cancelled their mini line which was 3% of sales.
It’s not a big enough slice for them to want to chase.
jsheard
5 months ago
I think they'd rather sell you an iPhone and an iPad Mini rather than one device that does both, just like they'd rather sell you an iPad Air/Pro and a MacBook with basically the same internals, rather than a convertible macOS tablet.
meindnoch
5 months ago
Aside from the obvious mechanical issues, the screen quality compromises, et cetera, folding phones are just dorky. Apple wants their products to be anything but dorky.
There will never be a folding iPhone, simple as.
Miraste
5 months ago
They're in the right. Folding phones are great, and I've used one for years, but the technology hasn't reached Apple levels. Get rid of the crease, make the screen less scratchable, and make them waterproof, and then it could go in an iPhone.
boppo1
5 months ago
Folders seem gimmicky to me
yoyohello13
5 months ago
The PMs are probably thinking folding phones are dumb…because they are.
nylonstrung
5 months ago
Marques Brownlee said they have prototypes for a folding phone and will likely release one
swiftcoder
5 months ago
Do any of the folding phones actually work well? I still haven't seen one in the wild (admittedly, I'm not living in a tech Mecca these days)
caycep
5 months ago
I dunno, I always felt folding phones added unnecessary complexity and moving parts. The slab phone seems closer to a platonic ideal and from a user/engineering perspective, has less compromises
runako
5 months ago
In all seriousness, is there a folding phone that doesn't have a crease in the screen while unfolded?
The one I have used felt like using a real phone through a layer of vinyl, definitely not a pleasant experience.
rickdeckard
5 months ago
> IMO it's underwhelming considering folding phones have been out for many years now and we still don't have a folding iPhone. What are the PMs doing at Apple.
They're buying another year of very-high margin phones I guess...
busymom0
5 months ago
I know they have been out for a while but I have yet to see a single one in person. They just don't make much of the market.
pdntspa
5 months ago
Why do we need a folding phone?