hackernews client

asqpl

6 hours ago

Engineer here – I run a therapeutic AI system for elderly care in assisted living facilities. Entire stack runs locally on Mac Minis (no cloud, no GPUs) for privacy and reliability.

Qwen 3.5's DeltaNet architecture broke our llama.cpp inference (1.5s → 21s). Migrated to Apple's MLX framework and brought it back to 7s while maintaining 100% clinical pass rates.

Happy to answer questions about MLX vs llama.cpp, DeltaNet optimization, or running SLMs on Apple Silicon for production healthcare workloads.

Migrating Elderly Care AI from Qwen 3 to 3.5 on Apple Silicon – 14x Latency Fix

1 Comments

asqpl