Show HN: ARM64-optimized prime sieve with 3.75x memory compression

1 pointsposted 5 hours ago
by amadeapewe

1 Comments

amadeapewe

5 hours ago

Hi HN,I’m a graphic designer and artist by background, but I’ve always been fascinated by patterns. I spent some time visualizing prime number distributions on paper and arrived at a geometric layout that felt very efficient for memory.With some help from AI (Gemini/ChatGPT), I translated this into C++. The speedup (~3.1x on M1) isn't from new math, but from optimizing how data sits in the cache. It uses an 8-pipe wheel ($H=30$) packed into a single byte to reduce memory traffic.I'm sharing this for a sanity check from the systems/HPC community. Would love to hear if this approach to memory locality is something you've seen before or if there's room to push it further.Benchmarks and the original paper sketch are in the repo. Thanks!