hackernews client

westurner

3 months ago

From https://www.sigops.org/2025/wafer-scale-ai-compute-a-system-... :

> When designing efficient software for wafer-scale compute, PLMR can serve as a checklist: performance-critical AI kernel and parallelism strategy should be PLMR-compliant. Importantly, PLMR is not limited to wafer-scale chips; it reflects a broader architectural shift from unified memory to large-scale NUMA designs. Unified memory, typically implemented with crossbars or all-to-all interconnects, scales poorly because networking cost grows exponentially with the number of cores and memory units. By contrast, emerging interconnects such as 2D mesh, ND mesh, 2D torus, and 3D torus scale with linear networking cost, but shift the complexity of maintaining efficient parallel computation onto software.

Wafer-Scale AI Compute: A System Software Perspective

1 Comments

westurner