mghaderi
16 hours ago
I implemented a neural network from scratch in x86 assembly (no frameworks, no Python) to recognize handwritten digits from MNIST. Feedback on performance optimizations or next steps is welcome Uses AVX-512 SIMD for parallel float32 ops (~7× faster than NumPy). Runs in a lightweight Debian Slim Docker container. The goal was to understand neural networks at the CPU level.
checker659
16 hours ago
> ~7× faster than NumPy
Is that on the CPU (not sure if NumPy has a GPU backend)
mghaderi
16 hours ago
Yes CPU same resources And same implementation