enochyearn
11 hours ago
I watched a video about DeepSeek’s mHC and couldn’t stop thinking about it, so I implemented a minimal version in MLX over the weekend and used it in place of residual connections to compare against a baseline ResNet.
Ran a quick stability check at depth=500 on Fashion-MNIST, no divergence, and the results were better than I expected.