Tried Implementing DeepSeek's MHC

1 pointsposted 11 hours ago
by enochyearn

1 Comments

enochyearn

11 hours ago

I watched a video about DeepSeek’s mHC and couldn’t stop thinking about it, so I implemented a minimal version in MLX over the weekend and used it in place of residual connections to compare against a baseline ResNet.

Ran a quick stability check at depth=500 on Fashion-MNIST, no divergence, and the results were better than I expected.