hackernews client

77 pointsposted 18 hours ago

9 Comments

17 hours ago

I wonder how much improvement is owed to which changes. I've also never heard of "Muon - Momentum Orthogonalized by Newton-schulz" being used.

EDIT: there's a bit more info on his twitter - https://x.com/kellerjordan0

It looks like he created this optimizer. Works on 2D matrices only.

15 hours ago

Just needs a Zero To Hero series episode offering line by line commentary to follow along on why each choice was made over alternatives.

17 hours ago

Cool work. No license?

10 hours ago

do you have a baseline of the regular implementation with 3x learning rate?

15 hours ago

So it compresses info better.

15 hours ago

That is literally intelligence.

12 hours ago

It's not.

17 hours ago

Seems like this is a modded NanoGPT not the original.

17 hours ago

Yes. It’s literally called “Modded-NanoGPT”.