Tiny hackable CUDA language model implementation

37 pointsposted 3 days ago
by markusheimerl

2 Comments

yobbo

6 hours ago

Looks very nice, but I can't find numerical gradient checks, which is helpful when verifying that backward pass is correct:

https://github.com/markusheimerl/gpt/blob/main/transformer/a...