Cappybara12
6 hours ago
I found the disagreement striking. Kaiser argues Transformers still win unless someone shows a better scaling curve while the other researchers argue the field is overfitting to current hardware and missing better architectures.
There was a back-and-forth on scaling, hardware constraints, continual learning and latent reasoning.
pari_d
5 hours ago
[dead]