kelseyfrog
4 days ago
If you squint your eyes it's a fixed iteration ODE solver. I'd love to see a generalization on this and the Universal Transformer metioned re-envisioned as flow-matching/optimal transport models.
kevmo314
4 days ago
How would flow matching work? In language we have inputs and outputs but it's not clear what the intermediate points are since it's a discrete space.
Etheryte
4 days ago
One of the core ideas behind LLMs is that language is not a discrete space, but instead a multidimensional vector field where you can easily interpolate as needed. It's one of the reasons LLMs readily make up words that don't exist when translating text for example.
cfcf14
4 days ago
This makes me think it would be nice to see some kinda child of modern transformer architecture and neural ODEs. There was such interesting work a few years ago on how neural ode/pdes could be seen as a sort of continuous limit of layer depth. Maybe models could learn cool stuff if the embeddings were somehow dynamical model solutions or something.