Surprised this isn't by lucidrains, they usually have the first repro attempts.
This tidbit from a discussion on that repo sounds really interesting:
> You can load a pretrained transformer backbone, freeze it, and train only the HOPE/TITAN/CMS memory pathways.
In principle, you would:
- Freeze the shared transformer spine (embeddings, attention/MLP blocks, layer norms, lm_head) and keep lm_head.weight tied to embed.weight.
- Train only the HOPE/TITAN memory modules (TITAN level, CMS levels, self-modifier projections, inner-optimizer state).
- Treat this like an adapter-style continual-learning finetune: base model provides stable representations; HOPE/CMS learn to adapt/test-time-learn on top.
----
Pretty cool if this works. I'm hopeful more research will go into reusing already trained models (other than freeze existing parts, train the rest) so all that training effort doesn't get lost. Something that can re-use that w/ architecture enhancements will be truly revolutionary.