fxtentacle
a year ago
The title is not wrong, but it also doesn't feel correct either. What they do here is they use a pre-trained model to guide the training of a 2nd model. Of course, that massively speeds up training of the 2nd model. But it's not like you can now train a diffusion model from scratch 20x faster. Instead, this is a technique for transplanting an existing model onto a different architecture so that you don't have to start training from 0.