billconan
10 months ago
the original models can only generate sentence embeddings, correct?
can a token prediction model use this?
stephantul
10 months ago
Hey, thanks for the question.
You are right, this can only be used to distill models for producing embeddings, although it's not restricted to encoder-only models. For example, you could use it on Llama, but you'd just get a bunch of embeddings out, not a model that can be used to do next-token prediction.