Show HN: Model2Vec: make sentence transformers 500x faster on CPU, 15x smaller

6 pointsposted 12 hours ago
by stephantul

2 Comments

billconan

7 hours ago

the original models can only generate sentence embeddings, correct?

can a token prediction model use this?

stephantul

7 hours ago

Hey, thanks for the question.

You are right, this can only be used to distill models for producing embeddings, although it's not restricted to encoder-only models. For example, you could use it on Llama, but you'd just get a bunch of embeddings out, not a model that can be used to do next-token prediction.