Show HN: Model2Vec: make sentence transformers 500x faster on CPU, 15x smaller

6 pointsposted 10 months ago
by stephantul

2 Comments

billconan

10 months ago

the original models can only generate sentence embeddings, correct?

can a token prediction model use this?

stephantul

10 months ago

Hey, thanks for the question.

You are right, this can only be used to distill models for producing embeddings, although it's not restricted to encoder-only models. For example, you could use it on Llama, but you'd just get a bunch of embeddings out, not a model that can be used to do next-token prediction.