hackernews client

billconan

10 months ago

the original models can only generate sentence embeddings, correct?

can a token prediction model use this?

stephantul

10 months ago

Hey, thanks for the question.

You are right, this can only be used to distill models for producing embeddings, although it's not restricted to encoder-only models. For example, you could use it on Llama, but you'd just get a bunch of embeddings out, not a model that can be used to do next-token prediction.

Show HN: Model2Vec: make sentence transformers 500x faster on CPU, 15x smaller

2 Comments

billconan

stephantul