Nvidia releases 8B model with learned 8x KV cache compression

9 pointsposted 14 days ago
by alecco

5 Comments

vercaemert

13 days ago

I'd be interested to hear some use cases people have for large contexts on an 8B model. Other than sentiment analysis or summarization (this release implies agentic use). My experience with the general intelligence of agentic interactions is that everything is unusable before 32B for any context greater than 4k tokens.

user

13 days ago

[deleted]

WaalkTheEaarth

11 days ago

I personally use a 8B model for general use on my laptop lol, it works like a charm and makes sense (most of the time atleast)