Hackernews
new
show
ask
jobs
vLLM introduces memory optimizations for long-context inference
5 points
posted 6 hours ago
by addisud
(github.com)
1 Comments
addisud
6 hours ago
[dead]