Hackernews
new
show
ask
jobs
Adaptive speculative decoding: picking draft lengths at runtime
2 points
posted 9 hours ago
by hasheddan
(fergusfinn.com)
No comments yet