Hackernews
new
show
ask
jobs
Moe inference optimizations: 15% lower expert load by request reordering
3 points
posted 11 hours ago
by mezark
(blog.doubleword.ai)
No comments yet