Moe inference optimizations: 15% lower expert load by request reordering

3 pointsposted 11 hours ago
by mezark

No comments yet