Batched reward model inference and Best-of-N sampling

33 pointsposted 3 days ago
by rawsh

No comments yet