Batched reward model inference and Best-of-N sampling

34 pointsposted a year ago
by rawsh

No comments yet