Hackernews
new
show
ask
jobs
Batched reward model inference and Best-of-N sampling
33 points
posted 3 days ago
by rawsh
(raw.sh)
No comments yet