Hackernews
new
show
ask
jobs
DPO fine-tuning outperforms SFT
1 points
posted 9 hours ago
by kcorbitt
(openpipe.ai)
No comments yet