DPO fine-tuning outperforms SFT

1 pointsposted 9 hours ago
by kcorbitt

No comments yet