DPO fine-tuning outperforms SFT

1 pointsposted a year ago
by kcorbitt

No comments yet