Hackernews
new
show
ask
jobs
Evals in 2025: benchmarks to build models people can use
2 points
posted 10 hours ago
by jxmorris12
(github.com)
No comments yet