Hackernews
new
show
ask
jobs
Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering
1 points
posted 10 hours ago
by popey
(arxiv.org)
1 Comments
pqtr2
6 hours ago
Couldn't agree more. Coding benchmarks are just a score. Benchmark the harness.