Hackernews
new
show
ask
jobs
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks
1 points
posted 14 hours ago
by FiberBundle
(arxiv.org)
1 Comments
cestivan
13 hours ago
[dead]