SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks

1 pointsposted 14 hours ago
by FiberBundle

1 Comments