SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

16 pointsposted 2 hours ago
by landonxi

1 Comments

egillie

2 hours ago

Is this because GPT-5 hallucinates less in general?