SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

22 pointsposted 5 months ago
by landonxi

1 Comments

egillie

5 months ago

Is this because GPT-5 hallucinates less in general?