hackernews client

SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

22 pointsposted 5 months ago

1 Comments

egillie

5 months ago

Is this because GPT-5 hallucinates less in general?