Hackernews
new
show
ask
jobs
SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations
16 points
posted 2 hours ago
by landonxi
(surgehq.ai)
1 Comments
egillie
2 hours ago
Is this because GPT-5 hallucinates less in general?