DeepSWE: Measuring frontier coding agents on original, long-horizon SWE tasks

2 pointsposted 5 hours ago
by WarmWash

No comments yet