Hackernews
new
show
ask
jobs
DeepSWE results are unreliable – 3/3 DSv4 "failed" tasks solved with same model
2 points
posted 5 hours ago
by theanonymousone
(github.com)
No comments yet