Hackernews
new
show
ask
jobs
Terminal-bench: a benchmark for AI agents in terminal environments
3 points
posted 13 hours ago
by cpard
(tbench.ai)
No comments yet