Hackernews
new
show
ask
jobs
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions
3 points
posted 11 hours ago
by jinqueeny
(github.com)
No comments yet