Show HN: Agent-evals – Claude skill to build your own evals

6 pointsposted 9 hours ago
by sauercrowd

1 Comments

johnjudeh

7 hours ago

Thanks for sharing! It’s way easier to build an agent that can complete a task than to make sure it works across all the cases you care about. Especially when the output quality is really subjective