hackernews client

johnjudeh

7 hours ago

Thanks for sharing! It’s way easier to build an agent that can complete a task than to make sure it works across all the cases you care about. Especially when the output quality is really subjective

Show HN: Agent-evals – Claude skill to build your own evals

1 Comments

johnjudeh