I hope to help you evaluate your GenAI App

1 pointsposted 13 hours ago
by shloveai

2 Comments

shloveai

13 hours ago

Someone said 2025 was the year of Agents. I feel like 2026 would be the year of evaluation. Seeing more and more agents are hitting the wall and increasing needs to keep all data and evaluation on local, I built Evalyn — a local-first evaluation pipeline for LLM and agent apps. It traces real executions, evaluates them with suggested metrics and LLM judges, and automatically calibrates those judges using a small amount of human feedback, all without sending data to a SaaS.

It’s open-source, CLI-driven, and meant to make evals something you can actually trust to evolve your GenAI app. Would love to white-glove support for whom are interested in it.