hackernews client

shloveai

a month ago

Someone said 2025 was the year of Agents. I feel like 2026 would be the year of evaluation. Seeing more and more agents are hitting the wall and increasing needs to keep all data and evaluation on local, I built Evalyn — a local-first evaluation pipeline for LLM and agent apps. It traces real executions, evaluates them with suggested metrics and LLM judges, and automatically calibrates those judges using a small amount of human feedback, all without sending data to a SaaS.

It’s open-source, CLI-driven, and meant to make evals something you can actually trust to evolve your GenAI app. Would love to white-glove support for whom are interested in it.

kundan_s__r

a month ago

please check verdic.dev

I hope to help you evaluate your GenAI App

2 Comments

shloveai

kundan_s__r