CueBench for Developers is live: score how well you drive coding agents

9 pointsposted 14 hours ago
by DillonMehta

3 Comments

drdexebtjl

13 hours ago

Yikes. This is literally only useful to justify layoffs.

jadyen

13 hours ago

Looks cool at a first glance, can't wait to play around with it!

DillonMehta

14 hours ago

Hey everyone we're CueBench (S26). As teams go agent-first, everyone benchmarks the agents; nobody measures how well people drive them. We score a coding-agent session (Claude Code, Codex, Cursor, PI) on the human side: delegation, task description, catching the agent's mistakes, and verifying before shipping. 0–100 plus a breakdown.

Scoring is deterministic, built on measurable signals from the session, not an LLM vibing on your transcript. Same session, same score.

We just opened a public demo and need real sessions thrown at it. Nothing to install, nothing runs on your machine, just upload a session file from your agent's logs (or paste one terminal command) and you get scored in seconds.

Where it's going: a product for engineering orgs — session-level feedback that upskills engineers at agent-driven development, and gives managers a skills signal (coaching, not surveillance).

The ask: run one real session through it this week and tell us where the score feels wrong. Brutal > polite. Demo video: https://youtu.be/r9vAdAMv6js