drdexebtjl
13 hours ago
Yikes. This is literally only useful to justify layoffs.
13 hours ago
Yikes. This is literally only useful to justify layoffs.
13 hours ago
Looks cool at a first glance, can't wait to play around with it!
14 hours ago
Hey everyone we're CueBench (S26). As teams go agent-first, everyone benchmarks the agents; nobody measures how well people drive them. We score a coding-agent session (Claude Code, Codex, Cursor, PI) on the human side: delegation, task description, catching the agent's mistakes, and verifying before shipping. 0–100 plus a breakdown.
Scoring is deterministic, built on measurable signals from the session, not an LLM vibing on your transcript. Same session, same score.
We just opened a public demo and need real sessions thrown at it. Nothing to install, nothing runs on your machine, just upload a session file from your agent's logs (or paste one terminal command) and you get scored in seconds.
Where it's going: a product for engineering orgs — session-level feedback that upskills engineers at agent-driven development, and gives managers a skills signal (coaching, not surveillance).
The ask: run one real session through it this week and tell us where the score feels wrong. Brutal > polite. Demo video: https://youtu.be/r9vAdAMv6js