captradeoff
10 hours ago
My team developed an MCP server that improves the operations of coding agents and we found it very difficult to test and get quantitative metrics about how much the MCP improves performance.
Stop guessing and start benchmarking against popular standard evals like swe-bench with mcpbr.
We're curious for your feedback, suggestions for improvements, and of course pull requests! Does this solve a problem your team is facing?