ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents

2 pointsposted 11 hours ago
by gmays

No comments yet