ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents

2 pointsposted 3 months ago
by gmays

No comments yet