Bluffbench is near saturation: LLMs can interpret counterintuitive plots

2 pointsposted 14 hours ago
by ionychal

No comments yet