trott
9 hours ago
TLDR:
Physicians scored 73.7. Physicians armed with GPT-4 scored 76.3. But GPT-4 alone scored 89.2.
The authors think it's unlikely that the materials are in the GPT-4 training data, because the cases have never been publicly released.