LLMs and Diagnostic Reasoning: A Randomized Clinical Vignette Study [pdf]

2 pointsposted 9 hours ago
by trott

1 Comments

trott

9 hours ago

TLDR:

Physicians scored 73.7. Physicians armed with GPT-4 scored 76.3. But GPT-4 alone scored 89.2.

The authors think it's unlikely that the materials are in the GPT-4 training data, because the cases have never been publicly released.