trott
a year ago
TLDR:
Physicians scored 73.7. Physicians armed with GPT-4 scored 76.3. But GPT-4 alone scored 89.2.
The authors think it's unlikely that the materials are in the GPT-4 training data, because the cases have never been publicly released.
panabee
a year ago
thanks for sharing.
the implications are fascinating, if the findings are generalizable and reproducible.
the study suggests LLMs may already be materially superior to experts in a critical field like medicine, and that inexpert users hold back LLMs.
given the author affiliations, it's also likely that the tested physicians are in the top tier -- suggesting even greater disparity between LLMs and doctors in less advanced areas.
panabee
a year ago
a doctor friend highlighted two key limitations: only six cases were evaluated per physician and half the physicians were only residents.