hackernews client

trott

a year ago

TLDR:

Physicians scored 73.7. Physicians armed with GPT-4 scored 76.3. But GPT-4 alone scored 89.2.

The authors think it's unlikely that the materials are in the GPT-4 training data, because the cases have never been publicly released.

panabee

a year ago

thanks for sharing.

the implications are fascinating, if the findings are generalizable and reproducible.

the study suggests LLMs may already be materially superior to experts in a critical field like medicine, and that inexpert users hold back LLMs.

given the author affiliations, it's also likely that the tested physicians are in the top tier -- suggesting even greater disparity between LLMs and doctors in less advanced areas.

panabee

a year ago

a doctor friend highlighted two key limitations: only six cases were evaluated per physician and half the physicians were only residents.

LLMs and Diagnostic Reasoning: A Randomized Clinical Vignette Study [pdf]

3 Comments

trott

panabee

panabee