AI misrepresents medical risk terms: Study

Advertisement

Large language models frequently misrepresent verbal risk terms used in medicine, potentially amplifying patient misunderstandings and diverging from established clinical definitions, according to a study spearheaded by researchers from Vanderbilt Health in Nashville, Tenn.

The findings, published Dec. 17 in JAMA Network Open, stem from an evaluation of four commercial large language models using 800 patient-style prompts containing terms such as “rare,” “common” and “likely.”

The models abstained from giving numeric definitions more than 90% of the time in some instances. When they did respond, their interpretations often strayed from regulatory benchmarks, the study found. For example, the term “rare” was defined as affecting up to 4% of people on average — well above the European Commission’s threshold of 0.1%. “Common,” was defined as high as 36%.

The models were more likely to avoid numeric answers and generate longer, more readable responses when prompts included anxious language or described severe clinical situations, according to the study.

Advertisement

Next Up in Artificial Intelligence

Advertisement