How NYU Langone is combating ‘automation bias’

Advertisement

AI-generated hospital course summaries required less editing than physician-written ones but contained more errors, a study led by New York City-based NYU Langone Health found.

A generative AI tool embedded in Epic’s EHR was piloted on 100 general medicine admissions. Residents edited a smaller share of AI-generated summaries, suggesting the technology could help speed up the process of compiling detailed discharge documentation. But attending physicians reviewing the notes flagged more inaccuracies in the AI-originated drafts, raising concerns about “automation bias” — the tendency to trust polished AI output and overlook mistakes.

“It’s really important that the end users know not only that AI can make errors and the types of errors it can make, but that it can convince clinicians that what it is putting in front of you is correct,” William Small, MD, lead author of the study and associate medical director of clinical informatics and applied AI at NYU Langone’s MCIT Department of Health Informatics, told Becker’s.

Following the study, NYU Langone added a safeguard: Every fact in an AI-generated summary now links back to the original clinical note so clinicians can quickly verify accuracy. Education and real-time decision support are also priorities as the technology is scaled, Dr. Small said.

During the tool’s development, Epic handled the integration while NYU Langone’s team designed the prompt that turns clinical notes into a narrative. That prompt was built after weeks of internal discussions to define what makes a “good” hospital course.

“We looked sort of everywhere in the literature and couldn’t find a consensus definition,” Dr. Small said. “The first weeks of this effort were really just sitting down with hospitalists and leaders in internal medicine and working to define that before we then asked the language model to write a hospital course.”

The team formalized that definition into what it called a “4Cs” quality standard: complete, concise, cohesive and confabulation free. In the study, attending physicians rated AI-originated drafts higher on completeness and similar on conciseness and cohesion.

Next, the health system plans a randomized controlled trial to study the tool’s effect on note quality and time spent charting.

“The goal is to turn it on for some clinicians and not for others, and evaluate whether we see improvements in the quality of their notes but also in the time they’re spending on those notes,” Dr. Small said.

He advised other systems to clearly define documentation standards before adopting similar tools and to remain alert to potential errors.

“Errors slip through the chart at all times,” he said. “Humans, especially when they’re doing lots of different things, are particularly prone to automation bias because it is easy to just copy and paste without realizing the downstream implications of an error or two in the sentences.”

Advertisement

Next Up in Artificial Intelligence

Advertisement