3 hospital execs: How to ensure medical AI is trained on sufficiently diverse patient data

Katie Adams - Print  | 

In the past decade, hospitals have increasingly adopted artificial intelligence tools to reduce inefficiencies, optimize clinician workflow and expedite diagnostics — but these tools could exacerbate health disparities if trained on datasets that aren't diverse enough to accurately represent the individuals they will be treating.

On Oct. 22, the FDA held a nearly seven-hour patient engagement meeting on the use of artificial intelligence in healthcare, in which experts in the fields of medicine, regulations, technology and public health address the public's questions about machine learning in medical devices. Among the topics was the composition of the datasets that train AI-based medical devices. 

The panel said a lack of transparency surrounding the datasets that train algorithms can lead to public mistrust in AI-powered medical tools, as these devices may not have been trained using patient data that represents the patients on which they will be used.

During the meeting, Center for Devices and Radiological Health Director Jeffrey Shuren, MD, noted that 562 AI-powered medical devices have received FDA emergency use authorization and pointed out that all patients should be considered when these devices are being developed and regulated.

Below, executives from three leading health systems discuss what can be done to ensure medical algorithms are being trained on sufficiently diverse patient data.

Editor's note: Responses have been edited lightly for clarity and length.

Edward Lee, MD, associate executive director at Kaiser Permanente and executive vice president of IT and CIO at the The Permanente Federation (Oakland, Calif.): It's important for the healthcare industry to recognize that AI algorithms trained on insufficiently diverse data can lead to AI bias. At a time when we are incorporating more and more AI in medicine, this bias can inadvertently contribute to the widening of health care disparities. One of the first steps we need to take is to be intentional in looking for bias. If we don't look we'll never find it, so understanding that AI bias can be part of any algorithm is essential.

Because bias can be introduced at multiple points throughout the algorithm development process, careful consideration is needed during all of the steps. This can start as early as building a diverse team that can bring different perspectives and expand thinking about the way data is collected, curated and analyzed. Additional key mitigating steps are including as broad a dataset as possible, and continually validating and revalidating results of an AI algorithm to confirm the output makes clinical sense. 

Ultimately, I consider AI to be augmented intelligence and not simply artificial intelligence. It is most impactful when used as a tool for physicians to augment, assist and complement their clinical decision making rather than a standalone technology.

Patrick McCarthy, MD, executive director at Northwestern Medicine's Bluhm Cardiovascular Institute (Chicago): In order to ensure appropriate representation in AI systems, the architects of these systems must have foresight to incorporate diverse training data sets and, most importantly, repeatedly validate prospectively on the target populations to evaluate for implicit bias. 

Ramsey Wehbe, MD, in the midst of our Northwestern Medicine Fellowship in Artificial Intelligence in Cardiovascular Disease, recently published a paper with impressive results from a set of 5,853 patients using machine learning to accurately identify COVID-19 using chest X-rays. The code is now publicly accessible to allow other investigators to enter more data, test the AI algorithm and improve accuracy. We hope that wide adoption and testing of Ramsey’s AI DeepCovidXR algorithm and others like it will allow data from urban, suburban, rural, VA, safety net and Native American hospitals to be entered so that these datasets accurately represent the broader U.S. population, increasing accuracy and generalizability through the use of diverse datasets.

Shamim Nemati, PhD, director of predictive health analytics at UC San Diego Health: As with any other systemic problem, multiple issues that we used to think of as distinct topics — such as strict cross-institutional data sharing policies pertaining to privacy and ownership, lack of interoperability of EHR systems, market competition and short-term financial incentives — are all having compounding effects. 

Like a hammer, an AI algorithm can be used (or misused) for different purposes. For example, a 'no show' prediction algorithm can be adopted to double-book patients or to find patients who could benefit from a ride-sharing program. The good news is the more we move towards inclusion, the more we will enrich our datasets and get closer to the ideals of fairness and equity.

Copyright © 2021 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.