The dangers of healthcare generative AI 'drift' - Becker's Hospital Review

The performance of GPT-4, the large language model that powers ChatGPT, in answering healthcare questions can change over time, a phenomenon known as “drift,” according to a study by researchers at Somerville, Mass.-based Mass General Brigham. Their work was published Aug. 8 in NEJM AI.

“Generative AI performed relatively well, but more improvement is needed for most use cases,” said corresponding author Sandy Aronson, executive director of IT and AI solutions at Mass General Brigham Personalized Medicine, in an Aug. 13 statement. “However, as we ran our tests repeatedly, we observed a phenomenon we deemed important: running the same test dataset repeatedly produced different results.”

Mr. Aronson and his fellow researchers were analyzing whether the technology could scan scientific articles to help geneticists with assessments of genetic variants. The variability of the results could differ across days, so the authors say the AI’s performance needs to be continuously monitored.

At the Becker's 11th Annual IT + Revenue Cycle Conference: The Future of AI & Digital Health, taking place September 14–17 in Chicago, healthcare executives and digital leaders from across the country will come together to explore how AI, interoperability, cybersecurity, and revenue cycle innovation are transforming care delivery, strengthening financial performance, and driving the next era of digital health. Apply for complimentary registration now.

Next Up in Innovation

The dangers of healthcare generative AI ‘drift’

The hidden cost of lost clinical time and how leading health systems are responding

Next Up in Innovation

Cyclosporiasis cases top 18,000: 3 updates

Health plan customer service reps have AI in their ear

Nebraska launches dashboard tracking rural health funds

The hidden cost of lost clinical time and how leading health systems are responding

Next Up in Innovation

Join the 500,000+ healthcare executives who start their day with Becker’s