ChatGPT on par with an 'intern or resident,' Mass General Brigham finds

ChatGPT is 72 percent accurate at making clinical decisions, performing better at final diagnoses and worse at determining possible diagnoses, according to an Aug. 22 study by Mass General Brigham researchers.

"No real benchmarks exist, but we estimate this performance to be at the level of someone who has just graduated from medical school, such as an intern or resident," said the study's corresponding author, Marc Succi, MD, associate chair of innovation and commercialization at Somerville, Mass.-based Mass General Brigham and executive director of its MESH Incubator, in an Aug. 22 news release.

The researchers fed 36 published clinical scenarios into ChatGPT, asking it to come up with possible, or differential, diagnoses, then gave it additional information before requesting a final diagnosis and treatment plan, according to the study in the Journal of Medical Internet Research. The artificial intelligence chatbot was 77 percent accurate in its final diagnoses, 68 percent accurate in clinical management decisions such as what medications to take, and 60 percent accurate in differential diagnoses.

Adam Landman, MD, CIO and senior vice president of digital at Mass General Brigham, said in the release that the health system "sees great promise" for large language models such as ChatGPT.

"We are currently evaluating LLM solutions that assist with clinical documentation and draft responses to patient messages with a focus on understanding their accuracy, reliability, safety and equity," he stated. "Rigorous studies like this one are needed before we integrate LLM tools into clinical care."

Copyright © 2024 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.

 

Featured Whitepapers

Featured Webinars

>