Duke debuts frameworks to assess ambient AI tools

Advertisement

As hospitals increasingly adopt AI to assist with documentation, a team at Duke University in Durham, N.C., is offering new tools to help ensure those technologies are safe and effective for patient care.

In two newly published studies, Duke researchers introduced evaluation frameworks aimed at measuring how well large language models — the AI systems behind many automated note-taking and patient messaging tools — perform in real-world medical settings. The research was published in npj Digital Medicine and the Journal of the American Medical Informatics Association.

The first framework, named SCRIBE, was created to assess ambient digital scribes — AI programs that generate clinical notes by listening to conversations between patients and providers. The tool uses a combination of expert reviews, automated scoring and simulations to evaluate how well these models capture important information, avoid bias and maintain clinical accuracy.

The second study focused on AI models built into Epic’s EHR system that help providers draft responses to patient messages. Duke’s research team compared the AI’s responses to physician feedback and found that while the tone and readability were generally strong, there were some shortcomings in completeness — a critical factor in patient communication.

Together, the frameworks are intended to give health systems, software developers and regulators practical ways to test AI tools before and after deployment, according to a June 16 news release shared with Becker’s. The authors emphasized that continuous monitoring is just as important as initial testing.

Advertisement

Next Up in EHRs / Interoperability

Advertisement