The new AI buzz in healthcare

Advertisement

The pace of change for AI in healthcare is surprising even seasoned IT leaders, moving beyond automation and generative AI to now include multimodal capabilities.

Multimodal AI can process and integrate multiple data types — such as images and text — at once to perform more complex tasks than single-modal AI. Technology leaders see this as yet another game-changer in healthcare.

“The most surprising emerging technology in healthcare over the past year has been the rise of multimodal generative AI assistants integrated into clinical workflows,” said Biju Samkutty, COO of Rochester, Minn.-based Mayo Clinic’s international and enterprise automation. “Unlike earlier AI solutions, these advanced systems synthesize data across clinical text, imaging and laboratory results to deliver real-time decision support and streamline complex documentation tasks. Their accelerated transition from pilot programs to enterprise-scale deployment has exceeded expectations and is poised to reshape care delivery models.”

The speed of multimodal generative AI allows the technology to see, hear, speak and “reason,” said Michael Maniaci, MD, chief clinical officer for Mayo Clinic’s Advanced Care at Home. Multimodal AI can interpret clinical conversations, analyze patient photos and respond in real time within the appropriate clinical context.

“In the hospital-at-home model, where care is decentralized and data is fragmented, this type of AI acts as a digital teammate, helping clinicians make faster, smarter decisions remotely,” said Dr. Maniaci. “What’s most surprising is how quickly we’ve gone from simple chatbots to AI capable of performing real-time documentation, symptom triage and patient education, all within 12 months.”

Mayo Clinic is using the tools to support nurses and physicians in acute care at home to streamline processes and connect all team members.

“For the first time, it feels like we’re developing a true ‘hospital intelligence layer’ where the building or technology becomes part of the care team,” said Dr. Maniaci.

Stanford (Calif.) Health Care is also using multimodal AI to seamlessly integrate the unstructured data and “unlock unprecedented innovation” for research and care delivery, according to Aditya Bhasin, vice president of software and chief of web systems at Stanford.

“We’re already seeing remarkable improvements — from personalized learning in medical education to accelerated research via automated literature reviews and clinical trial matching, and enhanced clinical practice through advanced diagnostics, patient communication and staff wellness,” said Mr. Bhasin. “What’s most striking is how quickly these technologies have moved from experimental concepts to practical applications, improving both patient outcomes and operational efficiency.”

Rahul Kashyap, MD, medical director of research at WellSpan Health in York, Pa., is seeing first hand how the multimodal large language models can integrate seamlessly into patient care.

“What once felt like science fiction — AI listening to patient-doctor conversations and generating structured notes — is now becoming reality,” said Dr. Kashyap. “These tools are not just replacing tasks but reimagining how care is delivered by integrating voice, image and data for real-time decision-making.”

Ashok Kurian, assistant vice president of data and AI at Texas Children’s Hospital in Houston had a similar reaction to the rise of multimodal AI, powered by cloud computing, over the last year. He’s seen the technology synthesize all forms of patient data, including images, genomics and wearable device data, to paint a broader picture of the patient’s situation.

“The holistic view enables more precise diagnosis, personalized treatment plans and accelerated medical research, all at a pace we couldn’t have imagined just a few years ago,” he said.

The technology is impressive, but leaders still face challenges for implementation. Salim Afshar, DMD, MD, faculty and AI translation lead for the Health Systems Innovation Lab at Boston-based Harvard T.H. Chan School of Public Health in Boston, called the pace of change “staggering” and said the multimodal capabilities are having interactions quickly to support a better patient experience.

“These capabilities will power AI agents for tasks like multilingual patient follow-ups, diagnostic support and administrative automation,” he said. “The real challenge is not the technology itself but adapting health systems to deploy these tools effectively.”

Advertisement

Next Up in Artificial Intelligence

Advertisement