Researchers at the Icahn School of Medicine at Mount Sinai are raising concerns about the safety of ChatGPT Health, a consumer AI tool designed to provide medical guidance, including advice on when to seek emergency care.

In a study published Feb. 23 in Nature Medicine, investigators found the tool under-triaged more than half of serious cases that physicians determined required emergency treatment. The research marks the first independent safety evaluation of the large language model-based system since its launch in January 2026, according to a Feb. 24 press release from Mount Sinai.

To test the system, researchers developed 60 structured clinical scenarios spanning 21 medical specialties. The cases ranged from minor conditions appropriate for home care to true emergencies. Three independent physicians established the correct level of urgency for each scenario using guidelines from 56 medical societies.

Each case was run through the chatbot under 16 contextual variations, including differences in race, gender, social dynamics and barriers to care such as lack of insurance or transportation. In total, the team conducted 960 interactions with the tool and compared its recommendations with physician consensus.

While the system performed well in clear-cut emergencies such as stroke or severe allergic reactions, it struggled in more nuanced situations, the researchers found. In some cases, the chatbot’s written explanation correctly identified warning signs but still advised patients to wait rather than seek emergency care.

In one asthma scenario, for example, the system noted early signs of respiratory failure but did not recommend immediate emergency treatment.

The study also identified concerns about the tool’s suicide-risk safeguards. ChatGPT Health is designed to direct users in high-risk situations to the 988 Suicide and Crisis Lifeline. However, researchers found the alerts appeared inconsistently — sometimes triggering in lower-risk situations while failing to appear when users described specific plans for self-harm.

The researchers emphasized that their findings do not mean consumers should abandon AI health tools altogether. Instead, they called for ongoing independent evaluation, noting that AI systems are frequently updated and performance may change over time.