TheTurkTime

Study: ChatGPT Health missed emergency referrals

2026-02-24 - 21:12

A new study in Nature Medicine has raised questions about the reliability of ChatGPT Health in high-risk medical situations. Researchers designed 60 standardized clinical scenarios across 21 specialties, ranging from minor ailments to life-threatening emergencies. Three independent physicians assessed each case’s urgency based on guidelines from 56 medical societies, creating a benchmark for comparison.Each scenario was tested under 16 contextual variations, resulting in 960 simulated patient interactions with the AI tool. The research team then evaluated whether the system’s triage recommendations aligned with physician-determined standards of care. Undertriage in critical cases According to researchers at the Icahn School of Medicine at Mount Sinai, ChatGPT Health performed adequately in clear emergency presentations but undertriaged more than half of cases physicians considered to require immediate medical attention. In several instances, the system correctly described alarming symptoms in its explanation yet still reassured users instead of directing them to emergency services. The study’s senior author, Girish N. Nadkarni, stated that the findings exceeded expectations regarding variability. “While we expected some variability, what we observed went beyond inconsistency,” he said, highlighting the potential risks of algorithmic decision-making in urgent care contexts. Concerns over suicide safeguards The researchers also examined the tool’s suicide-crisis protocols. Although ChatGPT Health is designed to guide high-risk individuals toward crisis resources such as the Suicide and Crisis Lifeline, alerts were triggered unevenly. In some lower-risk scenarios, warnings appeared unnecessarily, while in other cases involving explicit descriptions of self-harm planning, the system failed to activate appropriate safeguards. Call for cautious use Despite the shortcomings, the authors did not recommend abandoning AI-driven health tools altogether. Instead, they urged users to seek direct medical evaluation for worsening or concerning symptoms rather than relying exclusively on chatbot advice. Alvira Tyagi, a co-author of the study, emphasized the importance of training both clinicians and the public to critically assess AI outputs. Isaac Kohane, chair of biomedical informatics at Harvard Medical School and not involved in the study, underscored the broader implications. “When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high,” he said, adding that independent evaluation of such systems should become standard practice.

Share this post: