Study Reveals ChatGPT Health 'Under-Triaged' Medical Emergencies

Mar 5, 2026, 2:30 AM
Image for article Study Reveals ChatGPT Health 'Under-Triaged' Medical Emergencies

Hover over text to view sources

OpenAI's ChatGPT Health, a specialized chatbot launched to provide health guidance, has been found to frequently underestimate the severity of medical emergencies, according to a study published in the journal Nature Medicine. This first independent evaluation of the AI tool raises critical concerns about its safety in making urgent medical decisions.
Researchers tested ChatGPT Health's ability to triage medical cases by feeding it 60 different scenarios that ranged from mild conditions to serious emergencies. The responses were then compared to those of three independent physicians who assessed the urgency based on established medical guidelines. The study aimed to determine whether the chatbot could accurately advise users on whether to seek immediate medical attention.
The results were concerning: ChatGPT Health under-triaged 51.6% of emergency cases, advising users to wait for 24 to 48 hours instead of recommending immediate care. This included critical situations such as diabetic ketoacidosis and respiratory failure, which can be life-threatening if not treated promptly. Dr Ashwin Ramaswamy, the lead author of the study, emphasized that any trained healthcare professional would recognize the need for immediate intervention in these scenarios.
While the chatbot performed well with clear-cut emergencies, such as strokes, it struggled with more nuanced cases where clinical judgment is essential. In one example, ChatGPT Health identified signs of respiratory failure but still advised the patient to wait for further evaluation. This pattern of under-triage raises fears that users may receive a false sense of security in critical situations.
Additionally, the study revealed that ChatGPT Health over-triaged 64.8% of non-urgent cases, often recommending unnecessary appointments for conditions that could be managed at home. For instance, the bot advised a patient with a three-day sore throat to seek a doctor's appointment when home care would suffice. This inconsistency in triage could lead to unnecessary healthcare utilization, further straining an already overwhelmed medical system.
Perhaps most alarmingly, the study highlighted the chatbot's inconsistent responses in scenarios involving suicidal ideation. While ChatGPT Health is programmed to direct users to the 988 Suicide and Crisis Lifeline when users express suicidal thoughts, it failed to do so in high-risk scenarios where patients also provided normal lab results. This inconsistency could have dire consequences, as Dr Ramaswamy pointed out that the failure to activate crisis intervention mechanisms in serious situations is more dangerous than having no safeguards at all.
The results of this study have prompted calls for more rigorous testing and safety standards for AI tools used in healthcare. Dr John Mafi, an associate professor at UCLA Health, emphasized the importance of controlled trials to evaluate the benefits and risks of such technologies before they are widely adopted.
Despite the potential benefits of AI in healthcare, experts underscore that these systems should not replace traditional medical judgment. Dr Ethan Goh, director of ARISE, noted that while chatbots can provide valuable information, they cannot substitute for professional medical advice. The findings from this research serve as a reminder that while AI can assist in healthcare delivery, it is crucial to ensure patient safety remains the top priority.
OpenAI has acknowledged the study and expressed a commitment to improving the safety and reliability of ChatGPT Health before expanding its availability. As technology continues to advance, ongoing evaluation and updates will be necessary to ensure that AI tools effectively support, rather than endanger, patient health.
In conclusion, while ChatGPT Health offers a promising approach to providing medical guidance, significant concerns about its triage capabilities and response to critical situations must be addressed to prevent unnecessary harm and ensure patient safety.

Related articles

FDA's Updated Criteria for Breakthrough Device Designation

The FDA has revised its guidelines for the Breakthrough Devices Program, emphasizing the importance of health equity and accessibility in device designation. This evolution reflects a broader understanding of how medical technologies can address healthcare disparities while ensuring patient safety and efficacy.

Rocket Pharmaceuticals Secures FDA Approval for Kresladi Therapy

Rocket Pharmaceuticals has received accelerated FDA approval for Kresladi, a gene therapy for severe leukocyte adhesion deficiency-I (LAD-I), a rare genetic disorder affecting children's immune systems. This milestone not only offers hope for affected families but also positions Rocket for future growth as it prepares to conduct necessary post-marketing trials.

Regeneron Science Talent Search 2026 Awards Over $1.8 Million to Young Innovators

The 2026 Regeneron Science Talent Search honored top high school scientists, awarding more than $1.8 million in prizes. Connor Hill won the top award for his innovative work in computational mathematics, while other finalists explored fields such as neural science and cancer treatment.

UVA Health Joins Global Effort to Innovate Pediatric Cancer Treatments

UVA Health researchers have partnered with an international consortium to develop advanced treatments for childhood cancers. This collaboration aims to address the lack of investment in pediatric drug development and improve outcomes for patients facing aggressive forms of cancer.

AI-Enhanced Mammograms: A New Frontier in Women's Heart Health

Recent studies indicate that artificial intelligence (AI) can analyze mammograms to assess heart disease risk in women, potentially transforming routine breast screenings into dual-purpose health checks. This innovative approach aims to better identify cardiovascular risks, particularly in women often overlooked in traditional screenings.