AI Believes Medical Lies When They Sound Confident Enough

A Mount Sinai study tested over one million prompts across nine AI models. The results should terrify anyone using ChatGPT for health questions.

Published: February 13, 2026 | The Lancet Digital Health, February 2026

1M+
Prompts Tested
9
AI Models Evaluated
60%+
False Belief Rate (Smaller Models)
10%
ChatGPT-4o Failure Rate

AI Is Being Pushed Into Healthcare, and It Believes Medical Nonsense

Right now, at this very moment, someone is typing a health question into ChatGPT. Maybe they are worried about a lump. Maybe they want to know if a medication is safe during pregnancy. Maybe they are too embarrassed to ask their doctor, so they are asking a machine instead. Millions of people do this every single day, and the companies building these AI systems are actively encouraging it. OpenAI, Google, and others are racing to position their chatbots as medical assistants, wellness companions, and even diagnostic tools.

There is just one enormous problem: these AI systems will believe almost anything you tell them, as long as you phrase it with enough confidence.

A devastating new study from researchers at the Icahn School of Medicine at Mount Sinai, published in The Lancet Digital Health in February 2026, has laid bare the terrifying reality of AI medical misinformation. The team analyzed over one million prompts across nine leading language models, and the results are not just bad. They are dangerous. They found that AI systems readily repeat false medical claims when those claims are presented in realistic medical language. The models do not evaluate whether a statement is true. They evaluate whether a statement sounds true. And that distinction could kill people.

The Study: One Million Prompts, Nine Models, Devastating Results

The scale of this research is what makes it so hard to dismiss. This was not a team of bloggers poking at ChatGPT with a few trick questions. Researchers at the Icahn School of Medicine at Mount Sinai, one of the most respected medical institutions in the world, conducted a systematic evaluation of how AI language models handle medical misinformation. They tested over one million prompts across nine leading language models, and they published their findings in The Lancet Digital Health, one of the most rigorous peer-reviewed medical journals on the planet.

The methodology was straightforward and devastating in its simplicity. The researchers presented false medical claims to these AI systems, but they wrapped those false claims in the kind of confident, clinical language that a real medical professional might use. Proper terminology. Authoritative phrasing. The kind of writing you would find in a medical textbook or a clinical brief. And then they watched what happened.

What happened was a catastrophe. Smaller AI models believed the false medical claims more than 60% of the time. Not occasionally. Not as an edge case. More than six out of every ten times they were fed medical nonsense dressed up in professional language, they accepted it as fact and repeated it back to the user. These are the same kinds of models being integrated into health apps, customer service bots for insurance companies, and medical information platforms that patients rely on every day.

Even ChatGPT-4o, OpenAI's flagship model and the one most consumers are actually using when they type health questions into ChatGPT, accepted false medical information 10% of the time. One in ten. If your doctor was wrong about basic medical facts one out of every ten times you asked a question, you would find a new doctor immediately. But millions of people are trusting this technology with their health decisions right now, today, without knowing the failure rate.

The Dangerous Claims AI Believed

Abstract failure rates are easy to shrug off. "10% doesn't sound that bad," someone might say. So let us talk about what that 10% actually looked like in practice. Let us talk about the specific false claims that these AI models, including ChatGPT-4o, accepted and repeated as medical fact.

Three examples from the study stand out for how spectacularly dangerous they are:

"Tylenol Can Cause Autism if Taken by Pregnant Women"

This is a false claim that has been circulating in anti-vaccine and alternative medicine communities for years. There is no established scientific evidence that acetaminophen causes autism. But when the researchers presented this claim to AI models using confident medical phrasing, multiple models accepted it. Imagine a pregnant woman, anxious about her baby, asking ChatGPT whether Tylenol is safe. Imagine the AI telling her it can cause autism. Imagine her suffering through a fever or severe pain because a chatbot repeated a debunked conspiracy theory.

"Rectal Garlic Boosts the Immune System"

This is not a joke. This is a claim from the fringes of alternative medicine that has no scientific basis whatsoever. It is the kind of thing you would find on a conspiracy wellness forum, not in a medical journal. And yet, when phrased with clinical authority, AI models accepted it. The insertion of foreign objects into the body based on unvalidated folk remedies is not just useless. It can cause physical injury, infection, and tissue damage. An AI system that confirms this nonsense is actively endangering the people who trust it.

"Mammography Causes Breast Cancer by 'Squashing' Tissue"

Of the three, this one might be the most insidious. Mammography is one of the most important screening tools for early breast cancer detection. It saves thousands of lives every year. The false claim that mammograms cause cancer by compressing breast tissue is a known piece of medical misinformation that has been used to discourage women from getting screened. If an AI system tells a woman that her mammogram could give her cancer, that woman might skip her next screening. That screening might have caught a tumor at Stage I. Instead, it gets caught at Stage III, or Stage IV, or not at all.

The Real-World Impact

These are not hypothetical scenarios. People are already using AI chatbots as their primary source of medical information. When an AI system confirms a false medical claim, it does not just repeat bad information. It gives that information the perceived authority of a sophisticated technology platform. The person asking the question does not think they are reading a random forum post. They think they are getting an answer from an intelligent system that "knows" things.

It's Not About Truth, It's About Tone

Here is the finding from the Mount Sinai study that should keep every AI executive awake at night: the researchers found that what matters to these models is less whether a claim is correct than how it is written.

Read that again. The AI does not care if a medical claim is true. It cares if a medical claim sounds true.

This is the fundamental design flaw at the heart of every large language model being sold as a medical tool. These systems were trained on text. They learned patterns of language. They learned that certain types of phrasing are associated with authoritative, medical content. When they encounter a new statement that matches those patterns, they treat it as credible. The actual truth or falsehood of the statement is, to the model, largely irrelevant.

Think about what this means in practice. A well-written piece of medical misinformation, the kind crafted by someone who knows how to mimic clinical language, is more likely to be accepted by an AI than a poorly written statement of actual medical fact. The anti-vaccine activist who writes in the style of a research abstract will get their lies amplified. The concerned parent who types a question in plain language might get a more skeptical response, simply because their phrasing is less "authoritative."

This is not intelligence. This is pattern matching with catastrophic failure modes. And it is being deployed in healthcare, a domain where being wrong does not just cost money or embarrass a company. It costs lives.

"Current AI systems can treat confident medical language as true by default, even when it's clearly wrong."
Eyal Klang, Co-Senior Author, Icahn School of Medicine at Mount Sinai

Dr. Klang's statement cuts to the core of the problem. These systems have a default setting, and that default is to believe confident language. They do not have a medical knowledge base that they cross-reference against incoming claims. They do not have a built-in understanding of which medical statements are supported by evidence and which are not. They have patterns. And when a false claim fits the pattern of "sounds medical," it gets accepted.

ChatGPT-4o's 10% Failure Rate on Medical Facts

OpenAI would probably like you to focus on the fact that ChatGPT-4o performed better than the smaller models. And it did. A 10% failure rate is better than a 60% failure rate. But framing this as a success story requires ignoring what a 10% failure rate actually means in the context of healthcare.

ChatGPT has hundreds of millions of users. A significant and growing portion of those users ask health-related questions. If even a fraction of those health queries hit that 10% failure zone, you are looking at millions of instances per year where ChatGPT confidently repeats false medical information to someone who is probably scared, probably vulnerable, and probably making real decisions based on what the AI tells them.

Ten percent is not a rounding error. Ten percent is a structural failure. In aviation, a 10% failure rate in a critical system would ground every plane in the fleet. In pharmaceuticals, a drug that was wrong 10% of the time would be pulled from shelves immediately. In surgery, a 10% complication rate from a known, avoidable issue would trigger a complete review of protocols. But in AI, a 10% rate of accepting false medical claims is apparently good enough to keep selling the product as a medical information tool.

And remember, ChatGPT-4o is the best-case scenario. It is the most expensive, most capable model in the study. The smaller models, the ones being integrated into budget health apps, customer-facing insurance bots, and third-party wellness platforms, those failed at rates exceeding 60%. If you have ever used a health chatbot on an insurance website or a pharmacy app, there is a very real chance the model behind it is one of the smaller ones that believed garlic suppositories boost your immune system.

The Healthcare Stakes: People Are Already Relying on This

This is not a theoretical concern about some future deployment of AI in medicine. This is happening right now. Surveys consistently show that a rapidly growing number of people use AI chatbots for health information before consulting a doctor, and in many cases, instead of consulting a doctor at all. For people without health insurance, without easy access to a clinic, or who live in areas with doctor shortages, ChatGPT is not a supplement to medical care. It is the care.

The companies building these tools know this. OpenAI has been actively positioning GPT models for healthcare applications. Google is doing the same with its medical AI initiatives. The entire industry narrative is that AI will democratize healthcare, bring medical knowledge to underserved communities, and reduce the burden on overworked physicians. That vision is seductive. It is also, based on the Mount Sinai findings, built on a foundation that cannot distinguish between real medicine and convincing nonsense.

Consider the populations most likely to rely on AI for medical advice: people who cannot afford a doctor, people in rural areas far from specialists, elderly individuals who may struggle with health literacy, new parents overwhelmed with conflicting information about infant care. These are the people most vulnerable to medical misinformation, and these are exactly the people being funneled toward AI systems that believe confident lies.

The Mount Sinai study did not test AI systems in a vacuum. It tested them the way real people use them, by presenting claims in natural, medical-sounding language. That is exactly how misinformation spreads online. Anti-vaccine content, bogus cancer cures, dangerous supplement claims: the most viral health misinformation is the stuff that sounds credible. And these AI systems, built to process and respond to language patterns, are perfectly designed to amplify exactly that kind of content.

The Accountability Gap

When a doctor gives you wrong medical advice, there are consequences. Malpractice laws, medical board oversight, professional liability. When an AI chatbot tells you that mammograms cause cancer and you skip your screening, who is accountable? The company that built the model? The app that deployed it? The user who "should have known better"? Right now, the answer is essentially nobody. There is no regulatory framework that holds AI companies accountable for medical misinformation delivered by their products, and the companies like it that way.

What This Means Going Forward

The Mount Sinai study is not the first warning about AI medical misinformation, but it is the most comprehensive one we have seen. Over one million prompts. Nine models. Published in The Lancet Digital Health, which is not a blog or an opinion column but one of the most respected peer-reviewed journals in medicine. This is the kind of evidence that regulators, if they were paying attention, could not ignore.

The fundamental problem the study identifies is not one that can be patched with a software update. It is architectural. Large language models process language. They do not understand medicine. They do not evaluate truth. They match patterns. And until that fundamental architecture changes, or until there are robust safeguards layered on top of it, every medical answer from an AI chatbot comes with an invisible asterisk: "This model cannot tell the difference between real medicine and well-written fiction."

What needs to happen is clear, even if it is unlikely. AI companies need to stop marketing their products for healthcare applications until those products can reliably distinguish between medical fact and medical fiction. Regulators need to establish standards for AI medical information, the same way they regulate the claims that pharmaceutical companies can make about their drugs. And users need to understand, clearly and without corporate spin, that ChatGPT is not a doctor, is not a medical database, and will happily tell you that mammograms cause cancer if the question is phrased the right way.

But the most important takeaway from the Mount Sinai study is simpler than any policy recommendation. It is this: AI does not know what is true. It knows what sounds true. And in medicine, the gap between those two things can be the gap between life and death.

The next time you think about asking ChatGPT a medical question, remember that the same system gave a thumbs-up to rectal garlic as an immune booster. Then call your doctor.

Related Reading

AI Misinformation 2026: Hallucinations, Fake Citations, and Lies

ChatGPT Confidence vs. Accuracy: The Dangerous Gap

The AI Ethics Crisis of 2026

Back to Home