BBC Tested AI Chatbots on News Accuracy. Over Half Failed.

A joint BBC/EBU study found 51% of AI-generated news summaries contained significant issues, including fabricated quotes and invented facts.

Published: February 13, 2026

51%
AI Answers With Significant Issues
19%
Answers Containing Factual Errors
13%
Of Quotes Altered or Fabricated
60%+
Google Gemini Failure Rate

The BBC Just Proved What We Already Suspected

If you have been getting your news from an AI chatbot, you should probably sit down for this. The BBC and the European Broadcasting Union (EBU) ran a straightforward test: they asked four of the most popular AI chatbots to do what millions of people now use them for every single day, summarize the news. The results were not just bad. They were alarming.

Across ChatGPT, Google Gemini, Microsoft Copilot, and Perplexity, 51% of all AI-generated answers contained significant issues of some form. That is not a rounding error. That is not a fringe case. That is a coin flip. Every time you ask an AI to catch you up on what is happening in the world, there is roughly a one-in-two chance that what comes back is wrong, distorted, or entirely made up.

This is not some small startup failing a test. These are the four biggest names in consumer AI. And they cannot reliably tell you what happened today.

The Bottom Line Up Front

When the BBC and EBU, two of the most respected journalism organizations on the planet, formally test AI chatbots on basic news summarization and find a 51% failure rate, the conversation about trusting AI for information is effectively over. The technology is not ready. The data proves it.

The Numbers: A 51% Failure Rate Across All Models

Let's break this down carefully, because the headline number only tells part of the story. The BBC/EBU study did not just flag minor formatting issues or awkward phrasing. When they say "significant issues," they mean the kind of problems that fundamentally undermine the usefulness of the output.

What "Significant Issues" Actually Means

Nineteen percent of all AI-generated answers introduced outright factual errors. We are talking about incorrect statements, wrong numbers, and wrong dates. Not subtle misinterpretations or matters of framing. Hard, verifiable facts that the AI simply got wrong. If you asked an AI to summarize a story about election results, it might tell you the wrong candidate won. If you asked about an economic report, it might give you fabricated figures.

Then there is the quote problem. Thirteen percent of quotes that the AI attributed to BBC articles were either altered or fabricated entirely. The AI did not just paraphrase poorly. In some cases, it invented quotes whole cloth, attributed them to real people, and presented them as sourced journalism. Imagine reading a news summary where a world leader supposedly said something they never said. That is not a hallucination in the technical sense. In any other context, we would call it disinformation.

The remaining issues included misrepresentations, omitted context, and distortions that changed the meaning of the underlying stories. When you add it all up, the chance of getting a clean, accurate news summary from any of these tools is barely better than a coin toss.

The Scorecard: How Each AI Performed

Not all chatbots failed equally. But the gap between the "best" and the worst is not exactly something to celebrate. When the best-performing tools still get it wrong about 40% of the time, calling any of them winners feels generous.

AI Chatbot Problematic Response Rate Grade
Google Gemini Over 60% Worst Performer
Microsoft Copilot ~50% Poor
ChatGPT ~40% Less Bad
Perplexity ~40% Less Bad

Google Gemini was the worst performer, with over 60% of its responses containing significant problems. That means for every five news summaries Gemini produced, at least three of them had meaningful accuracy issues. This is the same tool that Google has been aggressively integrating into search results, email, and its entire product ecosystem.

Microsoft Copilot came in at around 50%, essentially a coin toss for accuracy. Copilot is now built into Windows, Office, and Bing. Millions of people use it every day without ever questioning whether the information it hands them is real.

ChatGPT and Perplexity performed comparatively better at around 40%, but let's be clear about what "better" means here. If your doctor was wrong 40% of the time, you would find a new doctor. If your GPS was wrong 40% of the time, you would throw it out. A 40% failure rate on basic news summarization is not a passing grade. It is a warning label.

The Fabricated Quotes Problem

Of all the findings in the BBC/EBU study, the fabricated quotes statistic might be the most dangerous. Thirteen percent of quotes that these AI tools sourced from BBC articles were either altered from their original form or fabricated entirely.

Think about that for a moment. You ask an AI chatbot to summarize a news article. It gives you a quote from someone involved in the story. It looks real. It has quotation marks. It names a real person. But the person never said it. The AI made it up.

This is not a new problem for AI. We have documented extensive AI misinformation issues across multiple contexts. But the BBC study puts a concrete, peer-reviewed number on it for news specifically. And the implications are staggering.

Why Fabricated Quotes Are Uniquely Dangerous

A wrong date or an incorrect statistic can be checked. But a fabricated quote attributed to a real person can take on a life of its own. It can be shared on social media. It can be cited in arguments. It can shape public opinion about a politician, a CEO, or a scientist. And because it came wrapped in the formatting of a real news summary, most people will never think to verify it.

Newsrooms spend enormous resources on accuracy. A single misquote can end a journalist's career. Meanwhile, AI tools are inventing quotes at a 13% rate and no one is being held accountable. The contrast could not be sharper.

Real Examples: What AI Got Wrong

Abstract numbers are one thing. Specific examples make the scale of the problem impossible to ignore. The BBC/EBU study documented several cases where AI chatbots did not just get the details wrong, they got the entire story backwards or reported events that never happened.

Gemini: Reversed NHS Policy on Vaping

Google Gemini incorrectly stated that the NHS discourages vaping as a smoking cessation tool. The opposite is true. The NHS actively promotes vaping as a less harmful alternative for people trying to quit smoking. This is not a minor nuance. Gemini took an established public health position and flipped it 180 degrees. Anyone who relied on that summary for health guidance received dangerous misinformation.

Copilot: Misrepresented the Gisele Pelicot Rape Case

Microsoft Copilot misrepresented the facts of the rape case involving Gisele Pelicot. This was one of the most significant criminal cases in France, widely covered by every major news outlet. The details of the case were readily available. And Copilot still managed to get it wrong. When AI cannot accurately summarize a story that dominated international headlines for weeks, it calls into question whether it can be trusted with any story at all.

ChatGPT: Reported Dead Hamas Leader as Still Active

ChatGPT reported that assassinated Hamas leader Ismail Haniyeh was still an active leader. Haniyeh was killed in a widely reported event that was covered by every major news organization on the planet. ChatGPT did not just get a detail wrong. It reported a dead man as alive and in charge. For anyone relying on AI to understand the geopolitical landscape of the Middle East, this kind of error is not just inaccurate. It is recklessly misleading.

Each of these examples comes from a different chatbot, a different topic, and a different kind of failure. Gemini reversed a policy position. Copilot mangled the facts of a criminal case. ChatGPT failed to register a major world event. The common thread: none of them can reliably process and summarize the news.

Why This Matters More Than You Think

The scale of this problem is not limited to a laboratory test. Millions of people have already replaced their morning news routine with an AI chatbot. They open ChatGPT or Copilot, type "what happened today," and trust whatever comes back. No cross-referencing. No skepticism. Just blind faith in a tool that, according to the BBC, gets it wrong more than half the time.

This shift has been accelerating. Google has embedded AI-generated summaries directly at the top of search results. Microsoft has woven Copilot into every corner of its productivity suite. OpenAI has turned ChatGPT into a default information tool for over 100 million users. These companies are not positioning AI as an assistant that might be wrong. They are positioning it as the answer.

The Death of Verification

Here is the uncomfortable truth: AI chatbots are training an entire generation of users to stop reading primary sources. When you get a clean, confident, well-formatted summary from an AI, there is no natural reason to click through to the actual article. The summary looks complete. It sounds authoritative. And in many cases, the user will never discover it was wrong, because they will never check.

Traditional news at least has a correction mechanism. If a newspaper publishes an error, there is a corrections column, a retraction, an editor who gets an earful. AI chatbots have none of that. There is no corrections page for ChatGPT. There is no editor at Gemini. The wrong answer simply dissolves into the next query, and the user walks away with bad information lodged in their brain as fact.

If your news source is wrong 51% of the time and has no mechanism for corrections, it is not a news source. It is a misinformation engine with a polished interface.
The core problem with AI news summarization

The Bigger Picture: AI as Information Gatekeeper

The BBC/EBU study lands at a moment when the relationship between AI and journalism is at a breaking point. News organizations around the world are watching their traffic decline as AI tools summarize their reporting without sending readers to the original articles. The irony is brutal: the AI cannot accurately summarize the journalism it is cannibalizing.

Consider the economics of this for a moment. Newsrooms employ fact-checkers, editors, legal reviewers, and experienced journalists to ensure accuracy. A single investigative piece might take months of work. Then an AI tool scrapes that article, mangles the facts, invents a quote that was never said, and serves it up to millions of users who will never visit the original source. The newsroom gets no traffic, no revenue, and no credit. The AI gets the user's attention and trust, despite delivering a version of reality that is wrong half the time.

What Can Be Done

The BBC/EBU study is not just an academic exercise. It is a direct challenge to AI companies. These are the questions that Google, Microsoft, OpenAI, and Perplexity now need to answer:

  • Why are you serving news summaries you cannot verify? If your tool cannot determine whether a quote is real, it should not be presenting quotes at all.
  • Where are the accuracy labels? Every AI-generated news summary should carry a visible disclaimer about its known error rate.
  • When will you stop training users to skip primary sources? The entire design of AI-powered search pushes users away from the actual journalism.
  • How will you compensate news organizations whose content you are summarizing inaccurately and without permission?

Until these questions are answered, the responsible thing to do is simple: do not get your news from AI. Read the actual article. Visit the actual newsroom. Use sources with editorial oversight, correction policies, and accountability.

AI chatbots are impressive tools for many tasks. Summarizing news accurately is demonstrably not one of them. The BBC just proved it with the most rigorous study to date. A 51% failure rate is not a technology that needs improvement. It is a technology that should not be deployed for this purpose at all.

The Final Word

If you are using AI chatbots as your primary news source, you are getting wrong information approximately half the time. You are reading fabricated quotes one out of every eight times. And you have no way of knowing which answers are accurate and which ones are fiction. The BBC tested it. The numbers are clear. AI is not your journalist. And it is not even close.