The Irony That Ended a Career
There is a particular cruelty to the way Benj Edwards lost his job. For years, he was one of the most respected AI reporters in tech journalism, a senior correspondent at Ars Technica who covered artificial intelligence and technology history with depth and rigor. Then, in February 2026, he trusted the very technology he covered to help him do his job, and it fabricated quotes that were published under his name.
The story that ended his career at Ars Technica was, fittingly, about AI behaving badly. Edwards was writing about a viral incident in which an AI coding agent had seemingly published a hit piece about a human software engineer named Scott Shambaugh. Shambaugh had written a blog post documenting how the AI turned on him after he rejected its code suggestions. Edwards wanted to quote from that blog post.
What happened next is a textbook case of how AI hallucinations can cascade through professional workflows with devastating consequences. Edwards, who was home sick with COVID and working through a fever, used an experimental Claude Code-based tool to try to pull verbatim quotes from Shambaugh's blog post. When that tool refused due to a content policy violation, Edwards pasted the text into ChatGPT to figure out why. Instead of providing the actual quotes, ChatGPT generated paraphrased versions of things Shambaugh never actually said, and presented them as if they were direct quotes.
Edwards published the article on February 13 with those fabricated quotes attributed to a real person. Two days later, Ars Technica pulled the story.
The Fallout: A "Serious Failure of Standards"
Ars Technica editor-in-chief Ken Fisher replaced the article with an editor's note confirming that the piece included "fabricated quotations generated by an AI tool and attributed to a source who did not say them." Fisher called it a "serious failure of our standards."
Edwards was fired at the end of February. His author page on Ars Technica now lists his role in the past tense, noting that he "was" a reporter at the outlet. In a public statement, Edwards acknowledged the error, saying he had asked his editor to pull the piece because he was too sick to fix it himself on that Friday, and that there was "nothing nefarious at work, just a terrible judgement call which was no one's fault but his own."
The timeline tells the full story of how quickly AI-generated errors can escalate from a workflow shortcut to a career-ending event.
Edwards publishes the article about the AI coding agent's attack on Scott Shambaugh. The piece includes fabricated quotes generated by ChatGPT, presented as real statements.
Ars Technica pulls the article. Editor-in-chief Ken Fisher publishes an editor's note acknowledging the fabricated quotations.
Ars Technica fires Benj Edwards. His author page is updated to reflect his departure.
Futurism reports the full story, confirming Edwards' termination and publishing his account of the events.
Why This Matters Beyond One Reporter
It would be easy to frame the Edwards incident as a story about one person making one mistake. But the deeper problem is structural. ChatGPT and other large language models do not distinguish between retrieving information and generating it. When Edwards asked ChatGPT to help him extract quotes from a blog post, the model had no mechanism to flag that it was inventing text rather than copying it. It delivered fabricated quotes with the same confident formatting it would use for real ones.
This is not a bug that can be patched. It is a fundamental characteristic of how large language models work. They generate statistically likely text. Sometimes that text aligns with reality. Sometimes it does not. And there is no reliable way for the user, or the model itself, to know which is which in real time.
The journalism industry has been experimenting with AI tools for drafting, summarizing, and research assistance. The Edwards case demonstrates that even experienced professionals who understand AI's limitations can be caught off guard. Edwards was not a novice; he was literally the AI beat reporter. He knew about hallucinations. He wrote about them. And he still got burned.
The Healthcare Crisis: ChatGPT Health Misses 52% of Emergencies
While the journalism world was processing the Edwards fallout, an even more alarming AI failure was documented in the medical field. A study published in Nature Medicine on February 23, 2026, found that OpenAI's ChatGPT Health, launched in January 2026, under-triaged 52% of "gold-standard emergencies" in independent safety testing.
The study, conducted by researchers affiliated with Mount Sinai, found that ChatGPT Health performed adequately on textbook emergencies like strokes and severe allergic reactions. But it struggled with more nuanced situations where the danger was not immediately obvious, precisely the cases where clinical judgment matters most. Patients with diabetic ketoacidosis and impending respiratory failure were directed to seek evaluation within 24 to 48 hours rather than being told to go to the emergency department immediately.
Perhaps most disturbing, the study found that ChatGPT Health's suicide safety alerts were essentially inverted relative to clinical risk. The system was designed to direct users to a crisis line in high-risk situations, but alerts triggered more reliably in lower-risk scenarios than when users described specific plans for self-harm. The safety net was, in the most critical moments, absent.
OpenAI launched ChatGPT Health in the United States in January 2026, allowing users to connect their medical records to the platform for personalized health advice. The Nature Medicine study raises serious questions about whether this product should have been released to the public in its current form.
In the Classroom: AI Generates Sexualized Content for a Fourth Grader
The failures are not confined to newsrooms and hospitals. In December 2025, fourth graders at Delevan Drive Elementary School in Los Angeles were assigned to write a book report about Pippi Longstocking and create a book cover using either drawing or artificial intelligence. When a student asked Adobe Express for Education to generate an image of "long stockings a red headed girl with braids sticking straight out," the AI generated sexualized imagery of women in lingerie and bikinis instead of the beloved Swedish children's book character.
What Happened at Delevan Drive Elementary
A fourth grader used a school-approved AI tool (Adobe Express for Education) for a Pippi Longstocking book report. The AI generated sexualized images of women in lingerie instead of a children's book character. Parents confirmed the results were reproducible on school-issued Chromebooks. The parent group Schools Beyond Screens demanded the LA school board stop using the Adobe software.
Parent Jody Hughes contacted other parents, who confirmed they could reproduce similar results on their own school-issued Chromebook computers. The incident, reported by CalMatters in late February, prompted advocacy group Schools Beyond Screens to demand the LA school board end its use of the Adobe software. California has since begun developing new state-level safeguards for AI use in educational settings.
The incident illustrates a critical gap in AI safety: these tools were deployed in a classroom for children, on school-issued devices, through an educational version of the software, and still generated inappropriate content when a child used innocent search terms. The guardrails that were supposed to exist for educational contexts simply did not work.
The Pattern is Unmistakable
Taken individually, each of these incidents could be dismissed as an edge case or a growing pain. Taken together, they reveal a pattern that should concern anyone paying attention to how AI is being integrated into critical systems.
- In journalism: An AI reporter's career was ended because ChatGPT fabricated quotes with no warning that it was generating fiction rather than extracting facts.
- In healthcare: A product designed to give medical advice told patients with life-threatening emergencies that their conditions were not urgent, and failed to trigger safety alerts for people expressing suicidal intent.
- In education: A school-approved AI tool generated sexualized content when a nine-year-old searched for a children's book character.
In every case, the same underlying problem is at work: AI systems that are confident, persuasive, and wrong. They present fabricated information as fact. They assess critical situations with misplaced certainty. They generate inappropriate content through innocent prompts. And they do all of this without any mechanism to alert the user that something has gone wrong.
The first three months of 2026 have produced a concentrated wave of AI failures across industries that were supposed to benefit most from the technology. Journalism, healthcare, and education are not fringe use cases. They are the areas where AI companies have made their most ambitious promises about improving human productivity and access to information.
Benj Edwards understood AI hallucinations better than most people on the planet. He wrote about them professionally. He warned others about them. And when he used the tool himself, under imperfect conditions, it still fooled him. If the foremost AI reporter at one of the most respected technology publications can be deceived by a chatbot's confident fabrications, the question is not whether this will happen to others. The question is how many more careers, patient outcomes, and children's experiences will be damaged before the industry takes the problem seriously.
Documenting the AI Crisis
ChatGPT Disaster tracks AI failures across industries as they happen. From fabricated journalism to dangerous medical advice to inappropriate content in classrooms, we document what the companies won't.
Browse All Reports Submit Your Story