SCIENTIFIC INTEGRITY CRISIS

21% of Peer Reviews at the World's Top AI Conference Were Written Entirely by AI, and Nobody Noticed Until a Researcher Got Suspicious

February 14, 2026

Pangram Labs scanned all 75,800 peer reviews submitted to ICLR 2026 and found 15,899 were fully AI-generated. More than half showed signs of AI assistance. The institution responsible for validating AI research is being undermined by the very technology it studies.

There is a certain poetic horror in the fact that the International Conference on Learning Representations, one of the most prestigious machine learning conferences in the world, just discovered that one in five of its peer reviews were written entirely by AI. The researchers who build these systems could not even detect when the systems were being used to evaluate their own work. If that doesn't capture the absurdity of where we are with artificial intelligence in 2026, nothing will.

ICLR 2026 received 19,490 manuscript submissions and 75,800 peer reviews. That is a staggering volume. And buried within that mountain of academic feedback, 15,899 reviews, a full 21%, were generated entirely by AI tools like ChatGPT and custom large language models. Not "AI-assisted." Not "partially drafted with AI help." Fully generated. A machine read the paper and a machine wrote the review and a human submitted it with their name on it.

15,899 Peer reviews at ICLR 2026, out of 75,800 total, were flagged as fully AI-generated by Pangram Labs detection tools

How a Carnegie Mellon Researcher's Suspicion Unraveled the Entire Peer Review System

The discovery started the way many academic scandals do: someone read their reviews and something felt off. Graham Neubig, a researcher at Carnegie Mellon University, received feedback on a manuscript that struck him as unusually verbose and oddly structured. The review requested non-standard statistical analyses that didn't quite match the paper's methodology. It read like something that had been generated by a system trained to sound academic rather than something written by a person who had actually read and understood the work.

Neubig posted on X offering a reward to anyone who could systematically scan the conference's submissions for AI-generated text. Max Spero, CEO of detection tool developer Pangram Labs, responded the next day. What Spero found was not a handful of lazy reviewers cutting corners. It was a systemic crisis.

Pangram screened all 19,490 studies and all 75,800 peer reviews submitted to ICLR 2026. The results were devastating. Twenty-one percent of reviews were flagged as fully AI-generated. More than half of all reviews showed at least some signs of AI assistance. And 199 of the manuscripts themselves, about 1% of submissions, were found to be fully AI-generated.

50%+ Of all 75,800 peer reviews showed signs of AI use, even when not fully AI-generated

The Scale of the Problem Is Far Worse Than Previous Estimates Suggested

This isn't the first time AI-generated reviews have been detected at academic conferences. Earlier studies estimated that up to 17% of reviews at top AI conferences in 2023 and 2024 involved some form of AI assistance. But the ICLR 2026 numbers represent a significant escalation. The rate has jumped from 17% with "some AI involvement" to 21% that are fully machine-written, with over half showing at least partial AI use.

The affected conferences extend well beyond ICLR. NeurIPS, ICML, CVPR, AAAI, and ACL have all faced similar concerns. Major machine learning conferences have seen exponential submission growth, with NeurIPS and ICML receiving over 10,000 papers by 2025. The reviewer pool has not kept pace with submissions, creating exactly the kind of pressure that incentivizes AI shortcuts.

And that is the structural problem nobody wants to talk about. Academic reviewers are not being paid. They are overworked. Many of them are reviewing dozens of papers while managing their own research, teaching, and publication schedules. When you create a system where thousands of hours of unpaid labor are required to keep the gears turning, and then you hand those laborers a tool that can do the work in minutes, the outcome is predictable.

What AI-Generated Peer Reviews Actually Look Like and Why They Passed Undetected

The hallmarks of AI-generated reviews, once you know what to look for, are surprisingly consistent. They tend to be overly formal. They use verbose, hedge-heavy language that sounds sophisticated but says very little. They request statistical analyses that are technically reasonable but don't quite match the specific methodology of the paper being reviewed. They praise the work in generic terms while raising objections that could apply to almost any paper in the field.

In other words, they sound exactly like a mediocre peer review written by a human who didn't read the paper carefully. And that is precisely why they went undetected for so long. The baseline quality of peer review has been declining for years. Rushed, superficial reviews are common enough that an AI-generated review doesn't stand out from the noise. The machines aren't getting better at impersonating humans. The humans have gotten bad enough that the machines can blend in.

The Self-Eating Snake

Consider the full picture: AI researchers submit papers about artificial intelligence to a conference about artificial intelligence, where 21% of the reviews evaluating those AI papers are themselves written by AI. The field responsible for understanding and advancing this technology cannot even police its own use within its own institutions. If AI researchers can't manage AI's impact on their own peer review system, what hope does anyone else have?

ICLR's Response: Stricter Guidelines and Mandatory AI Use Declarations

ICLR leadership, to their credit, moved quickly once the Pangram analysis was published. The conference implemented stricter guidelines including mandatory AI use declarations for all reviewers. Enhanced verification processes are being developed for future conferences. Other major venues, including NeurIPS and AAAI, are reportedly implementing similar measures.

But there is a fundamental enforcement problem. How do you detect AI-generated text that is becoming increasingly difficult to distinguish from human-written text? Pangram's tools caught 21% as "fully AI-generated," but the 50%+ figure for "showing signs of AI use" represents a vast gray zone where the line between "AI-assisted" and "AI-written" blurs into meaninglessness. If a reviewer uses ChatGPT to draft their review and then edits it for 10 minutes, is that an AI review or a human review? What about 30 minutes of editing? An hour?

The honest answer is that mandatory declarations are largely unenforceable. They rely on self-reporting, and the entire point of submitting AI-generated reviews is that the reviewer doesn't want anyone to know they didn't do the work. Declarations without detection are theater.

199 Manuscripts Were Also Fully AI-Generated, Meaning Machines Are Now Writing and Reviewing Science

Buried somewhat in the coverage of the review scandal is an equally disturbing finding: 199 of the 19,490 manuscripts submitted to ICLR 2026, roughly 1%, were found to be fully AI-generated. That's a small percentage, but it represents a new frontier. These aren't students using ChatGPT to clean up their prose. These are entire research papers, with hypotheses, methodologies, experiments, and conclusions, generated by AI and submitted to a top-tier conference as original scientific work.

Put those two findings together and the picture is genuinely surreal. At ICLR 2026, AI-generated papers were submitted for evaluation, and in some cases, those AI-generated papers were reviewed by AI-generated reviews. Machines writing science. Machines evaluating science. Humans nowhere in the loop except as the name on the submission form.

This is not the future AI researchers warned about. Nobody imagined that the first institution to be hollowed out by artificial intelligence would be the AI research community itself.

What This Means for the Credibility of Published AI Research Going Forward

Peer review is the foundation of scientific credibility. It is the process by which the academic community decides what counts as valid, rigorous, and publishable research. When that process is corrupted, the entire downstream knowledge chain is compromised. Policies informed by published research, technologies built on published findings, funding decisions based on publication records, all of it rests on the assumption that peer review means something.

The ICLR scandal doesn't just raise questions about one conference. It raises questions about every paper that has been peer-reviewed at any major AI conference in the past two years. If 21% of reviews at ICLR 2026 were AI-generated, what was the percentage at NeurIPS 2025? ICML 2025? AAAI 2025? Nobody knows, because nobody was looking until Graham Neubig got suspicious about one review of one paper.

The scientific peer review system was already strained before AI made it possible to automate the laziest version of the process. Now the strain has become a fracture, and the community responsible for building and studying artificial intelligence is the first to watch its own quality-control mechanism collapse under the weight of its own creation.

The machines have learned to evaluate science. They just haven't learned to do it well. And the humans, apparently, have decided that's good enough.

AI Is Destroying the Institutions That Built It. We're Keeping Score.

From peer review fraud to hallucinated research, the AI industry's credibility crisis is accelerating.

Browse All AI Disasters AI Hallucinated Citations Safer Alternatives