The Integrity of Science Itself Is at Stake
Imagine you are reading a cutting-edge machine learning paper. The authors cite a 2023 study that perfectly supports their novel architecture. The citation looks legitimate. It has authors, a title, a DOI, and an arXiv identifier. You move on, satisfied. There is just one problem: that study does not exist. The authors never existed. The DOI leads to a dead end. The AI that helped write the paper invented the entire thing, and nobody, not the authors, not the three to five peer reviewers who scored the paper, noticed.
This is not a hypothetical scenario. It is happening right now, at the highest levels of computer science research, at the very conferences that set the direction of the AI field itself. The machines are generating fake scholarship, and the humans who are supposed to be the last line of defense are, increasingly, using the same machines to do their reviewing. What could possibly go wrong?
A lot, as it turns out. GPTZero, the AI detection platform, recently scanned 300 papers submitted to the International Conference on Learning Representations (ICLR) 2026 and uncovered a contamination problem that should alarm anyone who cares about the reliability of published research. The findings paint a picture of an academic system sleepwalking into a credibility crisis, one fabricated citation at a time.
The Numbers: 50 ICLR Papers, 100+ NeurIPS Citations
Out of 300 papers GPTZero scanned from the ICLR 2026 submission pool, 50 contained at least one obvious AI-hallucinated citation. That is one in six. These are not borderline cases or ambiguous formatting errors. These are references to papers that do not exist, written by authors who were never born, with DOIs that resolve to nothing.
And it is not just ICLR. GPTZero also found over 100 hallucinated citations scattered across papers that were accepted to NeurIPS 2025, one of the most prestigious machine learning conferences in the world. Accepted. Published. Cited by other researchers. Already woven into the fabric of the academic record.
Think about what that means for a moment. Downstream researchers are now building on work that cites phantom studies. They are incorporating conclusions supported by evidence that was fabricated by a language model. The corruption does not stay contained. It propagates.
What the Fake Citations Actually Look Like
If you are picturing obviously fake references, something a careful reader would catch immediately, think again. These hallucinated citations are designed (or rather, generated) to look perfectly plausible. They follow the correct formatting conventions. They slot into the bibliography alongside real papers. They sound like real research.
GPTZero's analysis revealed several telltale patterns in the fabricated references:
"John Doe and Jane Smith (2023). Advances in Transformer..."
Pattern 2: Dead-End DOIs and URLs
DOI: 10.1234/fake.2023.00042 leads to 404 Not Found
Pattern 3: Incomplete arXiv Identifiers
arXiv:2305.XXXX, which literally contains filler characters
Fabricated author names like "John Doe and Jane Smith," DOIs and URLs that lead nowhere, and incomplete arXiv IDs like "arXiv:2305.XXXX," complete with filler characters that should have been an immediate red flag. Some of these papers shipped with what amounts to "TODO: add reference" baked right into the bibliography, and it still sailed through review.
The more sophisticated hallucinations are harder to spot. The AI generates plausible-sounding author names (not "John Doe" but something like "Chen, Wei and Patel, Ananya"), invents a paper title that sounds exactly like a real study in the field, and attaches a DOI that is structurally valid but points to nothing. You would have to actually click the link or search for the paper to realize it was fake. Most reviewers, apparently, did not.
The Double-AI Failure Loop
A System Eating Its Own Tail
AI-generated papers containing hallucinated citations are being reviewed by AI-assisted peer reviewers who lack the capacity (or incentive) to verify references. The result is a self-reinforcing cycle where fabricated research enters the record unchallenged.
This is not a theoretical concern. It is the current state of affairs at the top venues in computer science.
Here is where the story turns from troubling to genuinely alarming. The peer review system, the centuries-old process meant to be the immune system of science, is supposed to catch exactly this kind of contamination. Reviewers read the papers, evaluate the claims, check the methodology, and verify that the cited literature actually supports the arguments being made. That is the theory.
In practice, each of these 50 ICLR papers with hallucinated citations had been reviewed by three to five peer experts. Real researchers with PhDs, people who publish in these venues themselves. And they missed the fakes. Every single time. Some of the papers with fabricated citations had average reviewer ratings of 8 out of 10.
Why? Because the reviewers are increasingly using AI themselves. Up to 17% of peer reviews at major computer science conferences are now estimated to be AI-written. So you have a situation where AI-generated text, complete with AI-hallucinated citations, is being evaluated by AI-generated reviews. Nobody in this loop has the ability to verify whether a cited paper is real, because nobody in this loop is doing the kind of manual, tedious, click-the-link-and-check work that used to be the baseline expectation of scholarship.
Researchers have started calling this the "double-AI failure loop," and the name is apt. The machine writes. The machine reviews. The machine approves. Humans sign their names to all of it.
17% of Peer Reviews Are Now AI-Written
Let that number sink in. Nearly one in five peer reviews at the conferences that define the state of the art in artificial intelligence, the reviews that determine which papers get published and which get rejected, which ideas get funded and which get shelved, are estimated to be substantially generated by AI.
This is not reviewers using AI as a spell-checker or a brainstorming tool. This is reviewers feeding the paper into ChatGPT or a similar model and submitting whatever comes back, possibly with light editing. The result is reviews that sound superficially competent, that hit the right structural notes ("the paper presents a novel approach," "the experimental methodology is sound"), but that fundamentally fail to engage with the actual content of the work.
An AI-written review will not catch an AI-hallucinated citation because both the review and the citation were produced by the same class of systems, systems that are excellent at generating text that looks right and terrible at determining whether something is true. The reviewer AI has no way to verify a DOI. It does not check arXiv. It does not know whether "John Doe and Jane Smith (2023)" is a real paper or a confident hallucination. It just reads the bibliography, decides the formatting looks professional, and moves on.
The peer review system was designed for a world where the bottleneck was human attention. Reviewers are volunteers. They are busy. They have their own research. The expectation was always that they would do their best, and that the aggregate judgment of three to five experts would catch most problems. That assumption breaks down entirely when the "experts" are outsourcing their judgment to the same technology that created the problem.
Why Peer Review Failed to Catch This
It is tempting to blame individual reviewers, and there is some accountability there, but the structural problems run deeper. The academic publishing system was already under strain before AI entered the picture. Conference submission volumes have exploded. ICLR, NeurIPS, and ICML each receive thousands of submissions per cycle. Finding enough qualified reviewers to handle the load has been a challenge for years.
Now add AI to both sides of the equation. Submissions are easier to produce (because AI can help write them), so volumes increase further. Reviews are easier to produce (because AI can help write those too), so the quality of individual reviews declines. The system processes more papers faster while understanding each one less. It is the academic equivalent of a factory speeding up the assembly line while removing the quality control inspectors.
- Volume overwhelm: Reviewers are assigned more papers than they can carefully evaluate, creating pressure to skim rather than scrutinize.
- Citation-checking is tedious: Manually verifying every reference in a bibliography requires clicking links, searching databases, and cross-referencing. Almost nobody does this systematically.
- Plausibility is enough: If a citation looks right, sounds relevant, and follows proper formatting, it passes the pattern-matching test that most reviewers apply unconsciously.
- AI-assisted reviewing is incentivized: Reviewers face no penalty for using AI tools, and the time savings are enormous. The trade-off is invisible until something like the GPTZero analysis exposes it.
What This Means for the Future of Research
The implications extend far beyond computer science. If hallucinated citations are this prevalent in CS, the field with the most AI expertise and theoretically the best ability to detect AI-generated content, what is happening in biology, chemistry, medicine, social sciences, and law? Fields where reviewers may be even less equipped to spot AI artifacts?
Published research is the foundation of public policy, medical treatment guidelines, engineering standards, and legal precedent. Every hallucinated citation that makes it into the academic record is a crack in that foundation. Other researchers cite the paper. The fake reference gets laundered through successive layers of citation. Five years from now, a policy paper might trace its reasoning back through a chain of legitimate research that, at some point, rests on a study that never existed.
The misinformation problem that has plagued social media is now infiltrating the one institution that was supposed to be immune to it: peer-reviewed science. The difference is that nobody scrolls past a Nature paper thinking, "I should fact-check this." The entire value proposition of academic publishing is that the checking has already been done.
Except increasingly, it has not been. The checking was done by a machine that cannot check, reviewing a paper written by a machine that cannot think, producing a citation to a study that does not exist. And the humans whose names appear on the reviews, the papers, and the acceptance letters were, in many cases, just along for the ride.
The Real Question
The crisis is not that AI is capable of generating fake citations. We have known that since 2022. The crisis is that the entire system of academic quality control, the one humans built to ensure that published knowledge is actually knowledge, failed to catch it at scale. And unless something fundamental changes about how peer review operates, the contamination will only grow.
Some conferences are beginning to require automated citation verification. Others are experimenting with AI detection tools for both submissions and reviews. But these are band-aid solutions being applied to a structural wound. The academic system needs to reckon with a new reality: the tools it has embraced for productivity are the same tools undermining its credibility. That is not a problem you solve with better detection. That is a problem you solve with a serious conversation about what academic integrity means in an age when a machine can produce a perfectly formatted, completely fictional bibliography in seconds.
For now, the double-AI failure loop keeps spinning. Papers go in. Reviews come out. Citations multiply. And somewhere in the growing mountain of published research, a reference to "John Doe and Jane Smith (2023)" sits quietly in a bibliography, waiting to be cited again.
More on AI's Credibility Crisis
This is part of a growing pattern of AI systems generating confident, plausible, and completely false outputs.
Confidence vs. Accuracy AI Misinformation 2026 AI Ethics Crisis