The Citation Crisis
In a finding that should alarm every researcher, clinician, and patient, a massive audit of biomedical literature has revealed that nearly 3,000 peer-reviewed medical papers contain references that simply do not exist. The study, led by Columbia University and published in The Lancet on May 7, 2026, scanned 2.5 million papers spanning three years and found that fake citations have surged 12-fold since 2023—coinciding with the widespread adoption of AI writing assistants.
KEY STAT
In the first seven weeks of 2026, 1 in 277 PubMed-indexed papers contained at least one fabricated reference—up from 1 in 2,828 in 2023.
The implications are profound: treatment guidelines may be built on a foundation of nonexistent studies. "A medical professional or clinical guideline developer has no way of knowing that the evidence they are relying on does not exist," warns Maxim Topaz, PhD, associate professor at Columbia University's School of Nursing and Data Science Institute, who led the research. In one extreme case, a single paper had 18 out of 30 references fabricated—meaning 60% of its bibliography pointed to papers that were never published.
The audit, part of the CITADEL project (Citation Integrity Detection at Scale), represents the first systematic attempt to quantify the scale of “phantom citations” in biomedicine. And the numbers don’t lie: the contamination is accelerating, and the vast majority of affected papers have seen no corrective action from publishers.
How the Audit Uncovered the Problem
The Columbia team developed an automated pipeline that scanned 2.5 million open-access biomedical articles from PubMed Central, covering publications from January 1, 2023, to February 18, 2026. The review encompassed 126 million structured references, of which 97.1 million had verifiable Digital Object Identifiers (DOIs) or PubMed IDs—enough to validate their existence.
Using large language models (LLMs) and cross-referencing against four scholarly databases (PubMed, Crossref, OpenAlex, and Google Scholar), the system flagged any reference whose title could not be found. The researchers deliberately designed the method to be conservative—only fabrications that couldn’t be explained by formatting errors or indexing lags were counted.
| Year | Papers in Dataset | Fabricated Refs Found | Affected Papers | Rate (per 10k papers) |
|---|---|---|---|---|
| 2023 | (partial year) | N/A (baseline) | ~1 in 2,828 papers | ~4 |
| 2024 | (full year) | rising | >1 in 1,000 | >10 |
| 2025 | (full year) | (not broken out) | ~1 in 458 papers | ~21.8 |
| 2026 (Jan–Feb 18) | partial | 4,046 | ~1 in 277 papers | 56.9 |
In total, the audit identified 4,046 fabricated references scattered across 2,810 papers. That means 2,564 papers had one or two fake citations, while 246 papers had three or more—suggesting systematic issues rather than isolated clerical errors. To validate the methodology, the team manually reviewed a random sample of 500 flagged references; independent reviewers confirmed fabrication in 70% of cases (7 out of 10).
"The findings are conservative underestimates," Topaz cautions. "What we identified is the lower bound of true prevalence. We’re scratching the tip of the iceberg." Kathryn Weber-Boer of Digital Science agrees, calling it a "solid first initial contribution" while noting that reliance on Google Scholar for verification likely causes undercounting, as some fabricated references do appear there without tracing back to real publications.
The Timeline: When Did It Take Off?
The fabrication rate remained stable throughout 2023 at approximately 4 fake citations per 10,000 papers. But something shifted in mid-2024. The rate began climbing rapidly, reaching ~51 per 10,000 by late 2025 and hitting 56.9 per 10,000 in early 2026—a more than 12-fold increase in just two years.
Fabricated Citations per 10,000 Papers (Yearly Average)
(partial)
Note: 2026 data covers Jan 1–Feb 18 only; bars scaled to match visual trend.
The researchers point to a clear inflection: mid-2024, when generative AI writing assistants like ChatGPT moved from experimental tools to everyday workflow for many authors. While correlation is not causation, the timing matches the commercialization and mass adoption of large language models in research environments.
"Whether they’re fabricated by a computer or fabricated by a human being, that’s a question that remains open," says Weber-Boer. "But the growth in the problem suggests that there is a generative AI component." The acceleration continued unchecked through 2025 and into 2026, even as some journals began experimenting with AI detection tools.
Critics note that citation fabrication predates AI—paper mills have long inserted fake references to meet quotas. However, the unprecedented speed and scale at which modern LLMs can produce plausible yet nonexistent citations have lowered the barrier. "The 'damage is already done,'" Topaz told Retraction Watch. "The contamination of over 4,000 fabricated references does not go away when the AI gets better."
How Fake Citations Slip Through
Not all fabricated citations look alike. Experts describe three recurring patterns, each with different detection challenges:
| Type | Description | Detection Difficulty |
|---|---|---|
| Phantom | Fully invented: plausible author, real journal name, properly structured DOI, but the paper does not exist. | High—looks legitimate at a glance; DOI resolves to “not found." |
| Chimera | Combines real elements from different papers (real first author, real journal, correct volume/issue), but the specific title doesn’t exist. | Very High—components individually verifiable; only exact title/DOI search reveals mismatch. |
| Corrupted | Real paper exists, but details are wrong (year off by one, misspelled author, wrong page numbers), making it unrecoverable by standard lookup. | Medium—can be verified with careful cross-check; may be honest AI error rather than deliberate. |
One alarming example from the audit: a 2025 article on surgical anastomotic techniques in an open-access oncology journal contained 18 fabricated references out of 30 verified—a 60% fabrication rate in its bibliography. Papers with three or more fake citations were common, indicating systematic reliance on AI-generated references throughout the writing process.
Review articles, which synthesize existing evidence, faced an even greater risk: they exhibited a 57% higher fabrication rate than original research papers, likely because they involve longer bibliographies and broader literature searches—precisely the scenario where AI assistants are most heavily used.
The fact that many of these citations are chimeric or corrupted means they can easily evade automated checks that only validate DOIs. As one expert noted, "Google Scholar is not a reliable source" because some fabricated references do appear there but lack traceable publication records, creating false confidence.
Publisher Responses and the Retraction Debate
At the time of the audit, 98.4% of papers with fabricated references had received no publisher action—a figure that underscores the scale of the cleanup challenge. When approached by Retraction Watch, major publishers offered mixed responses:
| Publisher | Current Measures | Position on Fabricated Citations |
|---|---|---|
| Science | Automated reference checking at submission; no fabricated citations found to date in print. | Preventive; content blocked pre-publication. |
| NEJM / JAMA | Validation tools in place; authors attest to citation accuracy. | Reliance on author attestation; limited specifics on outcomes. |
| Taylor & Francis | Specialist staff and processes; problematic citations returned to authors; systematic issues can lead to rejection. | Active screening; editorial intervention at manuscript stage. |
| PLOS | Exploring system-wide reference integrity screening. | Does not automatically equate fabricated citations to misconduct; intent matters, institutional level decides. |
But the most heated discussion concerns retractions vs. corrections. Former JAMA editor Howard Bauchner and former JAMA Pediatrics editor Frederick Rivara argue in a commentary that any paper with a hallucinated reference should be retracted. "Researchers incur responsibility for the entire content of that paper," they write. "Retraction of these manuscripts might lead to greater scrutiny of references by authors."
Others disagree. NIH integrity researcher David Resnik and Topaz himself suggest a proportional response: retract only when the fabricated reference is central to the paper’s conclusions; otherwise issue a correction. Since 91% of affected papers had only one or two fabricated references, many likely stem from "honest mistakes by authors who used AI tools without verifying the output."
Skeptics also question the study’s methodology. Cochrane’s Ella Flemyng calls the findings "serious" but notes "we are lacking considerable details about the methods." Northwestern’s Mohammad Hosseini labels the analysis "simplistic" for not distinguishing between citations that are critical to a study’s claims and those that are incidental. Hosseini sees the current study as "low-hanging fruit; the tip of the iceberg," warning that a larger, harder problem is AI-generated citations that are inaccurate or biased but not entirely fictitious.
Toward a Solution: Recommendations from the Study
The authors of the CITADEL audit propose a multi-pronged response to stem the tide of fabricated citations:
- Fight AI with AI: Publishers should integrate automated reference verification into submission workflows before peer review begins. Tools could flag DOIs that don’t resolve and cross-check titles across databases in real time.
- Metadata for integrity: Indexing services (PubMed, Crossref, etc.) should add integrity metadata to references, so flags travel with citations and alert downstream users.
- Track fake references: Research integrity databases (like Retraction Watch) should establish a dedicated category for fabricated references to enable systematic monitoring and accountability.
- Retroactive screening: Publishers should screen existing publications and issue corrections or retractions where fake references compromise a paper’s conclusions.
The study authors also emphasize that the problem is concentrated among large open-access journals with high-volume, author-pays models. One such publisher produced fabrications at 14 times the rate of the most selective journals—a pattern that reflects resource constraints and lighter editorial oversight. Review articles are at particular risk, with a 57% higher fabrication rate than original research.
Yet the risk is not confined to open access. The audit only covered PubMed Central’s open-access subset; whether subscription journals like JAMA, NEJM, or The Lancet face similar infiltration remains unknown. As one insider noted, those journals have more resources for reference checking, but the absence of a systematic audit leaves a blind spot.
What is clear is that the "damage is already done," in Topaz’s words. Even if LLMs eventually stop hallucinating, the contamination in the literature persists—fake citations that will continue to be cited unless actively removed.
AI Disclaimer (as required):
This article was generated by AI based on research from multiple sources. While efforts are made to ensure accuracy, readers should verify information independently.
Public trust in science is already waning. The visible accumulation of non-existent references in the peer-reviewed record threatens that trust further. The time for proactive, system-wide verification is now—before the next generation of researchers and clinicians builds upon a foundation that, in thousands of cases, simply isn’t there.
Related posts: KPZ Universality Confirmed in 2D (May 7, 2026), Eggs and Alzheimer's (May 9, 2026)
Post a Comment