The problem of artificial intelligence models 'hallucinating' non-existent citations has recently shot to prominence. Now a team of researchers has sifted through 2.5 million papers and preprints to provide the best assessment of their prevalence yet. Their audit encompassed 111 million references in papers and preprints listed in major repositories including arXiv, bioRxiv, Social Science Research Network (SSRN), and PubMed Central servers, and found that there were 146,932 hallucinated citations in material published in 2025 alone. The analysis also suggests that the prevalence of hallucinated citations depends on the area of research. SSRN, a preprint server for social sciences research, had the highest rate of hallucinated citations at nearly 2%, almost five times higher than any other major repository. 'We were really amazed by the overall magnitude and dynamics of the whole body of hallucinated citations,' says Yian Yin, assistant professor of information science at Cornell...
learn more