Link rot


Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer points to its target, often called a broken or dead link (or sometimes orphan link), is a specific form of dangling pointer.

The rate of link rot is a subject of study and research due to its significance to the internet's ability to preserve information. Estimates of that rate vary dramatically between studies.

A number of studies have examined the prevalence of link rot within the World Wide Web, in academic literature that uses URLs to cite web content, and within digital libraries.

A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2]

A 2004 study showed that subsets of Web links (such as those targeting specific file types or those hosted by academic institution) could have dramatically different half-lives.[3] The URLs selected for publication appear to have greater longevity than the average URL. A 2015 study by Weblock analyzed more than 180,000 links from references in the full-text corpora of three major open access publishers and found a half-life of about 14 years,[4] generally confirming a 2005 study that found that half of the URLs cited in D-Lib Magazine articles were active 10 years after publication.[5] Other studies have found higher rates of link rot in academic literature but typically suggest a half-life of four years or greater.[6][7] A 2013 study in BMC Bioinformatics analyzed nearly 15,000 links in abstracts from Thomson Reuters's Web of Science citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.[8] A 2021 study of external links in 1996-2019 New York Times articles found that 25% of links were inaccessible. In addition, from a sample of 4,500 links still accessible, 13% did not lead to the original content, a phenomenon called content drift.[9]

A 2002 study suggested that link rot within digital libraries is considerably slower than on the web, finding that about 3% of the objects were no longer accessible after one year[10] (equating to a half-life of nearly 23 years).