New research conducted by the Pew Research Centre in the US has found that nearly 40 per cent of all web pages created in 2013 are no longer accessible, a phenomenon they refer to as “digital decay”. This means that if you are searching for an online article from 2013, there is a high chance that it may have disappeared. The study also revealed that a quarter of all web pages created from 2013 to 2023 no longer exist, with 8 per cent of those pages being created in 2023.
To conduct the research, the team collected a random sample of web pages from Common Crawl, an internet archive search that documents the internet at any given time. They examined roughly 90,000 internet pages per year from 2013 to 2023 to determine if they were still accessible. The study defines inaccessible links as pages that no longer exist on a host server, showing a 404 not found message.
When looking at specific types of web pages, the researchers found that a significant percentage of Wikipedia pages analyzed had broken links in the references section. Additionally, around 23 per cent of news sites and 21 per cent of government pages contained broken links. They discovered that government websites had an average of 50 links on each page, often leading to secure HTTP pages for more information. City governments were particularly prone to having broken links, with 29 per cent of their sites examined showing at least one broken link.
The study also highlighted the decay happening on social media platforms, specifically on X (formerly Twitter). Just under one in five posts collected in a random sample of 4.8 million posts were not available for more than a few months on the site. This could be due to accounts being deleted or individual posts being removed. Posts in Turkish or Arabic were more likely to disappear, as well as those coming from accounts with generic profile pictures or bios.
Overall, the research sheds light on the significant amount of digital decay happening online, with a large number of web pages becoming inaccessible over time. This could have implications for researchers, journalists, and the general public who rely on the internet for information. It is essential for website owners to regularly update and maintain their content to prevent digital decay and ensure that valuable information remains accessible to users. More studies and awareness on this issue are needed to address the challenges posed by the disappearing online content.