Evaluating link rot

After reading Brian Suda's article on link rot, I ran his script on my Pinboard collection, and discovered that around 12% of my bookmarked links are invalid.

75 80 85 90 95 100 2006 2008 2010 2012 2014 2019 2021

Here's the data used to plot the above graph. The total number of bookmarked links vary from year to year, with a high between 2010 and 2011.

Year Successful Bookmarks Bookmarks Average
200680109 73.39
200713216380.98
200818523778.05
200926831984.01
201075785588.53
201179489089.21
201220223087.82
201314315890.50
2014252696.15
2018515986.44
2019565994.91
2020495294.23
20213737100
202266100
Totals2785320088.87

Note: I didn't use Pinboard between 2014 and 2017 😶

Brian's script works like this:

The code looks through your bookmarks and attempts to fetch each URL. If the HTTP code is less than 400 we mark it as a success. Without manually checking every URL, there might be some false positives: people selling existing domains, hosting provider redirects, etc. If the status code was 400 or higher, we marked it as a failure. After some manual investigation, we realized that some domains were not allowing bots to crawl them. Our code was using cURL, which appears as a bot, so we faked a browser’s user-agent string and decreased our failure rate by ~4%.

Pinboard aka del.icio.us

I started to use del.icio.us back in 2006, when I discovered the service at “The Future of Web Apps London” and somehow forgot about it between 2014 and 2018.

I converted my account last year when Pinboard’s creator Maciej reached out to ask if we, original one-time payment users, would consider converting to a subscription model, helping him to continue maintaining and developing the service, and make a living out of it..

I was surprised that the numbers of invalid links weren't higher, considering that a vast majority of the links of my blog are now invalid. There is probably a significant number of false positives among the 88.87% of valid bookmarks. Randomly clicking through old links turned up a fair amount of them.

I still need to finish my link checking script that replaces invalid links by a link to the Internet Archive Wayback Machine project.

Want more ?  — prev/next entries