ARCHIVEBOX
Create your own VPS internet ArchiveBox
David Rutland uses the LXF virtual private server to archive as much of the internet as possible, keeping it safe for future generations.
OUR EXPERT
David Rutland spends his time pondering the imponderables. Is it possible for the Internet Archive to archive itself? What would happen? He thinks that there’s only one way to find out…
Paper is an honest medium. Once committed to the page by Future’s heavy iron, the images and words in this magazine are unalterable, and could, potentially, last the many centuries until the prophesied year of the Linux Desktop finally arrives.
The same is not true of the internet. Data is ephemeral, and updating a website or an individual page takes seconds. Wiping an entire website from existence, along with all of the words and images it contains is as simple as rm -rf /var/www/ .
In 2000, one of the largest bodies of knowledge concentrated online was Encarta, an epic site with more than 60,000 articles. Yet by 2009, Encarta was dead – pounded into irrelevance by the free and vastly superior Wikipedia. In 2010, encarta.comredirected to a free dictionary on MSN, and in 2013, the dictionary was dropped and seekers of knowledge were instead redirected to the Bing search results page for the word Encarta. The top result was the Wikipedia entry for Microsoft’s now defunct competitor. Visit encarta.comtoday and you’ll note that Microsoft no longer even bothers with redirects. You’ll get a connection timedout error.
Only those with an original Encarta CD (and an optical drive to put it in) are able to appreciate and enjoy the wonder of what was once the world’s premier electronic encyclopaedia.
Archiving can take a long time and pull in some unexpected pages. We weren’t expecting a snapshot of the Linux Format Twitter account.
Using docker-compose to install archivebox on to your VPS is straightforward, foolproof, and takes only a few minutes.
Thousands of sites drop off the internet every day. The domains lapse as owners can’t be bothered to renew, and are sold at auction by ISPs. Your favourite small blog now serves adverts for casinos, and you’ll never be able to read the ridiculous rants of the rambling writers.
News stories are changed retrospectively, especially in evolving situations. Furthermore, it’s very rare for outlets to tack an addendum to the bottom of the page explaining what has been changed, and why.
Archiving takes effort
Most of our readers are probably already aware of The Internet Archive. Founded in 1996 with a mission to preserve the web, the vast trove of images, sounds, text and pages were opened to the public in 2001. If you want to find out the top story on the Somerset County Gazette on 8 June 2019, it’s a simple matter of plugging the relevant URL into the search box at www.archive.organd then selecting the date.