The Internet Archive
Inside the INTERNET ARCHIVE
Iain Thomson tours the archive of a trillion web pages, which some copyright holders are trying to drive out of existence
LEFT
The library’s headquarters in San Francisco
ABOVE
Brewster Kahle founded the Internet Archive back in the 1990s
In 1995 a young man called Brewster Kahle sold a tech company called WAIS (Wide Area Information Server) for $15 million that indexed databases across the early days of the internet. Whereas many entrepreneurs would have taken the money and run, Kahle had bigger plans – he wanted to create the world’s first global encyclopaedia, and he’s well on the way to achieving that with the Internet Archive.
The organisation (archive.org) currently holds petabytes of records scraped from a trillion web pages, eight million digitised books, more than 80,000 concert recordings and music from 8,000 artists, plus decades of TV programmes. Not to mention four warehouses full of physical media that are being preserved for future generations.
“I got to grow up outside of New York City, and I would go into the White Plains Public Library, and it was just like magic,” Kahle told PC Pro.
“There was just all of these rows and floors of books and just all this stuff, and a very nice librarian who would say, ‘we have it all. If we don’t have it here, we’ll get it for you through the magic of an interlibrary loan.’ That’s what a library was when I was growing up. That’s not what a library is any more.”
Instead, he says, publishers are starving libraries of digital copies and slapped the Internet Archive with a $600 million copyright lawsuit, doing their best to ensure that the future of information is pay-to-play. Kahle has made it his life’s work to ensure that doesn’t happen.
No such thing as too much information
The headquarters of the Internet Archive is a former Christian Science church built in 1923 located in San Francisco’s Richmond district. It bears a remarkable resemblance to the organisation’s logo, which is styled on the Library of Alexandria.
Inside it’s an eclectic mix of high and low tech. The lobby contains physical media, ranging from Thomas Edison’s original sound recording cylinders all the way up to USB storage. In the corner there’s a 78rpm jukebox from 1947, just opposite a book-scanning room housing custom-built machinery with two cameras and arms that flip pages and photograph the books.