Jonah Edwards’ Presentation on Internet Archive Infrastructure

Jonah Edwards, Infrastructure and Operations Manager for the Internet Archive, put together a fascinating presentation on the Internet Archive’s infrastructure.

It’s amazing to think of a 200 petabyte archive that is growing 20-24 petabytes each year, and what it takes to keep that all running.

Internet Archive Starts Highlighting “Fact Checking” of Archived Pages

The Internet Archive is starting to highlight “fact-checking” organizations’ analysis of web pages to archived versions of those pagers contained within the archive.

Fact checking organizations and origin websites sometimes have information about pages archived in the Wayback Machine. The Internet Archive has started to surface some of these annotations for Wayback Machine users. We are attempting to preserve our digital history but recognize the issues around providing access to false and misleading information coming from different sources. By providing convenient links to contextual information we hope that our patrons will better understand what they are reading in the Wayback Machine.

For example, this archived article was allegedly part of a Russian disinformation campaign, so the Internet Archive now includes a text banner linking to a report about that disinformation page.

I’m skeptical of how useful this would be to the average person, however. The Internet Archive notice says,

This is an archived web page that was included in a report titled “Secondary Infektion”. Here is a link to it on the Live Web.

That is unlikely to make much sense to someone not already familiar with the Secondary Infektion disinformation campaign.

This is where it is a shame that some standardized method of third party annotation of web pages never emerged, despite many such efforts. It would be nice to have fact-checking extensions that worked similar to ad-blocking extensions, where users could subscribe to whatever fact-checking organizations they trusted and then received in-browser analyses of suspected disinformation.

Of course, the market for such a browser extension is likely even smaller than that for adblocking.

Brave Browser Will Show Archive.Org Option on 404 Pages

The Brave Browser will start including a “Check for saved version” button on all pages that are 404. The button will check to see if there is a version of the page on The Internet Archive’s site.

Available today, starting with version 1.4 of its desktop browser, Brave has added a 404 detection system, with an automated Wayback Machine lookup process to its desktop browser.

By default, it now offers users one-click access to archived versions of Web pages that might otherwise not be available. Specifically we are checking for 14 HTTP error codes in addition to the 404 (page not found) condition, including: 408, 410, 451, 500, 502, 503, 504, 509, 520, 521, 523, 524, 525, and 526. 

Brave - WhiteHouse.Gov - Archive.Org Example
Brave – WhiteHouse.Gov – Archive.Org Example

There are plugins for most browsers that will do the same thing, but it is nice to see Brave bake this into their browsing experience.

Personal Digital Archiving 2011 Conference

In February 2011, the Internet Archive sponsored a two-day conference on Personal Digital Archiving,

The combination of new capture devices (more than 1 billion camera phones will be sold in 2010) and new types of media are reshaping both our personal and collective memories. Personal collections are growing in size and complexity. As these collections spread across different media (including film and paper!), we are redrawing the lines between personal and professional data, and between published and unpublished information.

This being the Internet Archive, they’ve uploaded video of all the presentations in several different formats. The presentations range from looking at issues of costs, ethics, and technical issues to vendor presentations on specific tools in this area.