22120 is an open-source archiver that caches and archives everything you browse and then pairs that with full-text search.
Right now, this is a tool you download and install locally, and the archives are also stored locally. However, the developer has plans to add a “clientless hosted version” that would build an online archive that users could download.
Currently, 22120 only works with a Chrome-based browser.
One of the interesting things about 22120 is the developer’s views on competing systems like MHTML or SingleFile (which I use to archive all of my web browsing automatically).
The case for the 22120 format.
Other formats (like MHTML and SingleFile) save translations of the resources you archive. They create modifications, such as altering the internal structure of the HTML, changing hyperlinks and URLs into “flat” embedded data URIs, or local references, and require other “hacks* in order to save a “perceptually similar” copy of the archived resource.
22120 throws all that out, and calls rubbish on it. 22120 saves a verbatim high-fidelity copy of the resources your archive. It does not alter their internal structure in any way. Instead it records each resource in its own metadata file. In that way it is more similar to HAR and WARC, but still radically different. Compared to WARC and HAR, our format is radically simplified, throwing out most of the metadata information and unnecessary fields these formats collect.
. . .
Both SingleFile and MHTML require mutilatious modifications of the resources so that the resources can be “forced to fit” the format. At 22120, we believe this is not required (and in any case should never be performed). We see it as akin to lopping off the arms of a Roman statue in order to fit it into a presentation and security display box. How ridiculous! The web may be a more “pliable” medium but that does not mean we should treat it without respect for its inherent content.
. . .
In short, the web is an online medium, and it should be archived and presented in the same fashion. 22120 archives content exactly as it is received and presented by a browser, and it also replays that content exactly as if the resource were being taken from online. Yes, it requires a browser for this exercise, but that browser need not be connected to the internet. It is only natural that viewing a web resource requires the web browser.
I don’t agree with that perspective, but I can definitely respect the desire to adhere to fidelity.
The way something like SingleFile works does mean, in fact, that some pages are simply not archivable by the addon. But as the 22120 developer notes, some things are also not archivable by that system, including “video, audio and websockets, for now.”
As users, we tend to view the web like pages of a book: when I visit a given URL, conceptually, that is like going to page 255 in a book. But the reality is that much of what we see on a website is ephemeral and can be vastly different depending on when we view the URL, what browser or OS we’re using, what addons we’re using, etc., even in cases where the core content served at the URL doesn’t change.
For my intended purposes, SingleFile works best, but it is good to see the recent proliferation of web archiving options that take different approaches and embrace different philosophies.