Storing Hundreds of Terabytes of Data for Billions of Years

The University of Southampton issued a press release recently highlighting progress that scientists there have made on creating digital storage methods that could potentially survive for billions of years.

Using nanostructured glass, scientists from the University’s Optoelectronics Research Centre (ORC) have developed the recording and retrieval processes of five dimensional (5D) digital data by femtosecond laser writing.

The storage allows unprecedented properties including 360 TB/disc data capacity, thermal stability up to 1,000°C and virtually unlimited lifetime at room temperature (13.8 billion years at 190°C ) opening a new era of eternal data archiving. As a very stable and safe form of portable memory, the technology could be highly useful for organisations with big archives, such as national archives, museums and libraries, to preserve their information and records.

The Optoelectronics Research Centre has posted a short video on YouTube showing data being written to such a glass disc using a femtosecond laser writing system.

 

A few years ago, Hitachi was supposedly working on a glass-based data storage system that also etched data onto glass with a laser, although at much lower densities than the Southampton researchers are aiming for,

The company’s main research lab has developed a way to etch digital patterns into robust quartz glass with a laser at a data density that is better than compact discs, then read it using an optical microscope. The data is etched at four different layers in the glass using different focal points of the laser.

. . .

Hitachi said the new technology will be suitable for storing “historically important items such as cultural artifacts and public documents, as well as data that individuals want to leave for posterity.”

. . .

Hitachi has succeeded at storing data 40MB per square inch, above the record for CDs, which is 35MB.

Hitachi has mentioned it’s glass-based research several times since that 2012 announcement, but as far as I know has not shipped anything (probably due to the relatively low data density). In 2014, Hitach announced it had developed a system that could reliably read/write to a 100-layer glass disc.

These glass-based systems remind me of science fiction writer Charles Stross’s idea of using synthetic diamond to store immense amounts of data,

My model of a long term high volume data storage medium is a synthetic diamond. Carbon occurs in a variety of isotopes, and the commonest stable ones are carbon-12 and carbon-13, occurring in roughly equal abundance. We can speculate that if molecular nanotechnology as described by, among others, Eric Drexler, is possible, we can build a device that will create a diamond, one layer at a time, atom by atom, by stacking individual atoms — and with enough discrimination to stack carbon-12 and carbon-13, we’ve got a tool for writing memory diamond. Memory diamond is quite simple: at any given position in the rigid carbon lattice, a carbon-12 followed by a carbon-13 means zero, and a carbon-13 followed by a carbon-12 means one. To rewrite a zero to a one, you swap the positions of the two atoms, and vice versa.

It’s hard, it’s very stable, and it’s very dense. How much data does it store, in practical terms?

The capacity of memory diamond storage is of the order of Avogadro’s number of bits per two molar weights. For diamond, that works out at 6.022 x 1023 bits per 25 grams. So going back to my earlier figure for the combined lifelog data streams of everyone in Germany — twenty five grams of memory diamond would store six years’ worth of data.

Six hundred grams of this material would be enough to store lifelogs for everyone on the planet (at an average population of, say, eight billion people) for a year. Sixty kilograms can store a lifelog for the entire human species for a century.

In more familiar terms: by the best estimate I can track down, in 2003 we as a species recorded 2500 petabytes — 2.5 x 1018 bytes — of data. That’s almost ten milligrams. The Google cluster, as of mid-2006, was estimated to have 4 petabytes of RAM. In memory diamond, you’d need a microscope to see it.

So, it’s reasonable to conclude that we’re not going to run out of storage any time soon.

Faster, please.

The Laundry RPG

Cubicle 7 Entertainment received the license to make an RPG based on Charles Stross’ awesome Laundry Files series of novels. The core rulebook is out and available in both old skool paper as well as PDF, an a series of supplements including an Agent’s Handbook and collection of scenarios are all planned for the coming months. This uses the BRP which is the same system as Call of Cthulhu, so lots of possibilities here.

Charlie Stross on the Future of Video Games

I happen to be a complete Charles Stross fanboy so your mileage may vary on this one, but his speech at LOGIN 2009 on  the state of gaming in 2030 is Stross at his best in extrapolating current trends to the near future.

Much of what Stross talks about is already starting to happen — the smartphone is starting to become ubiquitous as it becomes more powerful and has access to faster and faster bandwidth. Stross envisions a future where this leads to augmented reality so we no longer play games so much as we are constantly surrounded by the Internet and games everywhere we go.

For example: if you point your phone at a shop front tagged with an equivalent location in the information space, you can squint at it through the phone’s screen and see … whatever the cyberspace equivalent of the shop is. If the person you’re pointing it at is another player in a live-action game you’re in (that is: if their phone is logged in at the same time, so the game server knows you’re both in proximity), you’ll see their avatar. And so on.

Using these gizmos, we won’t need to spend all our time pounding keys and clicking mice inside our web browsers. Instead, we’re going to end up with the internet smearing itself all over the world around us, visible at first in glimpses through enchanted windows, and then possibly through glasses, or contact lenses, with embedded projection displays.

God, I want to live in that world. Except once we get there, as Edward Castronova has argued, how do we make the “real world” compelling enough to get people to stick around and do the not-so-fun things that keep civilization going?

Charles Stross on ‘Strangecraft-ian’ Horror

Charles Stross has some interesting musings on the intersection of H.P. Lovecraft and the sort of absurdist humor/horror embodied in Stanley Kubrick’s Dr. Strangelove. Stross notes the interesting similarity between horror and humor in that you can take anything and then add either element — or both in Kubrick’s case — to any other fictional form.

The other odd thing Stross notes is the similarity between the Cthulhu Mythos and the Singularity in contemporary science fiction,

And it occurs to me that the Lovecraftian apocalyptic singularity is underexplored. In a nutshell, it poses this question: what happens when we take the human condition, and twist? You need a topping of gallows humour just to keep it in perspective: humour is a brutal necessity when you’re confronting the horrific on a day to day basis (as anyone who hangs out with medics can probably attest).

. . .

What’s the role of humour in this universe? Well, one might ask what Stanley Kubrick intended when he turned “Dr. Strangelove” into a theatre of the absurd: absurdity is generated by dissonance between a situation and its meaning, and Kubrick used it to viciously anatomize the process of atomic annihilation and hold up the petty and banal motives of its perpetrators to ridicule. But “Dr. Strangelove” didn’t laugh at what came after the bomb — it ended, on a double-blind ironic note (singing “We’ll meet again” to a background of mushroom clouds). The bomb was the punch-line of the joke, not the set-up. What happens in a survivable apocalypse? Lovecraftian apocalyptic fiction never actually explores the consequences of the Old Ones returning, let alone the human wreckage left behind in the aftermath. It’s like the Singularity in SF, circa 2000 — off-limits to exploration.

Hopefully Stross will write that novel — I’d certainly love to read it.

Charles Stross on Data Preservation

Charles Stross writes about a problem near and dear to my heart — data preservation.

Stross starts off by mentioning this Observer article by British Library head Lynne Brindley, but Brindley annoys with material like this,

The 2000 Sydney Olympics was the first truly online games with more 150 websites, but these sites disappeared overnight at the end of the games and the only record is held by the National Library of Australia.

. . .

People often assume that commercial organisations such as Google are collecting and archiving this kind of material – they are not. The task of capturing our online intellectual heritage and preserving it for the long term falls, quite rightly, to the same libraries and archives that have over centuries systematically collected books, periodicals, newspapers and recordings and which remain available in perpetuity, thanks to these institutions.

Not once does she mention Archive.org, the nonprofit driven effort to archive the web which has been doing for years what librarians like Brindley are finally getting serious about doing. And, yes, Archive.org has a copy of the Sydney Olympic websites.

But, of course, replication is just part of the problem and probably the easiest to solve these days. The cost of building large scale data storage is getting cheaper all the time. Yes, there’s more data being produced, but the ability to store and use that data is scaling well.

Ensuring the data will still be useful 10 years from now is another matter entirely. Yes, the web is mostly HTML and JPEGs, but there is plenty of nonstandard data from Word and PowerPoint files to Javascript and Flash files created using very different versions of software and often requiring specific versions of programs to all work properly together.

Stross looks at that problem from a personal level, noting the times he’s switched from one platform or software to another and the inherent difficulty in taking your data with you during that transition,

In the space of six years, I went through five word processing packages. Being naive at the time I didn’t export my files into ASCII when I moved from CP/M and LocoScript to MS-DOS. I learned better, and when I switched from Sprint to Word I halfway ASCII-fied those files; they’re a bit weird, but if I really wanted to I could get into them with Perl and mangle them into something editable. Along the way, I lost the 3″ floppies from the PCW. Then I had a hard disk die on me — in those days, the MTBF of hard drives was around 10,000 hours — and it took the only copy of most of the early work with it.

Score to 1993: two years’ work is 90% lost. And a subsequent five years’ work is accessible, kinda-sorta, if I want to strip out all the formatting codes and revert to raw ASCII.

Been there, though not quite to the extent as Stross — I made sure to have printouts of everything on the proprietary word processors I used back in the 1980s, though having PDF scans of everything is significantly less useful than just having plain ASCII files.

Having been burned by data loss, Stross describes how he tries to prevent that from happening again,

As a matter of personal policy, for those activities that involve creating data, I aim to use only software that is (a) cross-platform, (b) uses open or well-published file formats, and (c) ideally is free software.

This is in some ways a handicap; Thunderbird (my mail client of choice) and OpenOffice aren’t as colourful and feature-rich as, say, Apple’s Mail.app or Microsoft’s latest Word. However …

Firstly, they run on Macs, Linux systems, Windows PCs, and even on some other minority platforms. This protects my data from being held to ransom by an operating system vendor.

Secondly, they use open file formats. Thunderbird stores mailboxes internally in mbox format, with a secondary file to provide metadata. (This means I can claw back my email if I ever decide to abandon the platform.) OpenOffice uses OASIS, an ISO standard for word processing files (XML, style sheet, and other sub-files stored within a zip archive, if you need to go digging inside one). I can rip my raw and bleeding text right out of an OASIS file using command line tools if I need to. (Or simply tell OpenOffice to export it into RTF.)

Thirdly, they’re both open source projects and thus the developers have no incentive to lock me in so that they can charge me rent. I don’t mind paying for software; where an essential piece of free software has a tipjar on the developer’s website, I will on occasion use it. And I’m writing this screed on a Mac, running OS/X; itself a proprietary platform. But the software I use for my work is open — because these projects are technology driven rather than marketing driven, so they’ve got no motivation to lock me in and no reason to force me onto a compulsory (and expensive) upgrade treadmill.

I’ll make exceptions to this personal policy if no tool exists for the job that meets my criteria — but given a choice between a second-rate tool that doesn’t try to steal my data and blackmail me into paying rent and a first-rate tool that locks me in, I’ll take the honest one every time. And I’ll make a big exception to it for activities that don’t involve acts of creation on my part. I see no reason not to use proprietary games consoles, or ebook readers that display files in a non-extractable format (as opposed to DRM, which is just plain evil all of the time). But if I created a work I damn well own it, and I’ll go back to using a manual typewriter if necessary, rather than let a large corporation pry it from my possession and charge me rent for access to it.

All very good ideas. For text, I do everything in a text editor. If I need to pretty it up I’ll then import it into Open Office or something, but there’s something especially useful about plain old ASCII.

And for those situations where a closed-source, proprietary file format solution is the only possibility, make damn sure you have a relatively open alternative representation of your work. I use a modified CAD program, for example, and I have no idea whether or not the file format is closed or open, but I damn well make sure I have JPEG and PDF versions of every project I’ve done in case the company goes belly up and 10 years from now I don’t have anything to run the original files through.

The Long Now’s Rosetta Disc Finally Comes to Fruition

Wow — I remember back in 2000 when the Long Now project first started talking about its Rosetta Disc project. The idea was to look at ways of preserve cultural information for very long periods of time. We’ve got more data than ever before, but how much of it is going to be preserved in a readable format 100 years from now?

The Long Now Project took that even further — what would it take to preserve information with a high degree of reliability for 1,000 years? Longer?

Back then the idea was to use technology from Norsam which essentially etches images of pages onto a metal disk. The images etched onto the disk are small enough that they have to be viewed using a special microscope, but allowing for potentially hundreds of thousands of pages to be stored on a single  3- inch disk. And the upshot was the lifespan — somewhere in the 2,000 to 10,000 year span.

Anyway, after eight years, they’ve finally managed to produce the disk for the Long Now Project.

The disc is titanium on the front (the side depicted here) and nickle on the back, where the information is actually etched. There are 13,500 pages here including 1,500 different translations of Genesis 103, a list of common words for those 1,500 languages, etc.

There’s actually another, less ambitious, Rosetta Disc that was produced back in 2004, but it was launched on the Rosetta Space Probe scheduled to rendezvous with a comet in 2014.

You can have your own copy of the latest iteration of the Rosetta Disc for a mere $25,000.

Personally, I’m hoping for a version of Charles Stross idea of using synthetic diamonds to store information,

My model of a long term high volume data storage medium is a synthetic diamond. Carbon occurs in a variety of isotopes, and the commonest stable ones are carbon-12 and carbon-13, occurring in roughly equal abundance. We can speculate that if molecular nanotechnology as described by, among others, Eric Drexler, is possible, we can build a device that will create a diamond, one layer at a time, atom by atom, by stacking individual atoms — and with enough discrimination to stack carbon-12 and carbon-13, we’ve got a tool for writing memory diamond. Memory diamond is quite simple: at any given position in the rigid carbon lattice, a carbon-12 followed by a carbon-13 means zero, and a carbon-13 followed by a carbon-12 means one. To rewrite a zero to a one, you swap the positions of the two atoms, and vice versa.

It’s hard, it’s very stable, and it’s very dense. How much data does it store, in practical terms?

The capacity of memory diamond storage is of the order of Avogadro’s number of bits per two molar weights. For diamond, that works out at 6.022 x 1023 bits per 25 grams. So going back to my earlier figure for the combined lifelog data streams of everyone in Germany — twenty five grams of memory diamond would store six years’ worth of data.

Six hundred grams of this material would be enough to store lifelogs for everyone on the planet (at an average population of, say, eight billion people) for a year. Sixty kilograms can store a lifelog for the entire human species for a century.

In more familiar terms: by the best estimate I can track down, in 2003 we as a species recorded 2500 petabytes — 2.5 x 1018 bytes — of data. That’s almost ten milligrams. The Google cluster, as of mid-2006, was estimated to have 4 petabytes of RAM. In memory diamond, you’d need a microscope to see it.

Now that would be a diamond worth paying a couple months’ salary for.