Millenniata claims its M-Arc DVDs are backwards-compatible with existing DVD technologies, but rather than using a laser to heat up a photosensitive dye, the M-Arc uses a mechanical process to make scratches in a physical layer that M-Arc claims will last potentially for centuries if stored properly. According to a brief summary on the manufacturer’s site, the M-Arc:
Preserves data for centuries with physical changes in data layer Constructed with rock-hard materials known to last for centuries Backwards-compatible on all standard DVD drives Functions like a standard DVD with a capacity of 4.7 GB Exclusively written by the M-Writer™ Drive
The Millenniata site doesn’t list any prices, but Long Now reports $1,700 for the writer and $16-$25 per 4.7gb disc depending on the quantity.
Back in March, the Long Now Foundation blog featured an extremely long post republishing two articles and a paper concerned with the potential loss of data caused by the increasing speed at which storage technologies become obsolete and, soon thereafter, difficult to access.
Of the three pieces, Jennifer Stilles’s look at the National Archives’ efforts to preserve/recover data stored in obsolete formats was the most interesting. It seems clear from Stilles piece that the crux of the problem is the constant drive for technological innovation which produces products that are ever better but also, too often, ever more incompatible with previous formats. Moreover, this is a problem that started long before the current digital computer age,
On the wall are the internal organs of a film projector from the 1930s; the old heads have been mounted to play together with modern reels. “Twenty-eight different kinds of movie sound-tracking systems were devised during the 1930s and 1940s, trying to improve the quality of sound tracks,” Mayn explained. “Most of them are unique and incompatible.” This particular one used something called “push-pull” technology, in which the sound signal was split onto two different tracks. The technology was meant to cancel out noise distortion, but the two tracks must play in near-perfect synchrony. “If it is played back properly, it is better than a standard optical track, but if it is played back even a little bit improperly, it is far, far worse,” Mayn said. In the mid-1980s at a theater in downtown Washington, he was able to actually use this reconfigured projector to show several reels of push-pull film containing the trials of top Nazi leaders at Nuremberg. And the lab has transferred some 1800 reels of push-pull tape onto new negatives.
Wow. That fits nicely with one of the main problems with data storage today once you get past the physical media — the plethora of file formats and an odd lack of recognition that this is even a problem.
Microsoft rolls out yet another proprietary format for Office? Everybody simply upgrades without a second thought, because if you don’t all of a sudden you’re receiving file attachments you can’t open. Much of this is driven, I suspect, by the view that most data production is largely ephermal. Are we really going to be want to be able to open this report in Word 2003 format 10 years from now? Of course, I’ve also seen the fallout from that where people run around trying to find some way to open that 10 year old file which is suddenly extremely important due to issues with a specific vendor or contract, etc.
The current state of data preservation efforts remind me of the documentary “The Chances of the World Changing.” The documentary follows turtle enthusiasts who, given the lack of any coordinated effort to preserve endangered turtles, create their own ad hoc network of mini-Arks. They buy up individual turtles from overseas, and store them in warehouses, basements, garages, etc., moving the turtles around when one or another enthuisast burns out or runs out of cash. And they hope they’ll be able to keep the turtles going and around until they’re able to get others to see the need for a permanent, formal preservation effort.
I couldn’t agree more with Jason Scott’s essay, Fuck the Cloud — if you are entrusting important data to a service that you don’t control and don’t have a migration path out, you are a fool.
Because if you’re not asking what stuff means anything to you, then you’re a sucker, ready to throw your stuff down at the nearest gaping hole that proclaims it is a free service (or ad-supported service), quietly flinging you past an End User License Agreement that indicates that, at the end of the day, you might as well as dragged all this stuff to the trash. If it goes, it’s gone.
. . .
Contrast, though, when people are dumping hundreds of hours a year into the Cloud. Blowing out photos. Entering day after day of entries. Sharing memories, talking about subjects that matter to them. Linking friends or commenting on statuses or trading twitters or what have you. This is a big piece, a very big piece of what is probably important stuff.
Don’t trust the Cloud to safekeep this stuff. Hell yeah, use the Cloud, blow whatever you want into the Cloud. The Internet’s a big copy machine, as they say. Blow copies into the Cloud. But please:
- Don’t blow anything into the Cloud that you don’t have a personal copy of.
- Insult, berate and make fun of any company that offers you something like a “sharing” site that makes you push stuff in that you can’t make copies out of or which you can’t export stuff out of. They will burble about technology issues. They are fucking lying. They might go off further about business models. They are fucking stupid. Make fun of these people, and their shitty little Cloud Cities running on low-grade cooking fat and dreams. They will die and they will take your stuff into the hole. Don’t let them.
- Recognize a Cloud when you see it. Are you paying for these services? No? You are a sucker. You are giving people stuff for free. I pay for Vimeo and I pay for Flickr and a couple other things. This makes me a customer. Neither of these places get my only copy of anything.
- If you want to take advantage of the froth, like with YouTube or so Google Video (oh wait! Google Video is going off the air!) then do so, but recognize that these are not Services. These are not dependable enterprises. These are parties. And parties are fun and parties and cool and you meet neat people at parties but parties are not a home.
I think this is a much bigger short term problem than the sort of more basic data preservation problems. People are dumping all of their data into different services and coming to rely on those services without ever thinking, “what if this company goes out of business next year?” In many cases, people won’t even realize just how much they’re dependent on other people providing them access to their data until that disappears.
Personally, I do use a lot of cloud services, but I am also fairly obsessive (ok, ridiculously obsessive) about making sure I have personal copies of everything so the day those services go asking for a bailout I’m not stuck wondering whether I’m going to be able to get my data back or not.
This is also one of the reasons those offering such services need to be pressured to adopt open standards so it is simple and straightforward to create local copies of any data and/or migrate to another service, whether it be on another web service or on a server the user controls. Most of the sort of web services today that Scott is bitching about seem to think that locking their customers into their specific service is the way to go, emulating the Microsoft’s of the traditional software market.
Henry Newman has a basic overview of the issues with data preservation in an increasingly all-digital world, but the title of his essay — Rock Don’t Need to Be Backed Up — gives me the hives, because it is transparently wrong. Newman opens his essay by explaining the origins of the title he chose,
My wife and I were in New York’s Central Park last fall when we saw a nearly 4,000-year-old Egyptian obelisk that has been remarkably well preserved, with hieroglyphs that were clearly legible — to anyone capable of reading them, that is. I’ve included a couple of pictures below to give you a better sense of this ancient artifact — and how it relates to data storage issues.
As we stood wondering at this archaeological marvel, my wife, ever mindful of how I spend the bulk of my time, blurted out, “Rocks do not need backing up!”
But, of course rocks do need backing up. Of the apparently hundreds of obelisks built by the ancient Egyptians, only 27 have survived completely intact to today. If the Egyptians had, perhaps, made a backup copy of each of those obelisks, some of them may have survived to our time intact. Of course making the initial version was a laborious process, so even had they wanted to make backup copies it may very well have been out of the question.
Compare this to the situation with paper and animal skins. Paper is both easier to create and destroy than the granite that the obelisks were carved from. For most historical documents more than a few hundred years old, we rarely have the original document. Rather we have copies and, in many cases, copies of copies of translations, etc. Oftentimes this in itself creates problems, as the copying process was rarely 100% accurate and the copyists would occasionally intentionally insert or delete passages. The most pronounced example of this is books of the New Testament for which there are numerous versions of, none of them “original” copies and frequently diverging from each other ways both small and large.
So now we enter the digital age. We still have one of the main problems that has vexed historians — the possibility that we’ll forget how to read certain documents, although this has switched from not being able to decipher long lost languages to not being able to read long abandoned formats (and by “long abandoned” that could mean “in the last 24 months).
The other day, for example, I was cleaning out my office and ran across a stack of Syquest disk cartridges that belonged to someone who left my organization about a decade ago. Syquest systems were essentially the forerunner to Zip disks and Syquest dominated the market for large scale removable storage in the 1980s and early 1990s. By the mid-1990s, however, Syquest found itself with huge quality control problems that eventually forced it into bankuptcy in 1998.
The data on those Syquest disks is largely unreadable to me. Or, more precisely, I probably could recover the data but at a price I’m not willing to pay. Moreover, since the data was all created on a mid-1990s Macintosh I’m not certain if I’d even be able to meaningfully use the data there.
But there are clearly ways we could turn this around and use the technology to our advantage. Unlike rocks and paper, making copies — lots of copies — is trivially easy with data. Most people and a surprising number of organizations don’t seem to have much of a plan at all for doing so, but its becoming easier and cheaper to do.
On a personal level, I literally have dozens of copies of my personal data store (which is roughly 500gb and growing) created with the frankly still primitive tools available for doing so. Now there is a major difference between backing up my little old 500gb and an organization that may have 500TB across the organization. Additionally, its easy enough for someone obssessed with data preservation to pull this off on an individual basis with a bit of attention, but there’s still a significant amount of personal intervention required to keep everything go and making sure everything works.
But if I can do it, surely there’s a way to scale backups wider than just me. The rise of the (too) numerous companies that are offering online backups at least suggests that there is a growing awareness of the problem of data loss. Of course many of these companies have business models that suggest they’ll soon be part of the problem rather than the solution (i.e., when that VC funding finally runs out).
On the other hand, few people seem to give a damn about the file format incompatibility issue. The solution there is also simple enough — only use open, well-documented file formats that can easily be reconstructed when support disappears for them. Instead, the reality is that people rush around seeing who can upgrade to the latest Microsoft product whose file formats are completely incompatible with every other file format in the history of the world.
Even then, I think Newman is overly pessimistic about the effect of data loss,
Digital data management concepts, technologies and standards just do not exist today. I don’t know of anyone or anything that addresses all of these problems, and if it is not being done by a standards body, it will not help us manage the data in the long run. It is only a matter of time until a lot of data starts getting lost. A few thousand years from now, what will people know about our lives today? If we are to leave obelisks for future generations, we’d better get started now.
It is almost inevitable that a lot of data will get lost — we are probably no different from previous civilizations who have rarely left behind anything but a small percentage of their “data” in forms that are usable by us. We will likely be no different, except that we are generating so much data that even this trickle of data that survives will still be sufficient to overwhelm future historians trying to get a handle on us.
We should definitely make the issues Newman writes about a priority, but I worry more that future generations will be drowning in our data remnants rather than seeing our era as a black hole of data loss.
I’m a big fan of online backup of critical data — I use Amazon S3 to back up the 300gb or so of personal files I couldn’t bear to lose — but as these sort of sites proliferate, you have to be careful to pick one that’s going to be around for awhile. Case in point HP announced it is shutting down its HP Upline service. I’m not too familiar with this service, but I assume it is similar to the program that Dell and other computer manufacturers have these days where they try to upsell consumers an online backup system when they’re purchasing their desktop or laptop.
According to a press release via Web Worker Daily,
HP continually evaluates product lines and has decided to discontinue the HP Upline service on March 31, 2009.
HP will no longer be backing up your files to the HP Upline servers as of Feb 26, 2009 at 8 am Pacific time. HP will keep the file restore feature of the Upline service operational through March 31, 2009 Pacific time in order for you to download any files you have backed up to Upline.
If you have a paid subscription to HP Upline, you will be refunded the full amount of the fees you paid for the service. That refund will be credited to the credit card account or PayPal account that you used to subscribe to the Upline service. If you do not receive the refund prior to March 31, 2009, please contact our customer service team at https://www.upline.com/support/email.aspx.
HP looks forward to offering you additional technology products and services in the future.
Frankly, that’s a relatively short time frame for making copies of anything on their servers. Some of us are fortunate to have 20mb download through our ISPs, but not everyone does and intensive users of the service might find that is a cutting it close (along with the hassle of suddenly having to devote a good portion of time to the attendant issues in creating the restore volume, etc., etc.)
And if HP couldn’t make this service work (which doesn’t surprise me), how do you think those services offering unlimited storage for $5/month are going to be doing 6 months from now?
Online backup is definitely a useful tool to have, but it shouldn’t be used as the only backup method and you have to consider how likely it is the company hosting your data will still want to/be able to in the short and long terms.