This photograph, from Wikimedia, illustrates the amazing history of computer storage with a SanDisk 8gb microSD card laying on top of an 8 byte magnetic core memory unit.
Backblaze recently published an in-depth look at how durable/reliable data that is stored with its service is–i.e., what are the odds that you’ll want to retrieve a specific set of data from the service and find out that won’t be able to?
At the end of the day, the technical answer is “11 nines.” That’s 99.999999999%. Conceptually, if you store 1 million objects in B2 for 10 million years, you would expect to lose 1 file. There’s a higher likelihood of an asteroid destroying Earth within a million years, but that is something we’ll get to at the end of the post.
. . .
When you send us a file or object, it is actually broken up into 20 pieces (“shards”). The shards overlap so that the original file can be reconstructed from any combination of any 17 of the original 20 pieces. We then store those pieces on different drives that sit in different physical places (we call those 20 drives a “tome”) to minimize the possibility of data loss. When one drive fails, we have processes in place to “rebuild” the data for that drive. So, to lose a file, we have to have four drives fail before we had a chance to rebuild the first one.
The analysis then goes on to present a lot of math related to the time it takes for Backblaze to rebuild any data lost and its overall drive failure rate, but the general thrust is that it is extremely unlikely that Backblaze would ever suffer data loss from normal technical failures.
But at some point, we all start sounding like the guitar player for Spinal Tap. Yes, our nines go to 11. Where is that point? That’s open for debate. But somewhere around the 8th nine we start moving from practical to purely academic. Why? Because at these probability levels, it’s far more likely that:
- An armed conflict takes out data center(s).
- Earthquakes / floods / pests / or other events known as “Acts of God” destroy multiple data centers.
- There’s a prolonged billing problem and your account data is deleted.
There is one thing of interest in the odd way Backblaze concludes its analysis, however,
Eleven years in and counting, with over 600 petabytes of data stored from customers across 160 countries, and well over 30 billion files restored, we confidently state that our system has scaled successfully and is reliable. The numbers bear it out and the experiences of our customers prove it.
Note that this doesn’t say that they’ve never come across a file they were unable to restore due to technical, backend reasons (rather than issues related to customer credit cards, etc.)
Back in early March I decided to look into off-site backup of my data drives using either Crashplan or Backblaze. For the most part I’ve ignored online backup services mainly because of the large volume of data I currently maintain/backup for personal use, which is currently approaching 60 terabytes. Along with storage costs, the sheer amount of time to upload that amount of data is ridiculous and so I hadn’t really given much thought to online backups.
So I decided to give Backblaze a try. There are some things I do not like about Backblaze, but overall I have been very pleased with it in the intervening month and felt good enough about it to pay for a year’s subscription.
To get things started, I hooked up a nearly full Seagate 8 terabyte hard drive to my main computer using an external dock. I already have that hard drive backed up locally, so I’m only relying on Backblaze as an option in case both the original and all backup copies of the drive should fail.
Don’t Rely on Backblaze for Your Only Backup
A lot of horror stories I read online from users of both Crashplan and Backblaze made it clear that they were using these services as their only method of backup. In several cases, users got burned when they backed up their data to either service prior to reformatting or destroying a hard drive, only to find that their data was unavailable or unrecoverable (or only recoverable after extraordinary measures were taken).
This, in a word, is crazy. For $50/year I wouldn’t use these sorts of services as anything but as a backup of last resort. On the one hand, I’d put the odds of actually being able to recover my data from Backblaze if needed at 50/50. On the other hand, it’s only $50/year–it’s like the extra disability insurance I pay through my workplace that I have never bothered to actually track down the details about. Maybe it will help, maybe it won’t, but it’s so cheap that it’s not worth not carrying.
If you I do need to retrieve the data, however, it is reassuring that Backblaze will let me pay them to copy my data to a hard drive and then ship that hard drive to me, whereas with Crashplan my only option would be to download the data (and there were plenty of reports of that not working so well.)
Uploading Terabytes of Data
The second problem that a lot of users reported was the long length of time it took to upload large volumes of data. In some cases this was just users not understanding how the technology works. No, Mr. Clueless, you’re not going to be able to upload 1 terabyte of data to an offline service over a weekend on a DSL modem. That just isn’t going to happen.
But other users complained of slowness in general. My experience was that Crashplan was slow as hell–significantly slower than Backblaze. I’m on a cable service that has 60mbs down and 7mbs up (and no bandwidth cap). With Backblaze I was able to upload a little over 1 terabyte in the first month, which was very reasonable from my experience. This is where you really start to notice the ridiculously slow Internet speeds that most of us in the United States have to endure, but that’s a much bigger problem and obviously nothing Backblaze can do anything about.
Encryption . . . Sort Of
An absolute necessity for me was being able to encrypt my data independently of either Backblaze or Crashplan. Both services allowed me to use a private encryption phrase so that no one but me, in theory, would be able to unencrypt my data. However–there’s always some sort of “however”–the way these services handle restoring data is that you would need to supply the private key to Backblaze, for example, which would use it to decrypt the files and then make them available to you,
However, if you lose a file, you have to sign into the Backblaze website and provide your passphrase which is ONLY STORED IN RAM for a few seconds and your file is decrypted. Yes, you are now in a “vulnerable state” until you download then “delete” the restore at which point you are back to a secure state.
If you are even more worried about the privacy of your data, we highly recommend you encrypt it EVEN BEFORE BACKBLAZE READS IT on your laptop! Use TrueCrypt. Backblaze backs up the TrueCrypt encrypted bundle having no idea at all what is in it (thank goodness) and you restore the TrueCrypted bundle to yourself later.
Ugh. It would be much better to simply ship me an encrypted blob along with a utility to unencrypt the data locally. This process completely misses the point of why users want a private encryption key. (Crashplan appears to use the same sort of process of decrypting in the cloud and then downloading the unencrypted file over SSL to your hard drive). All you’re really doing, then, is limiting the window of time that Backblaze employees (and anyone who has infiltrated their network) have access to your unencrypted data.
As I said before, I would never rely on this sort of service as anything but a last resort. Losing all of my data and having to wonder if I really want to trust Backblaze even temporarily with an unencrypted copy of my data is still better than simply losing all of my data with no other options (for $50/year, that is. If it cost, say $200/year, I might have a different view). For me, using Backblaze was a no-brainer given the range of available backup options and costs.
The University of Southampton issued a press release recently highlighting progress that scientists there have made on creating digital storage methods that could potentially survive for billions of years.
Using nanostructured glass, scientists from the University’s Optoelectronics Research Centre (ORC) have developed the recording and retrieval processes of five dimensional (5D) digital data by femtosecond laser writing.
The storage allows unprecedented properties including 360 TB/disc data capacity, thermal stability up to 1,000°C and virtually unlimited lifetime at room temperature (13.8 billion years at 190°C ) opening a new era of eternal data archiving. As a very stable and safe form of portable memory, the technology could be highly useful for organisations with big archives, such as national archives, museums and libraries, to preserve their information and records.
The Optoelectronics Research Centre has posted a short video on YouTube showing data being written to such a glass disc using a femtosecond laser writing system.
A few years ago, Hitachi was supposedly working on a glass-based data storage system that also etched data onto glass with a laser, although at much lower densities than the Southampton researchers are aiming for,
The company’s main research lab has developed a way to etch digital patterns into robust quartz glass with a laser at a data density that is better than compact discs, then read it using an optical microscope. The data is etched at four different layers in the glass using different focal points of the laser.
. . .
Hitachi said the new technology will be suitable for storing “historically important items such as cultural artifacts and public documents, as well as data that individuals want to leave for posterity.”
. . .
Hitachi has succeeded at storing data 40MB per square inch, above the record for CDs, which is 35MB.
Hitachi has mentioned it’s glass-based research several times since that 2012 announcement, but as far as I know has not shipped anything (probably due to the relatively low data density). In 2014, Hitach announced it had developed a system that could reliably read/write to a 100-layer glass disc.
These glass-based systems remind me of science fiction writer Charles Stross’s idea of using synthetic diamond to store immense amounts of data,
My model of a long term high volume data storage medium is a synthetic diamond. Carbon occurs in a variety of isotopes, and the commonest stable ones are carbon-12 and carbon-13, occurring in roughly equal abundance. We can speculate that if molecular nanotechnology as described by, among others, Eric Drexler, is possible, we can build a device that will create a diamond, one layer at a time, atom by atom, by stacking individual atoms — and with enough discrimination to stack carbon-12 and carbon-13, we’ve got a tool for writing memory diamond. Memory diamond is quite simple: at any given position in the rigid carbon lattice, a carbon-12 followed by a carbon-13 means zero, and a carbon-13 followed by a carbon-12 means one. To rewrite a zero to a one, you swap the positions of the two atoms, and vice versa.
It’s hard, it’s very stable, and it’s very dense. How much data does it store, in practical terms?
The capacity of memory diamond storage is of the order of Avogadro’s number of bits per two molar weights. For diamond, that works out at 6.022 x 1023 bits per 25 grams. So going back to my earlier figure for the combined lifelog data streams of everyone in Germany — twenty five grams of memory diamond would store six years’ worth of data.
Six hundred grams of this material would be enough to store lifelogs for everyone on the planet (at an average population of, say, eight billion people) for a year. Sixty kilograms can store a lifelog for the entire human species for a century.
In more familiar terms: by the best estimate I can track down, in 2003 we as a species recorded 2500 petabytes — 2.5 x 1018 bytes — of data. That’s almost ten milligrams. The Google cluster, as of mid-2006, was estimated to have 4 petabytes of RAM. In memory diamond, you’d need a microscope to see it.
So, it’s reasonable to conclude that we’re not going to run out of storage any time soon.
Nice to see that Western Digital-owned HGST is shipping a 6TB hard drive that uses helium to assist in cramming 6.5TB into a 3.5″ form factor.
According to an HGST press release,
Leveraging the inherent benefits of helium, which is one-seventh the density of air, the new Ultrastar He6 drive features HGST’s innovative 7Stac™ disk design with 6TB, making it the world’s highest capacity HDD with the best TCO for cloud storage, massive scale-out environments, disk-to-disk backup, and replicated or RAID environments.
“With ever-increasing pressures on corporate and cloud data centers to improve storage efficiencies and reduce costs, HGST is at the forefront delivering a revolutionary new solution that significantly improves data center TCO on virtually every level – capacity, power, cooling and storage density – all in the same 3.5-inch form factor,” said Brendan Collins, vice president of product marketing, HGST. “Not only is our new Ultrastar helium hard drive helping customers solve data center challenges today, our mainstream helium platform will serve as the future building block for new products and technologies moving forward. This is a huge feat, and we are gratified by the support of our customers in the development of this platform.”
What’s really wild is HGST’s suggestion that since they are sealed to keep the helium from leaking out, that this could lead to some clever liquid cooling options (emphasis added),
One solution, which has been explored by many, is liquid cooling. Liquid, which is denser than air, can remove heat more efficiently and maintain a more constant operating temperature. However, traditional drives cannot be submerged as they are open to the atmosphere and would allow the cooling liquid inside, damaging or destroying the HDD. HGST’s HelioSeal platform provides the only cost-effective solution for liquid cooling as the drives are hermetically sealed and enable operation in most any non-conductive liquid. Today, HGST is working with leading innovators in this space such as Huawei and Green Revolution Cooling.
Interesting announcement from Sony and Panasonic about collaborating on a new standard for a 300gb optical disc. They hope to have the standard finalized by the end of 2015:
Sony Corporation (‘Sony’) and Panasonic Corporation (‘Panasonic’) today announced that they have signed a basic agreement with the objective of jointly developing a next-generation standard for professional-use optical discs, with the objective of expanding their archive business for long-term digital data storage. Both companies aim to improve their development efficiency based on the technologies held by each respective company, and will target the development of an optical disc with recording capacity of at least 300GB by the end of 2015. Going forward, Sony and Panasonic will continue to hold discussions regarding the specifications and other items relating to the development of this new standard.
These sort of things rarely filter down to the consumer level, but rather tend to fulfill high end data archiving such as might be needed for handling data for a movie.
One possibility, however, is that by the time the standard is complete there could be a demand for something like this for 4K movies.