Backblaze on Data Reliability/Durability with Its Cloud Storage Service

Backblaze recently published an in-depth look at how durable/reliable data that is stored with its service is–i.e., what are the odds that you’ll want to retrieve a specific set of data from the service and find out that won’t be able to?

At the end of the day, the technical answer is “11 nines.” That’s 99.999999999%. Conceptually, if you store 1 million objects in B2 for 10 million years, you would expect to lose 1 file. There’s a higher likelihood of an asteroid destroying Earth within a million years, but that is something we’ll get to at the end of the post.

. . .

When you send us a file or object, it is actually broken up into 20 pieces (“shards”). The shards overlap so that the original file can be reconstructed from any combination of any 17 of the original 20 pieces. We then store those pieces on different drives that sit in different physical places (we call those 20 drives a “tome”) to minimize the possibility of data loss. When one drive fails, we have processes in place to “rebuild” the data for that drive. So, to lose a file, we have to have four drives fail before we had a chance to rebuild the first one.



The analysis then goes on to present a lot of math related to the time it takes for Backblaze to rebuild any data lost and its overall drive failure rate, but the general thrust is that it is extremely unlikely that Backblaze would ever suffer data loss from normal technical failures.

But at some point, we all start sounding like the guitar player for Spinal Tap. Yes, our nines go to 11. Where is that point? That’s open for debate. But somewhere around the 8th nine we start moving from practical to purely academic. Why? Because at these probability levels, it’s far more likely that:

  • An armed conflict takes out data center(s).
  • Earthquakes / floods / pests / or other events known as “Acts of God” destroy multiple data centers.
  • There’s a prolonged billing problem and your account data is deleted.

There is one thing of interest in the odd way Backblaze concludes its analysis, however,

Eleven years in and counting, with over 600 petabytes of data stored from customers across 160 countries, and well over 30 billion files restored, we confidently state that our system has scaled successfully and is reliable. The numbers bear it out and the experiences of our customers prove it.

Note that this doesn’t say that they’ve never come across a file they were unable to restore due to technical, backend reasons (rather than issues related to customer credit cards, etc.)

Thoughts on Using Backblaze After A Month

Back in early March I decided to look into off-site backup of my data drives using either Crashplan or Backblaze. For the most part I’ve ignored online backup services mainly because of the large volume of data I currently maintain/backup for personal use, which is currently approaching 60 terabytes. Along with storage costs, the sheer amount of time to upload that amount of data is ridiculous and so I hadn’t really given much thought to online backups.

Someone I know (with a lot less data) was using Crashplan, however, and I figured for the low monthly cost it wouldn’t hurt to at least check it out. I did not like Crashplan. Not one bit. Pretty much everything about Crashplan was confusing, from its terms of use all the way up to its uploading client. I did pay for an initial one month subscription, but after about a week realized Crashplan simply would never work for my needs and canceled.

So I decided to give Backblaze a try. There are some things I do not like about Backblaze, but overall I have been very pleased with it in the intervening month and felt good enough about it to pay for a year’s subscription.

To get things started, I hooked up a nearly full Seagate 8 terabyte hard drive to my main computer using an external dock. I already have that hard drive backed up locally, so I’m only relying on Backblaze as an option in case both the original and all backup copies of the drive should fail.

Don’t Rely on Backblaze for Your Only Backup

A lot of horror stories I read online from users of both Crashplan and Backblaze made it clear that they were using these services as their only method of backup. In several cases, users got burned when they backed up their data to either service prior to reformatting or destroying a hard drive, only to find that their data was unavailable or unrecoverable (or only recoverable after extraordinary measures were taken).

This, in a word, is crazy. For $50/year I wouldn’t use these sorts of services as anything but as a backup of last resort. On the one hand, I’d put the odds of actually being able to recover my data from Backblaze if needed at 50/50. On the other hand, it’s only $50/year–it’s like the extra disability insurance I pay through my workplace that I have never bothered to actually track down the details about. Maybe it will help, maybe it won’t, but it’s so cheap that it’s not worth not carrying.

If you I do need to retrieve the data, however, it is reassuring that Backblaze will let me pay them to copy my data to a hard drive and then ship that hard drive to me, whereas with Crashplan my only option would be to download the data (and there were plenty of reports of that not working so well.)

Uploading Terabytes of Data

The second problem that a lot of users reported was the long length of time it took to upload large volumes of data. In some cases this was just users not understanding how the technology works. No, Mr. Clueless, you’re not going to be able to upload 1 terabyte of data to an offline service over a weekend on a DSL modem. That just isn’t going to happen.

But other users complained of slowness in general. My experience was that Crashplan was slow as hell–significantly slower than Backblaze. I’m on a cable service that has 60mbs down and 7mbs up (and no bandwidth cap). With Backblaze I was able to upload a little over 1 terabyte in the first month, which was very reasonable from my experience. This is where you really start to notice the ridiculously slow Internet speeds that most of us in the United States have to endure, but that’s a much bigger problem and obviously nothing Backblaze can do anything about.

Encryption . . . Sort Of

An absolute necessity for me was being able to encrypt my data independently of either Backblaze or Crashplan. Both services allowed me to use a private encryption phrase so that no one but me, in theory, would be able to unencrypt my data. However–there’s always some sort of “however”–the way these services handle restoring data is that you would need to supply the private key to Backblaze, for example, which would use it to decrypt the files and then make them available to you,

However, if you lose a file, you have to sign into the Backblaze website and provide your passphrase which is ONLY STORED IN RAM for a few seconds and your file is decrypted. Yes, you are now in a “vulnerable state” until you download then “delete” the restore at which point you are back to a secure state.
If you are even more worried about the privacy of your data, we highly recommend you encrypt it EVEN BEFORE BACKBLAZE READS IT on your laptop! Use TrueCrypt. Backblaze backs up the TrueCrypt encrypted bundle having no idea at all what is in it (thank goodness) and you restore the TrueCrypted bundle to yourself later.

Ugh. It would be much better to simply ship me an encrypted blob along with a utility to unencrypt the data locally. This process completely misses the point of why users want a private encryption key. (Crashplan appears to use the same sort of process of decrypting in the cloud and then downloading the unencrypted file over SSL to your hard drive). All you’re really doing, then, is limiting the window of time that Backblaze employees (and anyone who has infiltrated their network) have access to your unencrypted data.

Summary

As I said before, I would never rely on this sort of service as anything but a last resort. Losing all of my data and having to wonder if I really want to trust Backblaze even temporarily with an unencrypted copy of my data is still better than simply losing all of my data with no other options (for $50/year, that is. If it cost, say $200/year, I might have a different view). For me, using Backblaze was a no-brainer given the range of available backup options and costs.