Glacier Deep Archive

Back in 2013, Amazon announced its Amazon Glacier storage solution–cloud-based storage that was cheap, but designed for data that would need to be accessed very infrequently.

But even Glacier is expensive for some purposes. For example, I’ve got about 100 terabytes I need to back up, and even at Glacier’s low cost of $4-5/terabyte/month, that would still be ~$500/month. At that price, I might be better off buying a tape drive.

Now, Amazon has announced its Glacier Deep Archive storage solution that is designed to go after use cases like this. At a little over $1/terabyte/month, the costs of storing 100 terabytes in the cloud approaching the cost of tape backup.

There are a few caveats, however. First, it appears that the data stored in Glacier Deep Archive cannot be deleted. I assume that’s Amazon reducing costs by simply not making that feature available.

Second, as with the regular Glacier storage solution, getting data back out of Glacier Deep Archive is likely to be slow and more expensive than storing it. Standard retrieval for data in Glacier is around $12/terabyte. If you need faster retrieval, you can do so by paying more.

I do plan to look closely at Glacier Deep Archive and will likely use it as a sort of backup of last resort. I already have a backup system and process, but $100/month for the volume of data I have is very reasonable for a “if everything else gets screwed up” peace of mind.

Consumer SSD Prices and Sizes

It’s interesting to see how quickly the per/gigabyte price for SSDs continues to fall as companies begin introducing bigger and cheaper models.

Back in February 2018, I bought a couple 2TB SSDs for some new laptops for about $500/each. Today, ten months later, those SSDs can be had on Amazon for $290, a 42 percent price drop in less than a year.

Meanwhile, Samsung recently announced consumer level QLC SSDs in 1TB/2TB/4TB capacities that will initially retail for $149.99, $299.99, and $599.99 respectively.

Aside from the relatively low prices, one of the interesting things about the QLC drives is their write endurance,

The 860 QVO, from the box, is given a write endurace rating equivalent to 0.3 Drive Writes Per Day (DWPD), which even for the 1TB means 300GB a day, every day, which goes above and beyond most consumer workloads. 

Better drives, larger capacities and cheaper storage prices. What’s not to love?

Backblaze on Data Reliability/Durability with Its Cloud Storage Service

Backblaze recently published an in-depth look at how durable/reliable data that is stored with its service is–i.e., what are the odds that you’ll want to retrieve a specific set of data from the service and find out that won’t be able to?

At the end of the day, the technical answer is “11 nines.” That’s 99.999999999%. Conceptually, if you store 1 million objects in B2 for 10 million years, you would expect to lose 1 file. There’s a higher likelihood of an asteroid destroying Earth within a million years, but that is something we’ll get to at the end of the post.

. . .

When you send us a file or object, it is actually broken up into 20 pieces (“shards”). The shards overlap so that the original file can be reconstructed from any combination of any 17 of the original 20 pieces. We then store those pieces on different drives that sit in different physical places (we call those 20 drives a “tome”) to minimize the possibility of data loss. When one drive fails, we have processes in place to “rebuild” the data for that drive. So, to lose a file, we have to have four drives fail before we had a chance to rebuild the first one.



The analysis then goes on to present a lot of math related to the time it takes for Backblaze to rebuild any data lost and its overall drive failure rate, but the general thrust is that it is extremely unlikely that Backblaze would ever suffer data loss from normal technical failures.

But at some point, we all start sounding like the guitar player for Spinal Tap. Yes, our nines go to 11. Where is that point? That’s open for debate. But somewhere around the 8th nine we start moving from practical to purely academic. Why? Because at these probability levels, it’s far more likely that:

  • An armed conflict takes out data center(s).
  • Earthquakes / floods / pests / or other events known as “Acts of God” destroy multiple data centers.
  • There’s a prolonged billing problem and your account data is deleted.

There is one thing of interest in the odd way Backblaze concludes its analysis, however,

Eleven years in and counting, with over 600 petabytes of data stored from customers across 160 countries, and well over 30 billion files restored, we confidently state that our system has scaled successfully and is reliable. The numbers bear it out and the experiences of our customers prove it.

Note that this doesn’t say that they’ve never come across a file they were unable to restore due to technical, backend reasons (rather than issues related to customer credit cards, etc.)

Thoughts on Using Backblaze After A Month

Back in early March I decided to look into off-site backup of my data drives using either Crashplan or Backblaze. For the most part I’ve ignored online backup services mainly because of the large volume of data I currently maintain/backup for personal use, which is currently approaching 60 terabytes. Along with storage costs, the sheer amount of time to upload that amount of data is ridiculous and so I hadn’t really given much thought to online backups.

Someone I know (with a lot less data) was using Crashplan, however, and I figured for the low monthly cost it wouldn’t hurt to at least check it out. I did not like Crashplan. Not one bit. Pretty much everything about Crashplan was confusing, from its terms of use all the way up to its uploading client. I did pay for an initial one month subscription, but after about a week realized Crashplan simply would never work for my needs and canceled.

So I decided to give Backblaze a try. There are some things I do not like about Backblaze, but overall I have been very pleased with it in the intervening month and felt good enough about it to pay for a year’s subscription.

To get things started, I hooked up a nearly full Seagate 8 terabyte hard drive to my main computer using an external dock. I already have that hard drive backed up locally, so I’m only relying on Backblaze as an option in case both the original and all backup copies of the drive should fail.

Don’t Rely on Backblaze for Your Only Backup

A lot of horror stories I read online from users of both Crashplan and Backblaze made it clear that they were using these services as their only method of backup. In several cases, users got burned when they backed up their data to either service prior to reformatting or destroying a hard drive, only to find that their data was unavailable or unrecoverable (or only recoverable after extraordinary measures were taken).

This, in a word, is crazy. For $50/year I wouldn’t use these sorts of services as anything but as a backup of last resort. On the one hand, I’d put the odds of actually being able to recover my data from Backblaze if needed at 50/50. On the other hand, it’s only $50/year–it’s like the extra disability insurance I pay through my workplace that I have never bothered to actually track down the details about. Maybe it will help, maybe it won’t, but it’s so cheap that it’s not worth not carrying.

If you I do need to retrieve the data, however, it is reassuring that Backblaze will let me pay them to copy my data to a hard drive and then ship that hard drive to me, whereas with Crashplan my only option would be to download the data (and there were plenty of reports of that not working so well.)

Uploading Terabytes of Data

The second problem that a lot of users reported was the long length of time it took to upload large volumes of data. In some cases this was just users not understanding how the technology works. No, Mr. Clueless, you’re not going to be able to upload 1 terabyte of data to an offline service over a weekend on a DSL modem. That just isn’t going to happen.

But other users complained of slowness in general. My experience was that Crashplan was slow as hell–significantly slower than Backblaze. I’m on a cable service that has 60mbs down and 7mbs up (and no bandwidth cap). With Backblaze I was able to upload a little over 1 terabyte in the first month, which was very reasonable from my experience. This is where you really start to notice the ridiculously slow Internet speeds that most of us in the United States have to endure, but that’s a much bigger problem and obviously nothing Backblaze can do anything about.

Encryption . . . Sort Of

An absolute necessity for me was being able to encrypt my data independently of either Backblaze or Crashplan. Both services allowed me to use a private encryption phrase so that no one but me, in theory, would be able to unencrypt my data. However–there’s always some sort of “however”–the way these services handle restoring data is that you would need to supply the private key to Backblaze, for example, which would use it to decrypt the files and then make them available to you,

However, if you lose a file, you have to sign into the Backblaze website and provide your passphrase which is ONLY STORED IN RAM for a few seconds and your file is decrypted. Yes, you are now in a “vulnerable state” until you download then “delete” the restore at which point you are back to a secure state.
If you are even more worried about the privacy of your data, we highly recommend you encrypt it EVEN BEFORE BACKBLAZE READS IT on your laptop! Use TrueCrypt. Backblaze backs up the TrueCrypt encrypted bundle having no idea at all what is in it (thank goodness) and you restore the TrueCrypted bundle to yourself later.

Ugh. It would be much better to simply ship me an encrypted blob along with a utility to unencrypt the data locally. This process completely misses the point of why users want a private encryption key. (Crashplan appears to use the same sort of process of decrypting in the cloud and then downloading the unencrypted file over SSL to your hard drive). All you’re really doing, then, is limiting the window of time that Backblaze employees (and anyone who has infiltrated their network) have access to your unencrypted data.

Summary

As I said before, I would never rely on this sort of service as anything but a last resort. Losing all of my data and having to wonder if I really want to trust Backblaze even temporarily with an unencrypted copy of my data is still better than simply losing all of my data with no other options (for $50/year, that is. If it cost, say $200/year, I might have a different view). For me, using Backblaze was a no-brainer given the range of available backup options and costs.

Storing Hundreds of Terabytes of Data for Billions of Years

The University of Southampton issued a press release recently highlighting progress that scientists there have made on creating digital storage methods that could potentially survive for billions of years.

Using nanostructured glass, scientists from the University’s Optoelectronics Research Centre (ORC) have developed the recording and retrieval processes of five dimensional (5D) digital data by femtosecond laser writing.

The storage allows unprecedented properties including 360 TB/disc data capacity, thermal stability up to 1,000°C and virtually unlimited lifetime at room temperature (13.8 billion years at 190°C ) opening a new era of eternal data archiving. As a very stable and safe form of portable memory, the technology could be highly useful for organisations with big archives, such as national archives, museums and libraries, to preserve their information and records.

The Optoelectronics Research Centre has posted a short video on YouTube showing data being written to such a glass disc using a femtosecond laser writing system.

 

A few years ago, Hitachi was supposedly working on a glass-based data storage system that also etched data onto glass with a laser, although at much lower densities than the Southampton researchers are aiming for,

The company’s main research lab has developed a way to etch digital patterns into robust quartz glass with a laser at a data density that is better than compact discs, then read it using an optical microscope. The data is etched at four different layers in the glass using different focal points of the laser.

. . .

Hitachi said the new technology will be suitable for storing “historically important items such as cultural artifacts and public documents, as well as data that individuals want to leave for posterity.”

. . .

Hitachi has succeeded at storing data 40MB per square inch, above the record for CDs, which is 35MB.

Hitachi has mentioned it’s glass-based research several times since that 2012 announcement, but as far as I know has not shipped anything (probably due to the relatively low data density). In 2014, Hitach announced it had developed a system that could reliably read/write to a 100-layer glass disc.

These glass-based systems remind me of science fiction writer Charles Stross’s idea of using synthetic diamond to store immense amounts of data,

My model of a long term high volume data storage medium is a synthetic diamond. Carbon occurs in a variety of isotopes, and the commonest stable ones are carbon-12 and carbon-13, occurring in roughly equal abundance. We can speculate that if molecular nanotechnology as described by, among others, Eric Drexler, is possible, we can build a device that will create a diamond, one layer at a time, atom by atom, by stacking individual atoms — and with enough discrimination to stack carbon-12 and carbon-13, we’ve got a tool for writing memory diamond. Memory diamond is quite simple: at any given position in the rigid carbon lattice, a carbon-12 followed by a carbon-13 means zero, and a carbon-13 followed by a carbon-12 means one. To rewrite a zero to a one, you swap the positions of the two atoms, and vice versa.

It’s hard, it’s very stable, and it’s very dense. How much data does it store, in practical terms?

The capacity of memory diamond storage is of the order of Avogadro’s number of bits per two molar weights. For diamond, that works out at 6.022 x 1023 bits per 25 grams. So going back to my earlier figure for the combined lifelog data streams of everyone in Germany — twenty five grams of memory diamond would store six years’ worth of data.

Six hundred grams of this material would be enough to store lifelogs for everyone on the planet (at an average population of, say, eight billion people) for a year. Sixty kilograms can store a lifelog for the entire human species for a century.

In more familiar terms: by the best estimate I can track down, in 2003 we as a species recorded 2500 petabytes — 2.5 x 1018 bytes — of data. That’s almost ten milligrams. The Google cluster, as of mid-2006, was estimated to have 4 petabytes of RAM. In memory diamond, you’d need a microscope to see it.

So, it’s reasonable to conclude that we’re not going to run out of storage any time soon.

Faster, please.