Glacier Deep Archive

Back in 2013, Amazon announced its Amazon Glacier storage solution–cloud-based storage that was cheap, but designed for data that would need to be accessed very infrequently.

But even Glacier is expensive for some purposes. For example, I’ve got about 100 terabytes I need to back up, and even at Glacier’s low cost of $4-5/terabyte/month, that would still be ~$500/month. At that price, I might be better off buying a tape drive.

Now, Amazon has announced its Glacier Deep Archive storage solution that is designed to go after use cases like this. At a little over $1/terabyte/month, the costs of storing 100 terabytes in the cloud approaching the cost of tape backup.

There are a few caveats, however. First, it appears that the data stored in Glacier Deep Archive cannot be deleted. I assume that’s Amazon reducing costs by simply not making that feature available.

Second, as with the regular Glacier storage solution, getting data back out of Glacier Deep Archive is likely to be slow and more expensive than storing it. Standard retrieval for data in Glacier is around $12/terabyte. If you need faster retrieval, you can do so by paying more.

I do plan to look closely at Glacier Deep Archive and will likely use it as a sort of backup of last resort. I already have a backup system and process, but $100/month for the volume of data I have is very reasonable for a “if everything else gets screwed up” peace of mind.

Thoughts on Using Backblaze After A Month

Back in early March I decided to look into off-site backup of my data drives using either Crashplan or Backblaze. For the most part I’ve ignored online backup services mainly because of the large volume of data I currently maintain/backup for personal use, which is currently approaching 60 terabytes. Along with storage costs, the sheer amount of time to upload that amount of data is ridiculous and so I hadn’t really given much thought to online backups.

Someone I know (with a lot less data) was using Crashplan, however, and I figured for the low monthly cost it wouldn’t hurt to at least check it out. I did not like Crashplan. Not one bit. Pretty much everything about Crashplan was confusing, from its terms of use all the way up to its uploading client. I did pay for an initial one month subscription, but after about a week realized Crashplan simply would never work for my needs and canceled.

So I decided to give Backblaze a try. There are some things I do not like about Backblaze, but overall I have been very pleased with it in the intervening month and felt good enough about it to pay for a year’s subscription.

To get things started, I hooked up a nearly full Seagate 8 terabyte hard drive to my main computer using an external dock. I already have that hard drive backed up locally, so I’m only relying on Backblaze as an option in case both the original and all backup copies of the drive should fail.

Don’t Rely on Backblaze for Your Only Backup

A lot of horror stories I read online from users of both Crashplan and Backblaze made it clear that they were using these services as their only method of backup. In several cases, users got burned when they backed up their data to either service prior to reformatting or destroying a hard drive, only to find that their data was unavailable or unrecoverable (or only recoverable after extraordinary measures were taken).

This, in a word, is crazy. For $50/year I wouldn’t use these sorts of services as anything but as a backup of last resort. On the one hand, I’d put the odds of actually being able to recover my data from Backblaze if needed at 50/50. On the other hand, it’s only $50/year–it’s like the extra disability insurance I pay through my workplace that I have never bothered to actually track down the details about. Maybe it will help, maybe it won’t, but it’s so cheap that it’s not worth not carrying.

If you I do need to retrieve the data, however, it is reassuring that Backblaze will let me pay them to copy my data to a hard drive and then ship that hard drive to me, whereas with Crashplan my only option would be to download the data (and there were plenty of reports of that not working so well.)

Uploading Terabytes of Data

The second problem that a lot of users reported was the long length of time it took to upload large volumes of data. In some cases this was just users not understanding how the technology works. No, Mr. Clueless, you’re not going to be able to upload 1 terabyte of data to an offline service over a weekend on a DSL modem. That just isn’t going to happen.

But other users complained of slowness in general. My experience was that Crashplan was slow as hell–significantly slower than Backblaze. I’m on a cable service that has 60mbs down and 7mbs up (and no bandwidth cap). With Backblaze I was able to upload a little over 1 terabyte in the first month, which was very reasonable from my experience. This is where you really start to notice the ridiculously slow Internet speeds that most of us in the United States have to endure, but that’s a much bigger problem and obviously nothing Backblaze can do anything about.

Encryption . . . Sort Of

An absolute necessity for me was being able to encrypt my data independently of either Backblaze or Crashplan. Both services allowed me to use a private encryption phrase so that no one but me, in theory, would be able to unencrypt my data. However–there’s always some sort of “however”–the way these services handle restoring data is that you would need to supply the private key to Backblaze, for example, which would use it to decrypt the files and then make them available to you,

However, if you lose a file, you have to sign into the Backblaze website and provide your passphrase which is ONLY STORED IN RAM for a few seconds and your file is decrypted. Yes, you are now in a “vulnerable state” until you download then “delete” the restore at which point you are back to a secure state.
If you are even more worried about the privacy of your data, we highly recommend you encrypt it EVEN BEFORE BACKBLAZE READS IT on your laptop! Use TrueCrypt. Backblaze backs up the TrueCrypt encrypted bundle having no idea at all what is in it (thank goodness) and you restore the TrueCrypted bundle to yourself later.

Ugh. It would be much better to simply ship me an encrypted blob along with a utility to unencrypt the data locally. This process completely misses the point of why users want a private encryption key. (Crashplan appears to use the same sort of process of decrypting in the cloud and then downloading the unencrypted file over SSL to your hard drive). All you’re really doing, then, is limiting the window of time that Backblaze employees (and anyone who has infiltrated their network) have access to your unencrypted data.

Summary

As I said before, I would never rely on this sort of service as anything but a last resort. Losing all of my data and having to wonder if I really want to trust Backblaze even temporarily with an unencrypted copy of my data is still better than simply losing all of my data with no other options (for $50/year, that is. If it cost, say $200/year, I might have a different view). For me, using Backblaze was a no-brainer given the range of available backup options and costs.

SMS Backup+ Android App

Recently the Android app I had been using to backup my SMS/MMS messages became unsupported, so I looked around for an alternative.

At the moment, it looks like SMS Backup+ is the best tool for the job.

It will backup SMS, MMS attachments, and the call log to Gmail and assign a label to them. It also has a range of other settings, like syncing the call log with Google Calendar.

And, on top of that, the app is free and ad-free (I did send a $5 donation as the app was easily worth that much to me).

Amazon Glacier

Amazon Glacier is the cheaper, slower cousin to Amazon’s S3 storage. Whereas S3 currently costs US$0.095 per gigabyte per month, Glacier is a mere US$0.01 per gigabyte per month.

The tradeoff for the lower cost is that Glacier is effectively offline storage. If you want to download the data you have stored, you have to request that Glacier retrieve the data and make it available for download, and fulfilling that requests “typically” takes 3-5 hours according to Amazon.

Since the expectation is that Glacier data will only be accessed infrequently, there is also a US$0.12 per gigabyte charge to download more than a nominal 1 gigabyte per month.

So, storing 1 terabyte of data with Glacier will cost you roughly $10/month, but if you ever want to download it all in a month, that would run you $120.

Where something like Glacier shines is in long-term backups. For example, I have a 3 terabyte drive that stores all of my personal data. I have a couple of extra hard disks that I use to create local backups and store at various locations.

I used to use Amazon’s S3 as an online backup repository, but as I got closer to having 1 terabyte stored there, the cost became prohibitive and I ended up deleting it. But using something like Glacier, I could store 3 terabytes online for $30/month. The limitations on accessing the data really don’t concern me, since what I’m looking for is an offsite repository to store my data in case I experience a catastrophic failure with my local backups.

There are just two challenges: uploading the data to Glacier and protecting it adequately.

I’m primarily a Windows users, and have had a lot of success with FastGlacier, a freeware Windows tool designed to make it easier to upload data to Glacier and keep Glacier and local data in sync.

Glacier has a number of complications that S3 does not, and a program like FastGlacier helps smooth out some of the rough edges for those of us who just want to get our data into Glacier.

Protecting that data is another matter. Amazon encrypts the data that is uploaded to Glacier, but it is encrypted in a way that Amazon itself can decrypt. So if Amazon were hacked, for example, there is the potential that the keys to unlocking any data stored on Glacier (or S3) could be compromised.

It is absolutely crucial that any data intended for long-term storage be encrypted client-side by the person doing the uploading. Again, since I am primarily a Windows user I use the open source Gpg4Win to encrypt all of my files before I upload them to Glacier. Gpg4Win adds a GpgEX option in the file manager’s context menu so that it is relatively easy to encrypt specific files or entire directories.

Backups, Backups, Backups

I knew someone who several years ago wrote a book … on his laptop … for a year … and never backed it up or retained a print copy. You can probably guess what happened next.

Almost as bad are these folks who relied on a cloud-based company to store backups of episodes of the children’s show they produced. One malicious employee later, and (per the Register),

CyberLynk had fired an employee called Michael Scott Jewson and, according to a Honolulu courthouse news report, one month after being given the boot, Jewson accessed CyberLynk servers and wiped out 304GB of data, including 14 Zodiac Island episodes, a full season of the show.

The Zodiac Island producers were based in Hawaii, and Cyberlynk in Wisconsin. A cloud-based service is probably a very good solution for a television production team to share assets among disperse groups all working on a television show, but as a primary backup as well? Seriously?

Especially considering the small size of the dataset involved. Local backup of 304gb would have been dirt cheap. Having a cloud-based backup for convenience or as an alternative in case of a local disaster is a good idea, but I can’t see ever giving up local backups entirely unless the dataset is too large to do so meaningfully (if they were dealing with 100s of terabytes, then maybe I’d understand why they weren’t doing local backups as well, but 304gb…puhleeze).