Will Amount of Data Worldwide Double Every 11 Hours?

IBM has an interesting report (PDF) claiming that by 2010 the amount of data in the world may double every 11 hours.

How is that possible? The thing is that every so often some new processor or hard drive is released and some idiot like John Dvorak comes along and says, “no one will ever really need a processor this fast or a hard drive that large.”

What the idiots miss is that ever larger hard drives and faster processors change what can be measured and captured. As the IBM paper notes, whereas once an aerospace company might have been happy with a simulation that generated megabytes of data, once it becomes affordable to do so the same company will quickly move to more sophisticated methods that generate terabytes of data.

And this effects even non-scientific areas according to the IBM report,

Typical of the data challenge facing the financial services industry is
the practice of quantitative analysis – mathematical modelling of how
a particular security, a complex trade, or an entire market will behave
in the future. A key input is the historic price of an asset, and it is not
uncommon to use 20 years’ worth of such information. Originally the
analysts looked at daily data sets – opening and closing prices plus daily
volumes – running to several gigabytes in size. Now they need to work
with the price and volume for each and every trade of a particular stock
over a number of years, and the data sets have reached the terabytes.

Another recent report claim that 11 percent of Americans have more than 10,000 digital photos. Digital photography is a great example of the way that technology moves the chains. I remember the days when I would visit my in-laws over Christmas and be happy to bring 10 rolls of 36-shot film for an entire week. This past Christmas holiday, I was taking that many pictures per day.

Once upon a time, I was amazed that I could buy a 250mb hard drive. Today, I have about 20 terabytes worth of personal data archived on DVDs in my basement, with that increasing about 1 terabyte a month.

The IBM report, however, veers off into considering this acceleration of data collection to be a negative thing, portraying organizations that are already drowning in excess data (and the difficult decision of which data to retain, especially in light of legal requirements for data retention). And, not surprisingly, for a price IBM has the solution in the form of what it calls “information lifecycle management” (lifecycles? Like Tron?)

That part seemed to be overblown and almost pure marketing bullshit. Yes, the amount of data collected is increasing exponentially, but so is the hardware and software solutions to deal with such increasingly large volumes of data. IBM almost entirely ignores the possibility that more refined and larger data sets combined with more sophisticated ways of looking at that data may result in process and other improvements that outweigh the collection and storage costs.

One thought on “Will Amount of Data Worldwide Double Every 11 Hours?”

Leave a Reply

Your email address will not be published. Required fields are marked *