We Don’t Need to Keep the Web Relevant (But Archiving It Is A Good Idea)

Cyveillance released a report yesterday
claiming that its study of the Internet suggests there are currently a little
more than 2 billion public web pages, and that the number of web pages is currently
growing at a rate of 2 billion pages per year (i.e., the size of the web will
double this year).

Those are some pretty amazing numbers. To put that in perspective, a web researcher
quoted in USA Today says that is about the same amount of text that
is available in the entire Library of Congress collection.

But USA Today could not help but get a dig in at the web in its story,
tracking down consultant Tim Bajarin of Creative Strategies who said that the
problem is keeping the web relevant, “Creating a web page now is piece of cake.
Even my dog has a Web page. A lot of people’s dogs have web pages.”

Apparently the conclusion that most people are supposed to reach is that a
web page featuring a dog is simply not relevant — more noise than signal. But
this makes an erroneous assumption — that information is only useful if it
is homogenous and intended for consumption by millions of readers.

Consider books. USA Today pretty much turned its entertainment section
into an advertorial for the latest Harry Potter book the past few weeks. Now
I love the Harry Potter series, but the reason USA Today devoted so
much attention to the book was not because of any particular value the book
has (lots of very good children’s literature is published every year), but because
millions of people are interested in the Harry Potter book.

It is extremely unlikely that USA Today will ever run a book review,
regardless of how good the book is, of a small self-published effort with a
print run of 2,500 copies. To USA Today‘s publishers, and most of its
readers, such a review would not have much relevance.

The web is different because the low cost of providing information makes assembling
and publishing information aimed at heterogeneous audiences cheap and easy.
You might not care at all, for example, that there are hundreds of web sites
listed in Yahoo! devoted to paint ball, or that I have hundreds of pictures
of my daughter on my web site. But to the people who are interested in paint
ball or my daughter, such information is very important.

So, I say, bring on the dog home pages.

On the other hand, it is good to know that someone is trying to collect all
(or as much as possible) of the information on the Internet. The
Internet Archive has collected 1 billion pages in its archive since 1996.
The archive is stored on magnetic tape and currently occupies 13.2 terabytes
of data, and is growing at a size of 2 terabytes a month as of March 2000.

Lets do the math. A terabyte is equal to 1,024 gigabytes. The largest consumer-level
hard drives available at the moment are in the 75 gigabyte range, so if I wanted
my own local copy of the archive (and who wouldn’t?) it would take 177 of these
75 gig hard drives. In other words, all I have to do is wait 5 or 6 years and
I will be able to install a local copy on my LAN. Now that would be cool.

Leave a Reply Cancel reply