Why Do Websites like Boing! Boing! Collect So Much Data?

This exchange between Greg Yardley of Pinch Media and Joel Johnson of Boing! Boing! highlighted a fundamental hypocrisy about data collection and really begs the question of why so many websites think they need to collect so much data about visitors while really making this hard to suss out for normal users.

Yardley is co-founder of Pinch Media which makes spyware that is then baked into iPhone apps. When you use the iPhone app, the app gathers and transmits information about you back to Pinch Media. Johnson highlighted this, but Yardley responded that what company does is no different than what Boing! Boing! does,

Here’s what Boing Boing is running right now, right when I loaded this page:

Google Analytics
Quantcast
Federated Media
HitTail
Doubleclick
Google Custom Search Engine
Tribal Fusion
Six Apart Advertising
Adify
Chitika
AWStats

That’s no fewer than eleven different services that started tracking information about me without my consent. Most (not all) of these services track users across every domain where their code is placed, constructing a profile that’s then used for ad targeting. Some of these services go out of their way to circumvent user attempts to safeguard their privacy. A couple, for instance, store information in the much lesser-known – and rarely deleted – Local Shared Objects that come along with Flash, and have been known to use this information to ‘recreate’ user cookies after they’ve specifically been deleted. A couple more combine the information they’ve gathered about you here with information they’ve pulled in from social networks (where you’re also tracked) to work up a complete demographic profile for targeting. Some of these probably don’t even have a direct relationship with Boing Boing, but are served by other ad networks doing backfill – you could get a different set of trackers, potentially even more invasive, the next time you reload the page.

I didn’t consent to any of the tracking Boing Boing does – there’s no terms of service or privacy policy that pops up on first entry. Even if there *was*, by the time I got here, it’d be too late. If we went by the first commenter’s standards, Boing Boing’s running eleven different pieces of spyware.

The weird thing is that Johnson’s response is extremely weak (emphasis added),

And as far as Boing Boing‘s tracking and analytics goes, I can’t really argue against his general point. It’s useful for me as a writer and small businessman to have some basic stats (tracking pageviews to understand what sort of articles readers find compelling, for instance), and I think most people understand that a baseline of metrics is par for the course on commercial sites, but I hate the amount of tracking the comes out of the ad networks, too, and it only seems to be getting worse. There’s rarely more perfidious Javascript than that coded by an ad network programmer.

First, I think he’s totally wrong about the bolded part. Most people don’t have a clue just how much data the typical website is gathering about them. If you started talking to them about “baseline metrics” as Johnson does their eyes would glaze over.

But even assume that is true, so what? Saying it is useful and most people have come to expect it seem like the sort of weasel words we’d see from any industry trying to cover its ass.

Johnson continues and here’s where he really goes off the rails,

But there’s one difference between web-based tracking and the sort of analytics that Pinch Media gathers on the iPhone: it’s pretty simple to figure out what stats tracking occurs between a web site and a browser on a computer, as Yardley shows; it’s much more difficult to discern—or even be aware of—tracking that occurs in a closed system like the iPhone. And it’s not FUD to point it out so users can make their own decision.

That is a complete crock of shit. It is, in fact, extremely difficult for most people to figure out what is going on when they visit a website. I know pretty much what Boing! Boing! is doing in the background because I run Adblock and NoScript and can quickly look at all of the stuff Yardley points out.

The secretary down the hall has no clue. Moreover, my experience has been that once you show people and they understand, rather than being empowered they are resigned to going along with the system because they have little choice to do otherwise.

I can quickly right click on the NoScript button and enable the Flash movie that I want to see but that it blocked. The secretary has better things to do than spend all of her time trying to guess which script on the page is serving up necessary content and which is going to rat her out to some other server.

And before anyone beats me to it, I do run two services here — Google Ads and WordPress.com stats. Google Ads because I’m a greedy bastard, and WordPress.com stats because I wanted a basic stat tracking without the overkill that is Google Analytics. I’m not prepared to defend either one as motivated by anything other than crass self-interest.

Facebook Is Evil, But People Are Stupid

The flare up over Facebook’s recent terms-of-service change has been interesting, but even some of the people protesting the change on Facebook seem clueless about what they’re actually doing when they post things in Facebook.

The big change that people apparently got worked up about was Facebook granting itself a license in perpetuity to content that users posted on the site. On that point I can actually see where Facebook is coming from. When I delete my account at Facebook, I really want to delete all of my activity there. But that’s going to create all sorts of problems over the long term.

For example, I’ve got a few moderately successful Pages at Facebook that are devoted to different topics and individuals. Are those Pages my property or Facebook’s? If I want to delete my account, should this also delete the public page I created devoted to Negativeland? I’m not so sure.

Regardless, though, I suspect a lot of users  completely misunderstand where the real threat is — other users (i.e., all of those “friends” they’re collecting and showing content to).

NBC, for example, featured a brief soundbite from someone it billed as an organizer as one of the anti-TOS groups on Facebook. His argument against the new TOS was that, well, what if he wants to run for President a decade from now and Facebook has all of his posts and pictures and controls those rather than him.

If you’re running for President a decade from now, of course that stuff is going to show up on the nightly news. But it’s not Facebook you need to worry about, it’s your friends who are saving copies of that crazy picture of you stoned out of your mind at a party that you have to worry about (ask Michael Phelps). As we’ve learned repeatedly in the digital age, once something gets posted in a public or semi-public area on the Internet, it is very rare that it ever completely disappears.

I’m always amused, for example, by the people who screw up and send out offensive e-mails accidentally. Most corporate mail systems have the capability to let you “revoke” such e-mails, and most people seem to think that’s actually effective. Except there are so many different methods and clients to access e-mail that I suspect it is very rare that an e-mail is actually disappeared in this manner (I know I have a number of e-mails in my archive where the sender clearly hit the “revoke” button after thinking twice after hitting “send” ).

The problem with Facebook is that it gives a false sense of intimacy — we have “Friends” who update their “Status” — and its privacy settings require advanced understanding of Boolean logic to use adequately. Facebook wants users to think they’re at a small, friendly get-together, when in actuality Facebook is more like 55 million individual reality TV shows.

Keeping a WordPress Site Private, Part Two

WordPressBased on my previous post on the subject, there seem to be a lot of people interested in using WordPress to set up a completely private blog, whether for family members or private collaboration or whatever. Absolute Privacy is a WordPress plugin by John Kolbert that promises to help users administer a completely private  WordPress blog.

After having a few odd registrations and comments on our family blog, my wife asked me to create a plugin that would give the blog security from strangers but still be easily accessible to family and friends. Absolute Privacy does just that! Absolute Privacy turns your WordPress blog into a fully private site where you control who has access. It’s perfect for family blogs, private communities, and personal websites.

After activating the plugin your registrations are automatically protected. First, the plugin adds new fields to the registration menu which require the registrant to enter their first and last name, and to choose a password. Newly registered members are given a WordPress role created by the plugin called “Unapproved User” and although they are registered and have a password, they are unable to login until approved.

After registering, the newly registered user is sent an email reminding him/her that they will be unable to login until their account has been approved. The site administrator is sent an email with a link to quickly approve or delete the new user. Once the administrator approves the account the user is sent an email notification. To unapprove a user, simply edit their profile and change their “Role” to “Unapproved User”.

A new tab called “Moderate Users” is created in the “Users” area which allows you to quickly view and either approve or delete all unapproved users.

To prevent access to your site for non-logged in viewers, simply navigate to the Absolute Privacy tab under “Settings” and check the lock-down box. Your blog now is absolutely private, including RSS feeds.

Serious Google Calendar Encryption with GnuPGP

IBM’s Nathan Harrington has written an article outlining how to use the GnuPGP Firefox extension to create encrypted events within Google Calendar. This isn’t just accessing Google Calendar securely, but rather encrypting event details locally before passing that text on to Google Calendar. Anyone who compromises your Google account then would know the time of events, but would only see encrypted text for the actual event detail as in the example below,

That is frackin’ awesome. Now if there were only a GnuPGP plugin for my Blackberry calendar so I could sync the events meaningfully.

TrueCrypt Deniable File System Broken

The other day, Bruce Schneier had some post about securing data for border crossings and in the comments someone asked why not just use TrueCrypt’s deniable file system, which in TrueCrypt’s implementation hides an encrypted file system within a TrueCrypt encrypted volume. Schneier responded that he didn’t trust TrueCrypt’s deniable file system, and today he reveals why — he and several other researchers are publishing a paper announcing they were able to break that particular feature of TrueCrypt.

ABSTRACT: We examine the security requirements for creating a Deniable File System (DFS), and the efficacy with which the TrueCrypt disk-encryption software meets those requirements. We find that the Windows Vista operating system itself, Microsoft Word, and Google Desktop all compromise the deniability of a TrueCrypt DFS. While staged in the context of TrueCrypt, our research highlights several fundamental challenges to the creation and use of any DFS: even when the file system may be deniable in the pure, mathematical sense, we find that the environment surrounding that file system can undermine its deniability, as well as its contents. Finally, we suggest approaches for overcoming these challenges on modern operating systems like Windows.

TrueCrypt has apparently addressed many of the specific issues raised by the paper in their 6.0 release, but Schneier’s claim is that there are inherent problems to creating a deniable file system so even though the techniques outlined in the paper will not work against TrueCrypt 6.0, even the deniable file system there should be treated as untrusted. Better to go with whole disk encryption, which loses the deniability but is more secure.

The entire paper is avaialble as a PDF download.

Scroogle.Org

Google may not yet be evil, but it is certainly moving further and further down toward that end of the continuum with its extremely poor privacy practices in combination with the almost absurd amount of user data it appears to be logging and storing.

With that in mind, I suspect more services like Scroogle will arise to route around Google’s blase attitude toward user privacy.  Scroogle is basically a Google search proxy. Enter your search into Scroogle and it passes it on to Google using one of a small number of IP addresses, so yours is never logged. Scroogle then intercepts the cookie that Google returns and then displays just the actual search results.

Unlike Google which stores user identifiable information about the search for 18 months, Scroogle promises that a) it doesn’t store search terms at all and b) it only maintains logs for a maximum of 48 hours.

I noticed Daniel Brandt, who I’ve criticized in the past for his conspiratorial ways, is listed as one of the directors of the Scroogle effort. It’s nice to see him turn his anti-Google obsession to positive solutions.

Blizzard Announces a Physical Token for World of Warcraft Account Authentication

Theft of World of Warcraft accounts is a huge problem. The perception is that gold farmers are finding it much more lucrative to simply hack people’s accounts by tricking them into to installing keyloggers rather than actually use in-game bots to farm resources. There is an entire class of trojans now aimed largely at WoW players.

So Blizzard recently announced a forthcoming Authenticator product which looks to be a rebranded RSA SecurID. The device will costs $6.50 and asks the user to link the serial number of the device to the WoW account. From then on, when you want to log in you enter your username and password, then press a button on the Authenticator which generates a number that has to be entered as well. The number is essentially a rolling one time pad, and that specific number is only good for 30-60 seconds. So someone who manages to grab all three pieces of data has a very small window in which to gain access to your account.

As some have noted on WoW-related sites, this sort of scheme is still vulnerable to man-in-the-middle attacks. Think of this being used to authenticate login to a bank website. I put my server in between you and the bank. You think your data is going to the bank, but its really going to my server, then I’m passing it on to the bank, and then passing the bank’s response on to you. You never even know you’ve been hacked until I log in with your password and ID later and clean out everything.

Assuming that the Authenticator is ever owned by a large percentage of users — and I’m skeptical it will be — it will be interesting to see if the hackers turn to man-in-the-middle style attacks or simply turn their attention to an easier target.

Silly Brits and Their Child Database

This Guardian article highlights the odd way that Great Britain treats personal data. On the one hand, laws that govern what private companies can do with personal data are rather draconian compared to, say, the United States. On the other hand, there seems to be almost no effective limit to what European governments can compile.

In this case, the UK is creating a single database to track every child in the country.  The database will give each child a unique identifying number and contain everything from the name and address of the child’s physician to info on parents, what schools the child has attended, etc. And only about 300,000 people nationwide will have access to the database.

The Guardian article is about a report examining the security procedures for access to the database — a report that the government refused to publish in full. Of course the best method would be to simply not create such a monstrosity in the first place. As a review of the database’s security noted, “there will always be a risk of data security incidents occurring.” And because of the overreaching nature of the database, a security breach threatens the exposure of sensitive data about literally every child in the UK. Oy.

How Long Should ISPs Preserve Customer Records?

In September, U.S. Attorney General Alberto Gonzales told lawmakers that ISPs should be legally required to preserve customer records for perhaps as long as two years. Currently there are no federal laws governing customer record retention for ISPs.

Gonzales says that the customer records must be preserved for that length of time to assist the government in cracking down on child pornography.

The problem with this, of course, is that the result is records are preserved on millions of completely innocent people in order to help prosecute a relatively small number of cases that involve child pornography (in 2000-2001, according to the National Center for Missing & Exploited Children, a little over 1,700 people were arrested on child pornography-related charges).

And once that data is collected and preserved, it will inevitably be subpoenaed far and wide for everything from terrorism prosecutions to copyright infringement to anything else under the Sun.

As I’ve said before, ISPs should not preserve any sort of customer traffic records for any longer than they need for technical purposes — no more than a few days, at most. More importantly, ISPs and web services need to be more upfront and make more accessible just how long they do preserve such customer data and under what circumstances they will provide said data to law enforcement and other entities.

Sources:

Statement Of Alberto R. Gonzales Attorney General Of The United States Before The Committee On Banking, Housing, And Urban Affairs United States Senate Concerning “Combating Child Pornography By Eliminating Pornographers’ Access To The Financial Payment System”. September 19, 2006.

Child pornography fact sheet. National Center for Missing & Exploited Children, Accessed: September 30, 2006.

Gonzales Calls for ISP Customer Data Retention Law. RedmondMag.Com, September 19, 2006.