Jason Kottke decides to start a meme and Cory Doctorow can’t help but jump on the bandwagon about the WhiteHouse.gov robots.txt file which went from 2,400 lines to 2. Of course there must be some nefarious purpose there or lesson about the closed nature of the Bush administration vs. the new open Obama administration.
Kottke tells us the difference represents “a small and nerdy measure of the huge change in the executive branch of the US government today” and Doctorow tags his post with CIVLIB just to let us know this is not just some technical issue.
Which, of course, it is. You can view the entire robots.txt file here. For every /directory/ on the Whitehouse.gov site, the Bush administration created a text-only /directory/text/ subdirectory. The robots.txt file tells Google not to index the text-only version so that the complete page remains canonical for Google. In fact, this is exactly what Google suggests doing for sites that have large amounts of duplicated content (on this site, for example, most pages have a print-only option and the robots.txt file instructs Google not to index any URLs that contain /print/).
I wonder if this sort of nonsense is what Teresa Nielsen Hayden meant by “dumb-and-resentful” political commentators.
TechCrunch writes about the increasingly small window companies have to make an impression with new product launches, citing the lousy launch of Google-wannabe Cuil.
The thing that boggles my mind about Cuil is still the three-column search results page. WTF? That just makes my eyes bleed and has me running back to Google to lay down my privacy in exchange for relevant results normal people can actually read.
Google may not yet be evil, but it is certainly moving further and further down toward that end of the continuum with its extremely poor privacy practices in combination with the almost absurd amount of user data it appears to be logging and storing.
With that in mind, I suspect more services like Scroogle will arise to route around Google’s blase attitude toward user privacy. Scroogle is basically a Google search proxy. Enter your search into Scroogle and it passes it on to Google using one of a small number of IP addresses, so yours is never logged. Scroogle then intercepts the cookie that Google returns and then displays just the actual search results.
Unlike Google which stores user identifiable information about the search for 18 months, Scroogle promises that a) it doesn’t store search terms at all and b) it only maintains logs for a maximum of 48 hours.
I noticed Daniel Brandt, who I’ve criticized in the past for his conspiratorial ways, is listed as one of the directors of the Scroogle effort. It’s nice to see him turn his anti-Google obsession to positive solutions.
Most people don’t seem to be aware of all of the different operators and other techniques that Google accepts that can be used to narrow down searches to help find just the results you’re looking for.
Mapelli’s Ultimate Google Search Guide is a fairly thorough, short outline of all of the different tricks and techniques you can use when doing Google searches.
In another lifetime — i.e. a few years ago — I used to run a small search engine. Rather than try to index every possible site in the world, this search engine used some pretty basic technology to index the then-handful of web sites on the Internet opposed to the animal rights agenda. It was fairly rudimentary, but it was also very cool just to search this particular slice of the web that was relevant to me and others following the animal rights movement.
I’ve always had a goal in the back of my head of bringing that back someday, but Rollyo.Com has created something much better — an easy-to-use system to create a front end to Yahoo!’s search engine to search a limited number of sites.
For example, here’s an anti-animal rights search engine front-end that I created using Rollyo. Search on any term there, and the results come back only from the 7 or 8 anti-AR sites I’ve added. The user can currently add up to 25 sites.
This is extremely useful and I’m surprised its taken so long for such an application to emerge. And as much as I like Rollyo.Com, I’m hoping that Google comes up with something like it as I find their results more useful than Yahoo!’s.
Rollyo.Com is currently in beta and some of the functionality isn’t complete — what I’d really like to be able to do is throw a search box on my web site that searches the entities I’ve created, but that’s still a “coming soon” feature.
Still, even in its infancy this is an awesome tool.
Dave Winer is apparently impressed by Daniel Brandt’s anti-Google rantings. But as this Salon.Com article documents, Brandt is a nutty conspiracy theorist (just go a few links deep at his NameBase.Org who is pissed off because *his* page about Donald Rumsfeld, and a whole host of other people, doesn’t show up very high in Google searches.
I particularly love the brief explanation Brandt offers of why Google’s PageRank sucks,
It’s democratic in the same way that capitalism is democratic. You could have the cure for cancer on the Web and not find it in Google because ‘important’ sites don’t link to it.
But, of course, if there were a cure for cancer posted on the web, then it is likely that lots of people would link to it, much like many scientists would end up citing a paper that outlined a successful cure for cancer.
What Brandt wants is for Google to be democratic in the same way that the Democratic Republic of North Korea is Democratic.
In fact, as Salon notes, Brandt believes that if you search on “Donald Rumsfeld” his page about Rumsfeld should be shown before Rumsfeld’s DoD biography page, even though it is largely useless and almost impossible to navigate (the main problem with NameBase is that it is an index of citations largely of the conspiracy literature which Brandt has personally read).
Update: A good example of one of Brandt’s nutty conspiracy theories his his speculation about China’s blocking of Google in which Brandt argues that “China may be well-advised to block the use of U.S. engines to protect their own national security” because Google may be sharing data about Chinese users with the National Security Agency which would, in Brandt’s mind, “put the NSA at a tremendous advantage in determining where pro-U.S. sentiment may exist in China.”