Search Engines – Brian.Carnell.Com

grep.app

grep.app is a search engine that indexes 500,000+ publicly accessible git repositories.

Boing! Boing! Dumb-And-Resentful?

Jason Kottke decides to start a meme and Cory Doctorow can’t help but jump on the bandwagon about the WhiteHouse.gov robots.txt file which went from 2,400 lines to 2. Of course there must be some nefarious purpose there or lesson about the closed nature of the Bush administration vs. the new open Obama administration.

Kottke tells us the difference represents “a small and nerdy measure of the huge change in the executive branch of the US government today” and Doctorow tags his post with CIVLIB just to let us know this is not just some technical issue.

Which, of course, it is. You can view the entire robots.txt file here. For every /directory/ on the Whitehouse.gov site, the Bush administration created a text-only /directory/text/ subdirectory. The robots.txt file tells Google not to index the text-only version so that the complete page remains canonical for Google. In fact, this is exactly what Google suggests doing for sites that have large amounts of duplicated content (on this site, for example, most pages have a print-only option and the robots.txt file instructs Google not to index any URLs that contain /print/).

I wonder if this sort of nonsense is what Teresa Nielsen Hayden meant by “dumb-and-resentful” political commentators.

Cuil – Three Columns of Ugliness

TechCrunch writes about the increasingly small window companies have to make an impression with new product launches, citing the lousy launch of Google-wannabe Cuil.

The thing that boggles my mind about Cuil is still the three-column search results page. WTF? That just makes my eyes bleed and has me running back to Google to lay down my privacy in exchange for relevant results normal people can actually read.

Scroogle.Org

Google may not yet be evil, but it is certainly moving further and further down toward that end of the continuum with its extremely poor privacy practices in combination with the almost absurd amount of user data it appears to be logging and storing.

With that in mind, I suspect more services like Scroogle will arise to route around Google’s blase attitude toward user privacy. Scroogle is basically a Google search proxy. Enter your search into Scroogle and it passes it on to Google using one of a small number of IP addresses, so yours is never logged. Scroogle then intercepts the cookie that Google returns and then displays just the actual search results.

Unlike Google which stores user identifiable information about the search for 18 months, Scroogle promises that a) it doesn’t store search terms at all and b) it only maintains logs for a maximum of 48 hours.

I noticed Daniel Brandt, who I’ve criticized in the past for his conspiratorial ways, is listed as one of the directors of the Scroogle effort. It’s nice to see him turn his anti-Google obsession to positive solutions.

Nice, Short Guide to Searching Google

Most people don’t seem to be aware of all of the different operators and other techniques that Google accepts that can be used to narrow down searches to help find just the results you’re looking for.

Mapelli’s Ultimate Google Search Guide is a fairly thorough, short outline of all of the different tricks and techniques you can use when doing Google searches.

Rollyo.Com Is A Beautiful Thing

In another lifetime — i.e. a few years ago — I used to run a small search engine. Rather than try to index every possible site in the world, this search engine used some pretty basic technology to index the then-handful of web sites on the Internet opposed to the animal rights agenda. It was fairly rudimentary, but it was also very cool just to search this particular slice of the web that was relevant to me and others following the animal rights movement.

I’ve always had a goal in the back of my head of bringing that back someday, but Rollyo.Com has created something much better — an easy-to-use system to create a front end to Yahoo!’s search engine to search a limited number of sites.

For example, here’s an anti-animal rights search engine front-end that I created using Rollyo. Search on any term there, and the results come back only from the 7 or 8 anti-AR sites I’ve added. The user can currently add up to 25 sites.

This is extremely useful and I’m surprised its taken so long for such an application to emerge. And as much as I like Rollyo.Com, I’m hoping that Google comes up with something like it as I find their results more useful than Yahoo!’s.

Rollyo.Com is currently in beta and some of the functionality isn’t complete — what I’d really like to be able to do is throw a search box on my web site that searches the entities I’ve created, but that’s still a “coming soon” feature.

Still, even in its infancy this is an awesome tool.