Google Finally Adds Creative Commons Image Search Option

Google finally added the ability to restrict image searches to only images that are tagged with a Creative Commons license.

This feature allows you to restrict your Image Search results to images that have been tagged with licenses like Creative Commons, making it easier to discover images from across the web that you can share, use and even modify. Your search will also include works that have been tagged with other licenses, like GNU Free Documentation license, or are in the public domain.

Nice.

Folders4Gmail Greasemonkey Script

I don’t know about the rest of you, but I have a real love/hate relationship with Gmail. For the most part I love it, but Google seems to have Steve Jobs Disease in thinking that for key features there is only One True Way to implement a feature. If you don’t happen to like that way, Google’s support is happy to post friendly “you don’t know what you’re talking about” responses (see, for example, their responses to the completely screwed up way Gmail will associate completely unrelated e-mails into the same Conversation).

Fortunately, there are scripts and add-ons to deal with most of the defects, such as the Folders4Gmail Greasemonkey script which lets the user organize labels into a hierarchical structure. Google says nobody needs this, but I beg to differ. This is extremely helpful. For example, each of my web sites sends out numerous administrative e-mails which I assign labels to. It’s extremely helpful to have a Web Sites –> websiteX hiearchy, so it’s easy to quickly drill down to these particular set of e-mails without cluttering up the labels list.

NYT Exec Throws Fit Over Placement in Google Searches

Advertising Age has a great example of the complete and utter incompetence many newspaper executives still have when it comes to the web.

Then in January, Martin Nisenholtz, New York Times Co. senior VP-digital operations, got up at the annual Online Publishers Association summit in Florida, an event closed to the press, to blast both the algorithm and the results presentation on the screen.

Priorities
He’d just run a search for Gaza, which had been at war with Israel since Dec. 27. Google returned links to outdated BBC stories, Wikipedia entries and even an anti-Semitic YouTube video well before coverage by the Times, which had an experienced reporter covering the war from inside Gaza itself.

Search results for “Gaza” on March 20 began with two Wikipedia links, a March 19 BBC report, two video clips of unclear origin, the CIA World Factbook, a Guardian report and, most strikingly, a link to Gaza-related messages on Twitter.

The last paragraph isn’t quite accurate. What shows up is not a report from the Guardian on Gaza, but rather a portal-style page where The Guardian has *gasp* actually assembled a page listing all of its recent news stories on Gaza along with links to videos, photographs, commentary and assorted other resources on Gaza.

Now, there’s nothing stopping the New York Times from creating a similar aggregation page, except that it is too busy whining at publisher meetings and tinkering with nonsense like ACAP to think about what its readers (and, in turn, Google) might find valuable.

The reason the NYT didn’t show up on on the first page of a Google search on Gaza is quite simply that it didn’t deserve to. Maybe it can borrow more money Carlos Slim Helú to buy a clue. It might start by finding a VP of digital operations who knows what he’s doing.

Google and “Duplicate” Content

A couple months ago, Google clarified how its search engine handles duplicate content, but I still keep running across people and blogs getting all twisted in knots as to whether or not Google will penalize them because all the URLs on their site can be reached both with and without a trailing slash.

As Google made clear back in September,

But most site owners whom I hear worrying about duplicate content aren’t talking about scraping or domain farms; they’re talking about things like having multiple URLs on the same domain that point to the same content. Like www.example.com/skates.asp?color=black&brand=riedell and www.example.com/skates.asp?brand=riedell&color=black. Having this type of duplicate content on your site can potentially affect your site’s performance, but it doesn’t cause penalties. From our article on duplicate content:

Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.

So unless you’re intentionally trying to deceive Google with dupliate content sites, the worst that is going to happen with duplicate content is Google might decide URL X is the canonical URL for some content on your website, when you really might prefer it use URL Y.

So enough with all dire warnings and silly plugins designed to prevent the horrors of Google finding duplicate content.

Scroogle.Org

Google may not yet be evil, but it is certainly moving further and further down toward that end of the continuum with its extremely poor privacy practices in combination with the almost absurd amount of user data it appears to be logging and storing.

With that in mind, I suspect more services like Scroogle will arise to route around Google’s blase attitude toward user privacy.  Scroogle is basically a Google search proxy. Enter your search into Scroogle and it passes it on to Google using one of a small number of IP addresses, so yours is never logged. Scroogle then intercepts the cookie that Google returns and then displays just the actual search results.

Unlike Google which stores user identifiable information about the search for 18 months, Scroogle promises that a) it doesn’t store search terms at all and b) it only maintains logs for a maximum of 48 hours.

I noticed Daniel Brandt, who I’ve criticized in the past for his conspiratorial ways, is listed as one of the directors of the Scroogle effort. It’s nice to see him turn his anti-Google obsession to positive solutions.

Google Webmaster Feature Improvements

Google implemented some nice features this month for those of us who run websites to help get a better handle on who and how many people are following along with our sites.

In early February, in greatly expanded the ability to track external links to a web site you control. It is possible in Google already to get a list of pages that link to a particular page, but for some reason Google is actually only presenting a small subset of that information to the public.

Once you have verified your site with Google (which involves adding a meta tag), Google will now let you view and download a more comprehensive list of external links to any site you control. From my cursory look at the external links reported by Google to this site, there are still a lot of external links that Google either doesn’t know about or aren’t being included in this more comprehensive report, but it is still much better than what you get from using the link operator in the search engine.

The other thing Google did was add a notation when Google Reader hits an RSS feed that indicates how many people are subscribed to that feed. Open up a server log and search for the Google Reader bot and you can quickly see how many people are subscribed to the feed via Google Reader. Bloglines and other web-based RSS readers have had this feature for a long time, so it’s nice to see Google Reader finally implement this as well.