google spams

BlogsNow google internet malware

ok, provokative title. Let’s rephrase: google tolerates spam.

Blogger is owned by google. It runs the biggest blog service on it’s blogspot domain.

It appears to be very simple to create hundrets of thousands of ‘weblogs’ like this:

http://p85.blogspot.com/

Created solely for spam purposes. So called ‘splogs’. You set up a robot and there is nothing in the blogger software that stops you from adding all the blogs you like.

This is not new. Google / Blogger / Blogspot knows about it. They did nothing against it in the last years.

It should be relatively easy to make sure that there is a human in front of the computer if a new weblog is created at blogspot.com. Simplecaptchas are very common today.

There are two possible explainations why this did not happen yet:

– blogspot engineering is amazing incapable

or

– there is no real rush to get rid of splogs on googles side.

It might make sense:
You have to forget the “don’t be evil” and “organize the worlds information and make it easily accessible” google dogma’s for a second though. Google knows one thing very very well: how to run a scalable service. They have the lowest cost per stored bit due to their own file system technology. It uses commodity hardware and adds failover management brilliantly. It does cost google not much to host millions of splogs.

But wouldn’t million of false blogs pose a danger to the result-quality of a search engine?

Exactly.

Google knows from which ip address a blog get’s maintained. Nobody else does. They have the actual blog data readily available for further parsing. I doubt that the googlebot comes through the front door to blogspot. The bandwidth alone that you could be saved by crawling blogsport internally should make up for the ‘exception’ that this would mean to the googlebot operations. I don’t know these things. It’s a guess.

Every search engine has to have spam combat tools these days. Google is one of the most useful search engines and in the US they have an ok handle on search engine spam. Isn’t it funny that they don’t use their insider knowledge and acess together with their anti-spam tools to simple turn off splogs on blogspot?

Last October there was somebody that scraped famous blogers sites and reposted that content splogs. That got some attention, and stopped. But splogs did not.

Blogspot hosts lots of splogs. But also lots of legit and very powerful weblogs. Nobody can really afford to ignore the biggest weblog service. Yahoo, Msn and even my little BlogsNow have to crawl blogspot in order to find out what is going on. Google can skip the skip, all others have to deal with it.

There is also a third theory that is the most plausible:

splogs don’t matter to search engines. They have to crawl billions of pages anyway. Who cares about a couple of million spam blogs here and there. That’s probably what it is: The aircraft carrier keeps on going regardless if there are 50% more roaches in the kitchen or not.