google spams

BlogsNow google internet malware

ok, provokative title. Let’s rephrase: google tolerates spam.

Blogger is owned by google. It runs the biggest blog service on it’s blogspot domain.

It appears to be very simple to create hundrets of thousands of ‘weblogs’ like this:

http://p85.blogspot.com/

Created solely for spam purposes. So called ‘splogs’. You set up a robot and there is nothing in the blogger software that stops you from adding all the blogs you like.

This is not new. Google / Blogger / Blogspot knows about it. They did nothing against it in the last years.

It should be relatively easy to make sure that there is a human in front of the computer if a new weblog is created at blogspot.com. Simplecaptchas are very common today.

There are two possible explainations why this did not happen yet:

– blogspot engineering is amazing incapable

or

– there is no real rush to get rid of splogs on googles side.

It might make sense:
You have to forget the “don’t be evil” and “organize the worlds information and make it easily accessible” google dogma’s for a second though. Google knows one thing very very well: how to run a scalable service. They have the lowest cost per stored bit due to their own file system technology. It uses commodity hardware and adds failover management brilliantly. It does cost google not much to host millions of splogs.

But wouldn’t million of false blogs pose a danger to the result-quality of a search engine?

Exactly.

Google knows from which ip address a blog get’s maintained. Nobody else does. They have the actual blog data readily available for further parsing. I doubt that the googlebot comes through the front door to blogspot. The bandwidth alone that you could be saved by crawling blogsport internally should make up for the ‘exception’ that this would mean to the googlebot operations. I don’t know these things. It’s a guess.

Every search engine has to have spam combat tools these days. Google is one of the most useful search engines and in the US they have an ok handle on search engine spam. Isn’t it funny that they don’t use their insider knowledge and acess together with their anti-spam tools to simple turn off splogs on blogspot?

Last October there was somebody that scraped famous blogers sites and reposted that content splogs. That got some attention, and stopped. But splogs did not.

Blogspot hosts lots of splogs. But also lots of legit and very powerful weblogs. Nobody can really afford to ignore the biggest weblog service. Yahoo, Msn and even my little BlogsNow have to crawl blogspot in order to find out what is going on. Google can skip the skip, all others have to deal with it.

There is also a third theory that is the most plausible:

splogs don’t matter to search engines. They have to crawl billions of pages anyway. Who cares about a couple of million spam blogs here and there. That’s probably what it is: The aircraft carrier keeps on going regardless if there are 50% more roaches in the kitchen or not.

intro instead of ads

BlogsNow

The recent update (Version 92) to BlogsNow will go mostly unnoticed for most people. If you have never clicked on a BlogsNow link then you see a very reduced menu and an intro instead of ads. If you are new to BlogsNow then this should make understanding it easier.

If you are a veteran BlogsNow user and you don’t see the full meny anymore then you probably have cookies disabled.

Update 12/19/05 : looking at my numbers I decided to skip ads entirely for now.

no dns -> no blogsnow

BlogsNow

yikes, for the last two days I had the weird feeling as if the earth stood still. It turned out that the DNS stopped working on blogsnow. No DNS no crawl. Of course. So I started to read the same things in the paper that I see on Blogsnow.

Now it’s fixed and things should change soon.

And of course email did not work either during those two days. I could not forward to gmail.
Almost got me into trouble.

moved the server

BlogsNow internet this weblog

Just moved the server to it’s new location. Things should be better now. More bandwidth and better reliability.
That would be the the theory. The next days we will see how that really will pan out.

Andreas

blogdex where are though?

BlogsNow google internet malware

Blogdex is still down. So I thought I might run some google adwords pointing people to BlogsNow.
Turned out somebody was faster: Right now I see an add for blogturbo dot com. Interesting what google advertises for:
It costs only 149 US$ and you can generate thousands of weblogs pointing to your site. This looks like a keyword spam tool to me.
Interesting that google runs ads for it.

Then I wonderred what is going on at daypop.com
Turns out they are down as well …

update November 1st
Blogdex: “up” again, yet results are old/pointless right now.
Daypop: back up again, results make sense. the usual 24 hour delay
blogturbo: still showing ads on google adwords for blogdex.

new look, same crap

BlogsNow

Weblogs.com now a VeriSign service has a new look. And they only show the last 100 blogs that pinged them on their main page. Makes sense. The content however has not changed: junk, junk and junk.

splogs and what we can learn from them

BlogsNow

BlogsNow always checks for all entries how similar the blogs are than link to a given entry. Just now I tightened this parameter radically. It catches now more splogs and also a couple of ‘real blog entries’. If your fan club is not diverse enough, then you will not make BlogsNow as easy anymore. Looking at BlogsNow how it’s filtering I must say this makes sense: Those real blog entries are mostly linked by the very same group of blogs. Mostly on both sides of the political spectrum. Plain link repeaters have simply less influence right now. I like the results better: They reflect what is emerging in many blogs, rather then what gets pushed by whatever agenda there might be.

splogs and web 2.0

BlogsNow

Yahoo bought blo.gs a while ago.
Their new blogs page is one complete splog-fest.

Verizon bought Moveover.com and will switch weblogs.com servers next week. That will be interesting.

splogplosion

BlogsNow

Since the
last sploglosion event
also diluted peoples ego searches there is finally some discussion about Google’s ignorance towards this problem: Chris Pirillo demands that Blogger be fixed or turned off I can imagine that it’s harder for him to execute the P(i)rillo Effect.

Sidenote: when will stop the habit of leavng characters out and feel cool about it? (splogplosion -> spam world wide web log exploison)

And, of course, Icerocket suffers as well.

The problem is not new: seven months ago I had to turn of Blogger in BlogsNow. All future rewrites of BlogsNow knew not to trust blogger content. BlogsNow crawls Blogger as much as possible. But not more. That’s why BlogsNow wasn’t hit by the latest boom of junk on Google’s blogging tool. It does help to have very little resources: With BlogsNow I have simply neither room nor bandwidth for spam. Spam would have killed BlogsNow within days, if it wouldn’t be able to defend itself.

Over at Jeff Jarvis there is some discussion as well on the very same topic. Sreven Den Beste comments along the lines that splogs on blogger.com might be beneficial to google. They definitely are. Wether intended or not: Google knows which blogs on blogger get read and which ones not. That alone makes for huge head start for all search efforts.

The concept behind those splogs is threatening not only a few ego searches on PubSub: The internet didn’t work to sell dog food (web/bubble 1.0). Now the internet is trying to sell information (web/bubble 2.0). While this approach is much more promising, all that spam is diluting. And it hurts innovation: Much of BlogsNow constant rewrites go into the upkeep of the status quo against splogs and other malicious symptoms. I would rather add features. And I would rather be able to trust the information out there.

It’s the same with all spam: In order for somebody to make 1 cent somewhere there are damages of hundreds of dollars elsewhere.

ping poison

BlogsNow internet malware

BlogsNow gets seven pings a second. I just had a cursory look over those. Yes, they are all spam.
If you should still ping BlogsNow in good intention please stop doing so. If you ping BlogsNow in the future then your weblog will go on the black list. Sorry.