Splog detection

Half of all blogs are splogs, according to some estimates I’ve seen.

Like this one, that I recently noticed because it linked to bizhack.

I wonder if you could detect blogs by using some kind of Statistically Improbable Phrase methodology. Not that any of the phrases individually would mean anything … but a whole bunch of phrases that are statistically improbable to be found on the same website would probably be interesting.

Google? Technorati?

There’s gotta be a PageUnRank algorithm in there.

[tags] splog, blog, spam, PageRank, SIP, Statistically Improbably Phrases, google, technorati, john koetsier [/tags]

