Half of all blogs are splogs, according to some estimates I’ve seen.
Like this one, that I recently noticed because it linked to bizhack.
I wonder if you could detect blogs by using some kind of Statistically Improbable Phrase methodology. Not that any of the phrases individually would mean anything … but a whole bunch of phrases that are statistically improbable to be found on the same website would probably be interesting.
There’s gotta be a PageUnRank algorithm in there.[tags] splog, blog, spam, PageRank, SIP, Statistically Improbably Phrases, google, technorati, john koetsier [/tags]
3 CommentsLeave a comment
Nice idea… would be interesting to see whether its feasible…
🙂 …already experimenting with that idea…
Interesting … please keep me up to date. Would love to see whatever you’re working on.