« AntiSplog.net blog launched | Main | Bad spammers knows how to escape antispam techniques »

Thoughts on Bayasian filters and Splogs

There is many techniques that have been developped to fight spam, specially email spam, and I was wondering how these techniques could be helpful for splogs also. Before implementing AntiSplog, I have tested bayes theorem to classify splogs the result wasn't interesting for an automated process, so I decided to find another solution which is currently running.

What is Bayes Theorem ?

Bayes is an english mathematician for whom Bayes's theorem is named (1702-1761). Bayes's Theorem is a mathematical formula used for calculating conditional probabilities.

Bayes Theorem by definition is :

The probability of H conditional on E is defined as PE(H) = P(H & E)/P(E), provided that both terms of this ratio exist and P(E) > 0.

Why Bayasian filter can't stop splogs

The first thing I noticed that its very easy to bypass a bayasian filter, and in the same time very easy to detect a normal content as spam. For example a splog like this one http://mydordrecht.blogspot.com/ will increase the spam score of the keywords : Real Estate, Austin, Texas, Apartment, ... which are common keywords, any blog could be easily tagged splog using these keywords.

Also escaping Bayasian filter is very easy, you can see this blog http://web-traffic-tips.blogspot.com/ all the content looks very well choosed, also its about web traffic tips, so you can guess why !

Also you could experiment this with many blogging software that offer bayasian antispam solutions for comments and you'll see the number of problems that you will find.

When Bayasian filters could be useful ?

The experiment that I have made don't reflect the real things. I could give you more accurate results if I was testing on the Google index. A high number of websites could help to get more correct probabilities. But imagine the number of calculations to do just to find a splog, the database size of bayasian tokens ... !! No thank you, I don't have all this to fight small splogs.

Conclusion

Applying Bayes Theorem to fight spam could be in theory an excellent solution, while in practice many tricks could be used to bypass these implementations and make it fail. While I was looking for fast solution that could help to study the behaviour of "splogs" compared to a "normal website".

Bookmark this article at these sites
Post a comment





(Email will remain hidden)





Please enter the security code you see here




Related entries
Email to a friend
Email this article to:


Your email address:


Message (optional):