« RSS feeds help to create content rated Splog | Main | AntiSplog reviewed »

AntiSplog spider will be running soon

I have almost finished writing the AntiSplog.net Spider, it will help too much to get more cases of spam to study. I have separated the spider in two parts the blogs collector and the blogs filter because the two tasks are different. It's currently running and collecting thousands of blogs from all over the world. I have already an old database with millions blogs, but I guess it will boring to update it specially that a lot of blogs closed, moved, or inactive.

The filter, which based on AntiSplog algorythm will be launched in the next step. I think by the next week it will be ready, and will analyze and report automaticly Splogs. I'll add a counter on the website to show the work in progress.

I have also already announced that there will be stats and report posted monthly to show the progress of this project. I think the first month the result won't be very significant since its still experimental and not yet indexing lot of blogs. So saying that X% are splogs or Y% are not don't have real meaning with a small number of indexed blog. That's why the spider will give Antisplog.net reports another vision and more useful informations.

Bookmark this article at these sites
Comments
1

Other than Blogger and wherever else you submit splogs to, is there going to be some kind of review to make sure no legitimate blogs are caught?

For now the numbers of blogs you are dealing with may be small enough to keep an eye on, but once your crawler is working that isn't going to be possible. I know you are working to make the filter 100% accurate, but even experienced spam hunters have trouble determining the difference between some splogs and legitimate but strange blogs.

2

I think that currently the number of normal blogs reported as spam is very very small (See the review of Jean Véronis), but anyway reporting to Blogger that not mean that they will be deleted.

Blogger will review manually to be sure they won't delete or take an action against a normal blog.

And yes you are right, currently the number of blogs analyzed is small, in the future a flag will be introduced to review manually the suspected results. Or Antisplog will just return a "Splogability rate" which will help to make difference between the 100% splog and the suspected ones.

I queries the database and currently there is less than 10% of suspected results. Until Antisplog can reduce it to less than 1%

3

I know Blogger will review them before deleting splogs, but the more accurate your results the better because it will mean less work for them and hopefully faster removal of splogs.

I like the idea of marking unsure on ones you can't determine. Hopefully you can get that number down, but it is a really good start.

Your other problem is of course that the smarter spammers are going to adapt quickly. So your detection heuristics are going to have to constantly be adjusted.

4

About "smart spammers", I think AntiSplog is adapting more quickly compared to new splogs techniques.

And as I have already said, I work more on splogs behaviour, rather than techniques. which make it easier to detect more techniques with really small algorythm.

Post a comment





(Email will remain hidden)





Please enter the security code you see here




Related entries
Email to a friend
Email this article to:


Your email address:


Message (optional):