« Another AntiSplog review and the pharmacy Splogs family | Main | Antisplog index feeded with 2 milion blogs »

Blogs spider launched

As I have already announce it, the antisplog spider is running now and indexing blogs. I have added a counter to inform about the spider progress, it's updated every 10 to 30 minutes.

Indexed blogs aren't yet filtered, this is the first phase as I have said which consist of collecting blogs, and to study more cases. The database of blogs could reach the milion of entries in about three days, this explain a little the difficulty to run the AntiSplog detection system on it. If 10% of the results are not correct, it will represent 100 000 !!

Anyway, I think in first step I'll have to reduce the false positive results, then optimize the splogs detection system in general. Certainly lot of testing should be done before run it on all the database.

There will be certainly many new consideration which will be taken to analyze blogs and detect splogs. The best thing will be a database with 100% of results guaranty, so I guess I'll have something like if(!isAvailable($blog)) { return 3; } if (isSplog($blog)) { return 1; } if (isBlog($blog)) { return 0; } return 4; which is better than the current approach which is something like : if(!isAvailable($blog)) { return 3; } if (isSplog($blog)) { return 1; } else { return 0; }

Okay after some work it become a little easy to say "it's a splog", but how to say "it's a normal blog ?". Someone can automate this ? :-)

Bookmark this article at these sites
Post a comment





(Email will remain hidden)





Please enter the security code you see here




Related entries
Email to a friend
Email this article to:


Your email address:


Message (optional):