Blogs spider launched
As I have already announce it, the antisplog spider is running now and indexing blogs. I have added a counter to inform about the spider progress, it's updated every 10 to 30 minutes.
Indexed blogs aren't yet filtered, this is the first phase as I have said which consist of collecting blogs, and to study more cases. The database of blogs could reach the milion of entries in about three days, this explain a little the difficulty to run the AntiSplog detection system on it. If 10% of the results are not correct, it will represent 100 000 !!
Anyway, I think in first step I'll have to reduce the false positive results, then optimize the splogs detection system in general. Certainly lot of testing should be done before run it on all the database.
There will be certainly many new consideration which will be taken to analyze blogs and detect splogs. The best thing will be a database with 100% of results guaranty, so I guess I'll have something like
if(!isAvailable($blog)) {
return 3;
}
if (isSplog($blog)) {
return 1;
}
if (isBlog($blog)) {
return 0;
}
return 4;
which is better than the current approach which is something like :
if(!isAvailable($blog)) {
return 3;
}
if (isSplog($blog)) {
return 1;
} else {
return 0;
}
Okay after some work it become a little easy to say "it's a splog", but how to say "it's a normal blog ?". Someone can automate this ? :-)


