Bogofilter (bayesian spam filter) added to moonbase

elaine forbes elaine at fwsystems.com
Sat Mar 1 13:25:40 GMT 2003


I've installed bogofilter and tested / compared results against
bogofilter running on the Unix box that handles my mail, all looks good.

I'm a bit rusty on module-scripting but this one's simple so I don't
think there will be any issues.

The current stable version of bogofilter is 0.10.3.1, I have been using
0.8 for a long time (and spent a good bit of time trying to get a more
recent version working).

The older version was only moderately useful (it didn't unpack MIME),
and was doing unacceptably poorly on both false-negative *AND*
false-positives.

I think this version will be quite safe for average users and even for
making a single spam-recipe for a large user-base.

Locally, training with ~500spam / ~300ham messages has resulted in
wordlist DB files of ca 800k / 2.4mb and uses apx 0.03 sec per message
to classify incoming spam and about 0.17sec to filter a message with a
200kb kernel patch attached (550 mhz athlon/K7).

I assume it will be faster if you've installed db4, however I haven't
tested this.

This version passes some pretty hard acid-tests. One of my clients
writes advertising copy, and her emails incorporating work for her
website have been canned by nearly every method I've thought of for
spam-filtering, bogofilter is now keeping her stuff. The only false
positives are now on some traffic on LKML and one motorcyle list, which
are losses I can live with :-). 

To train the filter you'll need as large a set of both spam and good
email as practical, do something like:

for i in spam/*; do bogofilter -svI $i; done
for i in ham/*; do bogofilter -nvI $i; done


A procmail incorporating something like:

* ? /usr/bin/bogofilter -o 0.4 -u 

sets the 'spamicity' cutoff to 0.4 and updates the wordlists according
to whether the mail's judged as saam or ham. I don't think most users
will need to adjust the -o flag.

I felt the -u setting was probably unsafe in v 0.8, in the current
version it looks like it's matured to a very useful state. 

elaine


More information about the Lunar mailing list