|
From: Boris 'pi' Piwinger <3.14 <at> logic.univie.ac.at>
Subject: Test with bogominitrain.pl Newsgroups: gmane.mail.bogofilter.general Date: 2003-07-31 12:29:32 GMT (4 years, 48 weeks, 4 days, 12 hours and 21 minutes ago) Hi! I did some testing with my bogominitrain.pl (the version which will be in 0.14.1). Here are the results. Summary (false positives in 10,000 / false negatives in 5,000): runs \ -o | .501,.501 | .601,.401 | .701,.301 ----------+-----------+-----------+----------- 1 | 111 / 71 | 32 / 85 | 31 / 76 2 | 60 / 66 | 29 / 68 | 16 / 62 -f | 38 / 62 | 27 / 57 | 14 / 60 Using a security margin is clearly beneficial. Repeated training always improved the results, in some cases dramatically. The smaller the margin the less important repeating becomes. The details: > $ rm -f .bogofilter/*;grep -c '^From ' ham* spam* > ham:2772 > ham-1:10000 > ham-2:10000 > spam:2815 > spam-1:5000 > spam-2:5000 > spam-3:5000 > $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' > [...] > spam good > .MSG_COUNT 160 136 > > False negatives: 62 > False positives: 74 > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 111 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 71 > $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' > [...] > spam good > .MSG_COUNT 224 186 > > False negatives: 18 > False positives: 13 > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 60 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 66 > $ bogominitrain.pl -f .bogofilter 'ham ham-1' 'spam spam-1 spam-2' > [...] > spam good > .MSG_COUNT 293 234 > > False negatives: 0 > False positives: 0 > > > 8 runs needed to close off. > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 38 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 62 > $ rm -f .bogofilter/* > $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401' > [...] > spam good > .MSG_COUNT 522 344 > > False negatives: 241 > False positives: 49 > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 32 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 85 > $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401' > [...] > spam good > .MSG_COUNT 656 395 > > False negatives: 28 > False positives: 7 > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 29 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 68 > $ bogominitrain.pl -f .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401' > [...] > spam good > .MSG_COUNT 681 404 > > False negatives: 0 > False positives: 0 > > > 2 runs needed to close off. > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 27 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 57 > $ rm -f .bogofilter/* > $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301' > [...] > spam good > .MSG_COUNT 619 422 > > False negatives: 301 > False positives: 58 > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 31 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 76 > $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301' > [...] > spam good > .MSG_COUNT 775 467 > > False negatives: 17 > False positives: 9 > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 16 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 62 > $ bogominitrain.pl .bogofilter -fs 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301' > [...] > .MSG_COUNT 794 474 > > False negatives: 0 > False positives: 0 > > > 2 runs needed to close off. > $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H > 14 > $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S > 60 pi |
|
|