|
From: Boris 'pi' Piwinger <3.14 <at> logic.univie.ac.at>
Subject: Re: Testing training methods Newsgroups: gmane.mail.bogofilter.general Date: 2003-11-18 12:11:44 GMT (4 years, 32 weeks, 6 days, 11 hours and 37 minutes ago) Hi! In the past have have done several tests about training methods: http://article.gmane.org/gmane.mail.bogofilter.general/4373 http://article.gmane.org/gmane.mail.bogofilter.general/5346 http://article.gmane.org/gmane.mail.bogofilter.general/5403 Here is another set of tests: The first is with my new version of lexer, allowing tokens of lenght one and two, numbers and slightly changed characters at token front and back. All tests use the default parameters (-C). sizes of mboxes: t r0 r1 r2 tot sp 12085 4031 4027 4026 12084 ns 13396 4469 4464 4464 13397 ns: 13397, sp: 12084, target: 34 test: N (full training) wordlist ns 13396, sp 12085 wo (fn): 0.950000 130 137 116 383 wo (fp): 0.950000 2 2 1 5 wi (fn): 0.498987 44 45 36 125 wi (fp): 0.498987 12 10 12 34 test: R (randomtrain) wordlist ns 47, sp 401 wo (fn): 0.950000 69 67 54 190 wo (fp): 0.950000 8 8 4 20 wi (fn): 0.908152 45 43 33 121 wi (fp): 0.908152 14 11 9 34 test: M (one run of bogominitrain.pl) wordlist ns 43, sp 252 wo (fn): 0.950000 85 92 67 244 wo (fp): 0.950000 10 28 18 56 wi (fn): 0.987733 194 182 162 538 wi (fp): 0.987733 4 19 11 34 test: Mf (bogominitrain.pl -fn) wordlist ns 62, sp 495 wo (fn): 0.950000 61 60 58 179 wo (fp): 0.950000 2 3 2 7 wi (fn): 0.856059 24 29 27 80 wi (fp): 0.856059 10 14 10 34 sizes of the database: 27M test.N.d/wordlist.db 1.7M test.R.d/wordlist.db 1.1M test.M.d/wordlist.db 1.7M test.Mf.d/wordlist.db Note that there was no security margin used for the three train on error methods, so those results are not as good as you would see in normal production. We clearly see: - Neither one run of randomtrain nor bogominitrain.pl produces good results. Both have a high risk of false positives and leave many false negatives. I cannot explain why they both produce so different results here, they should be similar. - Training to exhaustion (test Mf) again was the best method in the test, even without security margin. The second run is as above, but with the lexer we now have in CVS (including the removal of ' and ` at the end of a token). test: N wordlist ns 13396, sp 12085 wo (fn): 0.950000 140 143 122 405 wo (fp): 0.950000 1 3 2 6 wi (fn): 0.498735 43 46 35 124 wi (fp): 0.498735 14 7 13 34 test: R wordlist ns 49, sp 413 wo (fn): 0.950000 89 96 88 273 wo (fp): 0.950000 11 11 6 28 wi (fn): 0.936851 76 76 72 224 wi (fp): 0.936851 12 12 10 34 test: M wordlist ns 44, sp 340 wo (fn): 0.950000 170 150 161 481 wo (fp): 0.950000 8 8 7 23 wi (fn): 0.926234 137 119 124 380 wi (fp): 0.926234 10 12 12 34 test: Mf wordlist ns 56, sp 611 wo (fn): 0.950000 86 90 63 239 wo (fp): 0.950000 4 3 4 11 wi (fn): 0.844118 41 35 33 109 wi (fp): 0.844118 12 11 11 34 25M test.N.d/wordlist.db 1.5M test.R.d/wordlist.db 1.4M test.M.d/wordlist.db 2.0M test.Mf.d/wordlist.db The results are similar to the first run. So it is interesting to compare those. For tests N and MF gives better results with the new lexer. For R the results look totally different. M also doen't answer the question. pi |
|
|