|
From: Boris 'pi' Piwinger <3.14 <at> logic.univie.ac.at>
Subject: Re: Radical lexers Newsgroups: gmane.mail.bogofilter.general Date: 2003-12-10 15:13:01 GMT (4 years, 29 weeks, 5 days, 8 hours and 19 minutes ago) [Corrected version] This is a very short test only. I compare my version (a) of the lexer (http://piology.org/bogofilter/lexer_v3.l) with a much stricter version of it (b). TOKEN will effectively be of the form [^[:blank:][:cntrl:]<>;&%@|/\\{}^"*,[\]=()+?:#$._!'`~-]+ So no more difference where in a token a character shows up. No punctuation (I hope I did not miss anything). Basically letters, digits and characters outside ASCII are allowed. And even more extreme (c). Tokens are explicitely: [[:alnum:]]+ Here is what I get: wordlist false neg false pos a) 27060k 210/13612 16/15670 b) 26832k 206/13612 17/15670 c) 23332k 210/13612 18/15670 So the size is a surprise. I expected something much smaller for b) and even more for c). The result for b) hurts. It says (if it can be confirmed) that we are doing much too complicated things when defining a token. I did really not expect that lexer to work. But well, that's how it is. c) is really mind-blowing. This simply MUST NOT work. pi |
|
|