Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: ChrisK <haskell <at> list.mightyreason.com>
Subject: ANN: bug fix for regex-tdfa, version 0.97.4 (and "regex-ast")
Newsgroups: gmane.comp.lang.haskell.cafe
Date: Tuesday 24th February 2009 11:57:34 UTC (over 8 years ago)
Hello,

   The regex-tdfa package has had a series of bug fix releases (0.97.1 and
2 and 
3 and now 4).  This 0.97.4 releases finishes fixing the bug that was only
mostly 
fixed in the 0.97.1 release.

   An example of the fixed bug: Apply the regex pattern (BB(B?))+(B?) to
the 
text BBBB.  The "BB" in the pattern should be used twice and both "B?"
should 
match nothing.  My code grouped the "+" wrong and matched the "BB" once and
then 
both the "B?" matched a "B".

   The case fixed here was not initially caught because of how I search for

unknown bugs.  I use "Arbitrary" from QuickCheck to generate random
patterns and 
strings to search, and compare regex-tdfa to another POSIX engine.

   Because I am on OS X, I am limited by the the native POSIX libraries
bugs: 
this bug in regex-tdfa was triggered only when the native POSIX was also
buggy.

   But the source of most of my unit tests is AT&T research [1], and they
have a 
"libast" with a POSIX implementation.  I have adapted my regex-* wrapper 
packages to make a "regex-ast" Haskell interface, but the difficulties with
the 
AT&T headers prevent me from releasing this on hackage.  This "regex-ast"
has 
given me access to a less buggy POSIX back-end, and randomized testing has
led 
to catching the bug fixed here (as well as a few bug reports back to AT&T).

   So while regex-tdfa will not win many speed contests, it is the only
POSIX 
regular expression library I have running that passes all the unit tests.

[1] http://www.research.att.com/sw/download/
     http://www.research.att.com/~gsf/testregex/
     http://www.research.att.com/~gsf/testregex/re-interpretation.html
 
CD: 3ms