Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Lev Walkin <vlm <at> lionet.info>
Subject: Re: Re: XML (HXML) parsing :: GHC 6.8.3 space leak from 2000
Newsgroups: gmane.comp.lang.haskell.cafe
Date: Tuesday 23rd September 2008 13:35:02 UTC (over 9 years ago)
Marc A. Ziegert wrote:

>>> We don't know of a good way to fix this problem.  I'm going to record 
>>> this example in a ticket for future reference, though.
>> Simon,
>>
>> is there a way, perhaps, to rewrite this expression to avoid leaks?
>> An ad-hoc will do, perhaps split in two modules to avoid intramodular
>> optimizations?
>>
>> -- 
>> Lev Walkin
> 
> finally... there is a way! :D
> 
> hmm... this was a nice puzzle ;)
> 
> i've tried several times (and hours!) to implement a Continuation (not
monad) based solution, but finally i developed this tricky but elegant
foldr solution...
> i built the parser around this type:
>   type FoldR_Builder = (TreeEvent,[UnconsumedEvent]) -> [Either
[UnconsumedEvent] (Tree String)] -> [Either [UnconsumedEvent] (Tree
String)]
> 
> it is based on the following thought:
> the tuple
>   (rs,ps)::([Rest],[Processed]) -- with the restriction, which forces the
list ps to be processed entirely before rs.
> is equipollent to
>   (fmap Right ps++[Left rs])::[Either [Rest] Processed]
> , but the latter is easier to handle ...at least if you can't trust the
GC.

Marc, you are my hero of the month!

I can't say I understood this solution before applying it back to
HXML-0.2, but it surely worked and made quite observable 20%
difference in performance:

9.8 seconds on my 45 megabyte XML test, running in half the space
(4m) compared to my parallel version based on Ketil Malde's suggestion
(which was 12 seconds on two cores (though, one core was almost
idling, `par` was used purely for its side-effect)).

To those who wants to parse XML in constant space, attached find
a patch to HXML-0.2 which fixes a space leak.

However, I am still a bit surprized to discover there is not an order
of magnitude difference between `par`-based version and your builder.

While the foldr-based builder is clearly superior, one can't
help but wonder whether is it `par` that is so efficient compared
to crunching through Eithers, or there's some other bottleneck
in the code. Will profile a bit later.

The XML parsing space leak was declared in HXML back in 2000 and
lingered in the code for 8 years. Good riddance!

-- 
Lev Walkin
[email protected]
 
CD: 3ms