Features Download
From: Benedikt Huber <benjovi <at> gmx.net>
Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement
Newsgroups: gmane.comp.lang.haskell.libraries
Date: Tuesday 24th June 2008 12:11:34 UTC (over 9 years ago)

I'd like to propose bugfixes, documentation fixes and a performance  
improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't  
effect the expected behaviour of the PP library.

I've written a QuickCheck test suite for the pretty printer (to test  
the improvement), and found two bugs and some misconceptions/ 
ambiguities in the documentation. Additionally, there is a  
microbenchmark for the suggested improvement.
Both are available at http://code.haskell.org/~bhuber/Text/

PrettyPrint/. Note that the QuickCheck tests need access to all top- 
level names in HughesPJ (i.e. ignore the export list).

In summary, I propose to
* fix a bug in fillNB and one in fillNB/sepNB
* correct documentation on laws and invariants.
* add more efficient implementations of vcat,hsep,hcat

More specifically:

(1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)

(2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical  

(3) Lazy implementations of vcat,hcat and hsep

(4) Law  isn't always true

(5) Invariant 5 should be made stronger

(6) Change the comment about negative indentation

The details follow below (maybe a little long).

best regards,


== Details (Long) ==

Bug Fixes

(1) Bugfix fillNB

Law  states that

 > sep (ps++[empty]++qs)   = sep (ps ++ qs)
 >         ...ditto hsep, hcat, vcat, fill...

In the current implementation, this fails for the paragraph fill  

 > render' $ fill True [ text "c", text "c",empty, text "c", text "b"]
 >   where render' = renderStyle (Style PageMode 7 1.4)
 >> c c c
 >>     b

The reason is a missing test for Empty in fillNB

2) Bugfix: overlap and f?(cat|sep)

The specification for cat/sep:
  * oneLiner (hcat/hsep ps)
    vcat ps [*]

But currently cat, sep, fcat and fsep attempt to overlap the second  
line with the first one, i.e. they use
`foldr ($$) empty ps' instead of `foldr ($+$) empty ps' [*]. I assume  
this is a mistake.

This bug can lead to situations, where the line in the right argument  
of Union is actually longer:

 > prettyDoc$ cat [ text "a", nest 2 ( text "b") ]
 >> text "a"; union
 >>           (text "b"; empty)
 >>           (nilabove; nest 1; text "b"; empty)

 > renderStyle (Style PageMode 1 1) $ cat [ text "a", nest 2 ( text  
"b") ]
 >> "a b"

In the implementation, we call `nilAbove False' instead of `nilAbove  
True' (see patch).

Performance Improvements

3) Improving the space/time performance of vcat, hsep, hcat

vcat (hsep,cat) is implemented in an unneccessarily strict way.
We only get some output after all of vcat's arguments are evaluated  
and checked against being Empty.
This can be improved by only checking the right argument of foldr  
against being Empty, and
then applying an Empty-filter on the resulting Doc. Space improvement  
is obvious.
The microbenchmark (code.haskell.org/~bhuber/Text/PrettyPrint/ 
HughesPJPerfCheck.hs) suggests that
the improvements in time are remarkable too.

Documentation fixes
The QuickCheck tests revealed that

4) Law  isn't always true

       text "" <> x            = x, if x non-empty

only holds if x does not start with nest (obviously, because of law  

5) The invariant 5 should be extended:

    A @NoDoc@ may only appear on the first line of the left argument of
    an union. Therefore, the right argument of an union can never be  
    to the empty set.

because this assumption is used when rendering.

6) Change the comment about negative indentation

In the source code we have:

-- (spaces n) generates a list of n spaces
-- It should never be called with 'n' < 0, but that can happen for  
reasons I don't understand

If we compose a <> b, and the first line of b is deeply nested, but  
other lines of b are not,
then, because <> eats the nest, the pretty printer will try to layout  
some of b's lines with
negative indentation:

doc       |0123345
d1        |a
d2        |   b
d1<>d2    |ab

Here is the reason why this is rather unavoidable: In John Hughes  
paper, there is a line stating that
    "composing layouts does not change the layouts being composed".
Another one states that
    "<> should eat up nests"
But this leads to negative indentation - imho a user error.

== End ==
CD: 3ms