Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Don Stewart <dons <at> galois.com>
Subject: Fast parallel binary-trees for the shootout: Control.Parallel.Strategies FTW!
Newsgroups: gmane.comp.lang.haskell.cafe
Date: Monday 1st September 2008 22:18:13 UTC (over 9 years ago)
The Computer Language Benchmarks Game recently got a quad core 64 bit
machine. Initial results just porting the single-threaded Haskell
benchmarks to 64 bit look rather pleasing,

    http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all

Note that C++, D, Clean et al aren't ported to the 64 bit machine yet
(and perhaps some implementations won't survive the 64 bit switch).

Now, none of the Haskell programs are annotated for parallelism yet,
despite that quad core sitting there ready to go. I've a conjecture that 
fast, smp capable, concise code, of the kind GHC produces, could
dominate the shootout for a while to come, if we can effectively utilise
those cores. This is a chance to make an impression!

So, the first parallel benchmark has been submitted by Tom Davies and
me, a parallel implementation of the binary-trees benchmark, using the
Control.Parallel.Strategies lib to set up some top level parallel tasks.

Initial results look promising, with the old entry:

* Single core, no heap hints: 26.9s

    $ ghc -O2 -fasm Original.hs --make
    $ time ./Original 20
    ./Original 20  26.79s user 0.16s system 100% cpu 26.933 total

    Looking at the GC stats (-sstderr) we see its spending too much time
    doing GC, 57.8%, so we can improve that by setting a default heap size:

* Single core, with heap hints: 13.7s

    $ time ./Original 20 -A350M
    ...
    ./Original 20 +RTS -A350M  13.54s user 0.27s system 100% cpu 13.791
total

    Halves runtime in comparison to the current entry, and reduces GC to
3.8%.

* Now, the parallel version, changing one line to use a parMap strategy,
and using the N+1 capabilities rule,
    
    $ ghc -O2 -fasm Parallel2.hs --make -threaded
    $ time ./Parallel2 20 +RTS -N5 -A350M
    ...
    ./Parallel2 20 +RTS -N5 -A350M  15.38s user 1.57s system 305% cpu 5.684
total

So 305% productive (not too bad for this memory-heavy program), and a
good speedup.

Observations:

    * check GC stats with -sstderr, then use -AxxxM to set a default heap
size
    * parallelise the top level control using speculative strategies
    * Use -N+1 capabilities.

-- Don
 
CD: 44ms