|
Subject: Re: GHC predictability Newsgroups: gmane.comp.lang.haskell.cafe Date: 2008-05-12 19:01:53 GMT (7 weeks, 4 days, 2 hours and 20 minutes ago) Don Stewart wrote: > jeff.polakow: > >> Hello, >> >> One frequent criticism of Haskell (and by extension GHC) is that it has >> unpredictable performance and memory consumption. I personally do not find >> this to be the case. I suspect that most programmer confusion is rooted in >> shaky knowledge of lazy evaluation; and I have been able to fix, with >> relative ease, the various performance problems I've run into. However I >> am not doing any sort of performance critical computing (I care about >> minutes or seconds, but not about milliseconds). >> >> I would like to know what others think about this. Is GHC predictable? Is >> a thorough knowledge of lazy evaluation good enough to write efficient >> (whatever that means to you) code? Or is intimate knowledge of GHC's >> innards necessary? >> >> thanks, >> Jeff >> >> PS I am conflating Haskell and GHC because I use GHC (with its extensions) >> and it produces (to my knowledge) the fastest code. >> > > This has been my experience to. I'm not even sure where > "unpredicatiblity" would even come in, other than though not > understanding the demand patterns of the code. > > It's relatively easy to look at the Core to get a precise understanding > of the runtime behaviour. > > I've also not found the GC unpredicatble either. > I offer up the following example: mean xs = sum xs / length xs Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!) If we now rearrange this to mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s' = s+x; n' = n+1 in s' `seq` n' `seq` (s', n')) (0,0) and run the same example, and watch it run in constant space. Of course, the first version is clearly readable, while the second one is almost utterly incomprehensible, especially to a beginner. (It's even more fun that you need all those seq calls in there to make it work properly.) The sad fact is that if you just write something in Haskell in a nice, declarative style, then roughly 20% of the time you get good performance, and 80% of the time you get laughably poor performance. For example, I sat down and spent the best part of a day writing an MD5 implementation. Eventually I got it so that all the test vectors work right. (Stupid little-endian nonsense... mutter mutter...) When I tried it on a file containing more than 1 MB of data... ooooohhhh dear... I gave up after waiting several minutes for an operation that the C implementation can do in milliseconds. I'm sure there's some way of fixing this, but... the source code is pretty damn large, and very messy as it is. I shudder to think what you'd need to do to it to speed it up. Of course, the first step in any serious attempt at performance improvement is to actually profile the code to figure out where the time is being spent. Laziness is *not* your friend here. I've more or less given up trying to comprehend the numbers I get back from the GHC profiles, because they apparently defy logic. I'm sure there's a reason to the madness somewhere, but... for nontrivial programs, it's just too hard to figure out what's going on. Probably the best part is that almost any nontrivial program you write spends 60% or more of its time doing GC rather than actual work. Good luck with the heap profiler. It's even more mysterious than the time profiles. |
|
|