Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Max Bolingbroke <batterseapower <at> hotmail.com>
Subject: Hoisting elements of array argument into registers
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Friday 5th November 2010 12:56:40 UTC (over 6 years ago)
Hi,

The Glasgow Haskell Compiler LLVM backend is generating a lot of code which
appears to be optimised  by LLVM quite poorly. The problem is demonstrated
by
this C source code:

int wf(int sp[]) {
    if (sp[0] == 0) {
        return sp[2] + sp[3];
    } else {
        sp[3] = sp[3] + (sp[1] * 5);
        sp[2] = (sp[2] + sp[0]) + 1;
        sp[1] = sp[1] - 1;
        sp[0] = sp[0] - 1;
        return wf(sp);
    }
}

int g(int a) {
    int sp[] = { a, a, 0, 0 };
    return wf(sp);
}

int main(void) {
    printf("%d\n", g(5));
    return 0;
}

As you can see, wf takes its arguments via a pointer. Nonetheless, we would
do
like to loop  optimisations on wf -- in particular, we would like to
eliminate
the duplicate computations in sp[0] and sp[1] (GHC's own optimiser does not
do
any classical loop optimisations like this).

Unfortunately, LLVM (opt -O3: I tried v2.7 and HEAD) doesn't seem to get
very
far. It eliminates the tail call for wf() correctly, but the arguments are
carried between each iteration of the loop by storing intothe sp array. It
would be better if the arguments were kept in registers during the loop and
only stored  into the array upon exit. (These stores could then be
eliminated
easily since there is no later load from sp).

If I manually rewrite the code to replace the array argument with its
composite
scalars in the following manner LLVM obviously does the right thing:

int wf(int a, int b, int c, int d) {
    if (a == 0) {
        return c + d;
    } else {
        return wf(a - 1, b - 1, (c + a) + 1, d + (b * 5));
    }
}

int g(int a) {
    return wf(a, a, 0, 0);
}

int main(void) {
    printf("%d\n", g(5));
    return 0;
}

The question is, is there any way to get LLVM to do this rewriting for us?
It
seems like this would be a generally useful optimisation.

I guess you could also achieve this by sinking the stores to sp into next
loop
iteration/the loop exit,  where they will meet the corresponding loads and
so
good things will happen. The current instruction sinking pass doesn't seem
to
get this case, though.

Thanks in advance for any help!
Max

p.s. Our real code is a little more complicated because sp is really a
pointer
to the top of the Haskell runtimes stack (which is of unknown length), not
a
4-element array, which might make the optimisation a bit harder. Perhaps
LLVM
would have to promote just those elements of the input array which were
accessed in the loop into scalars.
 
CD: 3ms