Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane

From: Michael Zolotukhin <mzolotukhin <at> apple.com>
Subject: [RFC] Heuristic for complete loop unrolling
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Friday 23rd January 2015 20:05:11 UTC (over 3 years ago)
Hi devs,

Recently I came across an interesting testcase that LLVM failed to optimize
well. The test does some image processing, and as a part of it, it
traverses all the pixels and computes some value basing on the adjacent
pixels. So, the hot part looks like this:

for(y = 0..height) {
 for (x = 0..width) {
   val = 0
   for (j = 0..5) {
     for (i = 0..5) {
       val += img[x+i,y+j] * weight[i,j]
     }
   }
 }
}

And ‘weight' is just a constant matrix with some coefficients.

If we unroll the two internal loops (with tripcount 5), then we can replace
weight[i,j] with concrete constant values. In this particular case, many of
the coefficients are actually 0 or 1, which enables huge code
simplifications later on. But currently we unroll only the innermost one,
because unrolling both of them will exceed the threshold.

When deciding whether to unroll or not, we currently look only at the
instruction count of the loop. My proposal is to, on top of that, check if
we can enable any later optimizations by unrolling - in this case by
replacing a load with a constant. Similar to what we do in inlining
heuristics, we can estimate how many instructions would be potentially
eliminated after unrolling and adjust our threshold with this value.

I can imagine that it might be also useful for computations, involving
sparse constant matrixes (like identity matrix).

The attached patch implements this feature, and with it we handle the
original testcase well.
 
CD: 9ms