Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Diego Novillo <dnovillo <at> google.com>
Subject: Loop unrolling opportunity in SPEC's libquantum with profile info
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Thursday 16th January 2014 00:13:27 UTC (over 3 years ago)
I am starting to use the sample profiler to analyze new performance
opportunities. The loop unroller has popped up in several of the
benchmarks I'm running. In particular, libquantum. There is a ~12%
opportunity when the runtime unroller is triggered.

This helps functions like quantum_sigma_x
(http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00149).
The function accounts for ~20% of total runtime. By allowing the
runtime unroller, we can speedup the program by about 12%.

I have been poking at the unroller a little bit. Currently, the
runtime unroller is only triggered by a special flag or if the target
states it in the unrolling preferences. We could also consult the
block frequency information here. If the loop header has a higher
relative frequency than the rest of the function, then we'd enable
runtime unrolling.

Chandler also pointed me at the vectorizer, which has its own
unroller. However, the vectorizer only unrolls enough to serve the
target, it's not as general as the runtime-triggered unroller. From
what I've seen, it will get a maximum unroll factor of 2 on x86 (4 on
avx targets). Additionally, the vectorizer only unrolls to aid
reduction variables. When I forced the vectorizer to unroll these
loops, the performance effects were nil.

I'm currently looking at changing LoopUnroll::runOnLoop() to consult
block frequency information for the loop header to decide whether to
try runtime triggers for loops that don't have a constant trip count
but could be partially peeled.

Does that sound reasonable?


Thanks.  Diego.
 
CD: 4ms