Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane

From: Tobias Grosser <tobias <at> grosser.es>
Subject: Where in the pass pipeline run the loop optimizer?
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Wednesday 3rd December 2014 08:32:49 UTC (over 4 years ago)
On 30.11.2014 15:05, Hongbin Zheng wrote:
> What should us consider when we try to determinate the point to run
Polly?

I think we could benefit from some community feedback here.

Some context:

Since a couple of weeks we focus on tuning Polly to minimize
compile time overhead. One important case for us is to minimize the 
compile time impact of Polly in case no loop optimizations are 
performed. One item that we would like to reconsider here is the 
position in the LLVM pass order where we execute Polly.

Currently, when '-O3 -polly' is run, we first schedule nine 
canonicalization passes (taken from the -O3 pass pipeline) to prepare 
the code for Polly, then run Polly itself and after this we run the full 
unmodified -O3 sequence.

The reason for this was that we wanted full control on how we 
canonicalize the code for Polly and we also wanted to run the _entire_
LLVM optimization chain to be sure the code is well optimized after. The 
reason for this choice was mostly to be able to experiment
with loop optimizations without being constrained by compile time or
pass order concerns.

This pass order is obviously not optimal due to the inherent compile 
time increase it implies and also because we do not currently run the 
inliner in our canonicalizer.

Hence, we would like to place the Polly optimizer inside the standard 
-O3 optimization sequence. The question is now what would be the optimal 
location.

Here some considerations:

1) After the IR has been canonicalized

This should remove the need for most of our -polly-canonicalize passes
and eliminate the corresponding compile time cost.

2) After inlining has been performed.

Inlining is essential to optimize C++ heavy code such as boost::ublas.

3) Before the Loop/SLP vectorizer

We want to ensure that Polly is run at a position, where the vectorizer
can continue to optimize the code generated by Polly (and to benefit 
from the parallelism, alias information and loop optimizations we 
performed).

4) Minimize interactions with the inliner

At best Polly is run after all inlining has happened, such that Polly 
always sees the largest possible loop nest. If we optimize too early
we may optimize code that later is inlined in a larger loop nest where
the earlier optimization hinders the optimization of the larger loop nest.

Some open questions:

5) Run it before / after LICM?

LICM may make more loop bounds analyzable by making the parameters in 
the loop bounds loop invariant.

On the other side, running it may introduce unnecessary scalar data 
dependences, which need to be eliminated again (Polly does not do this
yet).

6) Run it before loop rotate / loop unswitch?

Both of these passes introduce copies of code which Polly does not 
really benefit from. (Polly may e.g prove that a loop is executed at
least once, such that the copy introduced by loop-rotate is not 
necessary any more). Hence running them before Polly may unnecessarily 
complicating the loop structure. On the other side, those passes may be 
helpful for LICM.

I currently have been considering two locations:

Option 1: Before the loop optimization passes

=============================================

   MPM.add(createReassociatePass()); // Reassociate expressions

<< Add Polly before the loop optimization passes >>

// Or better call it EP_LoopCanonicalizerStart?
+ addExtensionsToPM(EP_LoopOptimizerStart, MPM);


   MPM.add(createLoopRotatePass()); // Rotate Loop
   MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
   MPM.add(createInstructionCombiningPass());
   MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
   MPM.add(createLoopIdiomPass());             // Recognize idioms like
                                                  memset.
   MPM.add(createLoopDeletionPass());          // Delete dead loops


Option 2: Right before the loop vectorizer
=====================================

   MPM.add(createBarrierNoopPass());

<< Add Polly at the beginning of the target specific optimizations >>
+ addExtensionsToPM(EP_TargetOptsStart, MPM);

   // Re-rotate loops in all our loop nests. These may have fallout out
   // of rotated form due to GVN or other transformations, and the
   // vectorizer relies on the rotated form.
   if (ExtraVectorizerPasses)
     MPM.add(createLoopRotatePass());

   MPM.add(createLoopVectorizePass(DisableUnrollLoops,
         LoopVectorize));

Originally I was thinking of 'Option 1'. It is after the scalar
optimization passes, but before LICM and LoopRotate. However, this 
location seems still to be still somehow a canonicalization phase. 
'Option 2' places Polly after all canonicalization (is the inliner 
somehow finished at this point?) and right at the beginning of the 
target specific optimizations. It seems somehow preferable to me at the 
moment.

Does anybody have opinions on this?

Cheers,
Tobias
 
CD: 127ms