Features Download
From: KARTHIK VENKATESH BHAT <kv.bhat <at> samsung.com>
Subject: [RFC] LoopInterchange Pass for llvm
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Thursday 5th February 2015 13:34:05 UTC (over 2 years ago)
Hi All,
I have been working on a loop interchange pass for llvm. The motivation is
to improve the cache hit to improve performance.
Would like to get your inputs on the same.

Currently this pass only handles loop of depth 2 other loops are ignored.
Going forward we would like to fix this.
This opt is disabled by default.

Patch: http://reviews.llvm.org/D7432

LoopInterchange Pass is divided into 3 parts-
1) LoopInterchangeLegality
2) LoopInterchangeProfitability
3) LoopInterchangeTransform

This class checks all the memory instructions in the loop and uses
DependenceAnalysis to conclude if we can interchange the loop or not.

LoopInterchangeLegality Functions:
  canInterchangeLoops - Checks if the loops can be interchanged.
  checkDependence - Called by canInterchangeLoops. Does the actual
DependenceAnalysis to conclude if we can interchange the loops
  currentLimiations - This function marks loops are illegal due to current
limitation in the way the transform is written. I intend to fix these

This class checks if it is profitable to interchange the loops. Currently i
use only 1 heuristic which is the order in which the array elements are
accessed (i.e. row major/column major) and count the good and bad order. If
we have bad order more than good order we interchange. We can improve the
heuristics here later on.

LoopInterchangeProfitability Functions:
  isProfitable - Concludes if it is profitable to interchange.
  getInstrOrderCost - Calculates the array access order heuristics.

This transforms the loop and interchanges the inner loop with the outer
loop. I'm writing a loop optimization for the first time and have few
doubts here.
The way we have interchanged the loop is - 
  1) Split the inner loop header and move all phi nodes into a seperate
block. This will be the new outer header.
  2) Split the loop latch at the indvar.next instruction( This may need
improvement as indicated in a TODO in the code) and this will be the new
outer loop latch.
  3) Adjust the loop links by adjusting the branch instructions so that we
move the inner loop outside.
  4) Fix PHi nodes due to the step 3 as previous basicblock from which it
branched will have changed.

After this step the loop is interchanged. It gives the correct results for
few of the sample loops which i checked but I'm not sure if this is the
right way to perform interchange transform.
I suppose i will have to update the dom tree etc?  Could someone please
guide me if this is the right way to go forward or we need to transform in
some other way?

I checked the transform with the following and some other examples it seems
to interchange the loops when legal and profitable.I checked the o/p with
and without the transform. 
They seem to be the same in the cases which i have checked.

using namespace std;
int N,M;
int A[100][100],B[100][100];
int k;
int main() {
 cin >> N >> M;
 for(int i=0;i> A[i][j];

 for(int j=0;j
CD: 4ms