Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Robison, Arch <arch.robison <at> intel.com>
Subject: Improving SLPVectorizer for Julia
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Monday 17th March 2014 21:38:04 UTC (over 3 years ago)
I'm working on some small improvements to SLPVectorizer.cpp so that it can
deal with some tuple operations arising from Julia code.  Being fairly new
to LLVM, I could use some advice, particular from those familiar with the
internals of SLPVectorizer.



The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857
.  Here is an example of the kind of LLVM code I wish to vectorize.

-------------------------------------------------------------

define <4 x float> @julia_foo111(<4 x float>, <4 x float>) {

top:

  %2 = extractelement <4 x float> %0, i32 0

  %3 = extractelement <4 x float> %1, i32 0

  %4 = fadd float %2, %3

  %5 = insertelement <4 x float> undef, float %4, i32 0

  %6 = extractelement <4 x float> %0, i32 1

  %7 = extractelement <4 x float> %1, i32 1

  %8 = fadd float %6, %7

  %9 = insertelement <4 x float> %5, float %8, i32 1

  %10 = extractelement <4 x float> %0, i32 2

  %11 = extractelement <4 x float> %1, i32 2

  %12 = fadd float %10, %11

  %13 = insertelement <4 x float> %9, float %12, i32 2

  %14 = extractelement <4 x float> %0, i32 3

  %15 = extractelement <4 x float> %1, i32 3

  %16 = fadd float %14, %15

  %17 = insertelement <4 x float> %13, float %16, i32 3

  ret <4 x float> %17

}

-------------------------------------------------------------

I want the fadd instructions to be vectorized.  I've been able to implement
most of what I need (see attached patch), but with a fatal flaw: the uses
of the vectorized result are  not moved as necessary.  Here is the current
(and quite illegal) result:

-------------------------------------------------------------

top:

  %2 = extractelement <4 x float> %8, i32 0

  %3 = insertelement <4 x float> undef, float %2, i32 0

  %4 = extractelement <4 x float> %8, i32 1

  %5 = insertelement <4 x float> %3, float %4, i32 1

  %6 = extractelement <4 x float> %8, i32 2

  %7 = insertelement <4 x float> %5, float %6, i32 2

  %8 = fadd <4 x float> %0, %1

  %9 = extractelement <4 x float> %8, i32 3

  %10 = insertelement <4 x float> %7, float %9, i32 3

  ret <4 x float> %10

-------------------------------------------------------------

Instructions %3, %A5 and %7 need to be moved to after Instructions %8.  I'm
wondering what is a good way to do this.  The relevant place in
SLPVectorizer.cpp is around here:

-------------------------------------------------------------

    if (Cost < -SLPCostThreshold) {

      DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n");

      R.vectorizeTree();



      // Move to the next bundle.

      i += VF - 1;

      Changed = true;

    }

-------------------------------------------------------------

Should I try to move the instructions before or after the call to
R.vectorizeTree()?  Or maybe do it even later after all bundles have been
vectorized?  Are there LLVM utilities for doing this kind of fixup within a
basic block?



Any pointers/advice appreciated.



- Arch
 
CD: 3ms