Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Arnold Schwaighofer <aschwaighofer <at> apple.com>
Subject: Re: Improving SLPVectorizer for Julia
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Monday 17th March 2014 23:54:35 UTC (over 3 years ago)
Hi Arch,

Thanks for looking at this.

The reason the SLPVectorizer bails out on many cases that seem vectorizable
is scheduling. It needs to produce a legal schedule. The way it does this
is by making sure that it can move all vectorized instructions to the last
instruction in a bundle. (Alternatively, you could build a dag, make sure
that you don’t create cycles and then produce a topological sort, but
this was not done out of compile time concerns).


If I understand your patch correctly you are disabling the above mentioned
check if the vectorizer starts at an insertelement instruction? What about
other users? You still need to detect that you can schedule them correctly.


define <4 x float> @julia_foo111(<4 x float>, <4 x float>) {
top:
  %2 = extractelement <4 x float> %0, i32 0
  %3 = extractelement <4 x float> %1, i32 0
  %4 = fadd float %2, %3
  %5 = insertelement <4 x float> undef, float %4, i32 0
  %6 = extractelement <4 x float> %0, i32 1
  %7 = extractelement <4 x float> %1, i32 1
  %8 = fadd float %6, %7
 
  %foo = operation which has a use of %8 that potentially feeds %12 but
even if not all of its users now need to be move below %16 and we need to
check all their users recursively …

  %9 = insertelement <4 x float> %5, float %8, i32 1
  %10 = extractelement <4 x float> %0, i32 2
  %11 = extractelement <4 x float> %1, i32 2
  %12 = fadd float %10, %11
  %13 = insertelement <4 x float> %9, float %12, i32 2
  %14 = extractelement <4 x float> %0, i32 3
  %15 = extractelement <4 x float> %1, i32 3
  %16 = fadd float %14, %15
  %17 = insertelement <4 x float> %13, float %16, i32 3
  ret <4 x float> %17
}

For your case of insertelements that start a vector tree you would get away
keeping a set of “insertelement” instructions of of which
trytoVectorizeList below started of.

if (InsertElementInst *IE = dyn_cast(it)) {
      SmallVector Ops;
      if (!findBuildVector(IE, Ops))
        continue;
      // add insert elements to InsertVectorRoot. you would need to make
sure that all ‘other’ uses of those insert elements are below the last
insert.
      if (tryToVectorizeList(Ops, R))

Instead of checking “buildsVector”. You could check this set.

      if (RdxOps && RdxOps->count(UI))
         continue;
 
+      // This user is part of building a vector
+      if (buildsVector) // use something like: if
(InsertVectorRoot.count(UI)) instead.
+        continue;
+

And this set would also contain the instructions that need to be moved.

Alternatively, we could teach the slp vectorizer how to ‘vectorize’
insertelements and start the vectorization tree with the insertelements
instead of its operands. Then it would naturally work (because in tree
users are considered safe).

Best,
Arnold

On Mar 17, 2014, at 2:38 PM, Robison, Arch  wrote:

> define <4 x float> @julia_foo111(<4 x float>, <4 x float>) {
> top:
>   %2 = extractelement <4 x float> %0, i32 0
>   %3 = extractelement <4 x float> %1, i32 0
>   %4 = fadd float %2, %3
>   %5 = insertelement <4 x float> undef, float %4, i32 0
>   %6 = extractelement <4 x float> %0, i32 1
>   %7 = extractelement <4 x float> %1, i32 1
>   %8 = fadd float %6, %7
>   %9 = insertelement <4 x float> %5, float %8, i32 1
>   %10 = extractelement <4 x float> %0, i32 2
>   %11 = extractelement <4 x float> %1, i32 2
>   %12 = fadd float %10, %11
>   %13 = insertelement <4 x float> %9, float %12, i32 2
>   %14 = extractelement <4 x float> %0, i32 3
>   %15 = extractelement <4 x float> %1, i32 3
>   %16 = fadd float %14, %15
>   %17 = insertelement <4 x float> %13, float %16, i32 3
>   ret <4 x float> %17
> }
 
CD: 3ms