Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Chandler Carruth <chandlerc <at> gmail.com>
Subject: Heads up! New x86 vector shuffle lowering becoming the default!
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Wednesday 1st October 2014 01:00:18 UTC (over 2 years ago)
Greetings folks!

After numerous rounds of benchmarks and lots of fixes, I think the new x86
vector shuffle lowering code path is ready to be enabled by default. I plan
to update the test cases and flip the switch as soon as the patches are
written.

We have had *significant* benchmarking at this point. I have no benchmarks
which regress by a very significant margin (over 5%) and the improvements
over all outstrip the regressions for all the micro-architectures I have
benchmarked. This includes both some internal benchmarks I have, but more
importantly LNT.

From what I can tell, AMD chips will see an *extremely* significant
improvement when using older SSE versions. If you're build only uses SSE3
or older, this should help a lot. Even when using newer ISA extensions, you
should see quite significant improvements.

Even on Intel chips I'm seeing very significant improvements for SSE3 and
SSE2. I'm also seeing some very small gains for modern ISAs, but mostly its
in the noise.

The primary difference is that the new lowering takes systematic approach
to decomposing the shuffle into components which have efficient instruction
sequences. As a consequence, at no point does it "fall back" to scalarizing
code the way the old lowering did. It also works very hard to minimize
domain crossing traffic which can have unexpectedly large penalties for
real world code.

If you're seeing performance regressions because of this switch, it should
actually be quite a bit easier to add code to the new logic to handle your
cases better. Patches welcome!


Correctness testing has been done by building LNT, bootstrap, and a bunch
of other large code bases with the new code path. But I have also written
(and committed) a fuzz tester for vector shuffles. Using that I have
correctness tested everything up through AVX-512 (thanks to the Intel SDE).
I have spent significant CPU cycles testing up through AVX2 with well over
2 million fuzz tests without any failures detected. For comparison, the
current default code path has several crashers and miscompiles found after
1 million fuzz tests.

Thanks!
-Chandler
 
CD: 4ms