Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Filip Pizlo <fpizlo <at> apple.com>
Subject: Re: Proposal: add intrinsics for safe division
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Friday 25th April 2014 18:49:18 UTC (over 3 years ago)
On April 25, 2014 at 10:48:18 AM, Reid Kleckner ([email protected]) wrote:

On Fri, Apr 25, 2014 at 10:19 AM, Filip Pizlo  wrote:
The sdiv operation in LLVM IR only makes sense for C and its very direct
relatives.  The amount of control flow necessary to represent a safe
division for any other language is ghastly.  (a/b) becomes something like
(b != 0 ? ((a != INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's more
useful to the optimizer to see this as the single operation that it really
is.

This argument makes sense to me. It would be hard for the arm64 backend to
pattern match all this control flow back into a single instruction after
optimizations.

Are there ISAs out there that produce different results without trapping
for these boundary conditions? Should we leave the quotient result
unspecified in the b == 0 || (a == INT_MIN && B == -1) cases? If we do
that, we require frontends to check the "overflow" bit if they want
deterministic cross-platform behavior.
That's a good question.  I believe that all ISAs go to one of the
following paths:

- Trap on x/0 and INT_MIN/-1 like x86.

- Return 0 for x/0 and INT_MIN for INT_MIN/-1 like ARM.

Also, here's what I know about what different languages do:

C/C++/etc: undef on x/0 and INT_MIN/-1, so they would use LLVM's existing
div instruction.

JavaScript: optimizing compilers will look for places where people say
things like (a/b)|0 - in JavaScript saying x|0 means "truncate to int32",
and they will do an integer division if the inputs were already represented
as integers.  x/0 will produce NaN and NaN|0 produces 0; INT_MIN/-1
produces -INT_MIN (which is representable as double) and then (-INT_MIN)|0
produces INT_MIN.  So, the ARM64 semantics match optimized JavaScript
semantics.  There is also the case that you say a/b in JavaScript, and we
have inferred that a and b are integers but there is no |0.  In that case
we want to detect when x/0 and INT_MIN/-1 so that we can promote the result
to double.

Java: x/0 throws an exception but INT_MIN/-1 produces INT_MIN.

Ruby and probably a lot of others: provide BigInt semantics but have fast
paths for small ints.  So, these will throw an exception on x/0 and they
will inflate to a BigInt on INT_MIN/-1.

I don't know about other languages.

Looking at this, it makes me *slightly* prefer to generalize these
intrinsics a bit.  Instead of returning an i1 flag for all overflow cases,
you could return an i2 status enum that says 0 for neither overflow
condition, 1 for x/0, and 2 for INT_MIN/-1.  Then, you get the following
matrix for how Java and JavaScript would map to this and how it would be
lowered:

- Java on x86: use safediv and check if the if the status is 1.  If it's 1
then throw.  This lowers to a bunch of branches fairly late in the
backend.

- JavaScript on x86 with |0: use safediv and ignore the status.  Also
loweres to a bunch of branches, which is as good as as it gets.

- JavaScript on x86 without |0: use safediv and deoptimize if status != 0.
 Also a bunch of branches.

- Ruby on x86: use safediv and detect both status==1 and status==2.  This
lowers to a bunch of branches.

- Java on ARM: use safediv as on x86, but not it lowers to just a branch to
detect x/0 and not the other case.  Note that without the safediv
intrinsic, you wouldn't be able to get this to one branch unless you used a
target intrinsic.

- JavaScript on ARM with |0: use safediv and you ignore the status and you
get no branches.  It maps directly to ARM.

- JavaScript on ARM without |0: use safediv and test the status, so you get
a bunch of branches as before.

- Ruby on ARM: this lowers to a bunch of branches as on x86.

So, it seems that for JavaScript and Java on ARM - and probably any
environment with emulated division, these intrinsics would result in
tighter code without requiring target intrinsics.  For all other
language/hardware combinations, the main benefit here is to materialize the
gnarly control flow later and so hopefully reveal more optimization
opportunities.  That seems like a pretty good sweet-spot for the systems
that I know about.

-Filip
 
CD: 4ms