Features Download
From: David Majnemer <david.majnemer <at> gmail.com>
Subject: RFC: Proposal for Poison Semantics
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Wednesday 28th January 2015 02:50:00 UTC (over 3 years ago)

What follows is my attempt to describe how poison works.  Let me know what
you think.


# LLVM Poison Semantics

Poison is an LLVM concept which exists solely to enable further
optimization of LLVM IR. The exact behavior of poison has been, to say the
least, confusing for users, researchers and engineers working with LLVM.

This document hopes to clear up some of the confusion of poison and
hopefully explain *why* it has its semantics.

## A Quick Introduction to Poison

Let's start with a concrete motivating example in C:
int isSumGreater(int a, int b) {
  return a + b > a;

The C specification permits us to optimize the comparison in `isSumGreater`
to `b > 0` because signed overflow results in undefined behavior.  A
reasonable translation of `isSumGreater` to LLVM IR could be:

define i32 @isSumGreater(i32 %a, i32 %b) {
  %add = add i32 %a, %b
  %cmp = icmp sgt i32 %add, %a
  %conv = zext i1 %cmp to i32
  ret i32 %conv

However, LLVM cannot determine that `%cmp` should not consider cases where
`%add` resulted in signed overflow.  We need a way to communicate this
information to LLVM.

This is where the `nsw` and `nuw` flags come into play.  `nsw` is short for
"no signed wrap", `nuw` is short for "no unsigned wrap".

With these, we can come up with a new formulation of `%add`: `add i32 nsw
%a, %b`.
LLVM can take this into account when it is optimizing the `%cmp` and
replace it with: `icmp sgt i32 %b, 0`.

## Differences Between LLVM and C/C++

There are some interesting differences between what C++ and C specify and
how LLVM behaves with respect to performing an operationg which is not
permitted to overflow.

Perhaps chief among them is that evaluating an expression in C++ or C which
results performs an overflow is undefined behavior. In LLVM, executing an
instruction which is marked `nsw` but which violates signed overflow
results in poison. Values which have no relationship with poisoned values
are not effected by them.

Let us take the following C program into consideration:
int calculateImportantResult(int a, int b) {
  int result = 0;
  if (a) {
    result = a + b;
  return result;

A straightforward lowering to LLVM IR could be:
define i32 @calculateImportantResult(i32 %a, i32 %b) {
  %tobool = icmp ne i32 %a, 0
  br i1 %tobool, label %if.then, label %if.end

  %add = add nsw i32 %a, %b
  br label %if.end

  %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
  ret i32 %result

Moving `%add` to the `%entry` block would be preferable and would allow
further optimizations:
define i32 @calculateImportantResult(i32 %a, i32 %b) {
  %tobool = icmp ne i32 %a, 0
  %add = add nsw i32 %a, %b
  %result = select i1 %tobool, i32 0, i32 %add
  ret i32 %result

In the original code, the calculation of `%add` was control dependent.
Now, `%add` might result in signed overflow in violation of the `nsw` flag.
Despite this, the program should behave as it did before because the
poisoned value is masked-out by the select. The next section will dive into
this in greater detail.

# Computation Involving Poison Values
Poison in a computation results in poison if the result cannot be
constrained by its non-poison operands.

Examples of this rule which will result in poison:
  %add = add i32 %x, %always_poison
  %sub = sub i32 %x, %always_poison
  %xor = xor i32 %x, %always_poison
  %mul = mul i32 %always_poison, 1

Examples of this rule which do not result in poison:
  %or  = or  i32 %always_poison, 2
  %and = and i32 %always_poison, 2
  %mul = mul i32 %always_poison, 0

In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
`0`.  In this respect, poison is not different from `undef`.

The following example is only poison if `%cond` is false:
  %sel = select i1 %cond, i32 2, %always_poison

### Is it safe to have poison as a `call` argument?

A `call` instruction may or may not result in poison depending on exactly
how the callee  uses the supplied arguments, it is not necessarily the case
that `call i32 @someFunction(i32 %always_poison)` results in poison.

LLVM cannot forbid poison from entering `call` arguments without
prohibiting an optimization pass from outlining code.

### Is it safe to store poison to memory?

`store i32 %always_poison, i32* %mem` does not result in undefined
behavior. A subsequent load instruction like `%load = load i32* %mem` will
result in `%load` being a poison value.

### Is it safe to load or store a poison memory location?

No.  Poison works just like `undef` in this respect.

### Does comparing a poison value result in poison?

It depends.  If the comparison couldn't solely be determined by looking at
the other operand, the result is poison.

For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
However, `icmp i32 ne %always_poison, 7` is poison.

### What if the condition operand in a `select` is poison?

In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel` is
either `true` or `false`.  Because, `%sel` depends on `%always_poison` it
too is poison.
CD: 3ms