Features Download
From: Chris Lattner <clattner <at> apple.com>
Subject: Re: C as used/implemented in practice: analysis of responses
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Tuesday 7th July 2015 17:26:43 UTC (over 2 years ago)
> On Jul 1, 2015, at 3:20 PM, Sean Silva  wrote:
> On Wed, Jul 1, 2015 at 12:22 PM, Russell Wallace
> wrote:
> I am arguing in favor of a point, and I understand you disagree with it,
but I don't think I'm dismissing any use cases except a very small
performance increment.
> I'm sure Google has numbers about how much electricity/server cost they
save for X% performance improvement.
> I'm sure Apple has numbers about how much money they make with X%
improved battery life.
> I'm not convinced that the cost of some of these bugs is actually larger
than the benefit of faster programs. Nor am I convinced about the inverse.
I'm just pointing out that pointing to a "bad bug" caused by a certain
optimization without comparing the cost of the bug to the benefit of the
optimization is basically meaningless. You'll need to quantify "very small
performance improvement" and put it in context of the bugs you're talking

As with many things, it is more complicated than that.  The performance
effects of optimizations are often non-linear, and you can take a look at
many of the worst forms of UB in C and easily show cases where they allow
2x speedups, not just 2%.

For example, consider undefined behavior for integer overflow:

  for (int i = 0; i <= N; ++i) {

When compiling for a 64-bit machine, you really want to promote the
induction variable to 64-bits.  Further, knowing the trip count of a loop
is extremely important for many loop optimizations.  Unfortunately, without
being able to assume undefined integer wraparound, you get neither of these
from C.

-fstrict-aliasing is another great example.  In many cases, it makes no
difference whatsoever.  OTOH, on code like:

void doLoopThing(float *array, int *N) {
    for (int i = 0; i < *N; ++i) {
       array[i] = array[i] + 1;

You can easily get a 2x or more speedup due to auto-vectorization if you
can assume -fstrict-aliasing.  Of course usually you wouldn’t write this
code, you’d get this because doLoopThing is a template, and N is passed
in as a reference.

Anyway, I could go on and on here, and I’ve spent a lot of time over the
years thinking about how to improve the situation: can we make clang detect
more of these, can we make the optimizer more conservative in certain cases
etc?  This is why (for example) our TBAA uses simple structural points-to
analysis before using TBAA.  With GCC’s implementation (circa GCC 4.0, I
have no idea what they are doing now), GCC would “miscompile” code

	float bitcast(int x) {
	  return *(float*)&x;

This code is a TBAA violation, but is also “obvious” what the
programmer meant.  LLVM being “nicer” in this case is a feature.  It is
irritating that the union version of this is also technically UB or
implementation defined behavior, so that isn’t portable either (a C
programmer needs to magically know that memcpy is the safe way to do this).

However, as I’ve continued to dig into this, my feeling is that there
really is no satisfactory solution to these issues.  The problem here are
pervasive structural problems in the C language: In the first example
above, it is that “int” is the default type people generally reach for,
not “long”, and that array indexing is expressed with integers instead
of iterators.  This isn’t something that we’re going to “fix" in the
C language, the C community, or the body of existing C code.  Likewise,
while C++ has made definite progress here by replacing some of these idioms
(e.g. with iterators), it adds its own layers of UB on, and doesn’t
actually *subtract* the worst things in C.

My conclusion is that C, and derivatives like C++, is a very dangerous
language the write safety/correctness critical software in, and my personal
opinion is that it is almost impossible to write *security* critical
software in it.  This isn’t really C’s fault if you consider where it
was born and evolved from (some joke that it started as a *very* nice high
level assembler for the PDP11 https://en.wikipedia.org/wiki/C_(programming_language)#Early_developments

There are many more modern and much safer languages that either eliminate
the UB entirely through language design (e.g. using a garbage collector to
eliminate an entire class of memory safety issues, completely disallowing
pointer casts to enable TBAA safely, etc), or by intentionally spending a
bit of performance to provide a safe and correct programming model (e.g. by
guaranteeing that integers will trap if they overflow).  My hope is that
the industry will eventually move to better systems programming languages,
but that will take a very very long time...

CD: 4ms