Features Download

From: Jim Tilander <jim.tilander <at> gmail.com>
Subject: Re: How to get 3dvector largest coordinate index?
Newsgroups: gmane.games.devel.algorithms
Date: Saturday 7th March 2009 05:28:16 UTC (over 9 years ago)
It's sad to see that so many people have no knowledge of the C++
language, or even worse, disdain for that knowledge and taking the
attitude that it doesn't matter. Yes, the standard is pretty nasty,
long and full of language that takes way too long to interpret. But
it's *the* only source of authoritative answers to many of the
questions we can encounter while coding. It's like a workman who
doesn't like or want to understand his tools. Oh, wait -- it's exactly
like that.

Now don't get me wrong, I've by no means an expert at the language,
but I've tried to preach 3.10.15 over the years (I saw that Charles
beat me to it posting it). The very first reaction to the paragraph
should be to read it again. Here it is:

If a program attempts to access the stored value of an object through
an lvalue of other than one of the following
types the behavior is undefined:
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the
dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type of
the object,
— an aggregate or union type that includes one of the aforementioned
types among its members (including,
recursively, a member of a subaggregate or contained union),
— a type that is a (possibly cv-qualified) base class type of the
dynamic type of the object,
— a char or unsigned char type.

Note the wording of the first sentence (read it again, how many of you
guys did read it the time Charles posted it? :). *Any other* access
than the ones listed are invalid (actually undefined behavior, which
in standard speak is run away and hide). Actually, Philip Taylor wrote
an earlier email explaining most of this in plain talk, read his post
again :) People try to get around this and read the following very
liberally, but it turns out that no, the way you can access memory is
very limited. The above can be summarize in short as:

"Only one type can ever live at one memory address at the same time"

Now, as many noted, char* are intentionally left there as a loophole.
You can always access things through a char* and it will be fine. But
you need to treat it as char, and not go around doing things like:

  float f;
  int* bad = (int*)(char*)(float*)&f; // this is bad.

If you want an integer representing your IEEE 754 number, you need to
do (assuming a long is 32 bits of course and endianess):

  float f;
  unsigned char* p = (unsigned char*)&f;
  long i = p[0] << 24 | p[1] << 16 | p[2] << 8 | p[3];

The union trick does not work. No really, it doesn't. It's undefined
behavior meaning that the compiler is free to do whatever it likes.
Even erase your hard drive or start tetris. It so happens that the gcc
guys usually doesn't want people to hunt them down and strangle them
so in this case they are trying to do the right thing, through turning
off most of the optimizations.

Another point to make is that the bit_cast that Pal mentioned (I think
it has been mentioned before as union_cast) has one terrible flaw
apart from the obvious that it's undefined since it uses the union
trick: it works by copying r-values. It can thus be abused like this:

  int* i = union_cast( &f );

Which will just yield the original error while optimizing since it
copies the r-value of the pointer, but not that of the value itself. I
think I saw this usage on the thread as well, stay away from it --
pain an misery will ensue.

Now some people question the need to actually bother with stuff like
this, and continue on casting like we programmed in C89. They will get
bitten by the decent compilers that does perform the most basic
optimizations. They will be very slow on in-order processor by nature
of the compiler having no chance to figure out aliasing. Aliasing btw
is the big reason why FORTRAN is 2x as fast as regular C89 in most of
the cases. Aliasing a large reason why C99 came about and why C++ has
the draconian rule 3.10.15.  There is a *reason* why this all matters
and that is speed.

Why would we bother with the whole union-type punning illegalities?
Because it breaks the standard! That's why we are in this whole mess
to start with, remember we broke the standard by casting float* to
int* and then using it? The proposed solution was a bug in the
compiler that could be exploited by using unions. It just seems
incredibly shortsighted to me, to go and break the same rule again and
then stick the head in the sand pretending that it's not a problem.
It's a fundamental problem with how we write code if we can not follow
something that's in the basics of the language (oh, god don't bring
templates into this if we can't even handle the basics like using
r-values and l-values).

Note that the TBAA (Type Based Alias Analysis) have been present in
gcc for a long time, it's only recently (ok, really not so recently)
that they decided to make it default in -O2, even though it would
break a lot of code. The fact that people think that they can break
the standard and be safe just because nobody will want to break their
code is just naive. Compiler vendors will "break" your code, and if
you turn around and complain I bet the answer will be "turn off
optimizations to fix it". Which I bet those people will not be willing
to do and then will turn to scramble to fix their code. Only, we are
talking about basic assumptions here. All the l-value accesses. Those
are a ton of accesses. It could potentially be a complete rewrite of a
major part of the codebase. That's what we've faced on the current
platforms and it just burned everybody badly. Why would you after this
advocate breaking the same rule again?


Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco,
-OSBC tackles the biggest issue in open source: Open Sourcing the
-Strategies to boost innovation and cut costs with open source
-Receive a $600 discount off the registration fee with the source code:
GDAlgorithms-list mailing list
[email protected]
CD: 4ms