Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Mark Seaborn <mseaborn <at> chromium.org>
Subject: Re: Named register variables GNU-style, deux
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Saturday 19th April 2014 23:31:54 UTC (over 3 years ago)
On 19 April 2014 15:41, Austin Seipp  wrote:

> You would think that considering this variable is (p)thread local, we
> could just use a __thread variable, or pthread_{get,set}specific to
> manage. But on OS X, both of these equate to an absolutely huge
> performance loss, upwards of 25%. Which is unacceptable, realistically
> speaking, but we've had to deal with it.


In practice, pthread_getspecific() on x86-64 on Mac OS X is just a very
simple assembly routine:

  movq %gs:_PTHREAD_TSD_OFFSET(,%rdi,8),%rax
  ret

For Native Client on Mac x86-64, we check that pthread_getspecific()
contains the code above, and we inline the %gs access into NaCl's runtime
code (reading the value of _PTHREAD_TSD_OFFSET from pthread_getspecific()'s
code).

You can find the code for doing that here:
https://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/trusted/service_runtime/arch/x86_64/nacl_tls_64.c?revision=11149

NaCl's reason for doing this is that NaCl needs to be able to read a
thread-local variable in a context when there's no stack available for
calling pthread_getspecific().  (We could pre-allocate a pool of stacks and
then allocate a stack from this pool with an atomic operation, then call
pthread_getspecific() on that stack.  But that's a lot more complicated,
and slower.)

This will of course break if OS X's implementation of pthread_getspecific()
changes (other than to change _PTHREAD_TSD_OFFSET).  Hopefully, if that
ever happens, OS X will have already started providing better thread-local
variables that can be accessed without calling a function, like what
Linux/ELF and Windows provide. :-)

This is hacky, but it should be completely reliable if
pthread_getspecific() matches the expected pattern, because it's not like
the code for pthread_getspecific() is going to change underneath you.

You could use the same trick, and fall back to calling
pthread_getspecific() if the code it contains doesn't match the pattern you
expect.

Cheers,
Mark
 
CD: 4ms