Gmane
From: Vladimir Tzankov <vtzankov <at> gmail.com>
Subject: Re: CLISP multithreading (patch included)
Newsgroups: gmane.lisp.clisp.devel
Date: 2008-07-28 11:33:40 GMT (48 weeks, 6 days and 5 hours ago)
Hi Sam,

I have implemented the same thing with TLS (__thread or
xthread_key_get() whichever is available).
Linked is another patch for it that can be applied against the CVS (it
is not compatible with the previous one - actually I have removed all
the THREAD_SP_SHIFT, sp_to_thread(), etc).
http://code.brumbar.com/clisp-mt-tls-20080728.patch

This time the code is much cleaner (the patch is big, since I removed
per_thread from all global variables - now there is single one
_current_thread which is per_thread - this might be useful for
embedding). The LISP stack for the new threads is allocated by
malloc() which may be not so good - but did not want to mess with
memory mappings.

With this patch it should be possible to build almost straightforward
for Win32 (native threads) - only the xthread_cancel() should be
implemented.

I have done benchmarks (the standard ones) and here is link to the results.
http://code.brumbar.com/clisp-mt-bench.txt

This is the summary:

(32 bit Debian x86 2.6.18-4)
TLS-THREAD-LINUX-X86 (with __thread )
total             23.34946 sec    23.34946 scaled
TLS-THREAD-LINUX-X86 (TLS via xthread_key_get /pthread_getspecific/)
total             62.44790 sec    62.44790 scaled
SP-THREAD-LINUX-X86 (stack pointer tweaking)
total             22.11738 sec    22.11738 scaled
NO-THREADS-LINUX-X86 (CVS HEAD version)
total             21.14132 sec    21.14132 scaled

As it seems the implementation with pthread_getspecific is almost 3
times slower.
There is no big difference between the other values - single threaded
CVS build has little advantage.

OSX PPC Darwin Kernel Version 8.11.0
TLS-THREADS-OSX-PPC (TLS via xthread_key_get /pthread_getspecific/)
total            333.30949 sec   333.30949 scaled
SP-THREADS-OSX-PPC (TLS vis stack pointer tweaking)
total             57.13520 sec    57.13520 scaled
NO-THREADS-OSX-PPC (CVS HEAD version)
total             61.29787 sec    61.29787 scaled

Here the build with TLS (via pthread_getspecific) is really very slow
- almost 6 times.
The SP tweaking however has a small advantage over the single thread
CVS build !!!

Some more information about the difference between __thread and
pthread_getspecific can be found here:
http://blogs.sun.com/seongbae/date/20051216

So as it seems it is reasonable to use TLS when the compiler provides
built in support for it - the code is straightforward and performance
is good. In all other cases the SP tweaking gives much better
performance.

BR
  Vladimir

>> I would prefer a TLS (Thread-Local Storage - __thread / per_thread)
>> approach because it should be cheaper.
>> we can keep both though - on the platforms with TLS, declare
>> clisp_thread_t* current_thread not a function but a per_thread variable.
>
> Basically every multithreading environment I know provides a mechanism
> for TLS however not all compilers have __thread (__desclspec(thread))
> support. The Apple fork of gcc does not (probably others as well).
>
> You are right - it is possible to redefine current_current() depending
> on this. For example on osx (and platforms without compiler TLS
> support) it can be something like:
> #define current_thread()  ({ var clisp_thread_t *__thr;
> (clisp_thread_t *)xthread_key_get(cur_thr_key); })
>
> This will remove the ugliness of switching manually the stack pointer
> (plus possible unexpected consequences of this) and is supposed to be
> quite portable. If it is fine (I do not see a reason not to be) -
> there will be no reason to keep the sp_to_thread() stuff (unless
> somebody runs in on MT platform that does not provide TLS - is there
> such platform?).
>
>