Features Download
From: Mike Pall <mikelu-0802 <at> mike.de>
Subject: Re: LuaJIT roadmap 2008
Newsgroups: gmane.comp.lang.lua.general
Date: Saturday 2nd February 2008 02:48:44 UTC (over 9 years ago)
alex.mania@iinet.net.au wrote:
> Love the idea of 8 byte values.. Ingenious using NaNs to encode

Excerpt from the docs:

| These internal tags overlap the MSW of a number object (must be a
| Interpreted as a number these are special NaNs. The FPU only generates 
| one type of NaN (0xfff8_0000_0000_0000). So MSWs > 0xfff80000 are
| for use as internal tags. Small negative numbers are used to shorten the
| encoding of type comparisons (reg/mem against sign-ext. 8 bit immediate).
|                  MSW    LSW
| primitive types  itype  (undefined)
| lightuserdata    itype  void *
| collectables     itype  GCObject *
| number           ----double----

I.e. nil is -1, false is -2, true is -3 and so on. Checking for
false in conditions is as simple as doing (itype >= -2) with an
unsigned comparison. Another nice side-effect is that storing an
FP number implicitly sets the type at the same time.

[Security notice: The few code pieces, where an arbitrary bit
pattern can be injected in place of a number, normalize these to
a standard NaN.]

I actually had this idea back then in mid 2006 and prototyped it
for plain Lua. It doesn't help a lot there. Any performance
advantage is lost in the pipeline noise amongst all of the BTB
misses due to the switch-based instruction dispatch.

A half-baked port to LuaJIT 1.x showed some speedups on most
cache-sensitive benchmarks, but also some bad slowdowns on
numeric benchmarks. This is caused by store-to-load forwarding
stalls due to the many type checks which load the 32 bit MSW
right after a 64 bit LSW/MSW store. This was another point which
made me think about a better design for LuaJIT.

Alas, I don't think this idea is original. Pointer tagging has
been used in the earliest Lisp and Smalltalk engines. But I
haven't seen it used in conjunction with FP numbers -- someone
must have invented it before.

Interestingly a variant of it appears in the recently published
"Tamarin Tracing" branch of Adobe's Flash VM (purportedly the
next-gen JavaScript engine for the Mozilla project). [Nope, I've
had no communication with its authors].

> On Sat Feb  2  2:09 , Mike Pall  sent:
> >- Many of the standard library functions are now "internal"
> >  functions. The fast paths are written in assembler, the
> >  fallback paths are written in C. Apart from dramatic speedups
> >  this also enables the stackless nature of coroutines.
> This is similar to current luajit correct? Just for more
> library functions. You don't mean that many standard library
> functions now can't be changed at runtime? (hopes)

You can change all of them, if you want. Doing silly things like
  math.sin = math.random
is perfectly ok. Go and surprise your co-workers. ;-)

Lua's dynamic semantics are fully preserved. Internal functions
have special call gates and are tagged with an ID. No matter
where you store them, how you name them or how often they are
passed around -- they run correctly in the interpreter and will
be correctly identified and translated to ucode in the recording

> And out of curiousity, is some version of DynASM still used for
> the final compilation phase or has that too needed a redesign?

I'm using DynASM for building the assembler parts of the VM
(mainly the interpreter) in an extra build step. Yep, this
involves generating various object formats (yuck) so this can be
linked to the executable. Not really dynamic, though.

> (I imagine by the sounds of it you'd need some way of changing
> the registers used at compile time, something I don't believe
> DynASM allows?)

Yes, you're right, this is a weakness of DynASM. Since LuaJIT 1.x
had static register assignments, this was just applying the YAGNI

But actually this is only part of the problem. The tricky thing
is, that I'm doing bottom-up code generation in the trace
compiler. I.e. encode the last ucode first and work upwards to
the top of the trace. This simplifies linear-scan register
allocation on an SSA stream and helps with instruction selection
and dead-code elimination. [A side-effect-free ucode, which has
not been assigned a register before encoding it, is dead -- think
about it ...]

Right now the ugly piece of code passing off as the x86 backend
employs a mess of C macros. I have some ideas on how to bring
back the comfort of DynASM, but work on other parts has priority.

CD: 3ms