I’ve been working a bit more on my Emacs JIT, in particular on improving function calling. This has been a fun project so I thought I’d talk about it a bit.
Under the hood, the Emacs Lisp implementation has a few different ways to call functions. Calls to or from Lisp are dispatched depending on what is being called:
- For an interpreted function, the arguments are bound and then the interpreter is called;
- For a byte-compiled function using dynamic binding, the arguments are bound and then the bytecode interpreter is called;
- For a byte-compiled function using lexical binding, an array of arguments is passed to the bytecode interpreter;
- For a function implemented in C (called a “
subr” internally), up to 8 arguments are supported directly — as in, C functions of the form
f(arg,arg,...); for more than that, an array of arguments is passed and the function itself must decide which slot means what. That is, there are exactly 10 forms of
subr (actually there are 11 but the one missing from this description is used for special forms, which we don’t need to think about here).
Oh, let’s just show the definition so you can read for yourself:
Lisp_Object (*a0) (void);
Lisp_Object (*a1) (Lisp_Object);
Lisp_Object (*a2) (Lisp_Object, Lisp_Object);
Lisp_Object (*a3) (Lisp_Object, Lisp_Object, Lisp_Object);
Lisp_Object (*a4) (Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object);
Lisp_Object (*a5) (Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object);
Lisp_Object (*a6) (Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object);
Lisp_Object (*a7) (Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object);
Lisp_Object (*a8) (Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object);
Lisp_Object (*aUNEVALLED) (Lisp_Object args);
Lisp_Object (*aMANY) (ptrdiff_t, Lisp_Object *);
Initially the JIT worked like a lexically-bound bytecode function: an array of arguments was passed to the JIT-compiled function. The JIT compiler emitted a bunch of code to decode the arguments.
For Lisp functions taking a fixed number of arguments, this wasn’t too bad — just moving values from fixed slots in the array to fixed values in the IR.
Handling optional arguments was a bit uglier, involving a series of checks and branches, so that un-bound arguments could correctly be set to nil. These were done something like:
if nargs < 1 goto nope1
arg0 = array
if nargs < 2 goto nope2
arg1 = array
nope1: arg0 = nil
nope2: arg1 = nil
&rest arguments were even a bit worse, requiring a call to create a list. (This, I think, can’t be avoided without a much smarter compiler, one that would notice when reifying the list could be avoided.)
Note that calling also has to use the fully generic approach: we make a temporary array of arguments, then call a C function (
Ffuncall) that does the dispatching to the callee. This is also a source of inefficiency.
Recently, I changed the JIT from this approach to use the equivalent of the subr calling convention. Now, any function with 8 or fewer (non-
&rest) arguments is simply an ordinary function of N arguments, and we let the already-existing C code deal with optional arguments.
Although this often makes the generated assembly simpler, it won’t actually perform any better — the same work is still being done, just somewhere else. However, this approach does use a bit less memory (most JIT-compiled functions are shorter); and it opens the door to an even bigger improvement.
What I’m implementing now is an approach to removing most of the overhead from JIT-compiled function calls.
Now, ideally what I’d like is to have every call site work “like C”: move the arguments to exactly where the callee expects them to be, and then call. However, while looking at this I found some problems that make it tricky:
- We still need to be able to call Lisp functions from C, so we’re limited to, at best,
subr-style calling conventions;
&rest arguments are straightforward (in our simple compiler, somebody just has to make the list);
&optional arguments don’t have a good C-like analog. The callee could push extra arguments, but…
- In Lisp, a function can be redefined at any time, and it is fine to change the function’s signature.
Consider this example:
(defun callee (x &optional y) (list x y))
(defun caller (callee 23))
(defun callee (x) (list x))
Now, if we compiled
caller with a direct call, it would turn out like
(callee 23 nil). But then, after the redefinition, we’d have to recompile
caller. Note this can go the other way as well — we could redefine
callee to have more optional arguments, or even more fixed arguments (meaning that the call should now throw an exception).
Recompiling isn’t such a big deal, right? The compiler is set up very naively: it just compiles every function that is invoked, and in this mode “recompilation” is equivalent to “just abandon the compiled code”.
Except… what do you do if
caller is being run when
callee is redefined? Whoops!
Actually, of course, this is a known issue in JIT compilation, and one possible solution is “on-stack replacement” (“OSR”) — recompiling a function while it is running.
This to me seemed like a lot of bookkeeping, though: keeping a list of which functions to compile when some function was redefined, and figuring out a decent way to implement OSR.
Instead I came up a with a simpler approach, involving — you guessed it — indirection.
On the callee side, I am going to keep the
subr calling convention that is in place today. This isn’t ideal in all cases, but it is reasonable for a lot of code. Instead, all the changes will take place at spots where the JIT emits a call.
I am planning to have three kinds of function calls in the JIT:
- Indirect. If we see some code where we can’t determine the callee, we’ll emit a call via
Ffuncall like we do today.
- Fully direct. There are some functions that are implemented in C, and that I think are unreasonable to redefine. For these, we’ll just call the C function directly. Another fully-direct case is where the code dispatches to a byte-code vector coming from the function’s constant pool — here, there’s no possibility to redefine the function, so we can simply always call the JIT-compiled form.
- Semi-direct. This will be the convention used when JIT-compiled code calls via a symbol.
The core idea of a semi-direct call is to have multiple possible implementations of a function:
- One “true” implementation. If the function has 8 or fewer arguments (of any kind), it will simply have that many arguments. The JIT will simply pretend that an optional argument is fixed. If it has more than 8 arguments, following the
subr convention it will just accept an array of arguments.
- If the function has optional or rest arguments, there will be trampoline implementations with fewer arguments, that simply supply the required number of additional arguments and then call the true implementation.
- Remember how there are exactly 10 relevant kinds of
subr? Any forms not covered by the above can simply throw an exception.
A vector of function pointers will be attached to each symbol, and so the JIT-compiled code can simply load the function pointer from the appropriate slot (a single load — the nice thing about a JIT is we can simply hard-code the correct address).
Then, when a function is redefined, we simply define any of the trampolines that are required as well. We won’t even need to define all of them — only the ones that some actually-existing call site has needed.
Of course, no project like this is complete without a rathole, which is why instead of doing this I’m actually working on writing a compiler pre-pass so that the compiler itself can have the appropriate information about the callee at the point of call. This sub-project turns out to feel a lot like writing a Java bytecode verifier…
Currently the JIT is only used for lexically-bound bytecode functions. That’s a reasonable restriction, I think — so one thing we should do is make sure that more of the Emacs core is using lexical binding. Currently, only about 1/3 of the Lisp files in Emacs enable this feature; but many more easily could.
Once my current project is done, the JIT will have a decent calling convention by default. Since we’ll have information about callees at points of call, I think it will be a good time to look into inlining. This will require tackling recompilation (and perhaps OSR) and having some sort of tiered optimization approach. There is still a lot for me to learn here — when does it make sense to inline? And what metrics should I use to decide when some code is hot enough to optimize? So, good times head even once the current project is done; and BTW if you have a reading list for any of this I would love to hear about it.
Once this is done, well, I have more ideas for even deeper JIT improvements. Those will have to wait for another post.