stmw
4 months ago
Good read. But a word of caution - the "JIT vs interpreter" comparisons often favor the interpreter when the JIT is inplemented as more-or-less simple inlining of the interpreter code. (Here called "copy-and-patch" but a decades-only approach). I've had fairly senior engineers try to convince me that this is true even for Java VMs. It's not in general, at least not with the right kind of JIT compiler design.
hoten
4 months ago
I just recently upgraded[1] a JIT that essentially compiled each bytecode separately to one that shares registers within the same basic block. Easy 40 percent improvement to runtime, as expected.
But something I hadn't expected was it also improved compilation time by 40 percent too (fewer virtual registers made for much faster register allocation).
[1] https://github.com/ZQuestClassic/ZQuestClassic/commit/68087d...
chromatic
4 months ago
This is an embarrassing context to admit, but here goes.
Back when Parrot was a thing and the Perl 6 people were targeting it, I profiled the prelude of Perl 6 to optimize startup time and discovered two things:
- the first basic block of the prelude was thousands of instructions long (not surprising) - the compiler had to allocate thousands of registers because the prelude instructions used virtual registers
The prelude emitted two instructions, one right after another: load a named symbol from a library, then make it available. I forget all of the details, but each of those instructions either one string register and one PMC register. Because register allocation used the dominance frontier method, the size of the basic block and total number of all symbolic registers dominated the algorithm.
I suggested a change to the prelude emitter to reuse actual registers and avoid virtual registers and compilation sped up quite a bit.
_cogg
4 months ago
Yeah, I expect the real advantage of a JIT is that you can perform proper register allocation and avoid a lot of stack and/or virtual register manipulation.
I wrote a toy copy-patch JIT before and I don't remember being impressed with the performance, even compared to a naive dispatch loop, even on my ~11 year old processor.
ack_complete
4 months ago
The difference between interpreters and simple JITs has narrowed partly due to two factors: better indirect branch predictors with global history, and wider execution bandwidth to absorb the additional dispatch instructions. Intel CPUs starting with Haswell, for instance, show less branch misprediction impact due to better ability to predict jump path patterns through the interpreter. A basic jump table no longer suffers as much compared to tail-calling/dispatch or a simple splicing JIT.
stmw
4 months ago
Exactly, and it's not just register allocatio: but for many languages also addign proper typing, some basic data flow optimization, some constant folding, and a few other things that can be done fairly quickly, without the full set of trees and progressive lowering of the operators down to instruactions.
What's odd about the "JIT vs interpreter" debate is that it keeps coming up, given that it is fairly easy to see even in toy examples.
stmw
4 months ago
Turns out one of the classic papers on this is available for those interested in this discussion - https://news.ycombinator.com/item?id=45582127