JIT: So you want to be faster than an interpreter on modern CPUs

79 pointsposted a day ago
by pinaraf

14 Comments

gary_0

an hour ago

> This is called branch prediction, it has been the source of many fun security issues...

No, that's speculative execution you just described. Branch prediction was implemented long before out-of-order CPUs were a thing, as you need branch prediction to make the most of pipelining (eg. fetching and decoding a new instruction while you're still executing the previous one--if you predict branches, you're more likely to keep the pipeline full).

Arnavion

11 minutes ago

Speculative execution does not require out-of-order execution. When you predict a branch, you're speculatively executing the predicted branch. Whether you're doing it in the same order as instruction order or out of order is independent of that.

klipklop

3 hours ago

A shame operating systems like iOS/iPadOS do not allow JIT. iPad Pro's have such fast CPU's that you cant even use fully because of decisions like this.

Pulcinella

an hour ago

Those operating systems allow it, but Apple does not. Agree that it is a total waste.

duped

24 minutes ago

What advantage does JIT compilation have over Swift or Obj-C?

saagarjha

10 minutes ago

It speeds up interpreted languages.

Pulcinella

6 minutes ago

And emulation.

saagarjha

3 minutes ago

What is an architecture but a scripting language to interpret? ;)

stmw

3 hours ago

Good read. But a word of caution - the "JIT vs interpreter" comparisons often favor the interpreter when the JIT is inplemented as more-or-less simple inlining of the interpreter code. (Here called "copy-and-patch" but a decades-only approach). I've had fairly senior engineers try to convince me that this is true even for Java VMs. It's not in general, at least not with the right kind of JIT compiler design.

hoten

2 hours ago

I just recently upgraded[1] a JIT that essentially compiled each bytecode separately to one that shares registers within the same basic block. Easy 40 percent improvement to runtime, as expected.

But something I hadn't expected was it also improved compilation time by 40 percent too (fewer virtual registers made for much faster register allocation).

[1] https://github.com/ZQuestClassic/ZQuestClassic/commit/68087d...

_cogg

2 hours ago

Yeah, I expect the real advantage of a JIT is that you can perform proper register allocation and avoid a lot of stack and/or virtual register manipulation.

I wrote a toy copy-patch JIT before and I don't remember being impressed with the performance, even compared to a naive dispatch loop, even on my ~11 year old processor.

gr4vityWall

4 hours ago

That was a pretty interesting read.

My take is that you can get pretty far these days with a simple bytecode interpreter. Food for thought if your side project could benefit from a DSL!

imtringued

2 hours ago

I'm not really interested in building an interpreter, but the part about scalar out of order execution got me thinking. The opcode sequencing logic of an interpreter is inherently serial and an obvious bottleneck (step++; goto step->label; requires an add, then a fetch and then a jump, pretty ugly).

Why not do the same thing the CPU does and fetch N jump addresses at once?

Now the overhead is gone and you just need to figure out how to let the CPU fetch the chain of instructions that implement the opcodes.

You simply copy the interpreter N times, store N opcode jump addresses in N registers and each interpreter copy is hardcoded to access its own register during the computed goto.

saagarjha

8 minutes ago

You run into the same problem a CPU does: if you have dependencies between the instructions, you can't execute ahead of time. Your processor has a bunch of hardware to efficiently resolve conflicts but your interpreter does not.