Collapse OS – Why Forth?

14 pointsposted 6 hours ago
by embedding-shape

7 Comments

cestith

3 hours ago

The article mentions the performance penalty of using a threaded interpreted language, but Forth compilers to native code do exist for several platforms.

vdupras

3 hours ago

There are options, yes, but the path to options that could be said to compete with, let's say, C, is narrow.

If you use Indirect or Direct (ITC, DTC) threaded code, there's no way out: you're going to call every word. You can use Subroutine Threaded Code and inline some code (that's what I do in Dusk OS), but you still have to choose the words you're going to inline. Typically, you're going to end up with words that "calls" more than your typical C code.

And then, you might very well realize that being fast as C everywhere isn't all that important and that all the inlining you've been placing everywhere isn't worth the tradeoff, so you'll scale back on it and keep speed optimizations for bottlenecks.

So, again, yes it's possible, but the path to it is narrow. I don't know of a Forth that can say that it compiles code expressed as Forth (as in ": foo bar baz ;") that compiles native code that can compete with C, speed wise. Do you?

cestith

2 hours ago

For a modern Forth on a modern platform, not so much because there’s been so much work done on optimizing C compilers. You can get pretty close, though.

On some of the older platforms, certain implementations were very low level. There are Forth implementations for the 6800, 6809, 6502, 8086 (CP/M, DOS, and embedded) where all the core words are precompiled and all expansions to the library get iteratively replaced with their definitions until they’re also native code. There are probably a few for the 8080 and Z80 too.

Absolutely not everything needs to be as fast as C or hand-tuned assembly (which these days is also sometimes not as fast as C that’s been through an optimizer). The ratio of the difference between C and some other solution can have wildly different bounds, though, which is my main point.

There are a lot of languages that get acceptably close to C, but as you get into more demanding tasks that list gets shorter. Fortran, Pike, C++, Rust, OCaml, Ada, and a few others are in that list for a lot more scenarios than CPython or Ruby. Perl has a big startup time, but for long-running tasks is acceptably close on its opcode VM. Many Forths would be too, and many Lisps. Both of those languages have native compilers here and there, though, that get you even closer.

vdupras

an hour ago

What I mean by a penalty of threaded code isn't related to whether words are implemented in native code or not. For example:

: square dup * ;

is going to generate a square word that does 2 calls, regardless of whether "dup" and "*" are native words or not.

The equivalent in C:

int square(int x) { return x*x; }

will generate code that contains no call, even if your C compiler is not a very optimized one.

With STC, it becomes possible for an elaborate Forth to inline "dup" and "*", but STC is less popular on the 8-bit architectures you mentioned because it's much less compact.

It's in that context that I mention that threaded code entails a speed tax. It's those 2 calls.

Of course, in your Forth system, you could rewrite "square" in native code to get rid of the penalty, but then it's not threaded code anymore, it's native code.

cestith

an hour ago

Oh, yeah. The call overhead specifically isn’t all that onerous though is it? For your example you’re also talking about making a memory copy, and unless you have hardware multiply you’re doing looping addition.

Most Forths I’ve dealt with also offer inline assembly as part of a word definition, so I suppose you could do it that way if really desired. I can see what you mean though about the penalty being completely acceptable, because it shouldn’t be super large.