The Titania Programming Language

105 pointsposted 5 months ago
by MaximilianEmel

67 Comments

Lerc

5 months ago

Having done a fair degree of programming in Wirthwhile languages, I think the only main design decision that I think was a mistake was the variables at the top.

I'm not sure of the value of seeing all of the variables used listed in one place, it has certainly led to me encountering a identifier scrolling up to determine the type then scrolling back down. When the variable is only used on a few consecutive lines it's just adding work to read and adding work to create. I daresay I have written harder to read code than I intended because I didn't want to go up and declare another variable for short term use. The temptation to inline the expression is always there, because you know what all the parts mean when you write it. It's only when you come back later that you get the regret.

It's possible it could be mitigated by defining something like (not sure if this is a valid grammar)

    stmt_sequence = {decl_sequence}. stmt {";" stmt} [";"].
and bring in scoping on statement sequences. maybe call it stmt_block so that stmt_sequence can be a component that really is just a sequence of statements.

munificent

5 months ago

> I'm not sure of the value of seeing all of the variables used listed in one place

It means the compiler knows how much memory the function's activation frame will take and the offset into that for every variable before it encounters any code in the function.

Basically, it makes it easier to write a single-pass compiler. That was important in the 70s but is less important these days.

cb321

5 months ago

A lot of people in this subthread are echoing this, but it's at most maybe very slightly easier. For example, TinyCC is a one-pass C compiler and yet C has sub-scopes int foo() { /.../ { int x; /.../ } }. The same could be said of Ken Thompson's C compiler which I believe was also one-pass.

Does Titania not have/intend to have lexically local subscopes? That would seem very un-Wirthian to me.

munificent

5 months ago

One difference here might be that most Pascals support local functions and C does not.

michaelcampbell

5 months ago

> Wirthwhile languages,

This made me smile; going to use it in the future.

doodpants

5 months ago

Though it would be pronounced vehrt-while.

michaelcampbell

5 months ago

Of course. It's more a visual pun than audible, but I liked it nonetheless.

Rochus

5 months ago

This design decision makes compiler implementation easier and especially enables single-pass compilation. Later Oberon versions at least supported more than one declaration section in arbitrary order, but still no in-place declarations.

troupo

5 months ago

Wirth was obsessed with the idea of creating the absolutely minimal useful language, and many of his languages' warts come from that.

Variables are at the top because:

- you immediately see them (so, perhaps, easier to reason about a function? I dunno)

- the compiler is significantly simplified (all of Wirths' languages compile superfast and, if I'm not mistaken, all are single-pass compilers)

However, I feel that Wirth was overly dogmatic on his approaches. And "variables must always be at the top" is one of those.

gingerBill

5 months ago

This has nothing to do with "dogma" and something simpler. It has nothing to do with "immediately see them".

Hint: This about this from a single pass compiler basis and how much memory needs to be reserved from the procedure's stack frame.

troupo

5 months ago

"This is nothing to do with something simpler" and "this is from a single pass compiler".

Are you sure you actually read my second bullet point?

If you read texts and papers by Wirth you'll see a single theme emerge: simplicity. Everything he didn't consider simple was thrown away and derided.

NuclearPM

5 months ago

Lua is single pass too and doesn’t have the same restrictions.

troupo

5 months ago

Wirth is also from a time when resources were incredibly constrained. So it was a combination of his ideas for what a PL should be: easy to understand, easy to learn, powerful enough to do everything he needed. And the compiler for the language also needed to be the same.

Unfortunately he got kinda stuck in that, and you can see it in all the Oberon flavors: he was clearly fighting against the limitations he imposed, but couldn't break out if them.

foldl2022

5 months ago

A nice side-effect of "variables at the top": you keep your functions short.

Turskarama

5 months ago

"Functions should always be short" is also one of those guidelines that people treat like a hard rule. There are occasions when a 100 line function is easier to read than 5 20 line functions, or god forbid 20 5 line functions.

Stop being overly dogmatic, it ALSO leads to worse code.

lenkite

5 months ago

There are occasions => There are only rare occasions.

mannschott

5 months ago

One would assume that, but in practice, the predominant style is not one of many short procedures. Instead it feels that there's a preference to just inline the code unless the resulting procedure will have more than one caller.

For example, search for "PROCEDURE Scan" here: https://people.inf.ethz.ch/wirth/ProjectOberon/Sources/Texts...

Control structures are deeply nested and this goes on for 64 (very dense) lines. The low line count but is an artifact of how Oberon is conventionally formatted. When reformatted to mimic the conventions of languages like C, Java or Python it works out to more than 120 lines.

When I program in Oberon (recreationally) I tend to follow this style even though I would extract the same code into a separate method were I writing in Java.

gingerBill

5 months ago

When I write the backend (this repo isn't even 24 hours old yet), you'll find out why variable declarations are at the top of a procedure. (Hint: it has something to do with the stack).

alexjplant

5 months ago

I wondered why my university made us use C90 for Systems Programming class (circa 2010) until I took Compilers. This quirk specifically stood out to me when considering code generation from an AST - it's a lot easier to simply allocate all required memory at the top of a stack frame when you have the variable declarations at the top of the function.

Lerc

5 months ago

I'm aware that it lets you do things in a single pass manner, but this is the instance where I think the cost for allowing that is too great.

I always thought there must be a better solution, like emitting the compiled function body first which just increments the offset whenever a space for a variable is required and emit the function entry after the function exit last, so you can set up the stack frame with full knowledge of it's use. Then the entry can jump to the body. Scoping to blocks would let you decrement the offset upon exiting the block which would result in less stack use which would almost always be more beneficial than the cost of the additional jump.

gingerBill

5 months ago

If you want single pass, then you have to do it on a per block basis at the most (e.g. C89).

But this is not meant to be a fully fledged language, it's meant to be a teaching tool. If you want a fully fledged language that allows for out of order declarations, try Odin!

Also, the syntax of Oberon/Pascal doesn't really allow for it in a nice way. It kind of looks really weird when you allow for it out of order.

Rochus

5 months ago

Remarkable that Bill is interested in a version of Oberon-07. It's even more minimalistic than the previous Oberon versions. I spent a lot of time with the original Oberon language versions and experimented with extensions to make the language more useful for system programming (e.g. https://oberon-lang.github.io/ and https://github.com/rochus-keller/oberon/). Eventually I had to give up backward compatibility to get to a language which I really consider suitable for system programming (and still minimal in the spirit of Oberon, see https://github.com/micron-language/specification/ and https://github.com/rochus-keller/micron/); it's still evolving though.

If I get it right, Bill's language is considered for teaching purpose, which is also a goal of Wirth's languages, and for which these languages are well suited (especially for compiler courses). Also note that the name "Oberon" was not inspired by Shakespeare, but by the Voyager space probe's flyby and photography of Uranus's moons during the mid-1980s when the language was being developed (see https://people.inf.ethz.ch/wirth/ProjectOberon/PO.System.pdf page 12).

gingerBill

5 months ago

Please note the project isn't even 24 hours old yet.

But I am using Oberon-07 as base, and I might deviate from it quite soon too. But I won't be going in the direction of things like Oberon+ (which adds generic and OOP programming) or Micron which adds the entire type system necessary to interact with foreign code. I just wanted something to explain to people how to do tokenizing, parsing, semantic checking (not just basic types), and machine code generation, and this seemed like the best language to choose.

n.b. I know the name does comes from from the Voyager space probe, but I wanted to keep it directly related somehow, and Titania was the best fit. It's also a moon of Uranus, and there is a story relation to Oberon (Fairy King).

eterps

5 months ago

> But I am using Oberon-07 as base, and I might deviate from it quite soon too.

I am curious about your thoughts on var parameters (i.e. mutable references), as in:

    proc increment(var x: int)
    begin
      x := x + 1
    end

    var i: int

    begin
      i := 10
      increment(i)  // i is now 11
    end

gnatmud8

5 months ago

i also wonder about this concept; is there a programming language that has this behavior?

eterps

5 months ago

Ada, Nim, Pascal. I think C++ also offers it with a specific syntax.

Rust also offers it, but you need to specify it on the call side as well.

v9v

5 months ago

Fortran can do "in out" arguments to subroutines also.

emigre

5 months ago

Really nice project, well done.

I would say it's probably not necessary to explain what the connection between Titania and Oberon is in the README. It's probably evident to most people?

cozzyd

5 months ago

Alas, most people have memorized at most seven Shakespearean sonnets.

Rochus

5 months ago

Supporting lowercase keywords and making (at least some) semicolons optional already makes Oberon much more attractive ;-)

gingerBill

5 months ago

And removing unnecessary keywords and modernizing it too. `[N]T` for arrays and `^T` for pointers, rather than `array N of T` and `pointer of T`. And supporting C++ style code `/*/` and `//`.

And as I develop this, I'll tweak it more so that people can actually understand without having to know the full histories of Pascals or Oberons or whatever.

cxr

5 months ago

Trying to eliminate semicolons by doing JS-style ASI is gross and complicates things unnecessarily. You can trivially change the parser/grammar so that the semicolons in import, var, procedure, etc. declarations just aren't required, and likewise with statements that end in "end". They'll still be necessary for statements comprising things like assignments and procedure calls, but for a teaching language who cares.

(Your grammar is missing a definition for proc_call, by the way.)

gingerBill

5 months ago

I only added that a few minutes ago, and it's a question of whether I should or not. This project is so goddamn new that I have not even decided anything. I was not expecting anyone posting this to HackerNews in the slightest.

Also this isn't JS-style ASI technically speaking, and it won't have any of the problems either. The syntax for this language is different enough that it won't be a problem. Procedures don't even return things.

cxr

5 months ago

I'm referring to the semicolon insertion described in the README as, "When a newline is seen after the following token kind, a semicolon is inserted".

> Procedures don't even return things

Oberon allows return values from functions (which are still declared with the PROCEDURE keyword). It looks like the same is true in Titania:

    proc_body = decl_sequence ["begin" stmt_sequence] ["return" expr] "end".
<https://github.com/gingerBill/titania/blob/085b7b5bcf7f06076...>

I'm curious what you're going to do with the code generator. Parsers are easy and can be completed in a day or two. Even with a reference implementation, however, it's the backend that's a slog.

gingerBill

5 months ago

I know Oberon does return things from certain procedures but I might not. I know the grammar allows for it but again, this is subject to change.

As for code generation, direct machine code to a Windows AMD64 PE executable.

Backend should not be that difficult because I am not working on anything complex nor optimizing. This won't be an optimizing compiler backend course.

Rochus

5 months ago

I have also implemented these and other simplifications in my languages, and I don't think it makes them any less readable. Looking forward to seeing your final design.

emigre

5 months ago

Definitely wise not to choose "uranus" as the name. ;)

elcapitan

5 months ago

@gingerBill Will there be accompanying blog posts or streams while you're building this?

jasperry

5 months ago

Very nice project, I'm a big fan of implementing Wirthian languages to learn compilers.

Also, in true Wirth style, the documentation mainly consists of the language grammar :)

gingerBill

5 months ago

The project is not even 24 hours old...

And I might plan on making this a recorded series of explaining how to make compilers from scratch with this language as a reference.

jasperry

5 months ago

Of course! It wasn't really a criticism, just a cheeky observation that the documentation for every Wirth-style language I've ever seen begins with the EBNF grammar. Though it's rare for a new language to do that today, I appreciate you continuing the tradition.

Rochus

5 months ago

> the documentation mainly consists of the language grammar :)

It's a bit more than just the grammar, but I agree it's generally underspecified.

__d

5 months ago

Clever name.

It'd be nice to see some discussion of the motivation for its departures from Oberon07.

gingerBill

5 months ago

This repo isn't even 24 hours old yet and it's on HackerNews...

Just wow....

smartmic

5 months ago

Is the language/project somehow related to Odin or its ecosystem, or is it completely independent?

If the latter, I wonder how you can manage another programming language alongside Odin — anyway, thank you and great respect for both!

gingerBill

5 months ago

Completely independent of Odin (except being written in Odin). It's a plan to be a teaching tool to learn compiler development, that's it.

ivanjermakov

5 months ago

Not surprising, it appeared in 1.3k github feeds of your followers.

kaichanvong

5 months ago

not Scotish sadly, AFAIK so: wowsers! kinda recall someone saying something about this?! \o/ yey

geokon

5 months ago

> teach compiler development with

Not trying to be confrontational, genuinely curious.. but why is this an area where you'd want a DSL?

My initial reaction is : When I'm learning a topic, the last thing I want to be worrying about is learning the ergonomics of a new language

I'm guessing there's a good rational I'm missing

it'd be nice to see some piece of compiler related code in this language that'd be ugly in a general purpose language

wiz21c

5 months ago

I have teached Pascal 25 years ago. The idea was to teach the basic principles of programming (loops, variables, arrays, linked lists, sorting, etc.) without worrying about the technical details (C was too tricky, python was not there). Plus Pascal is quite simple and has very few pitfalls.

Once students where proficient in Pascal, we could introduce compiler classes and, when sufficiently advanced, show what the Pascal BNF grammar looked like. So students had a complete picture of a language. Pascal's BNF grammar is very simple.

Also, Pascal enforces strong program structures (BEGIN, END, PROCEDURE, FUNCTION, etc). which helps to frame practical work.

gingerBill

5 months ago

Oberon is a general-purpose programming language, not a DSL. Even though it is very minimal, you can still do quite a bit in it.

But the point of teaching compiler development is to teach people how to do the basic things from tokenizing, parsing, semantic checking, and code generation (directly to machine code).

I have found this is actually a skill most programmers don't even know how to do, especially just tokenizing and parsing, so I thought I'd use Oberon-07 as a base/inspiration for it.

n.b. at the time of this comment, the repo/project is not even 24 hours old yet.

rubystallion

5 months ago

He's the author of Odin, so he has experience writing compilers, so he also wrote a toy compiler in his language as a fun weekend project I guess. Of course it's only a good learning resource for people familiar with Odin. I don't know much about Odin, but from glancing at the code it looks like there are some memory management related features that he's using, which would look uglier in other languages.

loumf

5 months ago

Looking at your source, I was introduced to Odin -- now I want to hear a lot more about that.

khaledh

5 months ago

He is also the creator of Odin :)

ruslan

5 months ago

No pointers ?

doug-moen

5 months ago

from the grammar:

> pointer_type = "^" type.

Hendrikto

5 months ago

Using his own Odin language to write another custom language. Next level.

Panzerschrek

5 months ago

We need more context. Why Odin's creator created yet another programming language?

Jtsummers

5 months ago

The reason is given in the second sentence of the readme.

> This is designed to be a language to teach compiler development with.

That is, this is the language a student would implement a compiler for. It's in line with what you'll find for most undergraduate compiler courses in terms of complexity.

dismalaf

5 months ago

Why not? Lots of language designers create or work on more...

Matz has Streem, SPJ is working on Epic's new language, the creator of Pony is working on MS' project Verona, probably lots of others. Doubt it will supplant Odin though, considering Odin's being used professionally.

fijiaarone

5 months ago

A modest proposal…

Instead of having println() or it’s equivalent in your programming language, add a new special character that denotes a newline after a string:

print(“Hello world”.)

Jtsummers

5 months ago

Is your idea that that would always work? Like:

  s := "Hello world".; -- equivalent to "Hello world\n"
Or only in `print`? If only in `print`, then you've suddenly made a context-sensitive grammar. And if the former, just use "Hello world\n" instead, since the tokenizer already supports that.

keithnz

5 months ago

I think the point is to add the correct end of line depending on OS.

kouteiheika

5 months ago

The correct end of line is always '\n'. Even Windows' Notepad supports it nowadays. I will gladly die on this hill. :P

coderedart

5 months ago

That would mess with dot syntax usually reserved for method calls. Like rust's "hello".to_string();

cxr

5 months ago

Oberon doesn't have string methods (and people who opt not to parenthesize for cases like that deserve the punishment).