comex
3 days ago
I’m pretty impressed, but also skeptical. There’s a contradiction in this post.
Near the beginning of the post is a list of use cases for mutable aliasing. "Back-references", "doubly-linked lists", "delegates" - all cases where you have persistent objects with references to each other, i.e. a cyclic graph of objects.
Near the end, the post says: “The mutual isolation restriction will influence our programs' data to look like trees, similar to how Rust's borrow checker does.” In other words, the proposed approach can’t actually support those use cases!
The post continues by saying it improves on Rust’s borrow checker by having “much more relaxed rules around how we access those trees from the outside”. That may be true, and I’m legitimately looking forward to playing with this in Mojo. But it’s a much smaller improvement (while still coming at the cost of added complexity). It doesn’t really “get the best of both worlds” as the post says earlier.
To be fair, there may be ways to apply similar ideas to get some subset of use cases for cyclic object graphs to work. Some of verdagon’s past blog posts have proposed mechanisms that do something like that. But that would be a whole different thing from what’s being proposed here.
nmsmith
3 days ago
Hi there, I am the Nick whose design we're discussing. You raise some valid points: the blog post enumerates some limitations with Rust's model, but my design (as written) only resolves a subset of those limitations. We probably should have made that a bit clearer.
That said, there is still hope! I have been iterating on the design over the last 9 months and am fairly confident that I can model graphs, as long as a few restrictions are imposed, such as not being able to delete nodes. But I can't prove this will work yet, so we will need to wait and see what the next design iteration looks like. I'm fairly confident that the next iteration will be more powerful than the version presented in OP's blog post.
comex
2 days ago
Good luck! I look forward to seeing the results.
thomasmg
3 days ago
I think you are right. The post talks about mutable vs immutable borrows, and how to have multiple mutable borrows, safely. But doubly-linked list are hard in Rust due to ownership, and not borrowing. (Doubly linked lists could be supported by "splitting" and "merging" ownership: having "half"-owners, and then you can merge ownership.)
I understand the reason for having 1 mutable xor n immutable references... but I think Rust made the wrong choice here: now we see two versions of each method, one with mutable and another without mutable. It's similar to the async coloring problem, but instead of async / not async, in Rust you have mutable and non-mutable. It would simplify things a lot for the programmer (not for the compiler, OK), if all borrows are mutable. Sure the compiler can't always emit noalias, and multi-threading gets slower... But with a sane memory model (as in Java), I think it would be simpler for the programmer.
comex
3 days ago
But how would you prevent use-after-free? In other words, how do you prevent a child object's method from reaching out to its parent object and, say, clearing the container that contains the child object? Specifically, how do you do that without reference counting or garbage collection.
thomasmg
3 days ago
Well the distinction between mutable and immutable borrows doesn't solve use-after-free. What prevents use-after-free is ownership. There is one owner, and the owner can not free the memory as long as there is a borrower (it doesn't matter whether the borrower is mutable or not). So that's the task of the borrow checker.
For my own programming language [1], I have only mutable borrowers (similar to Java), and a much simpler borrow checker: a method that holds a borrow can not call a method that potentially frees an object of the same type (directly or indirectly). I'm not sure yet if this is sufficient; it's still a borrow checker, just a very simple one. (My language uses reference counting by default, but does support single ownership + borrowing.)
john-h-k
3 days ago
> There is one owner, and the owner can not free the memory as long as there is a borrower (it doesn't matter whether the borrower is mutable or not). So that's the task of the borrow checker.
Yes, but what about:
``` let mut x = &mut some_vec; some_vec.push(10); ```
Sure the vec has an owner, but the second mutable borrow causes it to invalidate the first pointer. You need a _third_ category, “unstable mut”, which is exclusive as it can cause an object’s internal data to move. You can then collapse mut and immutable to one… but you end up with the exact some colouring problem between unstable mut and the others
thomasmg
2 days ago
Yes, I see your point. In Rust, the owner, and the mutable borrower can change the pointer itself (like C realloc). If multiple "mut" borrows are allowed, then this would be unsafe, so I understand "unstable mut" would solve this problem - but result in a new colouring problem.
My solution to this would be: in a new language, do not allow reallocation. Not for owners, and not for mut borrow. This is what Java does: an ArrayList is a wrapper around an array, and so adding entries will not move the ArrayList object, just the (wrapped) private array. In Java the programmer never sees dangling references. Java prevents the issue by design, at the cost of always paying for the wrapper. If you want to avoid this price, you need to use the array directly.
(I have to admit I was not aware that in Rust, push moves the memory... so thanks a lot for explaining! Always learning something new.)
I don’t have any problem with Rust’s focus on performance. But its design does make life harder for developers than it needs to be. It doesn't match my vision of a programming language that is at the same time easy to use, safe, and nearly as fast as C.
Dagonfly
2 days ago
> ArrayList is a wrapper around an array, and so adding entries will not move the ArrayList object, just the (wrapped) private array.
That's also how Vec works in Rust. Vec is just (buf_ptr, capacity, len) where capacity is the allocated size of the buffer.
The problem still exists though.
``` let mut v = vec![1, 2 ,3]; let x: &i32 = &v[0]; v.push(4); println!("First element {x}"); ```
The `push` might realloc the array. Then, x points into invalid memory. This is caused by the projection: You can create a reference to a field (aka member) from a reference to the base object.
A language without realloc sounds painful. Any growing container would lead to stale data.
thomasmg
2 days ago
In Rust, a vector is a "fat pointer" (with pointer to the array and length / capacity) which lives on the stack. In Java, the pointer to the array and length / capacity lives in the heap. So in Java, there is an indirection, and it is allowed to have multiple pointers to the ArrayList object.
> A language without realloc sounds painful. Any growing container would lead to stale data.
I think it's not so much about realloc, but about whether it's a fat pointer or not. (I could imagine that Java uses something like realloc for the array, if there is only one pointer to the array for sure).
Fat pointers have some advantages and some disadvantages. Rust chose fat pointers, I assume for performance reasons. That's fine. Java doesn't. But I don't think that's a _huge_ performance disadvantage for Java. What I'm arguing is not so much that one is better and the other is worse, just that there are advantages and disadvantages. Rust might be slightly faster, but a language without (this kind of) fat pointers could potentially be easier to use.
Dagonfly
2 days ago
That's true.
Though, for my example, the storage location of the array metadata/fat pointer is not relevant.
In Rust you can hold references directly into the buffer backing the array (`&v[0]`).
My Java knowledge is quite rusty at this point. AFAIK, a `ArrayList<MyType>` in Java stores pointers to MyType in the buffer. So, when indexing into a ArrayList, you never hold references into the actual buffer. Instead you get a shared-pointer to the class data. It's the indirection of the array entry that saves you during reallocation of the buffer.
Also because Java is a GC'ed VM, it wont dealloc the elements references by the array, as long as there are other references to an element.
The equivalent in Rust is `Vec<Rc<MyType>>` where holding an `&Rc<MyType>` referencing into the vec's buffer is problematic during reallocation. But, cloning the Rc and holding on to it is perfectly fine.
The initial point of this thread was that you can have a Rust-like language where you can hold multiple mutating (aliasing) references and prevent use-after-free. This won't work. Without a GC or RC, you can use one reference to "rug-pull" the memory that is aliases by the other reference.
thomasmg
2 days ago
> In Rust you can hold references directly into the buffer backing the array
Yes! But I am arguing that this prevents having multiple mutable references
> My Java knowledge is quite rusty > Also because Java is a GC'ed VM ...
Your Java knowledge is fine :-) But I'm arguing that you don't strictly need a GC'ed, or RC'ed language: if done "correctly", multiple mutable references are possible. Just not with fat pointers! The programming language I'm building allows this even today. You can try it in the playground [1]:
fun main()
list := List+(int[4]) # <<= owned list
borrow : &list # <<= mutable borrow
for i := until(4)
borrow.add(i)
list.add(10 * i)
for i := until(8)
println(borrow.array[i])
type List
array int[]
size int
fun List+ add(x int)
if size >= array.len
n : int[array.len * 2]
for i := until(array.len)
n[i] = array[i]
array = n
array[size] = x
size += 1
So "List+" is owned type (just "List" without "+" would be reference counted). You may want to look at the generated C code at the end of the page.john-h-k
2 days ago
You are borrowing the entire list. That’s fine. The problem occurs if you borrow a reference into the list. Java/C# solve this by making that operation impossible. You cannot hold a reference into a vector/list
gf000
3 days ago
I'm looking forward to your experience with this approach!
Bit off topic, but is there any particular reason you went with a go-like `ident type` syntax over the more common ones like the C or the ML one?
thomasmg
2 days ago
> any particular reason you went with a go-like `ident type` syntax
I just think that ":" is unnecessary. In my language, the type is mostly used in function declarations and types, eg.
fun File read(data i8[], pos int, len int) int
type List(T)
array T[]
size int
What would be the advantage of adding ":"?codedokode
3 days ago
For back links, can we have a special pointer type (back link to owner) that automatically gets updated when the object is moved/copied? Also auto-initialized when object is created.
pklausler
3 days ago
It could be made to work in a language like Haskell, where cyclic structures can arise only in limited circumstances (recursive `let` groups) and can be given a distinct runtime representation.
comex
3 days ago
Well, if everything is immutable like in Haskell, then you don't really run into the problems described in the blog post in the first place. You can live with Rust's "no mutable aliasing" rule instead.
pklausler
3 days ago
Obviously; but the point was, if the language were to limit the circumstances in which circular structures can arise, one could exploit that fact.
gf000
3 days ago
I was thinking that the type system can in certain cases determine that no circular reference is possible for an instance of this type - in that case it could e.g. use an RC (for a region only having that type of objects), and fallback to either an arena-like pattern of freeing everything at once (if only internal circular references exist), or a tracing GC.
verdagon
3 days ago
This approach solves one borrow checking pain point (by allowing temporary mutable aliasing, even across function boundaries), but the post might actually be a bit conservative in saying it will influence our programs' data to look like trees.
As a thought experiment, I've been imagining how this would handle an architecture like e.g. Google Earth's, where a program's classes are divided into multiple "layers". For example, an upper "business logic" layer (DocumentManager, Document, CardManager, Card, StateManager) might have references to the more general "services" lower layer (NetworkService, FileService).
With Nick's system, we could have a group annotation on these various upper-layer classes, like e.g. `Document[s: group IService]`. With these annotations, these upper-layer classes can have references to the services themselves (though not their contents). This might let services' methods mutate the services' contents even though others have references to them (though not their contents). The services' methods would have to communicate that they're modifying the contents of the services (via a `mut` annotation), and the compiler would prevent holding references to the services' contents across those callsites. But also, those contents are private anyway, so the user wouldn't have those references anyway.
If that turns out to be true (Nick and I are still exploring it) then he's found a way to make borrowing work for some DAG-shaped program architectures, rather than just strictly tree-shaped architectures.
On top of that, if we compose this approach with linear types, I think we can get at least halfway towards compile-time-memory-safe back-references. TBD whether that works, but that would be pretty exciting.
TL;DR: My saying it only supports tree-shaped programs is me hedging and being conservative.
Still, until these things are proven to work, I should be more consistent in when I'm conservative vs optimistic. I'll update the article to address that. Thanks for pointing that out and helping me be more consistent!