hackernews client

Group Borrowing: Zero-cost memory safety with fewer restrictions

81 pointsposted 3 days ago

85 Comments

comex

3 days ago

I’m pretty impressed, but also skeptical. There’s a contradiction in this post.

Near the beginning of the post is a list of use cases for mutable aliasing. "Back-references", "doubly-linked lists", "delegates" - all cases where you have persistent objects with references to each other, i.e. a cyclic graph of objects.

Near the end, the post says: “The mutual isolation restriction will influence our programs' data to look like trees, similar to how Rust's borrow checker does.” In other words, the proposed approach can’t actually support those use cases!

The post continues by saying it improves on Rust’s borrow checker by having “much more relaxed rules around how we access those trees from the outside”. That may be true, and I’m legitimately looking forward to playing with this in Mojo. But it’s a much smaller improvement (while still coming at the cost of added complexity). It doesn’t really “get the best of both worlds” as the post says earlier.

To be fair, there may be ways to apply similar ideas to get some subset of use cases for cyclic object graphs to work. Some of verdagon’s past blog posts have proposed mechanisms that do something like that. But that would be a whole different thing from what’s being proposed here.

nmsmith

3 days ago

Hi there, I am the Nick whose design we're discussing. You raise some valid points: the blog post enumerates some limitations with Rust's model, but my design (as written) only resolves a subset of those limitations. We probably should have made that a bit clearer.

That said, there is still hope! I have been iterating on the design over the last 9 months and am fairly confident that I can model graphs, as long as a few restrictions are imposed, such as not being able to delete nodes. But I can't prove this will work yet, so we will need to wait and see what the next design iteration looks like. I'm fairly confident that the next iteration will be more powerful than the version presented in OP's blog post.

comex

2 days ago

Good luck! I look forward to seeing the results.

thomasmg

3 days ago

I think you are right. The post talks about mutable vs immutable borrows, and how to have multiple mutable borrows, safely. But doubly-linked list are hard in Rust due to ownership, and not borrowing. (Doubly linked lists could be supported by "splitting" and "merging" ownership: having "half"-owners, and then you can merge ownership.)

I understand the reason for having 1 mutable xor n immutable references... but I think Rust made the wrong choice here: now we see two versions of each method, one with mutable and another without mutable. It's similar to the async coloring problem, but instead of async / not async, in Rust you have mutable and non-mutable. It would simplify things a lot for the programmer (not for the compiler, OK), if all borrows are mutable. Sure the compiler can't always emit noalias, and multi-threading gets slower... But with a sane memory model (as in Java), I think it would be simpler for the programmer.

comex

3 days ago

But how would you prevent use-after-free? In other words, how do you prevent a child object's method from reaching out to its parent object and, say, clearing the container that contains the child object? Specifically, how do you do that without reference counting or garbage collection.

thomasmg

3 days ago

Well the distinction between mutable and immutable borrows doesn't solve use-after-free. What prevents use-after-free is ownership. There is one owner, and the owner can not free the memory as long as there is a borrower (it doesn't matter whether the borrower is mutable or not). So that's the task of the borrow checker.

For my own programming language [1], I have only mutable borrowers (similar to Java), and a much simpler borrow checker: a method that holds a borrow can not call a method that potentially frees an object of the same type (directly or indirectly). I'm not sure yet if this is sufficient; it's still a borrow checker, just a very simple one. (My language uses reference counting by default, but does support single ownership + borrowing.)

[1] https://github.com/thomasmueller/bau-lang

john-h-k

3 days ago

> There is one owner, and the owner can not free the memory as long as there is a borrower (it doesn't matter whether the borrower is mutable or not). So that's the task of the borrow checker.

Yes, but what about:

``` let mut x = &mut some_vec; some_vec.push(10); ```

Sure the vec has an owner, but the second mutable borrow causes it to invalidate the first pointer. You need a _third_ category, “unstable mut”, which is exclusive as it can cause an object’s internal data to move. You can then collapse mut and immutable to one… but you end up with the exact some colouring problem between unstable mut and the others

thomasmg

2 days ago

Yes, I see your point. In Rust, the owner, and the mutable borrower can change the pointer itself (like C realloc). If multiple "mut" borrows are allowed, then this would be unsafe, so I understand "unstable mut" would solve this problem - but result in a new colouring problem.

My solution to this would be: in a new language, do not allow reallocation. Not for owners, and not for mut borrow. This is what Java does: an ArrayList is a wrapper around an array, and so adding entries will not move the ArrayList object, just the (wrapped) private array. In Java the programmer never sees dangling references. Java prevents the issue by design, at the cost of always paying for the wrapper. If you want to avoid this price, you need to use the array directly.

(I have to admit I was not aware that in Rust, push moves the memory... so thanks a lot for explaining! Always learning something new.)

I don’t have any problem with Rust’s focus on performance. But its design does make life harder for developers than it needs to be. It doesn't match my vision of a programming language that is at the same time easy to use, safe, and nearly as fast as C.

Dagonfly

2 days ago

> ArrayList is a wrapper around an array, and so adding entries will not move the ArrayList object, just the (wrapped) private array.

That's also how Vec works in Rust. Vec is just (buf_ptr, capacity, len) where capacity is the allocated size of the buffer.

The problem still exists though.

``` let mut v = vec![1, 2 ,3]; let x: &i32 = &v[0]; v.push(4); println!("First element {x}"); ```

The `push` might realloc the array. Then, x points into invalid memory. This is caused by the projection: You can create a reference to a field (aka member) from a reference to the base object.

A language without realloc sounds painful. Any growing container would lead to stale data.

thomasmg

2 days ago

In Rust, a vector is a "fat pointer" (with pointer to the array and length / capacity) which lives on the stack. In Java, the pointer to the array and length / capacity lives in the heap. So in Java, there is an indirection, and it is allowed to have multiple pointers to the ArrayList object.

> A language without realloc sounds painful. Any growing container would lead to stale data.

I think it's not so much about realloc, but about whether it's a fat pointer or not. (I could imagine that Java uses something like realloc for the array, if there is only one pointer to the array for sure).

Fat pointers have some advantages and some disadvantages. Rust chose fat pointers, I assume for performance reasons. That's fine. Java doesn't. But I don't think that's a _huge_ performance disadvantage for Java. What I'm arguing is not so much that one is better and the other is worse, just that there are advantages and disadvantages. Rust might be slightly faster, but a language without (this kind of) fat pointers could potentially be easier to use.

Dagonfly

2 days ago

That's true.

Though, for my example, the storage location of the array metadata/fat pointer is not relevant.

In Rust you can hold references directly into the buffer backing the array (`&v[0]`).

My Java knowledge is quite rusty at this point. AFAIK, a `ArrayList<MyType>` in Java stores pointers to MyType in the buffer. So, when indexing into a ArrayList, you never hold references into the actual buffer. Instead you get a shared-pointer to the class data. It's the indirection of the array entry that saves you during reallocation of the buffer.

Also because Java is a GC'ed VM, it wont dealloc the elements references by the array, as long as there are other references to an element.

The equivalent in Rust is `Vec<Rc<MyType>>` where holding an `&Rc<MyType>` referencing into the vec's buffer is problematic during reallocation. But, cloning the Rc and holding on to it is perfectly fine.

The initial point of this thread was that you can have a Rust-like language where you can hold multiple mutating (aliasing) references and prevent use-after-free. This won't work. Without a GC or RC, you can use one reference to "rug-pull" the memory that is aliases by the other reference.

thomasmg

2 days ago

> In Rust you can hold references directly into the buffer backing the array

Yes! But I am arguing that this prevents having multiple mutable references

> My Java knowledge is quite rusty > Also because Java is a GC'ed VM ...

Your Java knowledge is fine :-) But I'm arguing that you don't strictly need a GC'ed, or RC'ed language: if done "correctly", multiple mutable references are possible. Just not with fat pointers! The programming language I'm building allows this even today. You can try it in the playground [1]:

    fun main()
        list := List+(int[4]) # <<= owned list
        borrow : &list        # <<= mutable borrow
        for i := until(4)
            borrow.add(i)
            list.add(10 * i)
        for i := until(8)
            println(borrow.array[i])
    
    type List
        array int[]
        size int
    
    fun List+ add(x int)
        if size >= array.len
            n : int[array.len * 2]
            for i := until(array.len)
                n[i] = array[i]
            array = n
        array[size] = x
        size += 1

So "List+" is owned type (just "List" without "+" would be reference counted). You may want to look at the generated C code at the end of the page.

[1]: https://thomasmueller.github.io/bau-lang/

john-h-k

2 days ago

You are borrowing the entire list. That’s fine. The problem occurs if you borrow a reference into the list. Java/C# solve this by making that operation impossible. You cannot hold a reference into a vector/list

gf000

3 days ago

I'm looking forward to your experience with this approach!

Bit off topic, but is there any particular reason you went with a go-like `ident type` syntax over the more common ones like the C or the ML one?

thomasmg

2 days ago

> any particular reason you went with a go-like `ident type` syntax

I just think that ":" is unnecessary. In my language, the type is mostly used in function declarations and types, eg.

    fun File read(data i8[], pos int, len int) int

    type List(T)
        array T[]
        size int

What would be the advantage of adding ":"?

codedokode

3 days ago

For back links, can we have a special pointer type (back link to owner) that automatically gets updated when the object is moved/copied? Also auto-initialized when object is created.

pklausler

3 days ago

It could be made to work in a language like Haskell, where cyclic structures can arise only in limited circumstances (recursive `let` groups) and can be given a distinct runtime representation.

comex

3 days ago

Well, if everything is immutable like in Haskell, then you don't really run into the problems described in the blog post in the first place. You can live with Rust's "no mutable aliasing" rule instead.

pklausler

3 days ago

Obviously; but the point was, if the language were to limit the circumstances in which circular structures can arise, one could exploit that fact.

gf000

3 days ago

I was thinking that the type system can in certain cases determine that no circular reference is possible for an instance of this type - in that case it could e.g. use an RC (for a region only having that type of objects), and fallback to either an arena-like pattern of freeing everything at once (if only internal circular references exist), or a tracing GC.

verdagon

3 days ago

This approach solves one borrow checking pain point (by allowing temporary mutable aliasing, even across function boundaries), but the post might actually be a bit conservative in saying it will influence our programs' data to look like trees.

As a thought experiment, I've been imagining how this would handle an architecture like e.g. Google Earth's, where a program's classes are divided into multiple "layers". For example, an upper "business logic" layer (DocumentManager, Document, CardManager, Card, StateManager) might have references to the more general "services" lower layer (NetworkService, FileService).

With Nick's system, we could have a group annotation on these various upper-layer classes, like e.g. `Document[s: group IService]`. With these annotations, these upper-layer classes can have references to the services themselves (though not their contents). This might let services' methods mutate the services' contents even though others have references to them (though not their contents). The services' methods would have to communicate that they're modifying the contents of the services (via a `mut` annotation), and the compiler would prevent holding references to the services' contents across those callsites. But also, those contents are private anyway, so the user wouldn't have those references anyway.

If that turns out to be true (Nick and I are still exploring it) then he's found a way to make borrowing work for some DAG-shaped program architectures, rather than just strictly tree-shaped architectures.

On top of that, if we compose this approach with linear types, I think we can get at least halfway towards compile-time-memory-safe back-references. TBD whether that works, but that would be pretty exciting.

TL;DR: My saying it only supports tree-shaped programs is me hedging and being conservative.

Still, until these things are proven to work, I should be more consistent in when I'm conservative vs optimistic. I'll update the article to address that. Thanks for pointing that out and helping me be more consistent!

leoc

3 days ago

In memory-safety discussions it needs to be borne in mind that 'simply fix the hardware' is also a viable approach; or at least it's a lot more viable than it was or seemed to be a few decades ago https://en.wikipedia.org/wiki/Capability_Hardware_Enhanced_R... . Memory-safe BSD and applications running on real memory-safe RISC-V hardware apparently already exist, though obviously, yes, even in a best case existing architectures would not come close to being fully displaced for a long time. That said, I really hope the process isn't being further delayed by attempts to grub license money.

comex

3 days ago

True. To be fair, CHERI guarantees only security, while borrow checking (when it works) guarantees both security and correctness. If you write to a freed pointer or out-of-bounds, CHERI guarantees that your write either crashes or lands somewhere harmless, rather than corrupting other data. But borrow checking guarantees you won't perform such writes in the first place.

Yet while correctness may excite programmers, it's security that gets the decision makers' attention. At the end of the day, security is just much more impactful.

We'll see if Morello ever makes its way to the types of CPUs used on phones and higher-performance devices.

astrange

3 days ago

ARMv9 is not Morello and MTE is not CHERI, but it is much better than nothing and already in a few phones. But it does require software adoption and I don't know how heavily they've adopted it.

codedokode

3 days ago

Yes but CHERI is supposed to work for every application, not only Rust, but your favourite game from Windows 95 era as well.

pjmlp

2 days ago

Maybe it is because of Oracle's hate among the hacker community, however I always have to point out Solaris SPARC and Oracle Linux SPARC, have been shipping with ADI enabled since 2015, CHERI isn't the only option.

https://docs.oracle.com/en/operating-systems/solaris/oracle-...

https://www.kernel.org/doc/Documentation/sparc/adi.rst

astrange

3 days ago

Secure hardware needs software cooperation to work - for instance to avoid type confusion you need to tag memory with types, but malloc() doesn't know the type of memory it's allocating.

verdagon

3 days ago

Hey all, this is a post explaining a new memory safety model by my friend Nick Smith (original proposal at https://gist.github.com/nmsmith/cdaa94aa74e8e0611221e65db8e4...)

It was interesting enough that I knew I had to write a post about it. Happy to answer any questions!

wpollock

3 days ago

> But... we humans can easily conclude this is safe. After the evaluation of list_ref_a.push(5), my_list is still there, and it's still in a valid state. So there is no risk of memory errors when evaluating the second call to push.

Is the always true? What with piplining, branch prediction, and maybe asymmetrical NUMA , isn't out of order instructions possible? If so, don't you still need locks or memory barriers to ensure safety?

(I am most definitely not an expert, just curious.)

nmsmith

3 days ago

Hardware-based instruction reordering always preserves the behaviour of the original program. (Assuming the original program is valid.)

For example, an Intel CPU won't reorder `x += 1` and `x *= 2`.

titzer

3 days ago

Thanks for the detailed writeup, that must have been a lot of work.

I think you guys should check out Verona (https://www.microsoft.com/en-us/research/project/project-ver...).

verdagon

3 days ago

Big fan of Verona, I love their memory safety approach as well. I wrote a bit about it in the Grimoire [0] too. IIRC they plan for the user to specify whether a region is backed by an arena allocator or GC, which sounds pretty nice. It's kind of hard to track down details though, most of my knowledge comes from reading David Chisnall's comments on lobste.rs.

[0] https://verdagon.dev/grimoire/grimoire

sirwhinesalot

3 days ago

I had a somewhat related idea to this which was simply to track reference stability (i.e., is there any operation beyond freeing the data outright that will invalidate them).

A reference into a vector is unstable so you can only have 1 mut xor N read. A reference into an arena is stable so you can have all the muts you want, but they cannot be sent across threads.

Reference to object vs reference to contents is also related, you can safely have multiple mut references to the same object that cannot invalidate each other, even if they invalidate references to the contents.

This can be tracked with path analysis as employed by Hylo (Hylo only has second class references, but I mean the algorithm they use to track paths is applicable).

mrkeen

3 days ago

> Because of those "inaccessible" rules, we can never have a readwrite reference and a readonly reference to an object at the same time.

I can't not see this as a good thing. It's almost at the level of "the only thing an ownership system does".

If my thread is operating on a struct of 4 int64s, do I now have to think about another read-only thread seeing that struct in an invalid partially-written state?

nmsmith

3 days ago

The "Group Borrowing" concept that we're discussing still imposes aliasing restrictions to prevent unsynchronized concurrent access, and also to prevent "unplanned" aliasing. For example, for the duration of a function call, the default restriction is that a mut argument can only be mutated through the argument's identifier. The caller may be holding other aliases, but the callee doesn't need to be concerned about that, because the mut argument's group is "borrowed" for the duration of the function call.

I suppose you could describe the differences from Rust as follows:

- Borrowing happens for the duration of a function call, rather than the lifetime of a reference.

- We borrow entire groups, rather than individual references.

The latter trick is what allows a function to receive mutably aliasing references. Although it receives multiple such references, it only receives one group parameter, and that is what it borrows.

Hope that makes sense!

codedokode

3 days ago

> Borrowing happens for the duration of a function call

I don't understand this part. Cannot a function store a borrowed reference into a structure that outlives the function?

nmsmith

2 days ago

Yes, functions can return non-owning references. However, those references do not "borrow" their target, in the sense that they lock others out. That is the Rust model, and OP does a great job covering its limitations.

So, with the understanding that "borrowing" means "locking others out", a group parameter borrows the group for the duration of the function call. If it borrows the group as mutable, no other group parameters can borrow the group. If it borrows the group as immutable, other group parameters are limited to borrowing the group as immutable. This is reminiscent of the Rust model, but the XOR rule applies to group parameters rather than references, and borrowing lasts the duration of a function call, rather than the lifetime of a reference.

codedokode

3 days ago

The general rule to prevent any data races (as I guess) between threads is that at any point of time there can be either one writer or multiple readers to the same object. Rust guarantees this by not allowing to have any other references if you have a read-write (mutable) reference.

Note that Rust is more strict than necessary - theoretically you can have multiple writable and readable references, but not use them simultaneously and observe the rule. But it is difficult (or even impossible) to verify during compilation so Rust doesn't allow it. C allows it but leaves verification to the author which doesn't work well and doesn't scale.

This situation can happen if you have a graph of objects. For example, in an OS you might have a Process object having references to a list of opened Files, and have a File hold reference back to Process that opened it. In this case you cannot ever have a writable reference to the Process from anywhere because Files already have multiple reading references. And Files can have only read-only references to the Process that opened them.

So you have to use only read-only references and additional structures like Cell, Arc etc. that allow safe writing through them. They are cheap, but not free and ideally we as developers want to have memory safety for free. That's the problem yet to solve.

Note that there are other ways to achieve safety:

- use C and manually verify that your program never breaks this rule - requires god level coding skills

- immutable data - after the writer finished writing, data are "frozen" and can be safely shared without any checks and no rules are broken. Very good, but modification is expensive and requires you to clone the data or use clever and complicated design (for example, if you have 10-elements array but shared a reference only to first 7 elements, you can safely append to the last 3 elements because nobody else knows about them - that's how ring buffers work). See immer C++ library for example.

- atomic variables - allow safe reading and writing at the same time, but can hold at most 8 bytes. Note that using a same variable from different CPU cores causes L1 cache line migrations which, last time I measured it, takes about 2-3 ns or ~10-15 cycles.

- mutexes - ensure rule is observed but make everyone wait. Python's approach.

- using only a single thread. JavaScript's approach. You can have multiple references but you still need to ensure they are pointing to a live object (JS solves this by using an expensive garbage collector).

And by the way if you know more methods or ideas please share them!

titzer

3 days ago

> And by the way if you know more methods or ideas please share them!

Use a transaction manager (HTM or STM). The STM is going to boil down to lockfree synchronization, logging, and retry. Transactions can fail and retry.

But ultimately, all inter-thread communication boils down to programs (or libraries) using barriers for acquire/release semantics and/or using compare/swap and atomic read-modify-write.

> at any point of time there can be either one writer or multiple readers

Time is a slippery notion in concurrency. Most language-level memory models (e.g. Java) or even hardware-level concurrency models focus on happens-before relations, which are induced by executing certain operations. At the hardware level, you can basically think that a CPU receives asynchronous updates about cache lines from other processors (potentially out of order). While technically the cache coherence protocol pushes cache lines into processors, you can't ever guarantee that one "writer" can "push" updates into all other CPUs. Instead, you have to rely on those other CPUs executing either fences or being forced to execute fences through global OS means, such as IPIs. Those other CPUs executing fences (barriers) or other atomic operations induce happens-before relations.

codedokode

3 days ago

> Time is a slippery notion in concurrency. Most language-level memory models (e.g. Java) or even hardware-level concurrency models focus on happens-before relations, which are induced by executing certain operations

You are correct, but for simplicity we can assume that the CPU executes one operation at a time and still have a valid model for reasoning about data races.

> But ultimately, all inter-thread communication boils down to programs (or libraries) using barriers for acquire/release semantics and/or using compare/swap and atomic read-modify-write.

Interesting. I read about STM and it reminded me of "modification box" algorithm for modifying immutable trees, which also uses versions. So it is another clever but complicated trick. Also it seems that it requires locking? That's not very good given that locks often need syscalls.

titzer

3 days ago

> Also it seems that it requires locking? That's not very good given that locks often need syscalls.

AFAIU in most STMs all the fast paths are implemented using lock-free algorithms with just barriers.

codedokode

3 days ago

How do you implement "commit" part safely (writing changes to the object) without locking the object?

titzer

3 days ago

One way is that the writer thread prepares a completely new copy of the object in a private area and then either compare-and-swaps a pointer to it, or executes a store with a store-store barrier (depending on details of the STM system around it).

codedokode

3 days ago

Interesting, it's almost like immutable data structures.

mrkeen

3 days ago

> Also it seems that it requires locking?

STM requires locking in the same way that GC requires malloc. It gets it right so that the programmer doesn't mess it up.

astrange

3 days ago

A non-compacting GC still needs a good malloc because it needs to avoid heap fragmentation.

wavemode

3 days ago

Ideally, the rules for single-threaded references and references that are allowed to be shared across threads would be different.

zozbot234

3 days ago

That's why Cell<T> and RefCell<T> are a thing. Both allow you to mutate shared references, but disable shared access across threads. The qcell crate even includes a version of Cell/RefCell that's "branded" by a region/lifetime, just like in this proposal.

codedokode

3 days ago

As I remember, Cell only allows moving/copying data from/to cell so if you have a 128-byte object inside do you have to copy it to modify? Or this can be optimized?

kbolino

3 days ago

Yes, that's how Cell works. If you want to work with the data in place, you need a RefCell instead.

codedokode

3 days ago

But it is expensive, because it does run-time checks? Or they are optimized out?

Measter

2 days ago

RefCell does do runtime checks, but the cost is checking the counter, a conditional branch, then incrementing/decrementing the counter twice.

Because the counter is non-atomic and non-volatile the optimiser can sometimes optimise out the actual modification of the counter. It's not free, but it's not also not a huge expense.

codedokode

3 days ago

Even with a single thread you need to ensure that the object you are accessing, still exists and was not freed.

wavemode

3 days ago

If you're talking about stack objects, that's what lifetimes are for.

If you're talking about heap, you can accomplish that by restricting that the references can't be used to free or invalidate the memory. But you could still allow them to be mutable.

nixpulvis

3 days ago

No mention of how this is safe when operating on data across threads? One of the biggest wins for Rust is sane treatment of references when using parallelism.

verdagon

3 days ago

Great question! That's a big enough topic that I'd love to write a followup post about it. There's also a good thread on r/Compilers at https://www.reddit.com/r/Compilers/comments/1n2ay7g/comment/... about how Nick's model should support that.

TL;DR: Mutability is tracked at the group level, so we can share an immutable group with any number of threads (especially good with structured concurrency) or lend a mutable group to a single other thread. References themselves are still aliasable, regardless of the group's mutability.

Taking an existing (mutable, aliasing) group and temporarily interpreting it as immutable has precedent (I did it in Vale [0]) so I like the approach, but I might be biased ;)

(This is from my memory of how Nick's proposal works, I'll ask him to give a better answer once morning hits his Australia timezone)

[0] https://verdagon.dev/blog/zero-cost-borrowing-regions-part-1...

nmsmith

3 days ago

Yep, that's an accurate summary! The model still features a form of "borrowing", it just happens at the granularity of groups.

I wrote a more detailed answer here: https://news.ycombinator.com/item?id=45057636

kibwen

3 days ago

> In any language, when we hand a function a reference to an object, that function can't destroy the object, nor change its type

This isn't quite true. While Rust doesn't currently support this, people have proposed the concept of `&own` references that take ownership of the object and free it when the reference goes out of scope (consider that Rust's standard destructor signature, `fn drop(&mut)`, should probably take one of these hypothetical owning references). I addition, I believe that languages with typestate can cause types to change as a result of function calls, although I don't quite understand the details.

littlestymaar

3 days ago

Intriguing, what would be the purpose of such owning references compared to just passing ownership? Is that to have a way to reliably avoid memcopying large objects when passing them by value?

kibwen

3 days ago

> Is that to have a way to reliably avoid memcopying large objects when passing them by value?

I believe so, yes. Currently the only way to transfer ownership is by-value, and LLVM might optimize away the memcpy, but also it might not.

codedokode

2 days ago

I wonder what is the case when you cannot optimize away memcpy, but can work around it with an "owning reference"?

verdagon

3 days ago

Great point =) That's true for a lot of languages (including Mojo too), so I should have said "non-owning reference" there. I'll update the post to clarify. Thanks for catching that!

skywal_l

3 days ago

I don't understand your comment. `drop` takes a *mutable* reference. But by default, references in Rust are immutable.

littlestymaar

3 days ago

shared references are immutable in Rust, mut references and shared references are both references.

Ygg2

3 days ago

Kibwen is saying that there was an idea to have `fn drop(&mut x)` from Drop trait become `fn drop_own(&own x)`. Then `&own` would do mutation and drop the owner of the reference. A regular &mut reference shouldn't do that outside of `drop(&mut x)`.

codedokode

3 days ago

Drop takes an object itself ("owned reference"), as I remember. Mutable ref allows reading/writing but not destroying or passing the ownership.

Owner = can read/write/destroy

Mutable ref = can read/write

Immutable ref = can only read, guarantered not to change

steveklabnik

3 days ago

https://doc.rust-lang.org/stable/std/ops/trait.Drop.html#tym...

codedokode

3 days ago

That's the destructor function, that is written by you and called by Rust before actually destroying something. The function that you want to look at is [1].

If you read the docs at your link it even says:

> This method is called implicitly when the value goes out of scope, and cannot be called explicitly (this is compiler error E0040).

> However, the mem::drop function in the prelude can be used to call the argument’s Drop implementation.

By the way the implementation of the function drop is just an empty function [2]; that's enough as local variables are destroyed on function return.

Mutable reference is a "borrow" which means you take a value from an owner and promise to return it back, and you cannot destroy a thing that you must return.

[1] https://doc.rust-lang.org/std/mem/fn.drop.html

[2] https://doc.rust-lang.org/src/core/mem/mod.rs.html#957

steveklabnik

3 days ago

The drop function being talked about here is the one I pointed to, not the one you pointed to. The Drop trait is built into the language (as a lang item), std::mem::drop is just a regular old function.

codedokode

2 days ago

The drop that you mention doesn't free memory, as I understand, it is called before actually destroying object's memory.

steveklabnik

2 days ago

Not inherently, sure. But ultimately this is far afield of what I was trying to say, which is that the signature being discussed of being changed from &mut T to &owned T is the one from the Drop trait. That’s it.

modulared

3 days ago

> I believe that languages with typestate can cause types to change as a result of function calls

Do you have any specific languages?

kibwen

2 days ago

I've only heard of this concept in academic languages, I don't feel knowledgeable enough to recommend any specific one. In terms of mainstream languages, I think an analogous concept is the use-before-initialization analysis that languages like Rust employ, e.g. if I have a variable defined like this:

    let x: u8;

I can't actually do anything with this variable:

    foo(x); // error

Except I can apply the initialization operator:

    x = 42;

And now I can do things with this variable as usual.

But consider that the type of x hasn't changed as a result of the initialization. Beforehand it was a u8, and afterward it was still a u8, and yet something about the state of the variable changed my ability to use it in various contexts. I believe that typestate is something like this, generalized to allow variables to flow through states (like a compile-time state machine) to ensure that things happen in the correct order.

wavemode

3 days ago

An &own reference seems like it would just be equivalent to a Box.

steveklabnik

3 days ago

Box heap allocates, references do not.

wavemode

3 days ago

> `&own` references that take ownership of the object and free it when the reference goes out of scope

How can you free something if it's not allocated

steveklabnik

3 days ago

Sorry, yeah that phrasing was bad: I kinda think an "owning reference" is a contradiction in terms but I didn't come up with the idea.

What I meant was, creating a box creates a new allocation, whereas my understanding of &own would take over an existing allocation.

Joker_vD

3 days ago

There is also "Tree Borrows" [0] proposal for Rust, which is aimed at the same problem.

[0] https://perso.crans.org/vanille/treebor/aux/preprint.pdf

SkiFire13

3 days ago

No, Tree Borrows is aimed at defining the semantics of memory accesses. It does not provide a way to statically check whether they are correct, which is what the borrow checker and this proposal aim to do.

estebarb

3 days ago

I'm not sure if I understand it correctly. So, if I have a hashtable, adding or removing an element would invalidate existing pointers to any other element in the hashtable? I guess it makes sense from a memory release POV, but... I end up thinking that for databases using GC or RC is a better approach. Maybe I'm biased, but I have found far easier to work on databases written in C or C#. For that kind of programs I felt Rust overrestrictive and forcing to more dangerous patterns such as replacing pointers with offsets.

verdagon

3 days ago

Yep, adding or removing an element would invalidate existing pointers to any other element in the hash table. This is generally regarded as a good thing if your elements are stored contiguously in the hash table, because a resize would cause any existing pointers to dangle. This should be true for C, and might be true for C# if you're using `struct`s which put the data inline (memory's a bit fuzzy on C#'s rules for references to structs though, maybe someone can chime in).

This new approach still requires us to be mindful of our data layout. Not caring about data layout is still definitely a strength of GC and RC. I'm actually hoping to find a way to blend Nick's approach seamlessly with reference counting (preferably without risking panics or deadlocks) to get the best of both worlds, so that we can consider it for Mojo. I consider that the holy grail of memory safety, and some recent developments give me some hope for that!

(Also, I probably shouldn't mention it since it's not ready, but Nick's newest model might have found a way to solve that for separate-chaining hash maps where addresses are stable. We might be able to express that to the type system, which would be pretty cool.)

jfecher

3 days ago

> I'm actually hoping to find a way to blend Nick's approach seamlessly with reference counting (preferably without risking panics or deadlocks) to get the best of both worlds, so that we can consider it for Mojo. I consider that the holy grail of memory safety, and some recent developments give me some hope for that!

Ante's approach manages to blend a similar scheme for safe, shared mutability with Rc. There are some examples on the most recent blog post on its website of it. IMO combined with its shared types it emulates high-level GC'd code very well.

estebarb

3 days ago

Thanks for the answer. For instance, usually those containers are used as indexes in DBs, so they contain pointers to data, not the data itself. That is an scenario where the references shouldn't be invalidated.

Idk if it may be possible to introduce "semantic monitors". Say, a field within a class and an external container must be updated together. In practice the only time when I have needed to break the single ownership is for keeping internal data views. I know it is safe, but convincing Rust of that is painful.

codedokode

3 days ago

Rc is slightly expensive and Gc is super expensive. I would prefer a memory safety for free.

As for C, it leaves verifying the safety up to you, it is not easy at all and I would rather spend time on something else using safe languages.

pjmlp

2 days ago

Alternatives have existed since the 1960's, predating C's invention, and also after it came to be, the problem has been mainstream adoption of UNIX, and the side effects from that.