veltas
15 hours ago
Automatically translating C to unsafe Rust is pointless, the resultant code is harder to read and there's no improvement in understanding how to get the code maintainable and safe, that requires tons of manual work by someone with a deep understanding of the codebase.
Generally the Rust community as well don't seem to have an answer on how to do this incrementally. In business terms we have no idea how to do work slices with demonstrable value, so no way to keep this on track and cut losses if it becomes too much work. This also strongly indicates you're 'stuck' with Rust when you're done, maybe a better and less unidiomatic C++ killer comes later and sounds like you're either going to have to rewrite the whole thing or give up.
I'm definitely open to wisdom on this if anyone disagrees because it is valuable to me and probably most of the readers of this comment section.
pizza234
14 hours ago
> Automatically translating C to unsafe Rust is pointless, the resultant code is harder to read and there's no improvement in understanding how to get the code maintainable and safe, that requires tons of manual work by someone with a deep understanding of the codebase.
I have experience on a (nontrivial) translation of a "very unsafe" C codebase to Rust, and it's not true that there is no value in this type of work.
The first step, automatic translation from C to Rust via tools, immediately revealed bugs in the original codebase. This step alone is worth spending some time on the operation.
Ports from C to Rust aren't a binary distribution of "all safe" or no port at all. Some projects, for example ClamAV, are adopting a mixed approach - (part/most of) new code in Rust, and some translation of existing functionalities to Rust.
In general, I think that automatic porting of C to Rust is, in real world, an academic exercise. This is because C codebases designed without safety in mind, simply need to be redesigned, so the domain in not really "how to port C to Rust" - it's "how to redesign and unsafe C codebase to a safe one" first of all. Additionally, I believe that in such cases, maintaining the implementation details is impossible - unsafety is a design, after all.
I personally advocate for very precisely scoped ports, where it can be beneficial (safety an stability); where that's not possible, I agree, better abandon early.
Diggsey
11 hours ago
IMO, safety and "idiomatic-ness" of Rust code are two separate concerns, with the former being easier to automate.
In most C code I've read, the lifetimes of pointers are not that complicated. They can't be that complicated, because complex lifetimes are too error prone without automated checking. That means those lifetimes can be easily expressed.
In that sense, a fairly direct C to Rust translation that doesn't try to generate idomatic Rust, but does accurately encode the lifetimes into the type system (ie. replacing pointers with references and Box) is already a huge safety win, since you gain automatic checking of the rules you were already implicitly following.
Here's an example of the kind of unidiomatic-but-safe Rust code I mean: https://play.rust-lang.org/?version=stable&mode=debug&editio...
If that can be automated (which seems increasingly plausible) then the need to do such a translation incrementally also goes away.
Making it idiomatic would be a case of recognising higher level patterns that couldn't be abstracted away in C, but can be turned into abstractions in Rust, and creating those abstractions. That is a more creative process that would require something like an LLM to drive, but that can be done incrementally, and provides a different kind of value from the basic safety checks.
zozbot234
11 hours ago
> In that sense, a fairly direct C to Rust translation that doesn't try to generate idomatic Rust, but does accurately encode the lifetimes into the type system (ie. replacing pointers with references and Box) is already a huge safety win, since you gain automatic checking of the rules you were already implicitly following.
Unfortunately, there's a lot of non-trivial C code that really does not come close to following the rules of existing Safe Rust, even at their least idiomatic. Giving up on idiomaticness can be very helpful at times, but it's far from a silver bullet. For example, much C code that uses "shared mutable" data makes no effort to either follow the constraints of Rust Cell<T> (which, loosely speaking, require get or set operations to be tightly self-contained, where the whole object is accessed in one go) or check for the soundness of ongoing borrows at runtime ala RefCell<T> - the invariants involved are simply implied in the flow of the C code. Such code must be expressed using unsafe in Rust. Even something as simple (to C coders) as a doubly-linked list involves a kind of fancy "static Rc" where two pointers jointly "own" a single list node. Borrowing patterns can be decoupled and/or "branded" in a way that needs "qcell" or the like in Rust, which we still don't really know how to express idiomatically, etc.
This is not to say that you can't translate such patterns to some variety of Rust, but it will be non-trivial and involve some kind of unsafe code.
zozbot234
15 hours ago
> Generally the Rust community as well don't seem to have an answer on how to do this incrementally.
You can very much translate C to Rust on a function-by-function basis, the only issue is at the boundary where you're either left with unsafe interfaces or a "safe" but slow interop. But this is inherent since soundness is a global property, even a tiny bit of wrong unsafe code can spoil it all unless you do things like placing your untrusted code in a separate sandbox. So you can do the work incrementally, but much of the advantage accrues at the end.
pizza234
14 hours ago
> You can very much translate C to Rust on a function-by-function basis, the only issue is at the boundary
Absolutely not. There are many restrictions of Rust that will prevent that. Lifetimes, global state come to mind first. Think about returning pointer to some owned by the caller - this can require massive cascading changes all over the codebase to be fixed.
zozbot234
14 hours ago
These are restrictions of idiomatic Safe Rust. You can use either unsafe Rust or, in many cases, less idiomatic but still Safe Rust to sidestep them. (For instance, "aliasable mutable" but otherwise valid references which can often be expressed as &Cell<T>, etc.)
You might still need a "massive cascading change" later on to make the code properly idiomatic once you have Rust on both sides of the boundary, but that's just a one-time thing and quite manageable.
pizza234
9 hours ago
> You can use either unsafe Rust or, in many cases, less idiomatic but still Safe Rust to sidestep them. (For instance, "aliasable mutable" but otherwise valid references which can often be expressed as &Cell<T>, etc.)
There's no doubt that one can convert C into unsafe Rust - C2Rust can automatically convert an entire C codebase into unsafe Rust
The problem is that after such step (which is certainly valuable), converting the code to safe Rust is typically a lot of work, which is the point of the academic research in question. Half baked code, using safety workarounds, doesn't provide any value to a project.
adgjlsfhk1
13 hours ago
unsafe rust still has to follow invariants, you're just promising the compiler that it does
zozbot234
12 hours ago
Yes, clearly it's a matter of using different facilities that may only be accessible to Unsafe Rust, and changing the interface accordingly. But to state that Rust as a whole has such restrictions is not correct.
sevensor
12 hours ago
Surely if you do this, you just end up expressing your C design in different syntax?
Doing the right thing means writing different functions with different signatures. Incrementalism here is very hard, and the smallest feasible bottom up replacement for existing functionality may be uncomfortably large. Top down is easier but it tends to lock in the incumbent design.
zozbot234
12 hours ago
> Surely if you do this, you just end up expressing your C design in different syntax?
Using different syntax is not pointless: the syntax allows you to express limited invariants that are expected to be comprehensively upheld by the surrounding C code. These invariants will initially be extremely broad (e.g. "this function must always get a $VALID pointer as input", for whatever values of $VALID), since they cannot be automatically checked; but they can gradually become stricter as more and more of the codebase is rewritten to be memory safe. Does this sometimes involve " cascading changes"? Yes, but much smaller than a from-scratch 100% rewrite into Safe Rust.
the__alchemist
12 hours ago
My 2C: What we need isn't a translater, but painless FFI. The FFI tools avail like cc and bindgen make working results most of the time, but they need [manual] wrapping.
It's kind of a similar situation (Although a bit more complicated) exposing Rust libs in python; PyO3/maturin do the job, but you have to manually wrap.
So... I would like tools that call C code from rust, but with slices etc instead of pointers.
zozbot234
12 hours ago
> I would like tools that call C code from rust, but with slices etc instead of pointers.
A slice is just a bundle of pointer + size. C raw interfaces vary on how they express the "size" part, so the point of wrapping is translating that information into whatever bespoke way is expected by the code you're working with.
the__alchemist
11 hours ago
Good insight! I guess I don't really understand why we can't use native types then. I don't want to keep having to write these:
pub fn fir_q31(
s: &mut sys::arm_fir_instance_q31,
input: &[i32],
output: &mut [i32],
block_size: usize,
) {
// void arm_fir_q31 (
// const arm_fir_instance_q31 * S,
// const float32_t * pSrc,
// float32_t * pDst,
// uint32_t blockSize
// )
// Parameters
// [in] S points to an instance of the floating-point FIR filter structure
// [in] pSrc points to the block of input data
// [out] pDst points to the block of output data
// [in] blockSize number of samples to process
// Returns none compiler_fence(Ordering::SeqCst);
unsafe {
sys::arm_fir_q31(s, input.as_ptr(), output.as_mut_ptr(), block_size as u32);
}
}IshKebab
10 hours ago
It's not pointless. For a start it frees you from the C toolchain so things like cross-compilation and WASM become much easier.
Secondly, it's a sensible first step in the tedious manual work of idiomatic porting. I'm guessing you didn't read the article but it's about automating some of this step too.
krater23
2 hours ago
The big bloaty part it the rust toolchain, not the C toolchain. But this beside, you are now free from a C toolchain and have unmaintainable automatically generated unsafe Rust code. Don't see a win there.