Porting C to Rust for a Fast and Safe AV1 Media Decoder

66 pointsposted 4 days ago
by steveklabnik

21 Comments

xt00

4 days ago

Great work by the authors -- great to see this type of effort detailed. Video codecs are complicated so great to see that somebody did this. It would be cool if there ends up being some presentation about this saying something like "here are the bad patterns you should avoid if you want to be able to port your C code to Rust at some point" (things that may be outside of what you actually had to do on this project as well as what you mentioned in the post). Also I'm glad you mentioned how long this took -- nice to get a scale of how hard these ports are.

Vecr

4 days ago

Did they check the assembly code an the exported C API for issues with aliasing related problems and other C/Rust differences?

For example in a similar translation there was UB here: https://gitlab.com/sequoia-pgp/sha1collisiondetection/-/comm...

And here: https://gitlab.com/sequoia-pgp/sha1collisiondetection/-/comm...

I thought there was a fancier/weirder programming error in that crate but I can't find it right now.

kkysen

3 days ago

I don't think we do either of those errors. Obviously if the C caller does weird stuff, all bets are off, but we don't mutate through `&T`s or not initialize memory. `MaybeUninit` is only used in a few isolated and carefully checked places. Most of the rest of the buffers are zero initialized, which is usually done for free by the kernel.

Vecr

3 days ago

I'm thinking more about the differences between how C and Rust uses noalias, but I can't find an example of it going wrong in real translated code currently.

snvzz

3 days ago

An easier and more immediately obvious approach would be to sandbox the codecs, so that they cannot do anything other than what they need to do.

Well-engineered systems like Genode focus on this approach, leveraging capabilities.

kkysen

3 days ago

This is already done in most places, like browsers. But they want to remove the sandbox for performance reasons.

snvzz

2 days ago

This is nonsense. The IPC in and out of the codec process cost is invisible, relative to the cost of encoding/decoding video.

Koshkin

4 days ago

So… the implementation in C is… slow and unsafe?

> these decoders are a dangerous source of bugs

Hm… I thought code in any language could be such “source.”

timschmidt

4 days ago

Codecs, like browsers, are handling malformed and potentially hostile data non-stop. And like browsers, they tend to be widely deployed. Further, like cryptographic software, they do a lot of parsing and complex math on their inputs. Writing safe parsers is notoriously difficult, and ideally suited to memory safe languages and parser-generators. All this adds up to a high likelihood of idiosyncratic bugs, which are attractive to exploit.

This is why Apple disables most codecs in iMessage in lockdown mode.

Already__Taken

3 days ago

The video decoder ecosystem is obscure, opaque, diverse, highly privileged, largely untested, and highly exposed—a dangerous combination.

from the hyperlink you copied...

Koshkin

3 days ago

Point being, using a different programming language is going to change that?

timschmidt

3 days ago

Microsoft: 70 percent of all security bugs are memory safety issues: https://www.zdnet.com/article/microsoft-70-percent-of-all-se...

Rust's Memory Safety Model: An Evaluation of Its Effectiveness in Preventing Common Vulnerabilities: https://www.sciencedirect.com/science/article/pii/S187705092...

"...can Rust achieve the memory-safety promise? This paper studies the question by surveying 186 real-world bug reports collected from several origins which contain all existing Rust CVEs (common vulnerability and exposures) of memory-safety issues by 2020-12-31. We manually analyze each bug and extract their culprit patterns. Our analysis result shows that Rust can keep its promise that all memory-safety bugs require unsafe code..."

You can't change the environment codecs are expected to run in, or the data they are used to process, but you can use effective tools to limit the sorts of bugs you are likely to run into, and isolate them to the relatively small area between unsafe declarations. Exploitable C style memory errors simply aren't possible in memory safe languages like Python, for instance, but Python leaves performance on the table which is valuable for codecs. Rust provides a high degree of memory safety while performing as well as C or C++.

Koshkin

3 days ago

By Rust will not by itself make them less

> obscure, opaque, diverse, highly privileged, largely untested, and highly exposed

timschmidt

3 days ago

Rust addresses opaqueness and untestedness directly, by enforcing the language constraints at compile time. Rust addresses the privileged nature of the code and it's exposure directly by disallowing the most common bugs used to gain access to that privileged execution. Obscurity and diversity can both benefit from Rust's integration of Cargo and the resultant ease with which it allows for sharing common code.

In short, expressing this code in an appropriate language can absolutely address and even go some way toward correcting each of those issues.

angiosperm

4 days ago

Translating it back to C or C++ afterward, it would be equally as safe, and easier to deploy.

pornel

4 days ago

Rust projects are pretty easy to deploy. It's just LLVM underneath, and the product is similar to clang-built code. You get a library, static or dynamic, that you can link with anything that can link with C.

Rust projects are much easier to build, especially when supporting multiple platforms. I've converted projects to Rust to make them easier to build and deploy.

johnisgood

2 days ago

Does Cargo re-use dependencies today? Last time I tried to build medium-sized Rust projects, it pulled hundreds of dependencies each time, even the same ones. It took up too much space, and took too long to build.

steveklabnik

2 days ago

Yes, and it always has. If you do a second build, it will not rebuild those dependencies again.

> even the same ones.

Rust supports multiple versions of the same dependency, so that may be what you observed.

maximilianburke

4 days ago

That would only work if the C code was treated as a generated artifact and not touched directly. If the C code is worked on directly it will be just as susceptible to unsafe changes as before.

acdha

4 days ago

Comparably easy to deploy perhaps but easier? Is there some scenario you have in mind where that would be worth the overhead of that complex extra step?