Observational Equivalence and Unsafe Code

I spent a really interesting day last week at Northeastern University. First, I saw a fun talk by Philip Haller covering LaCasa, which is a set of extensions to Scala that enable it to track ownership. Many of the techniques reminded me very much of Rust (e.g., the use of spores, which are closures that can limit the types of things they close over); if I have time, I’ll try to write up a more detailed comparison in some later post.

Next, I met with Amal Ahmed and her group to discuss the process of crafting unsafe code guidelines for Rust. This is one very impressive group. It’s this last meeting that I wanted to write about now. The conversation helped me quite a bit to more cleanly separate two distinct concepts in my mind.

The TL;DR of this post is that I think we can limit the capabilities of unsafe code to be things you could have written using the safe code plus a core set of unsafe abstractions (ignoring the fact that the safe implementation would be unusably slow or consume ridiculous amounts of memory). This is a helpful and important thing to be able to nail down.

Announcing intorust.com

For the past year or so, I and a few others have been iterating on some tutorial slides for learning Rust. I’ve given this tutorial here at the local Boston Rust Meetup a few times, and we used the same basic approach at RustConf; I’ve been pretty happy with the results. But until now it’s been limited to in person events.

That’s why I’m so happy to announce a new site, Into Rust. Into Rust contains screencasts of many of these slides, and in particular the ones I consider most important: those that cover Ownership and Borrowing, which I think is the best place to start teaching Rust. I’ve divided up the material into roughly 30min screencasts so that they should be relatively easy to consume in one sitting – each also has some associated exercises to help make your knowledge more concrete.

I want to give special thanks to Liz Baillie, who did all the awesome artwork on the site.

Distinguishing Reuse From Override

In my previous post, I started discussing the idea of intersection impls, which are a possible extension to specialization. I am specifically looking at the idea of making it possible to add blanket impls to (e.g.) implement Clone for any Copy type. We saw that intersection impls, while useful, do not enable us to do this in a backwards compatible way.

Today I want to dive a bit deeper into specialization. We’ll see that specialization actually couples together two things: refinement of behavior and reuse of code. This is no accident, and its normally a natural thing to do, but I’ll show that, in order to enable the kinds of blanket impls I want, it’s important to be able to tease those apart somewhat.

This post doesn’t really propose anything. Instead it merely explores some of the implications of having specialization rules that are not based purely on subsets of types, but instead go into other areas.

Intersection Impls

As some of you are probably aware, on the nightly Rust builds, we currently offer a feature called specialization, which was defined in RFC 1210. The idea of specialization is to improve Rust’s existing coherence rules to allow for overlap between impls, so long as one of the overlapping impls can be considered more specific. Specialization is hotly desired because it can enable powerful optimizations, but also because it is an important component for modeling object-oriented designs.

The current specialization design, while powerful, is also limited in a few ways. I am going to work on a series of articles that explore some of those limitations as well as possible solutions.

This particular posts serves two purposes: it describes the running example I want to consder, and it describes one possible solution: intersection impls (more commonly called lattice impls). We’ll see that intersection impls are a powerful feature, but they don’t completely solve the problem I am aiming to solve and they also intoduce other complications. My conclusion is that they may be a part of the final solution, but are not sufficient on their own.

Thoughts on Trusting Types and Unsafe Code

I’ve been thinking about the unsafe code guidelines a lot in the back of my mind. In particular, I’ve been trying to think through what it means to trust types – if you recall from the Tootsie Pop Model (TPM) blog post, one of the key examples that I was wrestling with was the RefCell-Ref example. I want to revisit a variation on that example now, but from a different angle. (This by the way is one of those Niko thinks out loud blog posts, not one of those Niko writes up a proposal blog posts.)

'Tootsie Pop' Followup

A little while back, I wrote up a tentative proposal I called the Tootsie Pop model for unsafe code. It’s safe to say that this model was not universally popular. =) There was quite a long and fruitful discussion on discuss. I wanted to write a quick post summarizing my main take-away from that discussion and to talk a bit about the plans to push the unsafe discussion forward.

The 'Tootsie Pop' Model for Unsafe Code

In my previous post, I spent some time talking about the idea of unsafe abstractions. At the end of the post, I mentioned that Rust does not really have any kind of official guidelines for what kind of code is legal in an unsafe block and what is not.What this means in practice is that people wind up writing what seems reasonable and checking it against what the compiler does today. This is of course a risky proposition since it means that if we start doing more optimization in the compiler, we may well wind up breaking unsafe code (the code would still compile; it would just not execute like it used to).

Now, of course, merely having published guidelines doesn’t entirely change that dynamic. It does allow us to assign blame to the unsafe code that took actions it wasn’t supposed to take. But at the end of the day we’re still causing crashes, so that’s bad.

This is partly why I have advocated that I want us to try and arrive at guidelines which are human friendly. Even if we have published guidelines, I don’t expect most people to read them in practice. And fewer still will read past the introduction. So we had better be sure that reasonable code works by default.

Interestingly, there is something of a tension here: the more unsafe code we allow, the less the compiler can optimize. This is because it would have to be conservative about possible aliasing and (for example) avoid reordering statements. We’ll see some examples of this as we go.

Still, to some extent, I think it’s possible for us to have our cake and eat it too. In this blog post, I outline a proposal to leverage unsafe abstaction boundaries to inform the compiler where it can be aggressive and where it must be conservative. The heart of the proposal is the intution that:

when you enter the unsafe boundary, you can rely that the Rust type system invariants hold;
when you exit the unsafe boundary, you must ensure that the Rust type system invariants are restored;
in the interim, you can break a lot of rules (though not all the rules).

I call this the Tootsie Pop model: the idea is that an unsafe abstraction is kind of like a Tootsie Pop. There is a gooey candy interior, where the rules are squishy and the compiler must be conservative when optimizing. This is separated from the outside world by a hard candy exterior, which is the interface, and where the rules get stricter. Outside of the pop itself lies the safe code, where the compiler ensures that all rules are met, and where we can optimize aggressively.

One can also compare the approach to what would happen when writing a C plugin for a Ruby interpreter. In that case, your plugin can assume that the inputs are all valid Ruby objects, and it must produce valid Ruby objects as its output, but internally it can cut corners and use C pointers and other such things.

In this post, I will elaborate a bit more on the model, and in particular cover some example problem cases and talk about the grey areas that still need to be hammered out.

Unsafe Abstractions

The unsafe keyword is a crucial part of Rust’s design. For those not familiar with it, the unsafe keyword is basically a way to bypass Rust’s type checker; it essentially allows you to write something more like C code, but using Rust syntax.

The existence of the unsafe keyword sometimes comes as a surprise at first. After all, isn’t the point of Rust that Rust programs should not crash? Why would we make it so easy then to bypass Rust’s type system? It can seem like a kind of flaw in the design.

In my view, though, unsafe is anything but a flaw: in fact, it’s a critical piece of how Rust works. The unsafe keyword basically serves as a kind of escape valve – it means that we can keep the type system relatively simple, while still letting you pull whatever dirty tricks you want to pull in your code. The only thing we ask is that you package up those dirty tricks with some kind of abstraction boundary.

This post introduces the unsafe keyword and the idea of unsafety boundaries. It is in fact a lead-in for another post I hope to publish soon that discusses a potential design of the so-called Rust memory model, which is basically a set of rules that help to clarify just what is and is not legal in unsafe code.

Non-lexical Lifetimes: Adding the Outlives Relation

This is the third post in my series on non-lexical lifetimes. Here I want to dive into Problem Case #3 from the introduction. This is an interesting case because exploring it is what led me to move away from the continuous lifetimes proposed as part of RFC 396.

Non-lexical Lifetimes Based on Liveness

In my previous post I outlined several cases that we would like to improve with Rust’s current borrow checker. This post discusses one possible scheme for solving those. The heart of the post is two key ideas:

Define a lifetime as a set of points in the control-flow graph, where a point here refers to some particular statement in the control-flow graph (i.e., not a basic block, but some statement within a basic block).
Use liveness as the basis for deciding where a variable’s type must be valid.

The rest of this post expounds on these two ideas and shows how they affect the various examples from the previous post.

← Older Blog Archives

Baby Steps

A blog about programming, and tiny ways to improve it.