Introduction

So yeah, I am writing a Wayland compositor (actually, more like a library for compositors). Since I am in this rabbit hole already, I thought I might as well document my journey through it.

This series isn't meant to be a tutorial on how to write a Wayland compositor in Rust (although, if this project ends up being successful, then I would need to write one of those too). It's more of a record of my struggle with Rust and/or Wayland. Consequently, these chapters aren't going to be in a logical order, I am just writing things down as I encounter them.

Why?

Well, there are a couple of reasons. Understanding code, even those written by one's self, is hard. Having a written record will help future me understand why certain design decisions are made.

And since I don't consider myself an expert on Rust, and definitely not one on Wayland either, I hope other people reading this can point out things I missed and perhaps lead to a better design.

Lastly, it's difficult to remain motivated on a big project like this. Knowing there are other people caring about this project would help me a lot. How do I gather some attention without something concrete to show? Well, I write about it.

OK, but why a Wayland compositor?

Because it's fun, because I want to see Wayland, and Rust thrive.

Personally, I maintain a small compositor for X11, so when I say the X11 design is showing some age, I am speaking from experience. On the other hand, the Wayland design does look a lot better. And it's also where the most of the effort seems to be focused on - I want in on all the fun too!

And since the compositor is security critical, Rust seems to be a good candidate to write it in. Additionally I want to try out the new, shiny async/await feature of Rust as well.

Prior art

Obviously, I am not the first person to write a Wayland compositor in Rust. People have tried/been trying to do the same thing. Learning from others is a good a idea, so I think I need to talk a bit about them.

First of all, yes, I am aware of the way-cooler project, the failed attempt of writing a Wayland compositor in Rust. Many may point to that and say Rust isn't a suitable language choice. But, not to sound arogant, I think I can do better. If nothing else, at least I can learn from their experiences.

And people have indeed been able to do better! The Smithay project! Not sure if you have heard about it, but they are a suite of Rust libraries for writing Wayland compositors and clients in Rust. In other words, their goals overlap with mine.

So why am I starting my own library? Without going in to details (as that's what the rest of this series is about), this project has, in general, a different design style to Smithay. It also a smaller scope, which I hope could lead to a clearer design.

Expectation (of the reader)

The reader (you) is expected to have some moderate understanding of Rust. But I don't expect you to know anything about Wayland. Basically, me when I started this project.

Expectation (of me/of this project)

So, what's this project currently at?

Well, have a look:

Voilà! Stuff on the screen! Other than that, pretty much nothing else works. Windows are upside down, no mouse/keyboard, things aren't scaled correctly, yada yada.

There is still a long way to go. And there is no guarantee this project would succeed. I hope it does, if not, I hope at least what I wrote here is entertaining.

The Rust borrow checker is annoying, Pt. 2

OK, quick wayland primer. One of the central concepts in wayland is object. Each client connected to the wayland server has a set of objects bound to it. Client can invoke functions on those objects by sending a message to server with the right object ID.

Let's have a look at how we might implement this in Rust. First of all, a set of objects indexed via their object IDs, seems easy:

#![allow(unused)]
fn main() {
let objects: HashMap<u32, Object> = HashMap::new();
}

Now let's handle the messages:

#![allow(unused)]
fn main() {
let object_id = get_message_object_id();
let object = objects.get(message.object_id).unwrap();
// To decode a message, we need to know the type of the object.
let message = decode_message(object.interface);
object.handle_message(message);
}

Looks straightforward. But problem arises when our handle_message implementation needs access to the set of objects. For example, wl_compositor.create_surface creates new surfaces, so it needs to be able to insert into objects.

But we can't change handle_message(message) to handle_message(&mut objects, message), because we are already borrowing objects immutably via object.

And it's not as simple as putting objects in a RefCell either:

#![allow(unused)]
fn main() {
let objects_borrow = objects.borrow();
let object = objects_borrow.get(message.object_id).unwrap();
object.handle_message(objects.borrow_mut(), message); // Fails, because a borrow already exists
}

Similarly, it's also a problem if handle_message wants to modify an object inside objects, simply because we can't pass a &mut objects to handle_message, so there is no way of getting mutable references to objects. This forces the object to implement interior mutably. Which isn't the end of the world, but also not very nice.

Hopefully you can see this is a genuine problem, not simply because I am terrible at writing Rust. Now let's look at some solutions I've come up with.

Solutions

Solution 1

Make Object Rc<Object>, i.e.

#![allow(unused)]
fn main() {
let objects: HashMap<u32, Rc<Object>> = HashMap::new();
}

This solves the first problem - because we can clone the Rc<Object> and thus don't need to keep borrowing objects - but doesn't solve the second.

This also make the lifetime of Objects unpredictable. Previously, objects will be freed when objects is dropped; now because they are behind Rc, they can hang around indefinitely.

Solution 2

Remove the object from objects, put it back after handling the message.

This solves both problem, but carries some mental baggage. handle_message might call into other parts of the compositor toolbox, which might make the assumption that the object is still in objects. This seems error-prone.

This also has the same problem as solution 3, it does extra HashMap lookups for the re-insertion.

Solution 3

Make handle_message not take self, i.e.

#![allow(unused)]
fn main() {
Object::handle_message(&mut objects, message);
}

This solves both problems, but if handle_message needs access to the object itself, it will have to get it again from objects, essentially doing extra HashMap lookups.

Solution 4

Cache modifications to objects while it was borrowed, i.e.

#![allow(unused)]
fn main() {
let mut objects_modifications = ObjectsModification::new();
object.handle_message(&mut objects_modifications, message);
objects.apply(objects_modifications);
}

This solves the first problem, but can be a little counter-intuitive, as modifications made to the set of objects won't be immediately visible.

And this doesn't solve the second problem.

Conclusion

Which one is the best solution? I don't know. I am inclined to go with solution 3 as it's the least bad, but that's not the main point here - I wanted to present a case where the Rust's lifetime mechanism meets a real world use case, and the problems that came with it. Although not an unsolvable problem, all of the solutions presented have some awkwardness to them, and that is the kind of trade-off one might face when working with Rust.

I am not an expert on Rust, so please let me now if you think there is a better solution that I missed.

You probably should avoid putting lifetime parameters on traits

Having a lifetime parameter attached to a trait makes it much easier to accidentally create an unusable trait - I only realized after a lot of hardship fighting with the borrow checker.

Here is an example I encountered recently.

I tried to make my code generic over some sort of "lockable" types. Like a Mutex<T>, which you can call .lock() on, and after you locked it there is a set of methods you can call on T. I didn't want to lock the user into a predetermined Mutex type, so I came up with this trait:

#![allow(unused)]
fn main() {
trait Lockable {
    type Guard<'a>: Locked<'a> where Self: 'a;
    fn lock(&self) -> Self::Guard<'_>;
}
}

(Quick side note, if you don't know, this where Self: 'a bound is forced by the compiler, whether it's actually needed or not. You can see this issue for more details.)

And for the Locked<'a>, it implements a set of predefined methods. Naively, I wrote down this trait for it:

#![allow(unused)]
fn main() {
trait Locked<'a> {
    type Iter: Iterator<Item = &'a u32> + 'a;

    /// A scapegoat example to illustrate the problem I am going to have.
    /// This is not that weird a method to have - imagine a `Mutex<HashMap>`,
    /// after locking it you might want to iterate over its keys.
    fn iter(&'a self) -> Self::Iter;
}
}

OK, this seems innocent enough, what would happen if we try to use it?

#![allow(unused)]
fn main() {
fn test<T: Lockable>(t: &T) {
    let x = t.lock();
    x.iter();
}
}

Should work, right?

No. Instead, we get this very confusing error:

error[E0597]: `x` does not live long enough
  --> src/lib.rs:19:5
   |
19 |     x.iter();
   |     ^^^^^^^^ borrowed value does not live long enough
20 | }
   | -
   | |
   | `x` dropped here while still borrowed
   | borrow might be used here, when `x` is dropped and runs the destructor for type `<T as Lockable>::Locked<'_>`

What is going on here? If we take the error message at face value, it doesn't make a lot of sense: x is dropped while still borrowed, OK. Why was it still borrowed? Because it was used later. For what? For its destructor, because it was dropped later. What?

It's a head-scratcher, isn't it? Indeed, this error took me quite sometime to decipher, but I think I can explain what it actually means.

First, let me put the elided lifetime parameter back into fn lock:

#![allow(unused)]
fn main() {
fn lock<'a>(&'a self) -> Self::Guard<'a>;
}

When we call t.lock(), the compiler must figure out a lifetime to assign to 'a. This lifetime is used in the return type Self::Guard<'a>, and all the compiler know about Self::Guard<'a> is that it implements Locked<'a>. Locked<'a> is a trait, which means it is invariant w.r.t lifetime 'a. (If you don't know what variance is, see here, and here.)

And here lies the problem, in the context of our function test:

#![allow(unused)]
fn main() {
fn test<T: Lockable>(t: &T) {
    let x = t.lock();
    // ...
}
}

Let's call the type of x T::Guard<'lock>. 'lock is the lifetime of the implicit borrow of t that happened when we called t.lock(). Since x borrows from this lifetime (because of the signature of fn lock), 'lock must last longer than x.

Based on the trait bound, T::Guard<'lock> implements Locked<'lock>. And because of trait's invariance w.r.t. its lifetime, Guard implementing Locked<'lock> doesn't mean it implements Locked<'shorter> for any 'shorter lifetime. When we call x.iter(), it can only return Locked::Iter: 'lock. Which means x is actually borrowed for 'lock! It's borrowed for a lifetime that is actually longer than its own lifetime! (I think it's fair to say rustc's diagnostic here, while not being wrong, can use some polish.)

And once we figured this out, the solution is simple. One way is do away the lifetime parameter on Locked. For example, we can make it:

#![allow(unused)]
fn main() {
type Guard<'a>: Locked + 'a;
}

The whole example:

#![allow(unused)]
fn main() {
trait Lockable {
    // Unlike `Trait<'a>`, `Trait + 'a` is covariant w.r.t. `'a`.
    type Guard<'a>: Locked + 'a where Self: 'a;
    fn lock(&self) -> Self::Guard<'_>;
}
trait Locked {
    type Iter<'a>: Iterator<Item = &'a u32> + 'a where Self: 'a;
    fn iter(&self) -> Self::Iter<'_>;
}
fn test<T: Lockable>(t: &T) {
    let x = t.lock();
    x.iter();
}
}

And looking back, it's now clear our Locked<'a>::iter function was wrong: the Locked iterator yields items that lives as long as the Lockable type, yet if you think about the semantic of a lock, Locked really should only yield item that live as long as the Guard.

(If you really want to keep the lifetime on Locked<'a>, there is a way:

#![allow(unused)]
fn main() {
trait Lockable {
    type Guard<'a>: Locked<'a> where Self: 'a;
    fn lock(&self) -> Self::Guard<'_>;
}
trait Locked<'a> {
    type Iter<'b>: Iterator<Item = &'b u32> + 'b where 'a: 'b, Self: 'b;
    fn iter<'b>(&'b self) -> Self::Iter<'b> where 'a: 'b;
}
}

I am not trying to show lifetime parameter is absolutely not workable, I just want to say it often is an unexpected trap for new comers. Plus, there are other ways this invariance can be annoying.)

I ended up not needing such a trait at all, but I think this is a really good example why having lifetime parameter on a trait might not be a very good idea. In fact, most of the traits in std doesn't have an explicit lifetime parameter. There are cases where the use of a lifetime parameter can be justified, serde::de::Deserialize is such an example. But in general, you probably should think twice before using it.

Updates:

2023-Mar-05: Fixed an inaccuracy in the explanation of the error. Thanks to this awesome explanation by Daniel H-M.

January 2023 updates

I have implemented surface scaling and fixed coordinate system conversion between wayland surface coordinates and screen coordinates. So now everything is finally the right size and the right side up:

demo

Behind the scenes, I fixed the implementation of sync'd wayland subsurfaces. The previous implementation was actually completely wrong.

Next, I will try to implement mouse and keyboard input support. Once that is done, I plan to make the repository public.

The Adventure of Creating a Wayland Compositor Toolbox in Rust