Typestates in Rust (2018)

cdirkx · on Oct 31, 2019

When const generics are fully implemented (they are partially usable on nightly rust now) this will enable an even stronger version of this pattern, allowing you the full power of enums to represent the state.

  enum SenderState {
    ReadyToSendHello,
    HasSentHello,
    HasSentNumber,
    HasReceivedNumber
  }

  Sender<const S: SenderState> {
    ...
  }

  impl Sender<{SenderState::ReadyToSendHello}>{
    ...
  }

One can then give the individual states extra parameters:

  HasSentNumber {
    number: u32
  }

(Note that in this case that doesn't make much sense, the number that is sent is more associated data then an actual type parameter. There is no real difference, at the type level, between HasSentNumber { number: 3 } and HasSentNumber { number: 6 }, and the compiler generating two types for this would be unnecessary. It is only an example of the syntax.)

oleganza · on Nov 1, 2019

You can do session types already in stable Rust without const generics. Each "enum variant" can be its own separate type, and such types can only be instantiated by a method on the type representing the previous state, which also consumes that previous state.

We've used them in our Bulletproofs MPC (multi-party computation) where cryptographic requirement is not to replay the protocol from the mid-point. Since the user is supposed to perform these transitions within their application on top of network messages, such strongly-typed API guarantees that, if their program has compiled successfully, then:

1) Steps are performed in correct order. 2) The protocol cannot be replayed from the intermediate state.

We have wrote about it here: https://medium.com/interstellar/bulletproofs-pre-release-fcb... - scroll to "strongly-typed multiparty computation".

oleganza · on Nov 3, 2019

Here is updated link to the original article on MPC by Cathie Yun: https://medium.com/@cathieyun/bulletproof-multi-party-comput...

cdirkx · on Oct 31, 2019

Re: associated data for a state, that's possible too!

  trait StateData {
    type Data;
  }

  struct Sender<const S: SenderState> {
    ... // sender implementation
    state_data: <Self as StateData>::Data
  }

  impl<const S: SenderState> StateData for Sender<{S}> {
    // default: no associated data
    default type Data = ();  
  }

  struct HasSentNumberData {
    sent: u32
  }

  // using a specialized impl (also an unstable feature).
  impl StateData for Sender<{SenderState::HasSentNumber}> {
    // instead of a complete struct we could also have done Data = u32, 
    // but a separate struct would enable a nicer constructor, default values, helper methods and so on.
    // (ommitted in this example though)
    type Data = HasSentNumberData;
  }

  struct HasReceivedNumberData {
    sent: u32,
    received: u32
  }

  impl StateData for Sender<{SenderState::HasReceivedNumber}> {
    type Data = HasReceivedNumberData;
  }

Example usage:

  impl Sender<{SenderState::HasReceivedNumber}> {
    ...

    fn validate_received_number(&self) -> bool {
      self.state_data.received == self.state_data.sent   
    }
  }  


  let sender = Sender::<{SenderState::HasReceivedNumber}> { state_data: HasReceivedNumberData { sent: 3, received: 28 } };
  assert_eq!(sender.validate_received_number() == false) // client sent wrong number!

Playground link to see this in action: https://play.rust-lang.org/?version=nightly&mode=release&edi...

Munksgaard · on Nov 1, 2019

As others have mentioned, and as the post itself also was updated to say, you don't need const generics to support this pattern: https://github.com/Munksgaard/session-types, http://munksgaard.me/papers/laumann-munksgaard-larsen.pdf

cdirkx · on Nov 1, 2019

Yes you don't need const generics, after all that is what the entire post is about.

I was merely offering a perspective on how the const generics might offer an alternative way to express the same ideas, and possibly change the idiomatic way to write them. Type level values, especially enums, IMHO offer an intuitive way to encode what is going on.

hardwaregeek · on Oct 31, 2019

A good sign in a type system for me is that the types are accurately modeling behavior that I find in day to day code. To the point where moving back to a language that lacks this feature feels weirdly unsafe or not expressive enough. Going back to C APIs, with their states that need to be read via bit masks or through special integer return values, is just painful after using discriminated unions. Likewise being able to have multiple owners to a value is a little weird after using Rust.

Also what's the deal with the fi ligature in the code? It throws off the kerning and holds no purpose.

reificator · on Oct 31, 2019

> Also what's the deal with the fi ligature in the code? It throws off the kerning and holds no purpose.

I'm not seeing a ligature there, or at least not an obvious one. What browser/platform are you using?

hardwaregeek · on Oct 31, 2019

Interesting. The CSS is using `-moz-font-feature-settings: "liga" on` for some reason. I'm using Firefox on macOS. I assume you're not?

reificator · on Oct 31, 2019

Firefox on Windows. Do you have a screenshot? Maybe I'm just blind.

hardwaregeek · on Oct 31, 2019

Here you go: https://gfycat.com/angrycanineboa

Click the gear if it's too small for you.

reificator · on Oct 31, 2019

Font-Family for me is "SFMono-Regular","Liberation Mono","Roboto Mono",Menlo,Monaco,Consolas,"Courier New",Courier,monospace

Looks like I'm falling all the way back to consolas. I'm pretty sure SFMono comes with macOS, which explains the platform differences between us.

That might explain why the fi ligature has been a common complaint lately, as more people start using San Francisco in their sites.

saagarjha · on Oct 31, 2019

> I'm pretty sure SFMono comes with macOS, which explains the platform differences between us.

Not in a way that would be relevant here. I’d expect this font stack to fall back to Menlo on macOS.

reificator · on Nov 1, 2019

And does Menlo have this ligature? I don't have a mac to test with.

saagarjha · on Nov 1, 2019

It does, but it only works in Safari if I add font-feature-settings: "liga" on.

stouset · on Oct 31, 2019

Just checked on Safari in Catalina, I don't see an `fl` ligature.

frio · on Oct 31, 2019

We do something like this to encode our Redux states at compile-time (using TypeScript, obviously). Previously, using regular JS, the Redux devtools made tracking down incorrectly implemented reducers/state transition reasonably straightforward -- but you still had to trigger a bug before you knew you had to track it down, and implement tests etc.

This kind of design pattern measurably saves us time; it reduces the volume of unit tests we need to write/update when we make changes (we still test, just... with less fear), and it prevents newer developers from making mistakes. I haven't played with Rust (beyond a couple of toy projects) yet, and articles like this remind me I'm looking forward to sinking my teeth in over the Christmas break.

amykyta · on Nov 1, 2019

Cool! Are you referring to the phantom types technique? I am interested in how you implement this in typescript. Do you have a blog post about it you can refer me to? Or even just a gist?

frio · on Nov 2, 2019

I'll have to read up on that. We just leverage enums and Typescript's slightly more strict type checking to state that the store can be in only certain states, and use the excellent typesafe-actions library. I'll write up a blog post. There are probably flaws in our methodology but when we took some new grads on recently, it helped a lot :).

dmkolobov · on Nov 1, 2019

This is one of my favorite patterns from Haskell, which I've known as "type-level programming". One of the most common examples is a "sized vector"[1] which allows compile-time bounds checking. This is achieved by annotating each vector type with a phantom type variable representing its size.

Nice to see something like this in Rust! One thing that's a bit of a bummer, and I'm sure there are very good reasons for this, is that we HAVE to use every type argument of struct in its definition. If this restriction were to be relaxed, we wouldn't need the "state" field in the struct at all, and we could make the state type variable truly "phantom".

[1] https://www.schoolofhaskell.com/user/konn/prove-your-haskell...

EDIT: Typos.

kccqzy · on Nov 1, 2019

No. Singletons in Haskell are the single most over-engineered and ugliest thing I've seen. If you are going this route, you might as well use a real dependently typed language like Agda or Coq, and the latter supports extracting simple Haskell code from your Coq code while keeping your proofs in Coq.

Indexed types have their uses but just think about this: for your length-indexed vector example, what if you want to read a file and put its lines into such a vector? Your return type needs the length but it's not known until runtime. Okay you use existential quantification, but you now also need a runtime representation of your types. That's done through singletons. But that destroys Haskell's phase separation and means the inefficient ways you use to prove things and encode naturals now leak to the runtime! Have fun with your unary numbers then.

choudanu4 · on Nov 1, 2019

Does the PhantomData type in Rust do what you want? Or are you talking about something slightly different?

https://doc.rust-lang.org/nomicon/phantom-data.html

https://doc.rust-lang.org/std/marker/struct.PhantomData.html

So

  struct Sender<S> {
    /// Actual implementation of network I/O.
    inner: SenderImpl;  
    /// 0-sized field, doesn't exist at runtime.
    state: S;
  }

I believe could become

  struct Sender<S> {
    /// Actual implementation of network I/O.
    inner: SenderImpl;  
    /// 0-sized field, doesn't exist at runtime.
    _marker: PhantomData<S>;
  }

I've never used this PhantomData personally, so this might be wrong. Cheers!

dmkolobov · on Nov 1, 2019

It does! But you still have to actually "use" it, meaning that it appears as a field in the definition, and you have pass a PhantomData value whenever you create new instances of your data type.

In Haskell, you can omit these fields entirely, and achieve the same thing just by annotating the function.

For example, in Haskell, we can have

  data Const a b = Const a

whereas in Rust, it would be:

  struct Const<A, B> {
      konst: A, 
      // does not exist at run-time
      discard: PhantomData<B>
  }

Rusky · on Nov 1, 2019

The reason for this requirement is lifetime subtyping: https://doc.rust-lang.org/nomicon/subtyping.html

Type parameter variance is inferred from usage (e.g. covariant for normal fields, contravariant for function arguments) and without a usage there's no way to infer it.

likeliv · on Nov 1, 2019

There is an example of indexed vecteur in the rustc source code https://github.com/rust-lang/rust/blob/master/src/librustc_i...

nitnelave · on Oct 31, 2019

I wrote something similar in C++, completely compile time that disappears at runtime: https://www.fluentcpp.com/2019/09/24/expressive-code-for-sta...

Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.

pcwalton · on Oct 31, 2019

> Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.

Not soundly. Such static analysis can be trivially defeated using e.g. virtual methods.

nitnelave · on Oct 31, 2019

True, it's not bulletproof. However, with a bit of discipline it's possible to use it safely. And anyway, it's already an improvement over just documentation like "don't call this until you called that"

RcouF1uZ4gsC · on Oct 31, 2019

Basically, they are using Rust to encode a state machine using types. This is brilliant! The nice thing is that the transitions are determined at compile time so there can be performance benefit as well as compile time correctness testing.

Really neat techniques!

cies · on Oct 31, 2019

https://hoverbear.org/blog/rust-state-machine-pattern/

Here another article, but with a more classical state machine impl.

reificator · on Oct 31, 2019

Another example, implementing IMAP with Rust's affine types.

https://insanitybit.github.io/2016/05/30/beyond-memory-safet...

staticassertion · on Nov 1, 2019

Hey, I wrote that article. There's now an actual, functional IMAP library that uses these techniques here:

https://github.com/jonhoo/rust-imap

I did not contribute to that library, but it's nice to see a practical implementation. They use the same approach, essentially, as I describe in my blog.

pedrow · on Nov 1, 2019

Maybe not everyone will know this but Typestate was one of the original 'headline features' of Rust when Graydon Hoare started it. It was based on the Strom/Yemeni paper[0] for the NIL language. There's a mention of in this SO answer from 2010[1] and in the LtU discussion[2]. I'm not sure why it was de-emphasised, I think it didn't work as well as anticipated in the 'real world'

[0]: http://www.cs.cmu.edu/~aldrich/papers/classic/tse12-typestat...

[1]: https://stackoverflow.com/questions/3210025/what-is-typestat...

[2]: http://lambda-the-ultimate.org/node/4009

dhash · on Nov 1, 2019

Scala/Haskell has the same thing, and it's an amazing feature. The proper type is that of

    trait IndexedStateT[A, B, C]

Which signifies a typelevel state machine moving from state A to state B emitting a value of type C.

I can only speak for scala, but i'm assuming haskell has singleton and literal types as well. Meaning that code like this works great.

    object DoorOpen
    object DoorClosed
   
    class Door {
      def open: IndexedState[DoorClosed.type, DoorOpen.type, Unit]
      def close: IndexedState[DoorOpen.type, DoorClosed.type, Unit]
    }

   val d = Door()
   for {
      _ <- d.open() //works
      _ <- d.close() // works
      // _ <- d.close() //compile error
   }

By my understanding of the article, it uses the borrow/move state to implement the state transistion. Is this generalizable to arbitrary state machines, or only a simple 2-state one?

Rusky · on Nov 1, 2019

It generalizes to arbitrary state machines- check out the reference near the end to session types.

Each state is a type, methods consume their receiver (the "move state") and return the new state. You can't accidentally keep around a copy of the old state.

unlinked_dll · on Oct 31, 2019

This is a really useful pattern when you want to have rigidly defined/enforced state transitions to ensure that the data is never in an invalid state for a given operation.

It's pretty awful to deal with when you're unsure of what the state machine should look like, or if there needs to be a lot more flexibility in how the data is accessed. Maintainability nightmare.

An example of this I ran into is a data processing pipeline architecture where each vertex of the processing graph had a processing function called in a loop on its own dedicated thread. Using the type state pattern helped clearly define the "life cycle" of each vertex and enforce it, which provided for some powerful synchronization guarantees (e.g. we could provide some elements of memory safety even when loading things through shared libraries). If you dug into it you could break things, but that would be more work than just following the pattern.

gameswithgo · on Oct 31, 2019

Are there other languages that can do the trick done with "close()" on the file there? A type with a method that can make the type compile time unusable after being called!

that is pretty neat.

stouset · on Oct 31, 2019

Interestingly enough, if your `close` logic is implemented in the `drop` method, `close` itself can be written as:

    fn close(self) { }

...which is—not incidentally—the exact implementation of `std::mem::drop`(https://doc.rust-lang.org/beta/std/mem/fn.drop.html).

nimish · on Nov 1, 2019

Any dependently typed language worth its salt will be able to model a state machine in its type system.

Haskell can do it with some effort, maybe for simple stuff phantom types and GADTs are enough. With linear types the explicit "consumption" can be modeled.

Too · on Nov 1, 2019

Most languages have some variant of this where close is equal to variable going unaccessible. C# IDisposable, Python with-statement, Java try/resources, C++ RAII destructors, etc.

None are really as powerful but they come close and cover many use cases.

Sharlin · on Oct 31, 2019

There are, but they’re mostly research languages or otherwise experimental.

aliceryhl · on Nov 1, 2019

It's a big part of the whole idea behind Rust really.

amedvednikov · on Nov 1, 2019

> 2. we have seeked in a file that was already closed.

> The second error, however, is much harder to catch. Most programming languages support the necessary features to make this error hard, typically by closing the file upon destruction or at the end of a higher-order function call, but the only non-academic language that I know of that can actually entirely prevent that error is Rust.

Why not simply add an `is_closed` flag and throw an error if it is?

msl · on Nov 1, 2019

> Why not simply add an `is_closed` flag and throw an error if it is?

That's a way of doing it at runtime. The article describes catching the error at compile time.

amedvednikov · on Nov 1, 2019

I see, thanks.

aliceryhl · on Nov 1, 2019

The compiler is not able to catch that, and that the compiler can catch it is the whole idea behind type states.

justryry · on Nov 1, 2019

This is one my favorite features of the language and I believe to be fairly unique. It can make for some slick and safe state machine like code.

I do wish that we had reliable RVO so that this could come at zero cost.

nixpulvis · on Oct 31, 2019

https://github.com/Munksgaard/session-types

jbritton · on Oct 31, 2019

This line caught my eye.

my_file.open(); // Error: this may fail.

1) If this can fail, then it should be a compile error to not test the result code.

2) IMHO it would be nice if there was something like Python's with statement to correctly close a file.

    with open(filename, 'r') as f:
        f.read()
    # f.close() invoked automatically here

This prevents trying to close a file that is not opened.

The idea of encoding a state machine into the types seems interesting.

andolanra · on Oct 31, 2019

You should probably have read the rest of the blog post, because in Rust it is a compiler error to not test the result, and the post specifically calls this out. Specifically, look for this section:

    let mut my_file = MyFile::open(path)?;
    // Note the `?` above. It's a simple operator that asks
    // the compiler to check whether the operation succeeded.
    // The *only* way to obtain a `MyFile` object is to
    // have a successful `MyFile::open`.

    // At this line, `my_file` is a `MyFile`, which means
    // that we may use it.

Thaxll · on Nov 1, 2019

Is it though? Does the compiler not compile when you don't check / use the error result?

andolanra · on Nov 1, 2019

If you were to open a file that didn't exist and then not use the result at all, then you get a warning (which can be made an error, if you want):

    File::open("foo");
    // warning: unused `Result` that must be used

But that's also not necessarily an invalid program, it's just not a very useful one. Failing to open a file doesn't interrupt the program in any way, so File::open here will return either a file handle or an error, neither of which get used. However, if you open a file, don't check for an error, but still try to use the file, then your program won't compile:

    File::open("foo").read(&mut buffer);
    // error: method `read` not found in `Result<File, Error>`

You need to be explicit about what to do in case the file didn't open, and Rust won't let you use your file handle unless you explicitly indicate what to do in that error case!

kccqzy · on Nov 1, 2019

If you don't use the result, how would you be able to get the handle that represents the open file?

inferiorhuman · on Nov 1, 2019

You wouldn't. In this case a more likely example would be writing to the file. You don't have to check the return value (in C or Rust) to do something useful, but you should. In Rust you'll get a warning (or error depending on how your workspace is setup). In C, you won't get anything. It's also worth noting that the Result enum can be and is used for things beyond file I/O.

james-mcelwain · on Oct 31, 2019

In actual Rust APIs, opening a file returns a result, which needs to be handled or unwrapped in order to operate on the file.

All resources in Rust are implicitly like your Python with example -- when they go out of scope and destructors run, the file will be closed.

paulddraper · on Oct 31, 2019

> it would be nice if there was something like Python's with statement to correctly close a file.

In C++ or Rust, non-memory resource management is arguably even easier than Python.

RAII: Resource Acquisition Is Initialization.

(Apologies for the C++; can do the same thing in Rust.)

    ManagedFile myfile("path.txt");
    myfile.read();

`myfile` gets released whenever the scope finishes.

That said, the C++ and Rust stdlibs don't look like that because opening and closing files is not exception-free and unlike Python, C++ and Rust don't have or don't prefer non-explicit error handling.

strictfp · on Oct 31, 2019

1) A typesafe variant of this example, caught by the compiler, is what he implements further down 2) There is. Rust has the Drop trait, which guarantees destruction when going out of scope. Even better than `with` statements since the API doesn't require anything special from the user.

ChrisSD · on Oct 31, 2019

It's interesting to take note of the actual Rust `File` object: https://doc.rust-lang.org/std/fs/struct.File.html

`open` is a static function on the `File` object and an instance of `File` does not have a `close` method at all. So, as in this article, an instance of `File` can only be created if `open` succeeded and the file will be closed when dropped (either implicitly or explicitly). For example:

    fn main() -> std::io::Result<()> {
        // a file object is only created if File::create is successful
        // otherwise main returns an error
        let mut file = File::create("foo.txt")?;
        file.write_all(b"Hello, world!")?;
        Ok(())
    } // file closed here, no close method needed.

This contrasts with many other languages where a class instance can be in an invalid state. This is something the typestate pattern helps to avoid.

ridiculous_fish · on Oct 31, 2019

Why isn't there a close method? It seems strange to omit this, given that close can return an error.

edit: also closing stdin/stdout is common.

winstonewert · on Oct 31, 2019

You can see here some discussion of that question:

https://github.com/rust-lang/rust/issues/32255 https://github.com/rust-lang/rust/issues/59567

ChrisSD · on Oct 31, 2019

There's a sync_all method that will make sure all data and metadata is written or else return an error. This may be used at any time, not just when closing the file, and it is expensive so it does make sense to have a separate method for that case.

empath75 · on Oct 31, 2019

It automatically closes once it’s out of scope.

pornel · on Nov 1, 2019

In Rust this is achieved using `if let` for anything that uses `Result` and `Drop` (it's kinda neat how a few orthogonal features fit nicely together like that):

    if let Ok(f) = File::open(filename) {
        f.read();
        // drop(f) automatic here
    }

But Rust's move semantics give you extra flexibility, because you don't have to close it if you don't want to.

    let keep = None;
    if let Ok(f) = File::open(filename) {
        f.read();
        if random() {
           keep = Some(f);
        }
    }
    // the file *may* be usable beyond the first scope,
    // and it's still dropped correctly.

Values aren't dropped simply at the end of their initial scope (as in `with` or on-stack RAII), but after their last use, and language semantics allow the compiler to track globally where the last use is.

steveklabnik · on Nov 1, 2019

(drop types are still dropped at the end of lexical scope, even post NLL.)

pornel · on Nov 1, 2019

I've meant that if a value is moved, it won't be dropped in the scope it came from, but the one it's been moved to.

gameswithgo · on Oct 31, 2019

When you get a result type back in Rust you have to check it to use the value it contains. The worst you can do is call unwrap(), which says "Im assuming this will not fail, panic if I am wrong". But you can't forget to check.

MaulingMonkey · on Oct 31, 2019

RE 1: Several languages (including Rust!) have annotations to force you to test the result code. However, they generally leave my_file in scope even in the error path, which makes bugs easier to write, and which means the File type is forced to handle all the complexity of handling the case where the file wasn't successfully loaded for it's entire API surface.

RE 2: Rust takes the C++ approach of allowing types to define their own cleanup which gets auto-invoked when the object goes out of scope, instead of forcing the calling code to worry about it. Specifically, Rust types can implement the auto-invoked "Drop" trait, basically equivalent to C++'s destructors. You only need to call close if you want to close a file early (e.g. before the end of the scope), in Rust, which is fairly rare.

I vastly prefer this approach, but it admittedly doesn't play nicely with the... less deterministic lifetimes of objects in a garbage collected system, so I can understand why Python, C#, etc. have more explicit scope syntax for cleanup.

caleb-allen · on Oct 31, 2019

Feels similar to Kotlin's sealed classes

james-mcelwain · on Oct 31, 2019

They are similar to Kotlin's sealed classes, except there's no concept in Kotlin of an object "consuming" itself, which is necessary for maintaining safety in the API.

For example, in Kotlin, if a method on state A returns state B, there's no way to invalidate all existing references to state A. Normally this invariant would be enforced on the caller's side, or perhaps by throwing an IllegalStateException if state A is called after producing state B.

Nycto · on Oct 31, 2019

Adding to this, I see two other elements that make this tough to replicate in Kotlin:

1. Type erasure

2. Using sealed classes requires instantiation, while the Rust version is zero overhead.

paulddraper · on Nov 1, 2019

> 1. Type erasure

There is no need to access type information at runtime here.

Congeec · on Oct 31, 2019

also Monad in Haskell

TheAsprngHacker · on Oct 31, 2019

Huh? A monad isn't a sealed class, it's a generic type equipped with the functions bind and return...