When const generics are fully implemented (they are partially usable on nightly rust now) this will enable an even stronger version of this pattern, allowing you the full power of enums to represent the state.
One can then give the individual states extra parameters:
HasSentNumber {
number: u32
}
(Note that in this case that doesn't make much sense, the number that is sent is more associated data then an actual type parameter. There is no real difference, at the type level, between HasSentNumber { number: 3 } and HasSentNumber { number: 6 }, and the compiler generating two types for this would be unnecessary. It is only an example of the syntax.)
You can do session types already in stable Rust without const generics. Each "enum variant" can be its own separate type, and such types can only be instantiated by a method on the type representing the previous state, which also consumes that previous state.
We've used them in our Bulletproofs MPC (multi-party computation) where cryptographic requirement is not to replay the protocol from the mid-point. Since the user is supposed to perform these transitions within their application on top of network messages, such strongly-typed API guarantees that, if their program has compiled successfully, then:
1) Steps are performed in correct order.
2) The protocol cannot be replayed from the intermediate state.
Re: associated data for a state, that's possible too!
trait StateData {
type Data;
}
struct Sender<const S: SenderState> {
... // sender implementation
state_data: <Self as StateData>::Data
}
impl<const S: SenderState> StateData for Sender<{S}> {
// default: no associated data
default type Data = ();
}
struct HasSentNumberData {
sent: u32
}
// using a specialized impl (also an unstable feature).
impl StateData for Sender<{SenderState::HasSentNumber}> {
// instead of a complete struct we could also have done Data = u32,
// but a separate struct would enable a nicer constructor, default values, helper methods and so on.
// (ommitted in this example though)
type Data = HasSentNumberData;
}
struct HasReceivedNumberData {
sent: u32,
received: u32
}
impl StateData for Sender<{SenderState::HasReceivedNumber}> {
type Data = HasReceivedNumberData;
}
Yes you don't need const generics, after all that is what the entire post is about.
I was merely offering a perspective on how the const generics might offer an alternative way to express the same ideas, and possibly change the idiomatic way to write them. Type level values, especially enums, IMHO offer an intuitive way to encode what is going on.
A good sign in a type system for me is that the types are accurately modeling behavior that I find in day to day code. To the point where moving back to a language that lacks this feature feels weirdly unsafe or not expressive enough. Going back to C APIs, with their states that need to be read via bit masks or through special integer return values, is just painful after using discriminated unions. Likewise being able to have multiple owners to a value is a little weird after using Rust.
Also what's the deal with the fi ligature in the code? It throws off the kerning and holds no purpose.
We do something like this to encode our Redux states at compile-time (using TypeScript, obviously). Previously, using regular JS, the Redux devtools made tracking down incorrectly implemented reducers/state transition reasonably straightforward -- but you still had to trigger a bug before you knew you had to track it down, and implement tests etc.
This kind of design pattern measurably saves us time; it reduces the volume of unit tests we need to write/update when we make changes (we still test, just... with less fear), and it prevents newer developers from making mistakes. I haven't played with Rust (beyond a couple of toy projects) yet, and articles like this remind me I'm looking forward to sinking my teeth in over the Christmas break.
Cool! Are you referring to the phantom types technique? I am interested in how you implement this in typescript. Do you have a blog post about it you can refer me to? Or even just a gist?
I'll have to read up on that. We just leverage enums and Typescript's slightly more strict type checking to state that the store can be in only certain states, and use the excellent typesafe-actions library. I'll write up a blog post. There are probably flaws in our methodology but when we took some new grads on recently, it helped a lot :).
This is one of my favorite patterns from Haskell, which I've known as "type-level programming". One of the most common examples is a "sized vector"[1] which allows compile-time bounds checking. This is achieved by annotating each vector type with a phantom type variable representing its size.
Nice to see something like this in Rust! One thing that's a bit of a bummer, and I'm sure there are very good reasons for this, is that we HAVE to use every type argument of struct in its definition. If this restriction were to be relaxed, we wouldn't need the "state" field in the struct at all, and we could make the state type variable truly "phantom".
No. Singletons in Haskell are the single most over-engineered and ugliest thing I've seen. If you are going this route, you might as well use a real dependently typed language like Agda or Coq, and the latter supports extracting simple Haskell code from your Coq code while keeping your proofs in Coq.
Indexed types have their uses but just think about this: for your length-indexed vector example, what if you want to read a file and put its lines into such a vector? Your return type needs the length but it's not known until runtime. Okay you use existential quantification, but you now also need a runtime representation of your types. That's done through singletons. But that destroys Haskell's phase separation and means the inefficient ways you use to prove things and encode naturals now leak to the runtime! Have fun with your unary numbers then.
It does! But you still have to actually "use" it, meaning that it appears as a field in the definition, and you have pass a PhantomData value whenever you create new instances of your data type.
In Haskell, you can omit these fields entirely, and achieve the same thing just by annotating the function.
For example, in Haskell, we can have
data Const a b = Const a
whereas in Rust, it would be:
struct Const<A, B> {
konst: A,
// does not exist at run-time
discard: PhantomData<B>
}
Type parameter variance is inferred from usage (e.g. covariant for normal fields, contravariant for function arguments) and without a usage there's no way to infer it.
Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.
> Although the language itself doesn't guarantee that the value is not used again after a move, good static analyzers will provide a warning in that case, so it can still be safely used.
Not soundly. Such static analysis can be trivially defeated using e.g. virtual methods.
True, it's not bulletproof. However, with a bit of discipline it's possible to use it safely. And anyway, it's already an improvement over just documentation like "don't call this until you called that"
Basically, they are using Rust to encode a state machine using types. This is brilliant! The nice thing is that the transitions are determined at compile time so there can be performance benefit as well as compile time correctness testing.
I did not contribute to that library, but it's nice to see a practical implementation. They use the same approach, essentially, as I describe in my blog.
Maybe not everyone will know this but Typestate was one of the original 'headline features' of Rust when Graydon Hoare started it. It was based on the Strom/Yemeni paper[0] for the NIL language. There's a mention of in this SO answer from 2010[1] and in the LtU discussion[2].
I'm not sure why it was de-emphasised, I think it didn't work as well as anticipated in the 'real world'
Scala/Haskell has the same thing, and it's an amazing feature. The proper type is that of
trait IndexedStateT[A, B, C]
Which signifies a typelevel state machine moving from state A to state B emitting a value of type C.
I can only speak for scala, but i'm assuming haskell has singleton and literal types as well. Meaning that code like this works great.
object DoorOpen
object DoorClosed
class Door {
def open: IndexedState[DoorClosed.type, DoorOpen.type, Unit]
def close: IndexedState[DoorOpen.type, DoorClosed.type, Unit]
}
val d = Door()
for {
_ <- d.open() //works
_ <- d.close() // works
// _ <- d.close() //compile error
}
By my understanding of the article, it uses the borrow/move state to implement the state transistion. Is this generalizable to arbitrary state machines, or only a simple 2-state one?
It generalizes to arbitrary state machines- check out the reference near the end to session types.
Each state is a type, methods consume their receiver (the "move state") and return the new state. You can't accidentally keep around a copy of the old state.
This is a really useful pattern when you want to have rigidly defined/enforced state transitions to ensure that the data is never in an invalid state for a given operation.
It's pretty awful to deal with when you're unsure of what the state machine should look like, or if there needs to be a lot more flexibility in how the data is accessed. Maintainability nightmare.
An example of this I ran into is a data processing pipeline architecture where each vertex of the processing graph had a processing function called in a loop on its own dedicated thread. Using the type state pattern helped clearly define the "life cycle" of each vertex and enforce it, which provided for some powerful synchronization guarantees (e.g. we could provide some elements of memory safety even when loading things through shared libraries). If you dug into it you could break things, but that would be more work than just following the pattern.
Are there other languages that can do the trick done with "close()" on the file there? A type with a method that can make the type compile time unusable after being called!
Any dependently typed language worth its salt will be able to model a state machine in its type system.
Haskell can do it with some effort, maybe for simple stuff phantom types and GADTs are enough. With linear types the explicit "consumption" can be modeled.
Most languages have some variant of this where close is equal to variable going unaccessible. C# IDisposable, Python with-statement, Java try/resources, C++ RAII destructors, etc.
None are really as powerful but they come close and cover many use cases.
> 2. we have seeked in a file that was already closed.
> The second error, however, is much harder to catch. Most programming languages support the necessary features to make this error hard, typically by closing the file upon destruction or at the end of a higher-order function call, but the only non-academic language that I know of that can actually entirely prevent that error is Rust.
Why not simply add an `is_closed` flag and throw an error if it is?
You should probably have read the rest of the blog post, because in Rust it is a compiler error to not test the result, and the post specifically calls this out. Specifically, look for this section:
let mut my_file = MyFile::open(path)?;
// Note the `?` above. It's a simple operator that asks
// the compiler to check whether the operation succeeded.
// The *only* way to obtain a `MyFile` object is to
// have a successful `MyFile::open`.
// At this line, `my_file` is a `MyFile`, which means
// that we may use it.
If you were to open a file that didn't exist and then not use the result at all, then you get a warning (which can be made an error, if you want):
File::open("foo");
// warning: unused `Result` that must be used
But that's also not necessarily an invalid program, it's just not a very useful one. Failing to open a file doesn't interrupt the program in any way, so File::open here will return either a file handle or an error, neither of which get used. However, if you open a file, don't check for an error, but still try to use the file, then your program won't compile:
File::open("foo").read(&mut buffer);
// error: method `read` not found in `Result<File, Error>`
You need to be explicit about what to do in case the file didn't open, and Rust won't let you use your file handle unless you explicitly indicate what to do in that error case!
You wouldn't. In this case a more likely example would be writing to the file. You don't have to check the return value (in C or Rust) to do something useful, but you should. In Rust you'll get a warning (or error depending on how your workspace is setup). In C, you won't get anything. It's also worth noting that the Result enum can be and is used for things beyond file I/O.
> it would be nice if there was something like Python's with statement to correctly close a file.
In C++ or Rust, non-memory resource management is arguably even easier than Python.
RAII: Resource Acquisition Is Initialization.
(Apologies for the C++; can do the same thing in Rust.)
ManagedFile myfile("path.txt");
myfile.read();
`myfile` gets released whenever the scope finishes.
That said, the C++ and Rust stdlibs don't look like that because opening and closing files is not exception-free and unlike Python, C++ and Rust don't have or don't prefer non-explicit error handling.
1) A typesafe variant of this example, caught by the compiler, is what he implements further down
2) There is. Rust has the Drop trait, which guarantees destruction when going out of scope. Even better than `with` statements since the API doesn't require anything special from the user.
`open` is a static function on the `File` object and an instance of `File` does not have a `close` method at all. So, as in this article, an instance of `File` can only be created if `open` succeeded and the file will be closed when dropped (either implicitly or explicitly). For example:
fn main() -> std::io::Result<()> {
// a file object is only created if File::create is successful
// otherwise main returns an error
let mut file = File::create("foo.txt")?;
file.write_all(b"Hello, world!")?;
Ok(())
} // file closed here, no close method needed.
This contrasts with many other languages where a class instance can be in an invalid state. This is something the typestate pattern helps to avoid.
There's a sync_all method that will make sure all data and metadata is written or else return an error. This may be used at any time, not just when closing the file, and it is expensive so it does make sense to have a separate method for that case.
In Rust this is achieved using `if let` for anything that uses `Result` and `Drop` (it's kinda neat how a few orthogonal features fit nicely together like that):
if let Ok(f) = File::open(filename) {
f.read();
// drop(f) automatic here
}
But Rust's move semantics give you extra flexibility, because you don't have to close it if you don't want to.
let keep = None;
if let Ok(f) = File::open(filename) {
f.read();
if random() {
keep = Some(f);
}
}
// the file *may* be usable beyond the first scope,
// and it's still dropped correctly.
Values aren't dropped simply at the end of their initial scope (as in `with` or on-stack RAII), but after their last use, and language semantics allow the compiler to track globally where the last use is.
When you get a result type back in Rust you have to check it to use the value it contains. The worst you can do is call unwrap(), which says "Im assuming this will not fail, panic if I am wrong". But you can't forget to check.
RE 1: Several languages (including Rust!) have annotations to force you to test the result code. However, they generally leave my_file in scope even in the error path, which makes bugs easier to write, and which means the File type is forced to handle all the complexity of handling the case where the file wasn't successfully loaded for it's entire API surface.
RE 2: Rust takes the C++ approach of allowing types to define their own cleanup which gets auto-invoked when the object goes out of scope, instead of forcing the calling code to worry about it. Specifically, Rust types can implement the auto-invoked "Drop" trait, basically equivalent to C++'s destructors. You only need to call close if you want to close a file early (e.g. before the end of the scope), in Rust, which is fairly rare.
I vastly prefer this approach, but it admittedly doesn't play nicely with the... less deterministic lifetimes of objects in a garbage collected system, so I can understand why Python, C#, etc. have more explicit scope syntax for cleanup.
They are similar to Kotlin's sealed classes, except there's no concept in Kotlin of an object "consuming" itself, which is necessary for maintaining safety in the API.
For example, in Kotlin, if a method on state A returns state B, there's no way to invalidate all existing references to state A. Normally this invariant would be enforced on the caller's side, or perhaps by throwing an IllegalStateException if state A is called after producing state B.