I'm a phd student working on Idris and I used to work with Edwin on this, if you want more info or want to see more practical applications of idris2, Dependent types, or QTT, here are a couple links:
- my own projects in idris include a purely declarative server library where you give the API and the server is completely implemented and instantiated from your description https://gitlab.com/avidela/recombine
I'm curious if you have any insight into what a realistic lower limit to Idris compile time speed looks like. Right now even moderately sized Idris programs can be quite slow to compile, slower than even other dependently-typed languages (see e.g. https://github.com/AndrasKovacs/smalltt). How much of this is intrinsic to Idris' language design and how much of it is due to lack of optimization in the implementation?
I reckon we can get the raw performance quite low, but it might be a challenging environment to program in, the ergonomics of programming linear types aren't figured out and it would involve a lot of memory management. While safe, reasoning about memory can be quite hard, just like anyone learning how to handle the borrow checked in rust can tell you.
It's currently one of my project to use QTT to statically enforce performance invariants and optimisations, currently I am looking at partial evaluation of programs and explicit annotations to control which parts need to be partially evaluated and which parts need to remain Unevaluated.
Regarding the benchmarks there are two things to address to make them go faster:
- the idris parser is actually pretty slow so we should get working on that
- the examples use explicit runtime bindings for type parameters which force the types to be allocated and available at runtime, which destroys the performance of some programs (stlc notably)
It's not expected to write idris code in that way since linearity annotations also allow you to _remove_ values from the runtime and make dependent programs go faster, if you were to be concerned about runtime performance you would use _runtime erased_ type variables instead of _runtime relevant_ variables. Some of this is due to the foreign nature of QTT, and we should do a better job at teaching how to use it for performance.
On the other hand someone could argue "if idris is that smart can't it warn/perform those optimisations automatically?" And maybe that's the way to go, but we need more eyes and brain looking at the problem and coming up with prototype solutions to make progress. idris is still a small project so we don't have enough people working on it to implement those smart features
I might have misunderstood both comments, but I have the impression the OP was asking about speed of compilation and you're mostly talking about the performance of the compiled code?
I need my answer was focused on runtime performance, thanks for pointing it out. However, They are related for two reasons:
- idris is written in idris so any improvement in the language results in faster compile-times, if more things are annotated with 0 in the compiler, thing will compile faster.
- if some variable is only important at compile-time, the compiler doesn't have to look at it with very much scrutiny. That saves you a lot of time when typechecking
> if some variable is only important at compile-time, the compiler doesn't have to look at it with very much scrutiny. That saves you a lot of time when typechecking
Right, but if some code is annotated with unrestricted multiplicity, but still not actually used anywhere in a function body, does this make compilation of that code slower than if that same code was annotated with zero multiplicity?
Oh fascinating. Are you saying that multiplicity annotations can affect compile time speed (in particular giving a laxer multiplicity than what is required)?
Yes. Multiplicity 0 removes the value from the runtime avoiding allocations and garbage collection. That particularly handy for type-level indices and type-level programs in general.
Technically speaking multiplicity 1 (linear variable) can be subject to arbitrary inclining though it's not implemented in the compiler.
I'm working on other multiplicities that could allow you to describe which values are "compile-time" available, and therefore subject to optimisations
I'm working through the Type-Driven Development with Idris right now and I can only recommend it (it's written by the creator of Idris).
However, it's still based on Idris 1 (which didn't have QTT). The basics haven't changed much though (some examples have to be modified slightly, mostly related to making some arguments explicit that used to be implicit).
Idris seems like a really interesting language. Examples from the book include:
- typesafe printf (it will check that the correct number of arguments with the correct types are passed at compile time).
- proving that functions are total (roughly that means that they terminate for every input)
- writing infinitely running programs as a stream of actions, meaning that the majority of the program can be written as a total function and only the top-level function actually consuming the actions needs to be partial
- views with dependent pattern matching for traversing structures safely in several ways
- safe division
- ability to define all the effects your system uses instead of using an unrestricted IO type
etc.
While the examples are super interesting I don't think that I yet fully grasp all the possible things you could do with dependent types (or what sorts of limitations they have). I guess that would come with more practice.
Dependent Types can encode all of math (see the Lean math library for example). So there doesn’t really seem to be any limit to what you can do with Dependent Types. However it might of course be impractical to actually do it in some cases.
Of course, you can encode all of maths with it, but only in certain specific ways and not others.
For example, it seems like Idris doesn't really support subtyping unless I've missed something obvious (you can use typeclasses, but that doesn't allow you to use heterogeneous collections or anything). I believe it's also a requirement that all functions used in type signatures are total.
> but only in certain specific ways and not others.
Yes of course. All languages have specific syntax and semantics. That doesn’t change the fact that you can encode all of math in it which is a huge deal. It allows you to prove code correct (see CompCert, seL4, CertOS etc.)
And yes you can’t prove anything unless the functions you use are total. That’s a basic requirement for any language where you want to use the type system to prove correct code. Allowing non-total functions would allow you to prove false. And you can prove anything you want from false.
Sure, I wasn't doubting that you can prove anything in Idris or Coq or ... – same as you can write every program in any Turing complete language (not 100% strictly true because of non-functional requirements, but you get what I mean).
But in terms of practical developments, these "restrictions" can matter. Especially when you have specific maintenance concerns. And I'm talking more about programming than theorem proving (Idris is, after all, more geared towards the former than the latter unlike e.g. Coq or Lean).
I've had a problem in the back of my head for a while and I knew it would probably be a bit cumbersome to do in Haskell (although maybe with some advanced extensions it's possible?). I thought that dependent types might solve that problem but I yet don't see a way how I could do that, at least in Idris (possibly I'm missing out on some very obvious way). I think I'd really need subtyping to make it work in a non-cumbersome way. (Or else, possibly some code generation for the boilerplate part.)
That’s interesting. I have never heard of a problem that could be solved in a sub-typed language but not in a Dependently Typed language. I am honestly interested. Perhaps give an example?
It's not that you can't solve it, obviously. It's about the elegance of the approach.
The context is a mathematical problem solver. Mathematical terms form hierarchies (possibly not a single one), and certain operations only make sense for certain types of terms (e.g. inverting a matrix), while others apply to all terms (e.g. simplifying an expression). I want to be able to verify this at compile time and at the same time, the function signatures should be "easy" enough that operations can be composed through operators (i.e. manually passing proof terms probably wouldn't work as well).
In a language like Haskell, I could probably use wrapper types, e.g.
data Matrix = ...
data Integral = ...
...
data Term = MatrixT Matrix | IntegralT Integral | ...
but that's cumbersome especially if you have lots of types of terms.
I'm thinking that in Idris, maybe I could do something like:
data TermType = MatrixT Nat Nat a | IntegralT | ...
data Term : TermType -> Type where
Matrix : Vect n (Vect m a) -> Term (MatrixT n m a)
Integral : Term -> Term Integral T
which would probably allow me to specifically write functions that only work on matrices (or on square matrices) etc., but it still only gives me one level of hierarchy... in case I find out that I need to group some terms together for a particular operation, I would probably be out of luck.
This is all based on work that I previously did, but where all these checks did unfortunately not happen at compile time, so every operation needed to have explicit runtime type checks (this was in a language with subtyping though).
The fundamental issue in this case is that linearity in QTT is a property of the parameter bindings and _not_ the type itself, meaning we can't easily express linearity as a _global_ property of the type.
On the other hand, only affecting the parameter bindings makes it really easy to use runtime-erased/comptime stuff without reimplementing a bunch of common types as runtime- and comptime-only versions.
ATS chooses the other path and makes linearity and runtime-erasure a part of the type (not without paying a hefty syntactic cost however).
I thought I'd read over `Data.Linear.Array` before, but maybe not? My intuition for linear types would be that for any function that uses the array you basically almost always want to return a pair of the value and new linear version of the array (i.e. `Res a (const (arr t))`). The only exceptions are a set of "terminal" operations that convert it to a non-linear type (in this case probably the things that end up returning an `IArray`).
So I'm a bit confused why e.g. `read` is `(1 _ : arr t) -> Int -> Maybe t` and not `(1 _ : arr t) -> Int -> Res (Maybe t) (const (arr t))`.
But also isn't the `newArray` fixed by just rewrapping the output?
data Unrestricted : Type -> Type where
MkUnrestricted : a -> Unrestricted a
newArray : (size : Int) -> (1 _ : (1 _ : arr t) -> Unrestricted a) -> Unrestricted a
-- You'll need this if you actually make use of the outer linear multiplicity
-- in the function argument to newArray
-- which is also why newArray returns Unrestricted
useUnrestricted : (1 _ : Unrestricted a) -> (a -> b) -> b
useUnrestricted (MkUnrestricted x) f = f x
This is e.g. how Linear Haskell does it. The unrestricted multiplicity of the argument to `MkUnrestricted` prevents the bug here by only allowing you to output non-linear things (so it rejects wrapping the linear input in `MkUnrestricted`; you have to first convert it via a terminal operation into something else).
The entire `Data.Linear.Array` seems to be in need of a once-over unless I'm misunderstanding something.
EDIT: Never mind, reading too quickly, I see the mread now that has the signature I'd imagine it should.
I had a quick read of the paper and I think I got about 10% of the content - which is better than my usual average. My take-away - linear types allow you to express conditions such as this function must not be called with an empty list - QTT types extend linear types and allow you to specify run time dependencies between functions in the type system. I got the rough idea of being able to express a state machine in the type system - gives you the ability to say that this function can only be called in this state - to enter the state, these functions must be called in this order. How did I do ? Is that the rough gist of the paper ?
Linear logic is logic where any value must be used exactly once, i.e. if a=3 and you say b:=a, then b becomes 3 and a becomes undefined. The value is moved rather than copied. That is useful for describing e.g. resource management systems. If I transfer a resource to you, then you have it but I no longer do. Linear types basically means a type system that enforces linear logic.
Baker's paper is visionary in that it announces a link between linear logic and move semantics for resources more than a decade in advance. As we know this was put into practice by C++ and Rust. Compared to linear types, it amounts to operating a change of perspective from the quantitative interpretation (is it copied? is it dropped?) to a qualitative one (how is it copied? how is it dropped?). This is very much in spirit of linear logic (though it takes a bit of work to relate these ideas to Girard's mathematics). It answers questions that linear types were never able to, such as how to mix linearity and control. I take this paper as an example that there is more to linear logic than linear types!
Disclaimer: I'm not very bright, and don't write Idris
My understanding of "Linear Types" is that they are geared towards declaring values/types that can be used only once.
If this sounds weird, I thought so too. Except it turns out there a number of (fairly common) scenarios in which you might want to ensure that a value is only possible to be used once.
I've drank too much so I can't conjure many to mind, but what I can think of is something like a buffer in a lexer, where you want to ensure that a token is consumed once and then not available.
There are some decent examples in the Tweag repo for the GHC Linear Types proposal:
> I've drank too much so I can't conjure many to mind
The OP has some examples. One of them is creating verified state machines (i.e. stating invariants that different states and transitions should obey all the way up to entirely specifying the abstract state machine). The paper uses linear types to replace Idris 1's ST type. Another is enforcing order of messages to be sent in a bidirectional communication protocol. There's also a lot more low-level examples (make sure that files get closed after use, make sure mutable variables are used correctly, etc.).
In general invariants of the form "verify that this happens after that happens" (and note that this phrase has two meanings which both are covered, verify that e.g. a clean up action always occurs when an action that requires clean up has happened and also verify that e.g. an action occurs after another action, and not before) can be enforced by linear types, which covers a large amount of what happens with stateful computing.
As a practical example Rust uses a less powerful version of linear types called affine types for its ownership system. Values are moved by default in assignments and function calls, and they cannot be used (in their original location) after the move.
Rust already has the ability to make you use the type at most once, so the useful extension for Rust would be "must-use types" which is an extension of linear types (since linear types would be able to represented in this system easily enough)
It would actually solve a lot of issues in Rust, but unfortunately would be an insane amount of work to introduce because they would invade a lot of the current type system
There is the #[must_use] attribute for function return values, which throws compiler warnings when results are unused. It is not a real part of the type system though, but it serves as a useful lint.
Scala 3 is the only "functional" language that made me not feel like an idiot, and it's great for understanding "Dependent Types" as a layman.
It gives the example of a key in a database, which can return multiple types of values:
type DB = (k: Key) => Option[k.Value]
Where the type of "Key" is:
trait Key { type Value }
You can declare generic values of type "Key[Value]" and attempting to look them up in the DB can result in more than one kind of type, dependent on the value being looked up.
I wouldn't call those dependent types, exactly, because it's still happening entirely at the type level. The defining characteristic of dependent type systems is that they allow mixing of type-level and value-level expressions. To use the classic example of appending one vector to another, Idris lets us express the type:
app : Vect n a -> Vect m a -> Vect (n + m) a
where we're using the value-level addition operator to express the type of the result; specifically, that appending a Vect of length m to a Vect of length n will give a Vect of length (n + m). [1]
They are dependent types since they do depend on runtime values (notice in `DB` the `k` is a runtime value), rather than just types. They are, however, very restricted compared to more "full-spectrum" dependently typed languages.
The difference is your keys are types. Dependent types are about generics where the keys are allowed to be values. You can not in Java directly do,
Optional<1> and Optional<“Name”>. You can only have types as arguments.
C++ has some dependent types in it allows you to have generic arguments be ints std::array<…, 4> but it still does not allow most values as generic type argument.
Yeah the `Key` and `DB` example, while somewhat differing in semantics, can be emulated by Java.
Here's an example that extends it slightly and can't be emulated by Java.
// I forget if I need to curry the second argument in a separate argument list
// I'm not next to a computer with scalac at the moment, so I'll just use separate argument lists
type DB = (k: Key) => k.IsVerified => Option[k.Value]
trait Key {
type Value
val keyValue: Value
case class IsVerified(underlyingBool: Boolean)
}
val myDB: DB =
key => isVerified => x match
case IsVerified(bool) =>
if bool then
// return the actual value
doSomething()
else
None
def verifyKey(key: Key): key.IsVerified =
if isValidKey(key) then
key.IsVerified(true)
else
key.IsVerified(false)
val key0: Key = ...
val key1: Key = ...
val key0Verification: key0.IsVerified = verifyKey(key0)
// Fails to compile
// You're trying to cheat!
// You only verified key0 and are trying to use its verification to bypass
// verifying key1
myDB(key1)(key0Verification)
In theory you can do this kind of thing in Java, but you have to give each key its own type (cumbersome because Java doesn't have singleton types) and carry the type parameters arbitrarily far back through the call graph. E.g. imagine writing a function that appends 3 type-length vectors together - in a dependently-typed language you write this as:
def append[T, M, N, L](first: Vec[T, M], second: Vec[T, N], third: Vec[T, L]): Vec[T, M + N + L] = ...
In Java you'd have to write this as:
<T, M, N, L, P, R> Vec<T, P> append (first: Vec<T, M>, second: Vec<T, N>, third: Vec<T, L>, isSum1: IsSum<M, N, P>, isSum2: IsSum<P, L, R>) { ... }
And as you write more and more code you have exponentially more type parameters, because you have no way to just evaluate functions of type parameters, so you have to pass markers that represent all of your type relationships right from the initial input part of your program all the way down.
> And as you write more and more code you have exponentially more type parameters, because you have no way to just evaluate functions of type parameters, so you have to pass markers that represent all of your type relationships right from the initial input part of your program all the way down
This isn't really the thing that dependent types provide though. If you have type-level computation without dependent types that's enough to avoid this problem (and e.g. Scala has this). However, dependent types are when the `+` in your first append is the same `+` as the usual runtime `+` rather than a separate type-level function that only works with types, not runtime values. Scala does not have this. Its dependent types are much more limited.
Scala has only a very limited version of dependent types. Even its "dependent function types" are a more restricted version of general dependent function types. Full dependent types can in a certain sense be even simpler than non-dependently-typed languages (e.g. you no longer need a separate notion of generics).
A really nice thing in Haskell beyond dependent types is refinement types. Not sure if they exist in Scala - I'm sure someone has hacked something together somewhere.
> linear types allow you to express conditions such as this function must not be called with an empty list
I don't think that's a particularly enlightening example, mostly because we can implement such a function without any linear or dependent types.
Consider that a list (of some element type T) could contain zero Ts, or one T, or two Ts, etc. i.e.
List<T> = 1 + T + (T × T) + (T × T × T) + ...
Where:
- `1` is a type containing only one value, like 'Unit', 'Null', 'Nil, etc. We use this to represent the empty list.
- 'X + Y' is the (tagged) union of types X and Y; AKA a sum type, or an Either
- 'X × Y' is the type of tuples containing an X and a Y; AKA a product type, or a Pair
Non-empty lists are almost the same, except we don't want to allow the empty list. Hence we need to get rid of the '1' type, to get something like this:
NonEmptyList<T> = T + (T × T) + (T × T × T) + ...
Just using the normal rules of arithmetic, we can see that NonEmptyList<T> is the same as List<T>, except everything has been multiplied by T:
NonEmptyList<T> = T + (T × T) + (T × T × T) + (T × T × T × T) + ...
= (T × 1) + (T × T) + (T × T × T) + (T × T × T × T) + ...
= T × ( 1 + T + ( T × T) + ( T × T × T) + ...)
= T × List<T>
Hence we can implement NonEmptyList<T> as the type 'T × List<T>', i.e. tuples containing a T and a List<T>. Intuitively, those are the "head" and "tail" of the non-empty list; or equivalently, Cons constructs a tuple, and non-empty lists are those whose outermost constructor is always Cons.
Note that we can go the other way too, if our language provided non-empty lists and we want to allow empty ones too:
This time, we a value of type List<T> is either a unit value (representing the empty list), or a non-empty list. This pattern of "adding one" to a type is usually called 'Maybe' or 'Option':
Btw, this is extremely practical. For instance, you can then easily emulate other languages (such as SQL) natively with the typesystem and get all the benefits of the compiler/IDE, rather than having to use a library with custom syntax or a raw SQL string and having syntax/semantic errors very late at runtime only.
This sounds very neat in principle, but it doesn’t seem very easy to understand the code in practice. In particular:
In the run length encoding example, I don’t understand how the runtime data structure is defined. There is a Runlength data type, but its definition looks more like a Haskell-style curried function definition than a data definition. It seems to call the ‘rep’ function and the list concatenation operator, but how do you get a datatype out of that? If you drew a boxes and arrows diagram showing how the data is stored, what would it look like?
I’m wondering if this is like how you can use a closure to store data in captured local variables? Or perhaps like lazy evaluation in Haskell?
The definition of RunLength looks like two functions for uncompressing a list, given that you already have all the data from within the compressed list. The output of the function is a type containing the uncompressed data.
The input parameters to the "functions" appear to be data elements from within the compressed list. Because that's what they are. But this wasn't obvious to me.
I think the weird bit is when the tags and fields within a datatype are represented using the same syntax as a function's name and parameters. This punning threw me off. I suppose the same is true in Haskell, but you don't write type constructors using '->' between the fields, so this punning is less obvious. (And Haskell's syntax for product types is itself less familiar to me than a 'struct' definition in a C-like language.)
A sibling comment correctly lays out what the type means, and I'll elaborate a little more here.
> how the data is stored
Thinking of how concretely the data is stored is a little bit tricky. I find it easier to build an intuition by looking at what kind of values the type signature lets us build, and which values are disallowed.
runLengthOfAnEmptyList : RunLength []
runLengthOfAnEmptyList = Empty
-- Note that 1 is shorthand for (S Z)
runLengthOfAListWithOneElement : RunLength [ 1 ]
runLengthOfAListWithOneElement = Run 0 1 Empty
-- The definition of RunLength places restrictions on how to
-- construct it.
-- For example, this will not compile.
runLengthOfAListWithOneElementBrokenVersion : RunLength [ 1 ]
runLengthOfAListWithOneElementBrokenVersion = Run 1 1 Empty
--
-- This violates the type signature of RunLength
-- Notice Run 1 1 Empty which has changed the 0 to a 1
--
-- As a reminder this is the definition of Run
-- Run : ( n : Nat ) -> ( x : ty ) -> ( rle : RunLength more ) ->
-- RunLength ( rep ( S n ) x ++ more )
--
-- n is set to 1, x is set to 1, and Empty means that more is set to []
-- Making the necessary substitutions results in:
-- RunLength ( rep ( S n ) x ++ more ) -- No substitutions yet
-- RunLength ( rep ( S 1 ) 1 ++ [] ) -- S 1 is the same thing as S (S Z)
-- RunLength ( 1 :: 1 :: [] ++ [] ) -- Substituting rep ( S 1 ) 1
-- RunLength ( 1 :: 1 :: [] ) -- ++ [] doesn't change our list at all
-- RunLength [ 1, 1 ] -- Syntax sugar for 1 :: 1 :: []
--
-- But this is a contradiction! Since we said in our type annotation that we
-- should have RunLength [ 1 ], not RunLength [ 1, 1 ]!
However I agree with you that this style of dependently typed programming is confusing and ultimately in my view, a bad representation of how dependent types should be used in production code (if any full-spectrum dependently typed language gains enough adoption to be used in large production projects).
Here's the same idea in an alternative, but (roughly) equivalent, encoding, which hopefully lays it out more clearly.
RuntimeEncodedList : Type -> Type
RuntimeEncodedList t = List (Nat, t)
uncompress : RuntimeEncodedList t -> List t
uncompress runtimeEncodedList = case runtimeEncodedList of
[] => []
(n, elem) :: restOfList => rep (n + 1) elem ++ uncompress restOfList
compress : List t -> RuntimeEncodedList t
compress = ...
uncompressIsTheInverseOfCompress :
uncompress (compress xs) = xs
uncompressCorrectlyUncompressesEmptyLists = ...
I hope this is a much plainer and easier to understand representation, with a similarly strong invariant. The `...` are of course eliding a lot, but my point is that they can be elided, without really affecting comprehension of the overall code (which is not the case with the other representation).
This also nicely avoids a problem I see in a lot of dependently code where the structural representation of data structures are intertwined with the invariants they are supposed to uphold. Not only does this result in increasingly confusing code as the invariants get more sophisticated and also awkwardly privileges certain invariants over others, it makes modifying the representation in even the smallest ways a nightmarish test of resolve. This is because everywhere you want to create the data structure, you must also prove that the invariants hold. Sometimes this proof is trivial, sometimes this proof is difficult. But you must always do this, you cannot bail out with some sort of "believe me" escape hatch where you maybe use some sort of external solver to prove it instead, because you can now potentially crash or even segfault your program, even if your code would otherwise fulfill the invariant through some other reasoning the compiler cannot see, since the proof is an integral part of the runtime representation of the data structure.
This sort of proof burden means that it can be a Herculean task to swap out the underlying representation of the data structure for a more efficient one (e.g. you're going to be in for a hard time if you've written code with the `RunLength` type and all of a sudden, for speed reasons, you'd prefer an underlying representation of run length encoded types that groups all the run lengths in one array and stores all the actual values in another array). At least having zero multiplicity annotations on parts of the dependently typed data structure makes things a bit easier.
I instead advocate for a clean separation where the overwhelming majority of data types in a dependently-typed program should be non-dependently-typed and dependent types used instead merely to state properties about those data types independently.
The line of thought you are expressing here really reminds me of the work on Zombie, a research language that Stephanie Weirich, among many others, worked on. In particular was the dissertation research by Chris Casinghino. He gave a talk at Boston Haskell [1] that has some really well argued points that argue for your perspective in many respects.
Ooooo... even on a brief skim through this looks like a great talk (although I guess that's an easier thing to say when the speaker mainly agrees with your own viewpoint). Will watch in more detail. Thanks!
Isn't it a bit odd that the type signature "contains" the uncompressed data? There would seem to be no point in compressing data this way.
I suppose laziness makes it somewhat more practical. The concrete type may only exist conceptually.
It's also unclear what constraint is being enforced. Proving that all run-length encodings decompress to some list doesn't seem interesting? This is just as true of your encoding. I suppose it would be more interesting if not every possible value of the compressed data structure were valid, such as if it contained a cached checksum that needed to match the data. We could show that all compress functions compute the checksum correctly.
But couldn't do that by having a single compress function and proving that it works? Why have more than one?
Also, it seems like verifying such a checksum would be weird. In the presence of data corruption, you would have a proof that that checksum matches the data, but by computing the checksum you could show that it doesn't match the data.
Apparently type-checking proofs aren't really about concretely showing that data is consistent? If you rely on the proof, you assume correct data rather than verifying it. A sufficiently smart compiler might optimize away your attempted verification?
> Isn't it a bit odd that the type signature "contains" the uncompressed data? There would seem to be no point in compressing data this way.
I agree it's a bit tricky to understand (which is why I don't like that representation), but this has nothing to do with laziness (Idris is a strict language). It's a more general phenomenon that values in a type signature do not actually have to exist at runtime. It's purely a notational thing that lives in the type-checkers "brain" and has no runtime representation. At runtime, `compress` really does compress the list without keeping a copy of it anywhere. For any type `MyType someValue`, there's no a priori reason `someValue` should actually have any runtime representation. See e.g.
data MyType : Int -> Type
TheOnlyThingOfMyType : MyType 5
-- There's no way to actually access the 5 at runtime
-- It only lives in type signatures
someValue : MyType 5
someValue = TheOnlyThingOfMyType
-- This addition takes place at compile time, not runtime
someOtherValue : MyType (4 + 1)
someOtherValue = TheOnlyThingOfMyType
thisFailsToCompile : MyType (4 + 2)
thisFailsToCompile = TheOnlyThingOfMyType
There is likewise no reason to believe something like `SomeOtherType String` actually has a string in it anywhere (this pattern is known as a phantom type) without knowing more about `SomeOtherType`. All the type parameter does is constrain how functions can be called that involve the overall type. In this sense `List String` having an actual `String` you can pull out of it at runtime can be thought of as the exception rather than the norm (and indeed isn't always the case, e.g. if you have an empty list).
In this case, you can use `RunLength`'s list parameter to state more invariants, even if the list doesn't exist at runtime.
-- If two lists compress to the same thing, then they must've been the same to begin with
compressionIsLossless : RunLength xs = RunLength ys -> xs = ys
-- Here follows the proof of that statement
-- If the proof looks confusing, https://shuangrimu.com/posts/language-agnostic-intro-to-dependent-types.html might help
compressionIsLossless runLengthsAreEqual = case runLengthsAreEqual of
Refl => Refl
Switching back to my representation...
> But couldn't do that by having a single compress function and proving that it works? Why have more than one?
I only have one compress function (I was just too lazy to write it out and used ellipses instead).
> Proving that all run-length encodings decompress to some list doesn't seem interesting? This is just as true of your encoding.
The constraint is fairly simple, but very strong (although I made a typo where the name associated with the type signature and the actual implementation's name are different, d'oh):
uncompressIsTheInverseOfCompress : uncompress(compress xs) = xs
uncompressIsTheInverseOfCompress = ... -- Too lazy to write out the actual proof at the moment
This is the fundamental property all lossless compression schemes must satisfy.
You could also of course add all sorts of other properties.
-- Even though it won't inflate list length, it can result in more memory usage due to the tuples
compressWontInflateSize : (length (compress xs) <= length xs) = True
compressWontInflateSize = ... -- Again too lazy to write out the proof
-- Idris doesn't have a general exists statement so we need to provide a
-- concrete example
compressMakesAtLeastOneListSmaller : (length (compress [1, 1, 1]) <= length [1, 1, 1])) = True
compressMakesAtLeastOneListSmaller = Refl
-- This proof is easy, just tell Idris to actually do the calculation (at compile-time, not run-time!)
> Apparently type-checking proofs aren't really about concretely showing that data is consistent? If you rely on the proof, you assume correct data rather than verifying it. A sufficiently smart compiler might optimize away your attempted verification?
I'm not sure what you mean here. In order to show that your data is consistent you need to describe what you mean by consistency somehow. You can do that either within the definition of a type itself (which I do not like) or you can do that by explaining how different functions are supposed to relate to each other (e.g. how compress, uncompress, and length should be related to each other), which I like more.
As for "optimizing away verification," that's the point of Idris' 0 multiplicity. It marks something as "compile-time-only" that has no runtime representation and hence no runtime performance impact.
I haven't written enough dependently typed programs to know what's best, but just an argument in favour of the OP version :):
I agree that your version is easier to understand. But with the other version the types at least guide you in the implementation of the compress/uncompress. In your version you can return any `List t`.
For the sake of discussion I'm going to say something stronger than I believe, but which I don't think gets enough airtime so I want to expose its most extreme version. Having sufficient type direction such that there is only one possible implementation of a function is a bad thing. It represents a tightly constrained data type that will be painful to make even the smallest changes to down the road and duplicates code for no real benefit.
The definition of `RuntimeLength` does not do away with the need to define `uncompress`, all it does is move it into the definition of `RuntimeLength`. In fact my version of `RuntimeEncodedList` and `uncompress` is less overall code than `RuntimeLength` and `uncompress`. Indeed the only reason the compiler can synthesize `uncompress` is because it duplicates `RuntimeLength`! So really the compiler hasn't reduced the amount of work that needs to be done. Just like I could've messed up my definition of `uncompress`, its equivalent in `RuntimeLength` could also have been written incorrectly. We haven't reduced the amount of code that needs to be checked by a human for correctness (in the absence of other invariants), only moved it from a function definition to a data type definition.
By moving it into a function, however, we can then selectively add as many additional properties as we want (such as the `uncompress(compress xs) == xs` invariant), without having to worry about privileging one over any other.
Thank you for your comment, I haven't thought about it from this point of view before. What do you think about simpler datatypes like Vector? This can also be seen as a `List` with the constraint that `Vector n A = (l : List A) * length l = n`.
I personally balk even against sized vectors. It's too easy to accidentally find yourself writing out long proofs for arithmetic theorems that could easily be verified some other way (e.g. using a SAT solver). Although Idris 2 makes great strides here with zero multiplicity annotations on the vector lengths which makes bailing out a more feasible option.
Instead of
append : Vector n a -> Vector m a -> Vector (n + m) a
I'd prefer just
append : List a -> List a -> List a
and a separate theorem
appendSumsLength : (xs : List a) -> (ys : List a) -> length (append xs ys) = length xs + length ys
Sure it's wordier in the short-term, but I think it pays dividends in longer term maintenance.
To be clear though this is a minority position. Other programmers will argue that putting the size parameter in the data type itself makes a lot of proofs essentially trivial and that at least in the case of sized vectors, the issue of "privileging a single invariant" is less of an issue, because the only defining feature of a polymorphic list is its length. I personally disagree with the merits of this trade-off for most production code, but that's starting to get into the weeds.
I'm a phd student working on Idris and I used to work with Edwin on this, if you want more info or want to see more practical applications of idris2, Dependent types, or QTT, here are a couple links:
- idris2 is written in idris2 so that's a good example of a big program using dependent types https://github.com/idris-lang/Idris2
- we have some very interesting libraries in the works: https://github.com/idris-lang/Idris2/wiki/Libraries
- my own projects in idris include a purely declarative server library where you give the API and the server is completely implemented and instantiated from your description https://gitlab.com/avidela/recombine