Oh god no, this draft is actually pretty good, don't ruin it by infecting it with context. Context is the worst library in Go: it's a pile of hacks to get cancellation and goroutine-local storage by manually passing new heap-allocated objects through every function interface in every library between low-level IO up to task management. It infects everything, obliterating separation of concerns making even libraries that should have nothing at all do do with timeouts include code for it. And it includes an untyped key-value store implemented as a linked-list of pairs (!!!!), because why not?
If you can't tell, I don't like context. I've said before [0] I really hope that Go 2 comes up with an actual solution to the cancellation and task-local storage problems and deprecates context. Some comments in that thread pointed to alternatives that looked pretty decent, I wonder what state they're in these days.
I quite like the concept of contexts, I just wish they were implicit rather than explicit. The idea of a function taking place in a dynamic context makes perfect sense (because, after all, it does); the only thing which doesn't really make sense is having to pass it along by hand. Imagine how terrible it would be if we had to explicitly pass the return stack everywhere[0]!
It needs to 'infect' (as you put it) everything because it needs to pervade everything, so a library can pass it through to anything it calls. This is a good argument for making it implicit.
> And it includes an untyped key-value store implemented as a linked-list of pairs (!!!!), because why not?
That is just a cons chain, famous for example as the foundation of the Lisp programming language, and there is nothing wrong with it. Note that it is unfair to say that it is untyped: the values themselves are strongly typed, and the cells too are typed — interface{} (or T in Lisp terms) is still a type!
A cons chain has advantages for inheritance of values in a DAG of calls. It is not necessarily the most efficient, but simplicity is a virtue.
0: Continuation-passing style is both really powerful and well-nigh unreadable, for a reason.
I love contexts. They're a simple solution to a complicated problem. They take up space in the functions signatures but they're sooo easy to read and to use.
I'm familiar with go team's compatibility plan for "Go 2", or whatever it will be called, which is why I used "deprecate" not "delete". Though I agree that there's no requirement to have a new major version in order to introduce big fixes like this.
Everything old is new again. This challenge is basically why monads exist, because the monad allows you to cleanly separate between the state in which the algorithm is running and the algorithm itself. Reminds me of: https://philipnilsson.github.io/Badness10k/posts/2017-05-07-...
I wonder what poor abstraction the Go authors will come up with instead?
Every "monadic solution" prints the same code block without explaining how it would work, the types of the various variables, the semantics of the <- operator. I didn't leave the page with an understanding of how monads achieve these tasks.
Monads are just monoids in the category of endofunctors! /s
The issue is pretty much a language barrier. All these articles talking about benefits of / interesting ways to use monads are written by people who speak the language, assuming the reader speaks the language as well. As with many functional programming topics, the fundamentals aren't incredibly easy to wrap your head around. But if you already understood monads, can you imagine how annoying it would be if every resource, discussion, article, etc. relating to them started with a pages-long introduction on What is a Monad?
In Haskell, this is a fundamental topic. Monads are used everywhere. If people always explained how they worked when discussing them, it would be like looking up sorting algorithms and having every algorithm description start with a long-winded explanation of how for loops work.
> But if you already understood monads, can you imagine how annoying it would be if every resource, discussion, article, etc. relating to them started with a pages-long introduction on What is a Monad?
I agree that it would be unreasonable for every article that uses monads to describe what they are. But I don't think it would be unreasonable for every one of them to link to another article that does explain them.
Would you say the same about for loops? Depending on your audience, it's totally reasonable to expect they know certain things. Further, figuring out what to recommend isn't always easy, and I'd usually rather authors put their efforts into presenting the material that they have to share.
I just skimmed but IIUC, the article is a little more of an in-joke than an explanation and does seem to expect its audience to already understand (or maybe be intrigued enough to learn more from other sources?).
That said, I can try to explain here:
Haskell has a bit of syntax available called "do notation". You can write Haskell without it, but it makes some things read better (as a matter of common but not universal opinion).
There's a simple, purely syntactic translation from do notation into regular application of functions. "Syntactic sugar causes cancer of the semicolon." There are four rules, none of which is complicated, and only two of which are relevant here:
First, a single expression is just that expression, nothing magic happens.
do
expr
simply becomes
expr
Next, the arrows:
do
x <- m
... more stuff, which might use x ...
becomes
bind m (\ x -> do
... more stuff, which might use x ...
)
or in a few other syntaxes:
bind(m, x => do ... more stuff, which might use x ...)
(bind m (lambda (x) do ... more stuff, which might use x ...)
m.bind(|x| { do ... more stuff, which might use x ...)
bind(m, [] (auto x) { do ... more stuff, which might use x ...)
That internal do is then expanded recursively.
So to translate the whole block that's repeated throughout the article:
do
a <- getData
b <- getMoreData a
c <- getMoreData b
d <- getEvenMoreData a c
print d
becomes
bind getData (\ a -> do
b <- getMoreData a
c <- getMoreData b
d <- getEvenMoreData a c
print d)
which becomes
bind getData (\ a ->
bind (getMoreData a) (\ b -> do
c <- getMoreData b
d <- getEvenMoreData a c
print d))
which then becomes:
bind getData (\ a ->
bind (getMoreData a) (\ b ->
bind (getMoreData b) (\ c -> do
d <- getEvenMoreData a c
print d)))
and then:
bind getData (\ a ->
bind (getMoreData a) (\ b ->
bind (getMoreData b) (\ c ->
bind (getEvenMoreData a c) -> (\ d -> do
print d))))
and finally
bind getData (\ a ->
bind (getMoreData a) (\ b ->
bind (getMoreData b) (\ c ->
bind (getEvenMoreData a c) -> (\ d ->
print d))))
Which is "just" a bunch of chained functions combining lambdas.
So... why bother? and how does it do so many different things? and how does it know which to do? and what does it even do?
The key is that we're overloading "bind", picking the behavior that we want. You could do this in most languages by passing in a choice of function. We could have "do" take a parameter, like
do(bind)
or in an OO language you might hang bind on the objects involved. We often see this for particular instances - for instance, .then for promises/futures.
Haskell does it with a mechanism called "type classes", where you can specify an implementation of an interface for a given type, and the compiler will figure out which implementation to provide. This is very similar to Traits in rust, implicits in Scala, etc. You can usually avoid specifying the types manually because of inference.
So in Haskell, Monad is an interface providing two functions:
class Monad m where
bind :: m a -> (a -> m b) -> m b
pure :: a -> m a
We've already come across `bind`, which lets us operate "inside" a context in a way that combines the contexts.
The other function included, `pure`, takes a value and gives it the minimal possible context. What that means is implied by the "monad laws", which tell us how `bind` and `pure` must interact.
For each type that "is a monad", we tell the Haskell compiler how that type implements the interface:
// optionality:
instance Monad Maybe where
bind Nothing _ = Nothing
bind (Just x) f = f x
pure x = Just x
// state:
newtype State s a = State (s -> (a, s))
instance Monad State where
// pure gives the State action that produces x and leaves the state unchanged
pure x = State (\ s -> (x, s))
// bind threads the state through the chained actions
bind (State f0) f1 = State (\ s ->
let (x, s') = f0 s
State f2 = f1 x
in f2 s')
// lists:
instance Monad [] where
pure x = [x]
// bind for lists is concatMap
bind [] _ = []
bind (x:xs) f = f x ++ bind xs f
... that got long. Please ask any questions you're left with :)
newtype Reader r a = Reader { runReader :: r -> a }
instance Monad (Reader r) where
pure x = Reader (\ _ -> x)
m >>= f = Reader (\ r -> runReader (f (runReader m r)) r)
which allows us to define:
getContext :: Reader r r
getContext = Reader (\ r -> r)
and which spares us threading the context manually.
I don't think that's such a big deal, so long as there's only one thing we're threading.
In some languages, we can be generic over "contexts which provide the thing I want, whether they do other things or not", which is sometimes a much bigger win.
Context is like Schrödinger's Cat, you don't know if the Context you're working in is alive or dead (i.e. cancelled, timed out, etc.), but you have to keep passing it forward, one intermediate network call after the next, as long as its status is uncertain. Only when the request fully returns, or is actively cancelled, do you know whether the request is alive or dead.
The Maybe monad deals with the same issue. You have a series of function calls, each of which might or might not be legal, because you never know if you actually have the parameter for the next function call. Maybe you do, maybe you don't.
If you take the naive approach to solving the issue, you pass along Schrödinger's Cat each step along the way. You intertwine the concern of the algorithm you're trying to write with the uncertainty of whether you're carrying a live cat or a dead one. It can work, but it's ugly.
The monadic approach allows you to separate these concerns. You write your algorithm as if you know for a fact that the cat is alive. If the cat were ever to be revealed to be dead along the path, it doesn't matter, the monad separated the consequences of dealing with the live or dead cat from the rest of your algorithm. The rest of your algorithm simply doesn't get run.
Context is the same. You get to write code as if the context is always valid. You don't have to worry about context being cancelled, or timed out, or anything else. You take all those concerns and relate to them separately, in one place, where they can be neatly dealt with.
The whole point of the monadic pattern is to propagate state in such a way that it doesn't interfere with the pure algorithm which you're trying to write. You write the pure algorithm separately, and then use it within the monadic context of Context, so to speak.
How would that be helpful? A context deadline is a fixed point in time. You really want the context (and its deadline) to be specified at the operation scope, not at the file or at the entire filesystem level. And you might have a deadline on a read but be willing to wait forever for a close, so you don't really want the deadline to be attached to the file handle, either.
As long as something like `(f FS) WithContext(ctx context.Context) FS` exists and has overwrite/replace semantics, then that could be used to do more narrow scoping of particular requests, no? Ie. `fs.WithContext(ctxT1).Read()` vs `fs.WithContext(ctxT2).Close()`.
Awesome to see some work on filesystem API's but oh boy... you can really see the challenge provided by the backwards compatibility promise. Alias's in os package to definitions in io/fs, the duplication of the http.FileServer.
Much of the criticism I have seen towards Go has been in regard to the poor design of its filesystem APIs (example: https://news.ycombinator.com/item?id=22443363). So rethinking this might make the language significantly more attractive for some. Especially considering the addition of generics, I am becoming much more interested in the language again.
I’m kind of surprised they don’t mention afero at all, who was one of the first big packages to abstract filesystem access. Shoutout as they make unit testing filesystem ops a piece of cake!
I would like to put it out there that I'm working on a pathlib library that is attempting to solve a lot of the problems that this design draft is addressing.
This (recent but buried) comment suggests replacing FileInfo in the ReadDirFS interface with a DirEntry type, for reasons of performance and future extensibility:
Every filesystem ever has the concept of file attributes.
The problem is the exact details of what file attributes are supported vary widely from system to system.
The best approach is to support a subset which is reasonably common across platforms – file type[1], modification timestamps, file size in bytes, etc – and an extension mechanism to enable platform-specific attributes.
Java NIO handles this reasonably well with the java.nio.file.attribute package[2] in my opinion. (Not sure how easy it would be to port the concepts of that to Go though.)
IANA has a registry of OS-specific facts (i.e. file attributes) and OS-specific file types[3] – this is for use of FTP MLST and MLSD commands[4] but the registry is rather empty because that RFC doesn't appear to have got much adoption. It is a good idea though.
[1] There is a standard list of file types most platforms support – regular file, directory, link – but there are lots of special file types specific to various platforms (e.g. named pipes, UNIX domain sockets, BSD whiteouts, NTFS junctions), plus some filesystems have different subtypes of regular files or directories. (For example, on IBM z/OS, a "regular file" could be a UNIX file, a VSAM dataset, or a non-VSAM dataset, and the later two both have several subtypes; similarly, z/OS has UNIX directories, but PDS(E) could also be viewed as a non-UNIX type of directory.)
Maybe they can just put in double-glob (recursive) support in one Go as the embedded files stuff needs it too. Wonder how that will work from a backwards compatibility perspective? Do asterisks have to be escaped to begin with?
Returning large arrays is the kind of bad idiom you see in languages lacking abstract iterator constructs, generators, or lexical closures. Go isn't such a minimalist language; in fact, it's the one area where it excels, a fact often overlooked by those obsessed with generics. It's indeed perplexing why they'd add an interface that accumulates everything into a giant array by default, especially in a context that you know will cause performance problems, and used by a community with many inexperienced programmers--including ones (e.g. devops) not accustomed to thinking of these details--who will invariably fall into this trap.
What's annoying is that filepath.Walk(), something that you'd think was the performance-minded approach, also loads all the filenames into a slice to sort it for some reason.
If an API is going to arbitrarily sort filenames, at least let us pass in a comparator function so we have some control. That would actually be a decent improvement to its API.
What is this entire subthread even about? os.(File).Readdir does not, by default, even compile. The caller has to provide a parameter of how many entries to read at a time, and reading "all of them" is an intentional choice.
Ah. Perhaps the issue was ReadDirFS.ReadDir and GlobFS.Glob proposals, both of which are one-shot. I don't write much Go so missed that both os.ReadDir and ReadDirFile.ReadDir support streaming.
>why they'd add an interface that accumulates everything into a giant array by default
While I can't speak for Go's authors' mindset, here's my view:
One array of records is about as simple interface as possible, and it's good enough for many use cases. Contrasting with that, reading the directory with repeated readdir()'s or equivalent is fraught with hidden pitfalls. You end up being responsible for sorting, for resuming syscall in case of a signal or other interruption, for handling changes to the list of files during read... and probably other concerns above and beyond that.
The suggestion isn't repeated readdir calls, it is some kind of iterator, which would specify and be implemented to solve the problems you mention (which already have to be dealt with inside the array-returning readdir call, by the way).
I just tested os.Readdir() on a friend's budget Windows laptop which has a consumer grade SATA SSD running on the slow cygwin over the ancient NTFS format.
os.Readdir() fetched 10.000 file names into a slice in 0.2s.
That's more than fine for most use cases and certainly doesn't look like "one of the stupidest, most broken interfaces I've ever seen".
Specially given that it's not the only way to iterate over files of a directory in Go. There's filepath.Walk() for one.
> The files are walked in lexical order, which makes the output deterministic but means that for very large directories Walk can be inefficient.
If it's fine for "most use cases," that's all well and good. Just implement the correct interface, which is fine for ALL use cases, and then implement this sugar on top of that.
Go already has en entire namespace devoted to making things easier for "pretty common use cases:" ioutil. That's exactly where this version of Readdir belongs.
Otherwise, you'd better be 110% sure of this:
> Even when you know that you're dealing with a small number of files, and performance isn't critical?
Because if you turn out to be wrong down the line, your only option is now to rewrite the whole thing in a less-stupid language. Might as well just save yourself the trouble and do that from the beginning.
And what if I'm searching through a large directory? Your response doesn't address the argument being made.
Creating an API where the entire set of data is returned from a data source, without an option to limit the result, is bad. It means that, if I have to read a directory with 2000 files (and this is quite possible! - think /bin, etc.) the function will have to allocate a slice with 2000 entries, and then call stat on all of them.
Stat is not free, especially on certain filesystems (NFS!), nor is reading directory entries.
Yeah, it seems similar to `ioutil` functions which aren't necessarily the best approach in all cases, but are intended to make certain common tasks trivial for many cases.
You can also use io_uring and at least do all the stats in a single syscall (or none, depending on what flags you create the ring with, but for normal use it would be one call).
Ah yes, forgot about uring. Apparently readdir was "next on the list" at the end of last year: https://twitter.com/mathjock/status/1205996402930855937 But I don't see it in the tree. I'm guessing sockets got more priority.
Let’s not do participation trophies though, Pike and Thompson were the original authors of Go. Both of them should know better.
It’s fine to include an inefficient version if you also have a ‘proper’ way to fall back to when performance does matter. And if performance does not matter, why are you even using Go? Funnily enough Python Ruby and Node do expose performant ways to do this in the stdlib, so...
Yeah, nope. There are many appropriate levels of abstraction in between “do this the dumbest way possible” and “write raw unix syscalls using a library outside of the standard distribution”.
No, this is literally the dumbest way possible to accomplish the task. It is as close to a brute force approach as you can get, I can see no way in which it could be less efficient without being malicious. And it is the only way of accomplishing this task in the standard library of a language which prides itself on performance and maintainability.
Also, thanks for the ad hominem, perhaps you shouldn’t attach your self worth to a programming language and see how that works out for you.
This is great. I've had challenges related to filesystem abstraction for years. I scoured the go packages on github for a solution, and found afero[1], but at the time that package had been languishing for years without anyone merging PRs, many of which were critical fixes. It's been picking up again recently so maybe it's better now, but at the time I needed a solution so I built abfs[2].
Abfs is conceptually identical to the go draft design, including the concept of "extension interfaces", although at the fs interface level not the file interface level. One of the problems I ran into early was the need for an easy to use file handle. The `net/http` FileSystem interface shows that you must always implement a custom Open function and a custom File. An object that implements `net/http` File is not adequate, it must actually be cast to http.File. Because of this I've found that it is burdensome to allow flexibility in the definition of the File interface because on the one hand specific functionality not obviously available to a generic File interface must be looked for using type assertion and the consequences of not finding it are poorly defined (should we return an error, should we proceed with some work around, etc), and on the other functions that return a file handle have many possible choices making un-anticipated interoperability unlikely. So instead I opted for always returning a `absfs/absfs` File interface, but return a ErrNotImplemented for functions that are not supported by an implementation. I don't like this, but I like it better then having to do a lot of interface unwrapping to get to the same place, and having it be an error makes the consequences more concrete.
Nevertheless, I'm a million percent behind having a filesystem abstraction in the go standard library. It is immediately useful in testing to redirect potentially costly io operations. It is composable, allowing you to do very little work to wrap a file system with mutexes, timouts, and other gating mechanisms. It allows you to support a transactional file system by spawning a FileSystem from another FileSystem. Caching, copy on write, layering, and arbitrary data transformations are all much easier to reason about and implement using a prototypical filesystem as the model. Cheers, thanks for this Russ and Rob! From where I'm standing it would be one of the most valuable improvements to the go ecosystem possible at this point!
The best thing about Go is how things are designed thinking ahead in the future and avoiding hypes.
Instead of a language with more features than one can count, we have something really concise, yet powerful. This is something I miss a lot about Go and I think can perhaps even have a greater impact than generics for most people (at least I feel like this is my case), yet this is only being considered now after ideas matured in the community and people developed ways around it (see https://github.com/golang/go/issues/35950).
The outcome will probably be something that will be robust and stable and once in the language will likely last a long time unchanged without feeling awkward.
I'm viewing this from outside of the ecosystem, but nothing I've seen from Go externally shows any sort of "thinking ahead", so much as "building what Google needs and wants to have".
Things like no package management, no vendoring, importing modules directly from GitHub without any concept of versioning, the confusing mess that is $GOPATH, the awkward handling of errors, and so on. Lots of things that were fixed, but shouldn't have been broken in the first place.
Go certainly has use and functionality and benefits (goroutines sound awesome), but Go itself seems like kind of an awkward mess when viewed outside of the context of "internal Google tool".
* importing modules directly from GitHub without any concept of versioning
You set the versions in your `go.mod` file.
> Go itself seems like kind of an awkward mess when viewed outside of the context of "internal Google tool".
Go is actually really nice from a community standpoint. WebRTC is driven by Google and is magnitudes worse. I deal with a wonderful mix of arrogance and ignorance. I appreciate the great tech they bankroll and try not to let the other stuff bother me.
The fact that go eventually fixed things doesn't really speak to the top-level comment on this thread that "The best thing about Go is how things are designed thinking ahead in the future and avoiding hypes."
That the things needed fixing (and especially in cases where there were previous ways of doing things that had to be entirely discarded) seems to go against the idea that there was thinking ahead in those areas.
I think the 'In favor of Go' argument is that the Go authors defer decisions they don't know the answer for yet. They would rather leave something unimplemented, then do it the wrong way.
But maybe you are right and they didn't anticipate these issues! I have no idea either way :)
I appreciate the answer "I don't know" when it's the right answer. And I think the Go team is an excellent example of holding steady on "I don't know" until the answer arrives, despite all the flak they've received for taking their time. I also think the way they've decided to go about announcing/framing these recent drafts is smart; they're learning from the past.
You can compare golang's package/version management with .net's. It's obvious that the latter's designers got kicked in their delicate regions enough in the 90's that that they made it a first class priority from day one.
Probably doesn't help that at google when a package breaks they'll just force some faceless drone to make it work. And since they don't have customers in the traditional sense that works for them.
> Things like no package management, no vendoring, importing modules directly from GitHub without any concept of versioning, the confusing mess that is $GOPATH
You just said "no dependency management" 4 times in a row.
maybe you should try it ... what about that? I bet you will change your mind. Both posts that you have linked are pretty much: "meh... i need s.th. to complain about"-posts. I've never been really affected by the issues they are screaming so loud about and believe me, I did not only write some toy projects [1]
go is mega awesome and I love going back to the projects where I use it. It's a simple language that makes it very easy to write good code. And the library ecosystem is pretty good as well, especially when it comes to networking stuff.
I did. Contrary a point above, I vendored dependencies in Go in 2012 the old school way: copy them into a vendor directory. It worked fine for end-user binaries, and you didn't have to worry about painfully long builds like you can get doing that with C++.
The Go authors have explicitly stated they don't do things until they figure out the right way to do them. No language gets it all right out of the gate, but Go got pretty close with e.g. its standard library.
So he complains about stuff that is alread fixed? Shall we now go back to the first version of ffmpeg? Or php pre php5? or pre java 8? or C++ pre C++11?
Come on.
He should try golang and then come back instead of relying on blogposts to form his opinion
The argument being made is that Go's design isn't so much "forward thinking" as it is driven by Google's interests.
What they are not stating is that Go is a conservative language with a standard library hampered by that conservatism, nor are they making a judgment on the current state of the language.
If you are going to criticize someone's post, give them the respect of actually reading their argument, and not just writing a knee-jerk reaction.
It's sad when someone calls a remake of a failed language invented in 1968 as a language 'designed thinking ahead in the future' and other people agreeing with him
I'm sorry to say but I like PHP better than go and php is in the top 5 of languages people use that I hate (most to least is go, rust, js, java, php). You might say they're the most popular languages but so my top 3 languages. From least to most: D, Zig, C++, Python (3) and C#
Do people really sing the praises of File? Or, do they curse its many and dangerous rough edges?
Like you I read this proposal through that lens, but I actually don't see a line from Google's File abstraction to this. Is this Go abstraction really forward-compatible with cloud-native filesystems? And when I say forward compatible I mean would it work with Google's decade-old Colossus? I think it isn't because it puts Stat in the required interface and Stat returns FileInfo, many of the fields of which might be meaningless in a cloud filesystem (like mode, which is a unixism, and IsDir, which doesn't make sense for non-hierarchical filesystems).
If you can't tell I'm not much of a fan of trying to abstract over filesystems. Mostly they don't resemble each other at all, except in some extremely high level concepts.
So first, what exactly I was meant to say, is that:
Go design resembles closest to Google's internal C++ APIs.
Then, when I saw people praises Go APIs, I feel that some people worked on C++ APIs at Google, were missing some credits. That's a derived feeling from the above.
That is not to say that this File design was already being praised (not to say that it does not deserve praise); or that it should be criticized (not to say that it does not have problems).
As for your example to contradict my claim: "Stat returns FileInfo, many of the fields of which might be meaningless in a cloud filesystem":
1. This has to be there because a File API has to be compatible with OS files (Unixy). And Google's API also does the same thing.
2. Incompatibility derives from enforcing information that not present in another scenario. Clearly, the presence of such attributes does not prevent them to be applied on cloud files systems.
And further, Cloud file systems do have mode, and directory...
IsDir can return false for filesystems that don't have that concept, which would of course be the correct value in that case. Mode and ModTime are more concerning, but practically, FileMode(0) and time.Time{} can be used respectively in implementations that have no use for them.
In terms of the ideal of preferring types as a means to constrain functionality, I think FileInfo fields should have been expressed as granularly as possible using extension interfaces in this case, but practically I'd prefer putting some placeholder values in FileInfo for missing functionality over the clunky way these extensions have to be implemented in in Go.
Maybe Ken Thompson and Rob Pike worked on those C++ APIs, too. They certainly don't like C++, but also had to used it, and created Go out of frustration.
Network file system could use the same interfaces + timeout settings.