File System Interfaces for Go – Draft Design

didip · on Aug 5, 2020

I am surprised that it doesn't have public interfaces with context.

Network file system could use the same interfaces + timeout settings.

infogulch · on Aug 6, 2020

Oh god no, this draft is actually pretty good, don't ruin it by infecting it with context. Context is the worst library in Go: it's a pile of hacks to get cancellation and goroutine-local storage by manually passing new heap-allocated objects through every function interface in every library between low-level IO up to task management. It infects everything, obliterating separation of concerns making even libraries that should have nothing at all do do with timeouts include code for it. And it includes an untyped key-value store implemented as a linked-list of pairs (!!!!), because why not?

If you can't tell, I don't like context. I've said before [0] I really hope that Go 2 comes up with an actual solution to the cancellation and task-local storage problems and deprecates context. Some comments in that thread pointed to alternatives that looked pretty decent, I wonder what state they're in these days.

[0]: https://news.ycombinator.com/item?id=18561884

zeveb · on Aug 6, 2020

I quite like the concept of contexts, I just wish they were implicit rather than explicit. The idea of a function taking place in a dynamic context makes perfect sense (because, after all, it does); the only thing which doesn't really make sense is having to pass it along by hand. Imagine how terrible it would be if we had to explicitly pass the return stack everywhere[0]!

It needs to 'infect' (as you put it) everything because it needs to pervade everything, so a library can pass it through to anything it calls. This is a good argument for making it implicit.

> And it includes an untyped key-value store implemented as a linked-list of pairs (!!!!), because why not?

That is just a cons chain, famous for example as the foundation of the Lisp programming language, and there is nothing wrong with it. Note that it is unfair to say that it is untyped: the values themselves are strongly typed, and the cells too are typed — interface{} (or T in Lisp terms) is still a type!

A cons chain has advantages for inheritance of values in a DAG of calls. It is not necessarily the most efficient, but simplicity is a virtue.

0: Continuation-passing style is both really powerful and well-nigh unreadable, for a reason.

thiht · on Aug 6, 2020

I love contexts. They're a simple solution to a complicated problem. They take up space in the functions signatures but they're sooo easy to read and to use.

I really hope they'll stay.

networkimprov · on Aug 6, 2020

No Go 2.0 release is planned. Go2 is a marketing label (I guess?) meant to imply that the Go 1.x language and stdlib are still evolving.

The term is a mistake, as it creates the impression that a backwards-incompatible product is in the works.

infogulch · on Aug 6, 2020

I'm familiar with go team's compatibility plan for "Go 2", or whatever it will be called, which is why I used "deprecate" not "delete". Though I agree that there's no requirement to have a new major version in order to introduce big fixes like this.

solatic · on Aug 6, 2020

Everything old is new again. This challenge is basically why monads exist, because the monad allows you to cleanly separate between the state in which the algorithm is running and the algorithm itself. Reminds me of: https://philipnilsson.github.io/Badness10k/posts/2017-05-07-...

I wonder what poor abstraction the Go authors will come up with instead?

nyanpasu64 · on Aug 6, 2020

> https://philipnilsson.github.io/Badness10k/posts/2017-05-07-...

Every "monadic solution" prints the same code block without explaining how it would work, the types of the various variables, the semantics of the <- operator. I didn't leave the page with an understanding of how monads achieve these tasks.

virtue3 · on Aug 6, 2020

The article is literally: "Hell thing" Step 1 -> monads step 2 -> ???? step 3 -> profit?

This is apparently the monads curse where people that get them can't effing explain them to save their life.

strbean · on Aug 6, 2020

Monads are just monoids in the category of endofunctors! /s

The issue is pretty much a language barrier. All these articles talking about benefits of / interesting ways to use monads are written by people who speak the language, assuming the reader speaks the language as well. As with many functional programming topics, the fundamentals aren't incredibly easy to wrap your head around. But if you already understood monads, can you imagine how annoying it would be if every resource, discussion, article, etc. relating to them started with a pages-long introduction on What is a Monad?

In Haskell, this is a fundamental topic. Monads are used everywhere. If people always explained how they worked when discussing them, it would be like looking up sorting algorithms and having every algorithm description start with a long-winded explanation of how for loops work.

infogulch · on Aug 6, 2020

> But if you already understood monads, can you imagine how annoying it would be if every resource, discussion, article, etc. relating to them started with a pages-long introduction on What is a Monad?

I agree that it would be unreasonable for every article that uses monads to describe what they are. But I don't think it would be unreasonable for every one of them to link to another article that does explain them.

dllthomas · on Aug 7, 2020

Would you say the same about for loops? Depending on your audience, it's totally reasonable to expect they know certain things. Further, figuring out what to recommend isn't always easy, and I'd usually rather authors put their efforts into presenting the material that they have to share.

dllthomas · on Aug 6, 2020

I just skimmed but IIUC, the article is a little more of an in-joke than an explanation and does seem to expect its audience to already understand (or maybe be intrigued enough to learn more from other sources?).

That said, I can try to explain here:

Haskell has a bit of syntax available called "do notation". You can write Haskell without it, but it makes some things read better (as a matter of common but not universal opinion).

There's a simple, purely syntactic translation from do notation into regular application of functions. "Syntactic sugar causes cancer of the semicolon." There are four rules, none of which is complicated, and only two of which are relevant here:

First, a single expression is just that expression, nothing magic happens.

    do
      expr

simply becomes

    expr

Next, the arrows:

    do
      x <- m
      ... more stuff, which might use x ...

becomes

    bind m (\ x -> do
      ... more stuff, which might use x ...
    )

or in a few other syntaxes:

    bind(m, x => do ... more stuff, which might use x ...)
    (bind m (lambda (x) do ... more stuff, which might use x ...)
    m.bind(|x| { do ... more stuff, which might use x ...)
    bind(m, [] (auto x) { do ... more stuff, which might use x ...)

That internal do is then expanded recursively.

So to translate the whole block that's repeated throughout the article:

    do
      a <- getData
      b <- getMoreData a
      c <- getMoreData b
      d <- getEvenMoreData a c
      print d

becomes

    bind getData (\ a -> do
      b <- getMoreData a
      c <- getMoreData b
      d <- getEvenMoreData a c
      print d)

which becomes

    bind getData (\ a ->
      bind (getMoreData a) (\ b -> do
        c <- getMoreData b
        d <- getEvenMoreData a c
        print d))

which then becomes:

    bind getData (\ a ->
      bind (getMoreData a) (\ b ->
        bind (getMoreData b) (\ c -> do
          d <- getEvenMoreData a c
          print d)))

and then:

    bind getData (\ a ->
      bind (getMoreData a) (\ b ->
        bind (getMoreData b) (\ c ->
          bind (getEvenMoreData a c) -> (\ d -> do
            print d))))

and finally

    bind getData (\ a ->
      bind (getMoreData a) (\ b ->
        bind (getMoreData b) (\ c ->
          bind (getEvenMoreData a c) -> (\ d ->
            print d))))

Which is "just" a bunch of chained functions combining lambdas.

So... why bother? and how does it do so many different things? and how does it know which to do? and what does it even do?

The key is that we're overloading "bind", picking the behavior that we want. You could do this in most languages by passing in a choice of function. We could have "do" take a parameter, like

    do(bind)

or in an OO language you might hang bind on the objects involved. We often see this for particular instances - for instance, .then for promises/futures.

Haskell does it with a mechanism called "type classes", where you can specify an implementation of an interface for a given type, and the compiler will figure out which implementation to provide. This is very similar to Traits in rust, implicits in Scala, etc. You can usually avoid specifying the types manually because of inference.

So in Haskell, Monad is an interface providing two functions:

    class Monad m where
      bind :: m a -> (a -> m b) -> m b
      pure :: a -> m a

We've already come across `bind`, which lets us operate "inside" a context in a way that combines the contexts.

The other function included, `pure`, takes a value and gives it the minimal possible context. What that means is implied by the "monad laws", which tell us how `bind` and `pure` must interact.

For each type that "is a monad", we tell the Haskell compiler how that type implements the interface:

    // optionality:
    instance Monad Maybe where
      bind Nothing _ = Nothing
      bind (Just x) f = f x
      pure x = Just x

    // state:
    newtype State s a = State (s -> (a, s))
    instance Monad State where
      // pure gives the State action that produces x and leaves the state unchanged
      pure x = State (\ s -> (x, s))

      // bind threads the state through the chained actions
      bind (State f0) f1 = State (\ s ->
        let (x, s') = f0 s
            State f2 = f1 x
          in f2 s')

    // lists:
    instance Monad [] where
      pure x = [x]

      // bind for lists is concatMap
      bind [] _ = []
      bind (x:xs) f = f x ++ bind xs f

... that got long. Please ask any questions you're left with :)

philosopher1234 · on Aug 6, 2020

How would you apply monads to solve the context problem? Hard for me to imagine what that would look like

dllthomas · on Aug 6, 2020

I think what they have in mind is a Reader monad.

    newtype Reader r a = Reader { runReader :: r -> a }

    instance Monad (Reader r) where
        pure x = Reader (\ _ -> x)
        m >>= f = Reader (\ r -> runReader (f (runReader m r)) r)

which allows us to define:

    getContext :: Reader r r
    getContext = Reader (\ r -> r)

and which spares us threading the context manually.

I don't think that's such a big deal, so long as there's only one thing we're threading.

In some languages, we can be generic over "contexts which provide the thing I want, whether they do other things or not", which is sometimes a much bigger win.

solatic · on Aug 7, 2020

Context is like Schrödinger's Cat, you don't know if the Context you're working in is alive or dead (i.e. cancelled, timed out, etc.), but you have to keep passing it forward, one intermediate network call after the next, as long as its status is uncertain. Only when the request fully returns, or is actively cancelled, do you know whether the request is alive or dead.

The Maybe monad deals with the same issue. You have a series of function calls, each of which might or might not be legal, because you never know if you actually have the parameter for the next function call. Maybe you do, maybe you don't.

If you take the naive approach to solving the issue, you pass along Schrödinger's Cat each step along the way. You intertwine the concern of the algorithm you're trying to write with the uncertainty of whether you're carrying a live cat or a dead one. It can work, but it's ugly.

The monadic approach allows you to separate these concerns. You write your algorithm as if you know for a fact that the cat is alive. If the cat were ever to be revealed to be dead along the path, it doesn't matter, the monad separated the consequences of dealing with the live or dead cat from the rest of your algorithm. The rest of your algorithm simply doesn't get run.

Context is the same. You get to write code as if the context is always valid. You don't have to worry about context being cancelled, or timed out, or anything else. You take all those concerns and relate to them separately, in one place, where they can be neatly dealt with.

The whole point of the monadic pattern is to propagate state in such a way that it doesn't interfere with the pure algorithm which you're trying to write. You write the pure algorithm separately, and then use it within the monadic context of Context, so to speak.

voldacar · on Aug 6, 2020

the gophers hated him because he told them the truth

icholy · on Aug 5, 2020

There was discussion about this. One approach would be to tie the entire fs.FS instance to a context. Something akin to `http.Request.WithContext`.

jeffbee · on Aug 5, 2020

How would that be helpful? A context deadline is a fixed point in time. You really want the context (and its deadline) to be specified at the operation scope, not at the file or at the entire filesystem level. And you might have a deadline on a read but be willing to wait forever for a close, so you don't really want the deadline to be attached to the file handle, either.

q3k · on Aug 6, 2020

As long as something like `(f FS) WithContext(ctx context.Context) FS` exists and has overwrite/replace semantics, then that could be used to do more narrow scoping of particular requests, no? Ie. `fs.WithContext(ctxT1).Read()` vs `fs.WithContext(ctxT2).Close()`.

andrewstuart2 · on Aug 6, 2020

It definitely wouldn't need to be at the top level. Remember, this is an interface, so

    func (f SomeConcreteFS) WithContext(ctx context.Context) fs.FS

could hold a reference to the underlying filesystem and call appropriate read deadlines, timeouts, etc.

sudhirj · on Aug 6, 2020

And the draft specifically mentions that those methods are the minimum interface - a library that supports contexts could always do this either way.

joshuak · on Aug 6, 2020

I am not a fan of the context pattern either, however one can support it without need to explicitly define that support.

    var fs fs.Fs
    fs = somefs.New()
    fs = contextfs.New(context.TODO(), fs)

fs is now an fs with an imbedded context.

nikon · on Aug 6, 2020

Accept interfaces, return structs?

suessflorian · on Aug 6, 2020

Awesome to see some work on filesystem API's but oh boy... you can really see the challenge provided by the backwards compatibility promise. Alias's in os package to definitions in io/fs, the duplication of the http.FileServer.

shawnz · on Aug 5, 2020

Much of the criticism I have seen towards Go has been in regard to the poor design of its filesystem APIs (example: https://news.ycombinator.com/item?id=22443363). So rethinking this might make the language significantly more attractive for some. Especially considering the addition of generics, I am becoming much more interested in the language again.

SEJeff · on Aug 5, 2020

I’m kind of surprised they don’t mention afero at all, who was one of the first big packages to abstract filesystem access. Shoutout as they make unit testing filesystem ops a piece of cake!

https://github.com/spf13/afero

kyrra · on Aug 5, 2020

They are likely aware of it, as spf13 works on golang at Google. :)

SEJeff · on Aug 5, 2020

Yeah I realize that. It is just surprising is all. It is a very nice to use interface. Getting an equivalent in stdlib would be great.

mseepgood · on Aug 6, 2020

https://www.reddit.com/r/golang/comments/hv976o/qa_iofs_draf...

SEJeff · on Aug 6, 2020

Awesome find, thanks!

shanemhansen · on Aug 5, 2020

Just a random gopher sharing their level of surprise. I didn't know about the spf13 project, but I instantly thought of https://godoc.org/golang.org/x/tools/godoc/vfs

Looks like they mention that package in their proposal.

LTClipp · on Aug 5, 2020

I would like to put it out there that I'm working on a pathlib library that is attempting to solve a lot of the problems that this design draft is addressing.

https://github.com/chigopher/pathlib

networkimprov · on Aug 6, 2020

This (recent but buried) comment suggests replacing FileInfo in the ReadDirFS interface with a DirEntry type, for reasons of performance and future extensibility:

https://www.reddit.com/r/golang/comments/hv976o/qa_iofs_draf...

It was prompted by this proposal for os.Readdirentries():

https://github.com/golang/go/issues/40352

It seems to me that FileInfo isn't a suitable interface for either a general filesystem construct, or a performant implementation.

bfuclusion · on Aug 5, 2020

Is stat really something you want as an abstraction over all filesystems? You might not even have that information available in all of them.

skissane · on Aug 5, 2020

Every filesystem ever has the concept of file attributes.

The problem is the exact details of what file attributes are supported vary widely from system to system.

The best approach is to support a subset which is reasonably common across platforms – file type[1], modification timestamps, file size in bytes, etc – and an extension mechanism to enable platform-specific attributes.

Java NIO handles this reasonably well with the java.nio.file.attribute package[2] in my opinion. (Not sure how easy it would be to port the concepts of that to Go though.)

IANA has a registry of OS-specific facts (i.e. file attributes) and OS-specific file types[3] – this is for use of FTP MLST and MLSD commands[4] but the registry is rather empty because that RFC doesn't appear to have got much adoption. It is a good idea though.

[1] There is a standard list of file types most platforms support – regular file, directory, link – but there are lots of special file types specific to various platforms (e.g. named pipes, UNIX domain sockets, BSD whiteouts, NTFS junctions), plus some filesystems have different subtypes of regular files or directories. (For example, on IBM z/OS, a "regular file" could be a UNIX file, a VSAM dataset, or a non-VSAM dataset, and the later two both have several subtypes; similarly, z/OS has UNIX directories, but PDS(E) could also be viewed as a non-UNIX type of directory.)

[2] https://docs.oracle.com/en/java/javase/14/docs/api/java.base...

[3] https://www.iana.org/assignments/os-specific-parameters/os-s...

[4] https://tools.ietf.org/html/rfc3659

heinrichhartman · on Aug 5, 2020

Calling stat on all the files is just really expensive if your directly is large.

networkimprov · on Aug 5, 2020

See the Reddit for an alternative to FileInfo.

https://www.reddit.com/r/golang/comments/hv976o/qa_iofs_draf...

tptacek · on Aug 5, 2020

FileInfo is an interface, isn't it? You could populate fields other than "name" lazily, right?

TheDong · on Aug 5, 2020

You can't because the interface doesn't allow any errors. Lazily populating data is usually fallible, and the interface's methods are all infallible.

tptacek · on Aug 5, 2020

OK, I'm convinced. Not a great interface, should just return an iterator.

sanxiyn · on Aug 6, 2020

This is very valuable for languages to get right. Tcl had it for a long time and I envy them: https://wiki.tcl-lang.org/page/VFS

parhamn · on Aug 5, 2020

Maybe they can just put in double-glob (recursive) support in one Go as the embedded files stuff needs it too. Wonder how that will work from a backwards compatibility perspective? Do asterisks have to be escaped to begin with?

boring_twenties · on Aug 5, 2020

os.Readdir has to be one of the stupidest, most broken interfaces I've ever seen.

Reading the whole dir into an array is bad enough as it is, but they even call stat() on every file too.

wahern · on Aug 5, 2020

Returning large arrays is the kind of bad idiom you see in languages lacking abstract iterator constructs, generators, or lexical closures. Go isn't such a minimalist language; in fact, it's the one area where it excels, a fact often overlooked by those obsessed with generics. It's indeed perplexing why they'd add an interface that accumulates everything into a giant array by default, especially in a context that you know will cause performance problems, and used by a community with many inexperienced programmers--including ones (e.g. devops) not accustomed to thinking of these details--who will invariably fall into this trap.

hombre_fatal · on Aug 5, 2020

What's annoying is that filepath.Walk(), something that you'd think was the performance-minded approach, also loads all the filenames into a slice to sort it for some reason.

If an API is going to arbitrarily sort filenames, at least let us pass in a comparator function so we have some control. That would actually be a decent improvement to its API.

jeffbee · on Aug 5, 2020

What is this entire subthread even about? os.(File).Readdir does not, by default, even compile. The caller has to provide a parameter of how many entries to read at a time, and reading "all of them" is an intentional choice.

wahern · on Aug 5, 2020

Ah. Perhaps the issue was ReadDirFS.ReadDir and GlobFS.Glob proposals, both of which are one-shot. I don't write much Go so missed that both os.ReadDir and ReadDirFile.ReadDir support streaming.

dexen · on Aug 5, 2020

>why they'd add an interface that accumulates everything into a giant array by default

While I can't speak for Go's authors' mindset, here's my view:

One array of records is about as simple interface as possible, and it's good enough for many use cases. Contrasting with that, reading the directory with repeated readdir()'s or equivalent is fraught with hidden pitfalls. You end up being responsible for sorting, for resuming syscall in case of a signal or other interruption, for handling changes to the list of files during read... and probably other concerns above and beyond that.

sanderjd · on Aug 5, 2020

The suggestion isn't repeated readdir calls, it is some kind of iterator, which would specify and be implemented to solve the problems you mention (which already have to be dealt with inside the array-returning readdir call, by the way).

hu3 · on Aug 5, 2020

I just tested os.Readdir() on a friend's budget Windows laptop which has a consumer grade SATA SSD running on the slow cygwin over the ancient NTFS format.

os.Readdir() fetched 10.000 file names into a slice in 0.2s.

That's more than fine for most use cases and certainly doesn't look like "one of the stupidest, most broken interfaces I've ever seen".

Specially given that it's not the only way to iterate over files of a directory in Go. There's filepath.Walk() for one.

boring_twenties · on Aug 5, 2020

filepath.Walk() isn't much better:

> The files are walked in lexical order, which makes the output deterministic but means that for very large directories Walk can be inefficient.

If it's fine for "most use cases," that's all well and good. Just implement the correct interface, which is fine for ALL use cases, and then implement this sugar on top of that.

hu3 · on Aug 5, 2020

They are aware: https://github.com/golang/go/issues/16399

Seems like they postponed incorporating fastwalk into the standard library because it would break API: https://godoc.org/golang.org/x/tools/internal/fastwalk

Perhaps until then one could use libraries that expose fastwalk:

https://github.com/karrick/godirwalk

https://github.com/s12chung/fastwalk

jfkebwjsbx · on Aug 6, 2020

2 ms per 10 files is a huge amount of time.

hu3 · on Aug 6, 2020

You're right! I was executing time go run main.go

I now built the exe and ran:

    real    0m0.199s
    user    0m0.000s
    sys     0m0.014s

So 0.2s for 10k files on the worst possible hardware/software scenario I could find nearby. Edited my original comment. Thanks!

jfkebwjsbx · on Aug 6, 2020

You're welcome!

It is still a ton of time, about an order of magnitude more than optimal if the real/sys time split is to be believed.

networkimprov · on Aug 5, 2020

There is a proposal to replace os.Readdir() here:

https://github.com/golang/go/issues/40352

duskwuff · on Aug 5, 2020

Even when you know that you're dealing with a small number of files, and performance isn't critical?

It's not a "stupid" interface if it does exactly what you need. And I think this is a pretty common use case.

boring_twenties · on Aug 5, 2020

Go already has en entire namespace devoted to making things easier for "pretty common use cases:" ioutil. That's exactly where this version of Readdir belongs.

Otherwise, you'd better be 110% sure of this:

> Even when you know that you're dealing with a small number of files, and performance isn't critical?

Because if you turn out to be wrong down the line, your only option is now to rewrite the whole thing in a less-stupid language. Might as well just save yourself the trouble and do that from the beginning.

mpfundstein · on Aug 5, 2020

i like the function. saved me a lot of manual work already ;-)

just use it when you need it and be aware of the downsides...

Varriount · on Aug 5, 2020

And what if I'm searching through a large directory? Your response doesn't address the argument being made.

Creating an API where the entire set of data is returned from a data source, without an option to limit the result, is bad. It means that, if I have to read a directory with 2000 files (and this is quite possible! - think /bin, etc.) the function will have to allocate a slice with 2000 entries, and then call stat on all of them.

Stat is not free, especially on certain filesystems (NFS!), nor is reading directory entries.

sethammons · on Aug 5, 2020

try millions of files :). You can run out of inodes too :)

mpfundstein · on Aug 5, 2020

that's why I wrote: be aware of the downside...

It's a mega convenient function if you want to slurp in a few files.

ikiris · on Aug 6, 2020

"ORM is great because it means I have all the data avaliable!"

heavenlyblue · on Aug 5, 2020

In what universe calling stat on every element of an array is “saving a lot of manual work”?

ecnahc515 · on Aug 5, 2020

Yeah, it seems similar to `ioutil` functions which aren't necessarily the best approach in all cases, but are intended to make certain common tasks trivial for many cases.

boring_twenties · on Aug 5, 2020

Uh, k?

What does being aware of the downsides get you?

Is there some alternate interface that avoids the downsides?

Mathnerd314 · on Aug 5, 2020

This is the syscall: https://www.man7.org/linux/man-pages/man2/getdents.2.html

You opendir, read 100 entries, do whatever with them, read 100 more, repeat, then closedir. So it's a buffered interface similar to normal I/O.

There's been proposals for doing readdir+stat in one syscall, but nothing merged yet. https://lwn.net/Articles/606995/

yxhuvud · on Aug 6, 2020

You can also use io_uring and at least do all the stats in a single syscall (or none, depending on what flags you create the ring with, but for normal use it would be one call).

Mathnerd314 · on Aug 8, 2020

Ah yes, forgot about uring. Apparently readdir was "next on the list" at the end of last year: https://twitter.com/mathjock/status/1205996402930855937 But I don't see it in the tree. I'm guessing sockets got more priority.

mpfundstein · on Aug 5, 2020

I am just saying that the function has merit. Even if it could have been written much better. Thats all :-) It served me well.

Reg downsides: Maybe don't use it on a directory with 1000 entries...? Its good to be aware of that or not?

doteka · on Aug 5, 2020

Let’s not do participation trophies though, Pike and Thompson were the original authors of Go. Both of them should know better.

It’s fine to include an inefficient version if you also have a ‘proper’ way to fall back to when performance does matter. And if performance does not matter, why are you even using Go? Funnily enough Python Ruby and Node do expose performant ways to do this in the stdlib, so...

boring_twenties · on Aug 5, 2020

So just what the fuck are you supposed to use on a directory with 1000 entries?

icholy · on Aug 5, 2020

You can drop down into the lower level apis https://godoc.org/golang.org/x/sys/unix

doteka · on Aug 5, 2020

Yeah, nope. There are many appropriate levels of abstraction in between “do this the dumbest way possible” and “write raw unix syscalls using a library outside of the standard distribution”.

icholy · on Aug 6, 2020

There's https://golang.org/pkg/os/#File.Readdir

Someone · on Aug 6, 2020

Which, as far as I can tell (couldn’t find opendir/closedir), has the same problem.

I think they named the call after the posix one. If so, they must have known about opendir/closedir.

icholy · on Aug 6, 2020

It uses https://golang.org/pkg/syscall/#ReadDirent and https://golang.org/pkg/syscall/#ParseDirent to incrementally read them.

icholy · on Aug 5, 2020

Just because something doesn't work for your use-case doesn't make it the "dumbest way possible". It just makes you sound dumb.

doteka · on Aug 6, 2020

No, this is literally the dumbest way possible to accomplish the task. It is as close to a brute force approach as you can get, I can see no way in which it could be less efficient without being malicious. And it is the only way of accomplishing this task in the standard library of a language which prides itself on performance and maintainability.

Also, thanks for the ad hominem, perhaps you shouldn’t attach your self worth to a programming language and see how that works out for you.

boring_twenties · on Aug 5, 2020

That interface is just awful, here's a byte buffer and some offsets, good luck.

hu3 · on Aug 5, 2020

https://github.com/karrick/godirwalk

joshuak · on Aug 6, 2020

This is great. I've had challenges related to filesystem abstraction for years. I scoured the go packages on github for a solution, and found afero[1], but at the time that package had been languishing for years without anyone merging PRs, many of which were critical fixes. It's been picking up again recently so maybe it's better now, but at the time I needed a solution so I built abfs[2].

Abfs is conceptually identical to the go draft design, including the concept of "extension interfaces", although at the fs interface level not the file interface level. One of the problems I ran into early was the need for an easy to use file handle. The `net/http` FileSystem interface shows that you must always implement a custom Open function and a custom File. An object that implements `net/http` File is not adequate, it must actually be cast to http.File. Because of this I've found that it is burdensome to allow flexibility in the definition of the File interface because on the one hand specific functionality not obviously available to a generic File interface must be looked for using type assertion and the consequences of not finding it are poorly defined (should we return an error, should we proceed with some work around, etc), and on the other functions that return a file handle have many possible choices making un-anticipated interoperability unlikely. So instead I opted for always returning a `absfs/absfs` File interface, but return a ErrNotImplemented for functions that are not supported by an implementation. I don't like this, but I like it better then having to do a lot of interface unwrapping to get to the same place, and having it be an error makes the consequences more concrete.

Nevertheless, I'm a million percent behind having a filesystem abstraction in the go standard library. It is immediately useful in testing to redirect potentially costly io operations. It is composable, allowing you to do very little work to wrap a file system with mutexes, timouts, and other gating mechanisms. It allows you to support a transactional file system by spawning a FileSystem from another FileSystem. Caching, copy on write, layering, and arbitrary data transformations are all much easier to reason about and implement using a prototypical filesystem as the model. Cheers, thanks for this Russ and Rob! From where I'm standing it would be one of the most valuable improvements to the go ecosystem possible at this point!

[1]: https://github.com/spf13/afero

[2]: https://github.com/absfs/absfs

henvic · on Aug 5, 2020

The best thing about Go is how things are designed thinking ahead in the future and avoiding hypes.

Instead of a language with more features than one can count, we have something really concise, yet powerful. This is something I miss a lot about Go and I think can perhaps even have a greater impact than generics for most people (at least I feel like this is my case), yet this is only being considered now after ideas matured in the community and people developed ways around it (see https://github.com/golang/go/issues/35950).

The outcome will probably be something that will be robust and stable and once in the language will likely last a long time unchanged without feeling awkward.

danudey · on Aug 5, 2020

I'm viewing this from outside of the ecosystem, but nothing I've seen from Go externally shows any sort of "thinking ahead", so much as "building what Google needs and wants to have".

This blog post is just one example of those kinds of issues, and it's just the most recent of such that I've read: https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-...

Things like no package management, no vendoring, importing modules directly from GitHub without any concept of versioning, the confusing mess that is $GOPATH, the awkward handling of errors, and so on. Lots of things that were fixed, but shouldn't have been broken in the first place.

Go certainly has use and functionality and benefits (goroutines sound awesome), but Go itself seems like kind of an awkward mess when viewed outside of the context of "internal Google tool".

Sean-Der · on Aug 5, 2020

Maybe not relevant to your actual point! But most of the things you mentioned aren't a problem anymore.

* no package management

I use Go Modules for Pion and really love them https://github.com/pion/webrtc/blob/master/go.mod I select the versions I want and everyone that downloads my code uses the proper version.

* No vendoring

Just do `go mod vendor` and you are done

* importing modules directly from GitHub without any concept of versioning

You set the versions in your `go.mod` file.

> Go itself seems like kind of an awkward mess when viewed outside of the context of "internal Google tool".

Go is actually really nice from a community standpoint. WebRTC is driven by Google and is magnitudes worse. I deal with a wonderful mix of arrogance and ignorance. I appreciate the great tech they bankroll and try not to let the other stuff bother me.

codys · on Aug 5, 2020

The fact that go eventually fixed things doesn't really speak to the top-level comment on this thread that "The best thing about Go is how things are designed thinking ahead in the future and avoiding hypes."

That the things needed fixing (and especially in cases where there were previous ways of doing things that had to be entirely discarded) seems to go against the idea that there was thinking ahead in those areas.

Sean-Der · on Aug 5, 2020

I think the 'In favor of Go' argument is that the Go authors defer decisions they don't know the answer for yet. They would rather leave something unimplemented, then do it the wrong way.

But maybe you are right and they didn't anticipate these issues! I have no idea either way :)

infogulch · on Aug 6, 2020

I appreciate the answer "I don't know" when it's the right answer. And I think the Go team is an excellent example of holding steady on "I don't know" until the answer arrives, despite all the flak they've received for taking their time. I also think the way they've decided to go about announcing/framing these recent drafts is smart; they're learning from the past.

Gibbon1 · on Aug 6, 2020

You can compare golang's package/version management with .net's. It's obvious that the latter's designers got kicked in their delicate regions enough in the 90's that that they made it a first class priority from day one.

Probably doesn't help that at google when a package breaks they'll just force some faceless drone to make it work. And since they don't have customers in the traditional sense that works for them.

icholy · on Aug 5, 2020

> Things like no package management, no vendoring, importing modules directly from GitHub without any concept of versioning, the confusing mess that is $GOPATH

You just said "no dependency management" 4 times in a row.

mpfundstein · on Aug 5, 2020

maybe you should try it ... what about that? I bet you will change your mind. Both posts that you have linked are pretty much: "meh... i need s.th. to complain about"-posts. I've never been really affected by the issues they are screaming so loud about and believe me, I did not only write some toy projects [1]

go is mega awesome and I love going back to the projects where I use it. It's a simple language that makes it very easy to write good code. And the library ecosystem is pretty good as well, especially when it comes to networking stuff.

[1] https://github.com/MarkusPfundstein/taylor

Varriount · on Aug 5, 2020

Note that they state "all things that have already been fixed". Did you use Go when these oversights were present?

mratzloff · on Aug 6, 2020

I did. Contrary a point above, I vendored dependencies in Go in 2012 the old school way: copy them into a vendor directory. It worked fine for end-user binaries, and you didn't have to worry about painfully long builds like you can get doing that with C++.

The Go authors have explicitly stated they don't do things until they figure out the right way to do them. No language gets it all right out of the gate, but Go got pretty close with e.g. its standard library.

mpfundstein · on Aug 5, 2020

So he complains about stuff that is alread fixed? Shall we now go back to the first version of ffmpeg? Or php pre php5? or pre java 8? or C++ pre C++11?

Come on.

He should try golang and then come back instead of relying on blogposts to form his opinion

Varriount · on Aug 5, 2020

The argument being made is that Go's design isn't so much "forward thinking" as it is driven by Google's interests.

What they are not stating is that Go is a conservative language with a standard library hampered by that conservatism, nor are they making a judgment on the current state of the language.

If you are going to criticize someone's post, give them the respect of actually reading their argument, and not just writing a knee-jerk reaction.

YesThatTom2 · on Aug 5, 2020

Back in 1999/2000 Rob Pike responded to this comment.

You can read it https://slashdot.org/story/11794

He didn’t even need a time machine.

jfkebwjsbx · on Aug 6, 2020

That applies to many conservative languages like the usual classic suspects such as C or Fortran.

In fact, in comparison, Go is pretty unstable compared to those two once you take into account the total time they have existed.

It is definitely not unique to Go.

westicecoast32 · on Aug 6, 2020

It's sad when someone calls a remake of a failed language invented in 1968 as a language 'designed thinking ahead in the future' and other people agreeing with him

I'm sorry to say but I like PHP better than go and php is in the top 5 of languages people use that I hate (most to least is go, rust, js, java, php). You might say they're the most popular languages but so my top 3 languages. From least to most: D, Zig, C++, Python (3) and C#

dumbneurologist · on Aug 6, 2020

Go is a remake of what?

pjmlp · on Aug 6, 2020

http://cowlark.com/2009-11-15-go/

zemnmez · on Aug 5, 2020

I really like the way these proposals preserve file metadata

star-trek-fleet · on Aug 5, 2020

One thing I guess people dont realize is that a lot of Go design is closest resembles Google's internal C++ APIs.

There is no exception for this API either.

I kind of feel weird when there has been loud praise on Go team, while in general it was no mention of the decade of hard work of evolving C++ APIs...

jeffbee · on Aug 5, 2020

Do people really sing the praises of File? Or, do they curse its many and dangerous rough edges?

Like you I read this proposal through that lens, but I actually don't see a line from Google's File abstraction to this. Is this Go abstraction really forward-compatible with cloud-native filesystems? And when I say forward compatible I mean would it work with Google's decade-old Colossus? I think it isn't because it puts Stat in the required interface and Stat returns FileInfo, many of the fields of which might be meaningless in a cloud filesystem (like mode, which is a unixism, and IsDir, which doesn't make sense for non-hierarchical filesystems).

If you can't tell I'm not much of a fan of trying to abstract over filesystems. Mostly they don't resemble each other at all, except in some extremely high level concepts.

star-trek-fleet · on Aug 6, 2020

So first, what exactly I was meant to say, is that: Go design resembles closest to Google's internal C++ APIs.

Then, when I saw people praises Go APIs, I feel that some people worked on C++ APIs at Google, were missing some credits. That's a derived feeling from the above.

That is not to say that this File design was already being praised (not to say that it does not deserve praise); or that it should be criticized (not to say that it does not have problems).

As for your example to contradict my claim: "Stat returns FileInfo, many of the fields of which might be meaningless in a cloud filesystem": 1. This has to be there because a File API has to be compatible with OS files (Unixy). And Google's API also does the same thing. 2. Incompatibility derives from enforcing information that not present in another scenario. Clearly, the presence of such attributes does not prevent them to be applied on cloud files systems.

And further, Cloud file systems do have mode, and directory...

athrun · on Aug 6, 2020

> And further, Cloud file systems do have mode, and directory...

The most popular cloud file system, S3, doesn't have folders. It has prefixes, but if you treat them like folders, you're in for a lot of pain.

jeffbee · on Aug 6, 2020

Colossus doesn't have directories either.

http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Ke...

Page 9

star-trek-fleet · on Aug 6, 2020

What are the pains when treating prefix as directories?

I am not familiar with S3.

boomlinde · on Aug 7, 2020

IsDir can return false for filesystems that don't have that concept, which would of course be the correct value in that case. Mode and ModTime are more concerning, but practically, FileMode(0) and time.Time{} can be used respectively in implementations that have no use for them.

In terms of the ideal of preferring types as a means to constrain functionality, I think FileInfo fields should have been expressed as granularly as possible using extension interfaces in this case, but practically I'd prefer putting some placeholder values in FileInfo for missing functionality over the clunky way these extensions have to be implemented in in Go.

paedubucher · on Aug 5, 2020

Maybe Ken Thompson and Rob Pike worked on those C++ APIs, too. They certainly don't like C++, but also had to used it, and created Go out of frustration.