> A quick look at the code implies that various data structures are used under the hood for what looks like one data structure in the language. That's a lot of complexity! I'm not sure that's a tradeoff I'd be happy to make. It makes it harder to reason about performance.
I don't think that's true. A persistent array map is a single data structure. It's just a persistent data structure optimized for non-destructive (functional) updates. On a high-level it's not too different from finger trees which may be more famous. When I used to write Clojure, I didn't feel any need to reason about its internals for performance reasons or any other reasons. It simply feels natural.
The actual details of how the underlying datastructures work isn't that hard to grasp (see: https://youtu.be/wASCH_gPnDw?t=1817), but in practice you don't even need to go that far.
At a high level there's really just a few things you need to keep in mind, the biggest being the big O guarantees of the core DS interfaces, which are listed here https://clojure.org/reference/data_structures. Those guarantees hold across the different underlying implementations, and there are plenty of things you can tune to get a 2x, 10x, 50x improvement in performance before resorting to things that depend on which specific DS is being used under the hood
Isn't it similar to what happens in other languages?
In example, in C++ std::maps are implemented as RB trees, are::vectors instantiate 50% (sometimes 100%) more memory than requested, etc. All of that happens under the bonnet in order to fulfill the performance guarantees of the language.
IIRC, one of the most widely quoted memes in Clojure space goes along the lines of "simple != easy". Or better (not= :simple :easy)
Traditionally most C++ implementations use RB trees, the standard doesn't require it to be so, all data structures requirements are described via big O notation.
Kind of, if tomorrow someone comes up with funky zombie trees with the same O() characteristics, or even better, they can ship them on their C++ standard library.
And then some clever C++ developers that have code that depends on rb tree's behaviours, would cry that their cheese was moved and their code doesn't work as before, even though nothing prevented funky zombie trees to be shipped instead.
I think the author's referring to the multiple implementations of persistent maps (i.e. APersistentMap). Clojure's generated docs[1] say there are four implementations (PersistentArrayMap, PersistentHashMap, PersistentStructMap, PersistentTreeMap), and this Stack Overflow answer[2] explains that the latter two are rarely used, and the former two make up the bulk of actual APersistentMap instances, with the language's core functions for persistent maps changing implementation used under the hood as it likes in order to do what it thinks is best for performance (basically, using PersistentArrayMap if the number of keys is small, and using PersistentHashMap otherwise). You can see this, for example, in PersistentArrayMap's implementation of assoc[3], which replaces itself with a PersistentHashMap if the number of keys has grown over the hardcoded threshold of 16.
So like you say, your experience as an "end user" makes it seem like it's just one data structure at least in interface, though in implementation it's actually doing some performance optimization stuff which, I think I'm willing to grudgingly grant to the author, does make performance harder to reason about in principle. Though in my (very unserious) experience, I think I've enjoyed languages that take a possibly incomplete stab at getting performance right over ones that draw a line leaving it entirely out of scope and declare victory.
I'll equivocate a little and share that I have been bitten by these internals before, though the fault was mine: one time, I was unwittingly relying on an implementation detail of PersistentArrayMaps, which is that they preserve insertion order. PersistentHashMaps don't, and my code was mysteriously breaking when number of keys grew beyond 16 until several hours later I learned much of what I've just shared above.
> does make performance harder to reason about in principle.
the thing is, you don't reason about performance. you measure it.
Once you've obtained measurements, you can make decisions based on the outcome of said measurements.
If the Runtime (and not just talking about clojure here) makes a decision on performance, and it's suboptimal, your measurements _should_ reveal it - for example, by increasing your input size, the performance graph should show a clear inflexion of where this happens.
The real question is whether you _can_ force the Runtime to do something at the programmer's behest, rather than a hidden decision that you cannot control or change. I believe in clojure, you can change the backing data structure, but i'm not sure if you could do it after the fact?
> the thing is, you don't reason about performance. you measure it.
This is only half-true. You do reason about it and later profile. For example, I want to know if my assinment operator in C++ will run in constant time.
I don't think that's true. A persistent array map is a single data structure. It's just a persistent data structure optimized for non-destructive (functional) updates. On a high-level it's not too different from finger trees which may be more famous. When I used to write Clojure, I didn't feel any need to reason about its internals for performance reasons or any other reasons. It simply feels natural.