> A quick look at the code implies that various data structures are used under t...

hxegon · on Nov 4, 2022

The actual details of how the underlying datastructures work isn't that hard to grasp (see: https://youtu.be/wASCH_gPnDw?t=1817), but in practice you don't even need to go that far.

At a high level there's really just a few things you need to keep in mind, the biggest being the big O guarantees of the core DS interfaces, which are listed here https://clojure.org/reference/data_structures. Those guarantees hold across the different underlying implementations, and there are plenty of things you can tune to get a 2x, 10x, 50x improvement in performance before resorting to things that depend on which specific DS is being used under the hood

Edit: Fixed timestamp for video link

rippeltippel · on Nov 4, 2022

Isn't it similar to what happens in other languages?

In example, in C++ std::maps are implemented as RB trees, are::vectors instantiate 50% (sometimes 100%) more memory than requested, etc. All of that happens under the bonnet in order to fulfill the performance guarantees of the language.

IIRC, one of the most widely quoted memes in Clojure space goes along the lines of "simple != easy". Or better (not= :simple :easy)

pjmlp · on Nov 4, 2022

Traditionally most C++ implementations use RB trees, the standard doesn't require it to be so, all data structures requirements are described via big O notation.

zabzonk · on Nov 4, 2022

except that if you don't use a rb tree, you will have a hard time meeting the performance requirements

pjmlp · on Nov 4, 2022

Kind of, if tomorrow someone comes up with funky zombie trees with the same O() characteristics, or even better, they can ship them on their C++ standard library.

And then some clever C++ developers that have code that depends on rb tree's behaviours, would cry that their cheese was moved and their code doesn't work as before, even though nothing prevented funky zombie trees to be shipped instead.

lgessler · on Nov 4, 2022

I think the author's referring to the multiple implementations of persistent maps (i.e. APersistentMap). Clojure's generated docs[1] say there are four implementations (PersistentArrayMap, PersistentHashMap, PersistentStructMap, PersistentTreeMap), and this Stack Overflow answer[2] explains that the latter two are rarely used, and the former two make up the bulk of actual APersistentMap instances, with the language's core functions for persistent maps changing implementation used under the hood as it likes in order to do what it thinks is best for performance (basically, using PersistentArrayMap if the number of keys is small, and using PersistentHashMap otherwise). You can see this, for example, in PersistentArrayMap's implementation of assoc[3], which replaces itself with a PersistentHashMap if the number of keys has grown over the hardcoded threshold of 16.

So like you say, your experience as an "end user" makes it seem like it's just one data structure at least in interface, though in implementation it's actually doing some performance optimization stuff which, I think I'm willing to grudgingly grant to the author, does make performance harder to reason about in principle. Though in my (very unserious) experience, I think I've enjoyed languages that take a possibly incomplete stab at getting performance right over ones that draw a line leaving it entirely out of scope and declare victory.

I'll equivocate a little and share that I have been bitten by these internals before, though the fault was mine: one time, I was unwittingly relying on an implementation detail of PersistentArrayMaps, which is that they preserve insertion order. PersistentHashMaps don't, and my code was mysteriously breaking when number of keys grew beyond 16 until several hours later I learned much of what I've just shared above.

[1]: https://www.javadoc.io/static/org.clojure/clojure/1.10.1/clo...

[2]: https://stackoverflow.com/questions/16516176/what-is-the-dif...

[3]: https://github.com/clojure/clojure/blob/5ffe3833508495ca7c63...

chii · on Nov 4, 2022

> does make performance harder to reason about in principle.

the thing is, you don't reason about performance. you measure it.

Once you've obtained measurements, you can make decisions based on the outcome of said measurements.

If the Runtime (and not just talking about clojure here) makes a decision on performance, and it's suboptimal, your measurements _should_ reveal it - for example, by increasing your input size, the performance graph should show a clear inflexion of where this happens.

The real question is whether you _can_ force the Runtime to do something at the programmer's behest, rather than a hidden decision that you cannot control or change. I believe in clojure, you can change the backing data structure, but i'm not sure if you could do it after the fact?

germandiago · on Nov 4, 2022

> the thing is, you don't reason about performance. you measure it.

This is only half-true. You do reason about it and later profile. For example, I want to know if my assinment operator in C++ will run in constant time.

rawoke083600 · on Nov 4, 2022

>the thing is, you don't reason about performance. you measure it.

You that "fun person" at parties right ? :)

CrazyPyroLinux · on Nov 7, 2022

The scientific approach would be to invite him to a party and measure it. :)

eru · on Nov 4, 2022

CPython also switches out what internal implementation it uses to represent your dicts based on a few heuristics.

pluijzer · on Nov 4, 2022

Lisp programmers know the value of everything and the cost of nothing.