Crazy to me that garbage collection systems are a subject of continual evolution...

_a_a_a_ · on April 24, 2023

Not an expert but bear in mind that there can't really be one single thing called a 'good' garbage collector, because the behaviour of the programs that produce the garbage can be hugely varied, and the garbage so produced can likewise be hugely varied, and how you want to recover it can depend on the platform you're running on (if embedded, you probably want reference counting the tight memory, in other circumstances you might be very happy just to leak everything and let the OS clean up at the end. Edit: yes, it's sometimes ok to leak).

As somebody says elsewhere, it's multidimensional issue and a single dimensional garbage collector can't be a brilliant match for everything.

The book "Garbage Collection" by Richard Jones, recently discussed on HN, is something I'd highly recommend you buy and read.

stingraycharles · on April 24, 2023

It’s just as “solved” as regular memory allocators are for C/C++ — it’s constantly evolving, with many competing solutions. TBB, tcmalloc, jemalloc, mimalloc, etc etc etc.

Memory management is hard, and in a world where memory sizes keep increasing, new techniques being developed is expected and perfectly normal.

kaba0 · on April 24, 2023

Hell, not even sorting algorithms are “solved”, even though they are more than half a century old.

TylerE · on April 24, 2023

Tim sort, used in python, V8, Android, and Swift, was invented in this millennium.

re-thc · on April 24, 2023

And there are yet newer and faster sorting algorithms being implemented.

vlovich123 · on April 24, 2023

Notably quadsort which if I’m remembering my etymology correctly has pdqsort, fluxsort, and glidesort as descendents

killingtime74 · on April 24, 2023

I'm not sure what "solved" would mean for GC. Can't there always be improvements in maintainability, pause duration, correctness, throughput etc. Many dimensions to continually improve on.

Unless someone comes up with a no pause, no compute power no code fully correct GC?

mike_hearn · on April 24, 2023

There are some dimensions yes, but I do wonder if GC research is starting to approach the point of being "solved" well enough that we'll see a drop in investment. Not yet, clearly, but approaching that territory.

If you look at the very latest JVMs you have three GCs which a spread over throughput-optimized (parallel), a middle ground jack of all trades (g1), and latency-optimized (zgc). They're a lot more self configuring in the past, so no matter what needs your app has it's usually just one CLI switch to get something fairly optimal.

But ZGC is going generational, at which point it will have significantly higher throughput than before. Also, all the JVM GCs can handle massive heaps these days (like, terabyte scale). So at that point the choice might become even simpler: just use ZGC all the time unless you can tolerate higher pauses and want to reclaim some CPU for the app, in which case go with parallel. And in many cases the right tradeoff between latency and performance could be inferred automatically from the frameworks and libraries that are on the classpath, meaning that for most apps even needing to make that choice could go away.

If you have a GC that doesn't pause, can handle terabyte sized heaps, has great throughput, is largely or completely self-configuring, is portable, open source, supports dozens of languages and has clean code, there are really very few metrics left to optimize after that. It seems like HotSpot is nearly there, especially when combined with GraalVM for language support.

_a_a_a_ · on April 24, 2023

> If you have a GC that doesn't pause, can handle terabyte sized heaps, has great throughput, is largely or completely self-configuring, is portable, open source, supports dozens of languages and has clean code

Yeah. If.

mike_hearn · on April 24, 2023

That's basically ZGC Generational + GraalVM. So, code that's available now albeit you'd have to compile it from version control (graalvm stable releases don't support zgc yet, released zgc isn't generational yet). Give it another year or so and that'll be shipping out of the box in ordinary JVMs.

_a_a_a_ · on April 24, 2023

No pauses and high throughput? Do you have any links to how it does this? I'd really like to know.

pron · on April 24, 2023

That's the subject of the video being discussed. More technical info here and in the links: https://openjdk.org/jeps/439

_a_a_a_ · on April 24, 2023

Appreciated.

Someone · on April 24, 2023

The hardware still evolves, so at the least, garbage collectors will have to be tuned or self-tune themselves for that.

I also think it’s unlikely that a GC can be written that handles both terabyte sized heaps on machines with hundreds of cores and megabyte sized heaps on single-CPU systems well.

mike_hearn · on April 24, 2023

Single GC, no. Single VM, yes. HotSpot serial GC is designed for the latter use case. I forgot to mention it before but it's still there and supported, I think even selected automatically in some cases.

But there aren't so many single core systems with megabyte sized heaps anymore. Even watches are multi-core these days.

addaon · on April 24, 2023

Are there any production-ready GCs that have strictly bounded latency? I'm not aware of any, and that means that GC is still too immature for use in about 95% of the computers in my house, the deeply embedded ones.

mike_hearn · on April 24, 2023

Metronome did this for embedded hard realtime use cases. I don't know much about it. ZGC has bounded pause latency at least but it's not really meant for very small machines. Android has used a pauseless GC for a while.

kaba0 · on April 24, 2023

Strictly bounded anything (hard real-time) is an entirely different category, with very few uses for the average person - and contrary to what one might think from its name, it is not “fast” at all. It’s mostly about “calculating the trajectory of that incoming projectile should definitely happen in 0.3s”, which is many orders of magnitude more than what a modern computer would take to do 3 multiplications.

But to answer your question, there is JamaicaVM and many niche JVM implementations with hard real time GCs — but it’s simply false that a GC would be somehow a problem for most of your devices.

addaon · on April 24, 2023

Hard real-time has a huge amount of uses for the average person. I just made toast -- the software-controlled thermostat in my toaster oven has (slow) hard bounds on its sense-to-control latency to avoid dangerous overheating. Before that, I brushed my teeth -- the brushed motor controller detects a motor stall by increased current if I brush too hard and stall it, shuts down the motor to avoid overheating (with a hard real-time sense-to-actuate constraint), and beeps to inform me. Then I spent some time on HN, with a laptop with PMICs and other microcontrollers that have hard real-time constraints all over the place. By far the majority of my computer interactions so far today have been with computers that have hard real-time constraints, and I haven't even gotten in my car yet!

kanzenryu2 · on April 24, 2023

Java already has that GC, it's called Epsilon https://blogs.oracle.com/javamagazine/post/epsilon-the-jdks-...

Alifatisk · on April 24, 2023

Epsilon doesn't GC at all?

kaba0 · on April 24, 2023

I think it was a tongue in cheek reply, as it is indeed a no-code, no-pause, no-compute power GC that does eventually reclaim the garbage (at process exit).

Alifatisk · on April 24, 2023

Correct me if I am wrong but does it really GC at all if you receive an OutOfMemoryError?

kaba0 · on April 24, 2023

It’s meant for short lived apps where you simply wouldn’t get OOM either way. I don’t know about an objective definition for what a GC is, one I quite like is “gives the illusion of infinite memory”, based on that it isn’t a GC.

But there are also conservative GCs, RCs, etc, that fail different definitions also (leaving behind circular dependencies, having false positives, in reverse order), like in case of many things in CS, there isn’t an objective definition.

Alifatisk · on April 24, 2023

Thank you for the explanation, I think I get it now!

tsimionescu · on April 24, 2023

Theoretically someone could perhaps one day discover the theoretically optimal GC algorithm, and someone else could write the TeX of GCs using that algorithm, and then perhaps we'd have the final GC.

Extremely unlikely that such an algorithm exists of course, and even if it did, looking at sorting will quickly suggests that improvements will keep being found even if the best asymptotic complexity had been achieved.

ignoramous · on April 24, 2023

> Unless someone comes up with a no pause, no compute power no code fully correct GC?

Functional programming languages can be: https://www.reddit.com/r/haskell/comments/d5d13i / https://archive.is/4iyaG

matsemann · on April 24, 2023

Lots of things in CS are actually cyclic: RAM is expensive, implement swapping from HDD. RAM gets cheaper, load all data there and instead optimize your code for CPU. Data gets bigger, need to go back to swapping again. Suddenly we get SSDs and hard drive access is lots faster, invent something new but similar to the old stuff.

And so it goes, the bottleneck is always moved.

JoachimS · on April 24, 2023

And also in system design, computer architecture and engineering. T. H. Meyer and I. E. Sutherland coined this generational behaviour the "Wheel or Reincarnation" in their paper "On the Design of Display Processors." IMHO a paper that is still well worth reading:

http://cva.stanford.edu/classes/cs99s/papers/myer-sutherland...

dwaite · on April 24, 2023

Optimal memory behavior and lifetime is highly application specific, so the behavior of an automatic system likewise will be better for some applications than others.

In Java, the goal is to have something that performs reasonably well, without degenerative cases, for a wide variety of applications and services. It also has to handle cases where a monolithic code base results in a multitude of different memory behaviors. Even thinking too hard on the ideal behavior will negatively impact application code.

The research field of garbage collectors is a study of trade-offs.

A highly tuned, application-specific memory strategy will likely outperform Java, but the goal is to not require anyone to build such a thing for most software.

nayuki · on April 25, 2023

> You'd think it would be "solved" by now... Or is that just in Java world?

I'm not sure what you mean but I'll try to relate. Automatic garbage collection is a problem not just in the Java world. It's also needed in JavaScript, Python, Go, Haskell, etc.

Citation needed, but I suspect the Java Virtual Machine ecosystem has the most advanced GC algorithms in the world. Even if that's not true, I can tell you with certainty that other language environments have adopted features many years after Java did, such as JavaScript V8 introducing concurrent generational copying in year 2017 ( https://v8.dev/blog/orinoco-parallel-scavenger ). I'm sure there are examples in Go as well where their GC implemented features that Java had a decade earlier.

kaba0 · on April 25, 2023

The JVM is definitely the state of the art when it comes to GC algorithms.

sideeffffect · on April 24, 2023

GC is by no way a solved thing. Let alone in Java! Consider these developments:

* Distilling the Real Cost of Production Garbage Collectors

https://twitter.com/stevemblackburn/status/15196411576216576...

https://www.youtube.com/watch?v=OUZt0mo1xic

https://old.reddit.com/r/java/comments/udtdke/the_real_cost_...

https://old.reddit.com/r/ProgrammingLanguages/comments/udt2p...

https://news.ycombinator.com/item?id=31192261

* Low-latency, high-throughput garbage collection

https://twitter.com/stevemblackburn/status/15183861337467289...

https://www.youtube.com/watch?v=1TLmawuxHfY

https://old.reddit.com/r/ProgrammingLanguages/comments/ubp2d...

https://old.reddit.com/r/java/comments/ubgyva/lxr_a_new_java...

https://news.ycombinator.com/item?id=31153434

darthrupert · on April 24, 2023

Java is still slower than C in its memory management for many cases, so we can safely say that this is not a solved problem.

kryptiskt · on April 24, 2023

Memory management for C is not itself a solved problem, not only is there a lot of performance to squeeze out of malloc itself (the benchmarks on https://github.com/microsoft/mimalloc exemplifies the variance between the implementations), but it's up to the programmer to implement memory management in the large in an efficient way, which is not an easy task. One sure mark of a slow C program is one with a ton of mallocs and frees strewn all over.

kaba0 · on April 24, 2023

If we can know exactly where a particular object’s lifetime ends, then indeed we can write more efficient programs than possible with a GC, but that in the general case is not something we can determine at compile time (Rust’s lifetimes are a strict subset of the general case).

Anything more dynamic will require some “ad hoc, informally-specified, bug-ridden, slow implementation of a GC”.

pron · on April 24, 2023

Rust also relies on a GC for the more dynamic lifetimes in Rust programs (reference counting), it just doesn't call it a GC, and while Rust's GC has a lower footprint than tracing GCs, it has worse performance and it suffers from cycle problems. I wouldn't call it "bug ridden" (perhaps "limited" is a better term), but it is slow.

Some think that reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here. If you could determine the time where the collection would take place with a refcounting GC, i.e. when the last reference is cleared, you wouldn't need a refcounting GC. It is unpredictable pretty much by definition.

pkolaczk · on April 24, 2023

> Rust's GC (...) has worse performance [than tracing GC]

That's an unsubstantiated claim.

While reference counting may be slower than tracing in the general case, Rust's reference counting is not the same thing as general reference counting in languages like e.g. Swift or Python. There are two reasons:

- The good majority (99%+) of objects in Rust don't need to be managed by reference counting at all.

- Rust relies heavily on move semantics to avoid updating the ref-counts, so the good majority of refcounted objects have very few (sometimes just 1) refcount updates.

> reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here.

Where does ZGC claim nanosecond latency? But even if it did, it would be still worse, because refcounting doesn't do pauses at all. Pausing applies to pausing all application threads. Spending X nanoseconds freeing memory by one thread does not count as a pause, because the system can still make progress.

pron · on April 24, 2023

Even then updating the counter is still more work than tracing collectors do, which is nothing almost all the time.

> refcounting doesn't do pauses at all

At the cost of throughput.

BTW, I'm not comparing Java to Rust, just ZGC to Rust's GC. I am well aware that it is not used for most objects.

pkolaczk · on April 24, 2023

But reference counting is just a tiny subcomponent of Rust's GC. The majority of Rust's automatic memory management is based on affine types and static code analysis, not refcounting. You're just picking subcomponents to make Java solution look better, but what matters for developers is the whole picture.

And BTW, I wouldn't be so sure about throughput either. ZGC is capable of allocation rates of the order of only ~5 GB/s, which is way below the theoretical memory bandwidth. I bet jemalloc can do much better.

pron · on April 24, 2023

I thought I made it clear in my last comment that I'm not claiming that Java's memory management is somehow "better" than Rust's. I was merely pointing out that even Rust has a GC, and it's not a good one, but it might certainly be good enough for Rust's needs as it doesn't depend on it. Overall, obviously Rust and Java choose very different tradeoffs in their memory management strategy.

> ZGC is capable of allocation rates of the order of only ~5 GB/s

That was the non-generational ZGC. Indeed, ZGC's max throughput was well below that of ParallelGC and G1. One of the main goals of generational ZGC is to increase the throughput, and as you can see in the video, they got a 4x increase in throughput.

kaba0 · on April 24, 2023

The statement was that ZGC (or any JVM GC for that matter) is better than Rust’s GC, which is just naive ref counting. This is true - if we were to create a program that allocates the same amount in both implementations, Java would win by a huge margin.

pkolaczk · on April 24, 2023

Have you ever tested it by yourself? I tested and no, it didn't win by a huge margin. Malloc+free was only a tad slower than the default Java GC new (stop the world, parallel) but the default GC is much faster than G1 and ZGC so I would not be surprised if malloc could beat those modern ones at allocation rate benchmark.

kaba0 · on April 24, 2023

Do you have a public benchmark for that? It’s quite hard to write a benchmark that actually measures what we are interested in. Not doubting you, I would be interested only.

pkolaczk · on April 24, 2023

This has been on my list of next blog posts for so long. But unfortunately no, not yet.

kaba0 · on April 24, 2023

That last line was just a play on this ( https://en.m.wikipedia.org/wiki/Greenspun%27s_tenth_rule ), but you are right of course.

yw3410 · on April 24, 2023

It does give better performance predictability! Decrementing a refcount may either be a no-op or expensive but crucially the op can happen only during that particular call, nowhere else.

pron · on April 24, 2023

It does not. Maintaining the counter requires more work than tracing GCs do, and you don't know in advance when the last reference is dropped. True, you get more promptness in deallocation (not the same as predictability), which reduces the footprint but comes at the expense of compaction, the lack of which in most refcounting GCs also adds unpredictability. In contrast, in the common case, allocation with a compacting, tracing GC is a pointer bump and there's no overhead at all to copying references or other memory operations. During collection cycles, GC barriers are triggered, which add overheard to things like mutating pointers or reading pointers from objects in the heap.

There's a great paper (linked from this post https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/unifie...) showing how refcounting GCs can match the performance of tracing GCs -- in fact, how they can each have the properties of the other -- but that requires sophistication in the implementation of refcounting GCs that most of them don't have. So it is true that a sophisticated refcounting GC and a sophisticated tracing GC can exhibit similar behaviour, but while OpenJDK's tracing GCs are very sophisticated (especially ZGC and G1), most refcounting GCs in the wild are quite unsophisticated. It may certainly be true that unsophisticated refcounting GCs have more desirable properties than unsophisticated tracing GCs, but in practice the choice offered is between unsophisticated refcounting GCs and very sophisticated tracing GCs. The latter beat the former on most counts except for footprint overhead.

pkolaczk · on April 24, 2023

Ah, it's that topic again. The whole point "but refcounting requires more maintenance per object than GC" is totally moot when there are only 0.01% of objects being refcounted (in Rust or C++) vs 100% objects traced by GC in Java.

pron · on April 24, 2023

I never claimed otherwise. Java's and Rust's overall memory management strategies are far too different to compare. I was talking specifically about refcounting vs tracing.

yw3410 · on April 25, 2023

But what you call promptness is predictability from the PoV of the programmer, no?

I don't think I've mentioned performance anywhere; thanks for the paper I'll take a look.

pron · on April 25, 2023

> But what you call promptness is predictability from the PoV of the programmer, no?

I don't think so. You don't know when memory will be freed or how long that would take only that it will be freed as soon as possible. The end result is lower footprint but that's about it.

pkolaczk · on April 24, 2023

Rust's lifetimes is a strict subset of the general case, but it covers a good majority of use-cases with affine types - which means trivial allocation and deallocation. The number of times I had to use lifetime specifiers or dynamic lifetimes (RefCell/Rc/Arc) is extremely low, so low that it is not even a blip on the profile. Also, unlike most GC languages, Rust favors stack allocation over heap allocation, which makes most Rust program allocation rates several orders of magnitude lower than any Java program I saw (at the expense of more inter-stack copying, but that is extremely cheap because the top of the stack is almost always in L1 cache).

kaba0 · on April 24, 2023

In my experience most applications do need an escape hatch from a strictly tree-like lifetime pattern, it really is limiting.

Object layout is the primary reason for the performance differences between low-level languages and manages ones — Java uses a thread local allocation buffer, which is for all practical purposes just as hot as the stack. What is more expensive is arrays of references and such, but that will be solved by value types once, hopefully.

EDIT: What I’m trying to say is that there is very little overhead to creating short-lived objects in Java.

pkolaczk · on April 24, 2023

> EDIT: What I’m trying to say is that there is very little overhead to creating short-lived objects in Java.

I've been working on high performance Java software (databases) for many years now and the overhead of creating short lived objects is bearable only because we're going so far trying not to create those objects, writing Java in a kinda C-style - no objects, just primitives as much as you can.

The total overhead is not just bumping up the pointer and zeroing memory but actually all what happens when the GC runs out of the slab. By allocating short lived objects youre making GC collect earlier and overall doing more work. Also the faster you allocate, the more objects survive the collection and more objects must be copied. That process is very expensive and also very complex.

ackfoobar · on April 24, 2023

> You'd think it would be "solved" by now

This can be an interesting inqury. Sad that you have to ruin it with snark.

I guess in some sense it was "solved" more than a decade ago in Azul's C4 (Continuously Concurrent Compacting Collector). I wonder how a generational ZGC will compare against it.

nielsbot · on April 25, 2023

No snark intended actually

executesorder66 · on April 24, 2023

Java was released in 1995. I wonder if you have ever updated any software that came out before then? Surely you don't update your OS. Windows/Linux/Mac all came out before 1995, and must surely be "solved" by now. In fact you probably still use Windows 95. Surely Windows 98 or any later version can't be any better?

What's crazy to me is the sort of cognitive dissonance that you must have to hold a view like that, and still apply software updates at all.

Jeff_Brown · on April 24, 2023

The name is a little misleading because once you know it's garbage, "collection" is trivial. The trick is to figure out what is garbage cheaply.

datadeft · on April 24, 2023

Since garbage collection based memory models are mush more complex and require a lot of software engineering it is not a surprise that we have many iterations.

nielsbot · on April 25, 2023

People seem to think I was being snarky here, but I wasn't. It was honest surprise.

Ygg2 · on April 24, 2023

Due to Rice theorem any interesting property of computer program can be solved by solving the halting problem.

moonchild · on April 24, 2023

That's backwards. One proof of rice's theorem relies on the fact that, if you could decide any interesting property of programs in general, then you could solve the halting problem; it doesn't follow that, if you could solve the halting problem, then you could decide any interesting property.

It happens to be true that, if you could solve the halting problem, then you could decide any interesting property of programs; but this is simply because the antecedent is false.

Ygg2 · on April 24, 2023

> That's backwards.

See Wikipedia entry for Rice theorem subsection proof by reduction from the halting problem.

1. Have intersting property solver (e.g. is input squared)

2. Convert solver to halting problem solver

3. Solver is undecidable

So they are in some equatable/convertible.

moonchild · on April 24, 2023

Being able to turn an interesting property solver into a halting problem solver is not the same thing as being able to turn a halting property solver into an interesting problem solver.

Ygg2 · on April 24, 2023

My claim or at least intent was you can solve interesting property solver like you can solve a halting problem. I.e. not in a satisfying way (no false negatives or false positives with unlimited iterations).

You can solve it to some extent (e.g. does program terminate in X steps).

UncleMeat · on April 24, 2023

But this is vacuous and unrelated to the halting problem or Rice's Thm. A program that emits "Yes" on every input will decide some property of programs with zero false negatives.

Ygg2 · on April 24, 2023

Hence, the interesting property part. Program that returns "true" on everything is trivial.

UncleMeat · on April 24, 2023

"Interesting property", as defined by Rice's Thm, is "any semantic property that is not either always true or always false for all program executions." It does not mean "interesting" as in "interesting to humans."

And I'm talking about the decider, not the program being fed as input.

Ygg2 · on April 24, 2023

I'm not a computer scientist, but I am aware Interesting doesn't mean interesting to humans.

> And I'm talking about the decider

EDIT: If by decider you mean program tested for having property P, that's a particular case and not applicable in general, ergo non-important (a program that's a NOP can be shown to leak no memory as well).

So the property P you are testing always returns true? Then it's not interesting per your definition. Leaking memory however is interesting.

UncleMeat · on April 25, 2023

No that's not what I mean by decider. The decider is the program that takes the other program as input and tells you whether that input program has the semantic property you want.

The only noninteresting semantic properties of programs that exist are "this is a program" and "this is not a program."

Ygg2 · on April 25, 2023

Yes. And you said

> A program that emits "Yes" on every input will decide some property of programs with zero false negatives.

Followed by

> any semantic property that is not either always true or always false for all program executions

So what you said is your decider always outputs true. That's not an interesting property, because it's true for all inputs.

Btw what is the point of your argument? It still doesn't refute my initial statement. If you want to determine an interesting property of program like leaks, or automatic lifetime calculation or whether output is square of input, you run into Rice Theorem, which makes the problem undecidable. In other words it's impossible to get correct yes and no algorithmically.