Thanks for responding. My experience with tracing GC at scale is exclusively in ...

pkolaczk · on April 1, 2024

Same experience with doing database system programming in Java. The number of performance issues caused by GC and the amount of code complication introduced in order to avoid them is mind-blowing. We’re managing memory manually in critical parts. But Java ergonomics around manual memory management (cake off-heap) is just terrible. There are moments I think I’d be more productive in pure C because of that. Not C++, not Rust, but pure C.

neonsunset · on March 30, 2024

When you hit memory limits, .NETs GC implementation would perform much more frequent, invasive and aggressive collections, including LOH compaction to reduce memory watermark which leads to greater GC pauses, though this is rarely seen in such a bad way on modern versions with e.g. SRV GC.

The most scaling way to address this is usually to just allocate less and use valuetasks with pooling where applicable (frequent asynchronous yields), I'm certain if you built a .NET 8 based solution you would see user-written code dominate heap allocations profile, as most hot internal paths of async utilize said state machine box pooling+ValueTask<T>[0] and may be entirely allocation-free.

[0] Example: https://github.com/dotnet/runtime/blob/cc7bf831f02cad241547e...

lll-o-lll · on March 30, 2024

> When you hit memory limits, .NETs GC implementation would perform much more frequent, invasive and aggressive collections, including LOH compaction to reduce memory watermark which leads to greater GC pauses, though this is rarely seen in such a bad way on modern versions with e.g. SRV GC.

The trouble with server GC mode is that then there is no natural back pressure. If the processing is not CPU bound, then memory allocation can grow unbounded. This is not something that happens with RC as, again, the GC performance hit is inlined with task processing. The service may not be capable of as much throughput, but it doesn’t take out the entire server either.

> The most scaling way to address this is usually to just allocate less and use valuetasks with pooling where applicable (frequent asynchronous yields), I'm certain if you built a .NET 8 based solution you would see user-written code dominate heap allocations profile, as most hot internal paths of async utilize said state machine box pooling+ValueTask<T>[0] and may be entirely allocation-free.

Absolutely; I think it’s relatively simple to write servers that scale using modern .net; the memory allocation foot-guns when dealing with asynchronous code are now well understood, and tooling is good. I am compressing ~15 years of experiences in that previous post.

It’s probably the case that a tracing GC is the better choice for most modern applications, excepting memory constrained devices (like phones), and so long as you design with memory in mind.

neonsunset · on March 30, 2024

Ah, I see where you are coming from.

You are correct, sustained load heap size of SRV GC has been a known pain point that had been particularly exacerbated after beefy Windows Server hosts fell out of fashion and got replaced by 512Mi Linux containers.

There has been work conducted on this each release throughout Core 2.1, 3.1, and then 5, 6, 7 and 8 versions to make it play nicer with more constrained memory limit systems.

The two major features that address this are Regions[0] (.NET 6/7) and DATAS[1] (.NET 8). The former is enabled by default everywhere except macOS and the latter is available opt-in either via env var DOTNET_GCDynamicAdaptationMode=1 or msbuild poperty GarbageCollectionAdaptationMode: 1 (see more in [1]).

The latter has shown to significantly reduce sustained (or, especially, idling) heap size for some particularly problematic workloads (but not all, sometimes you just have a lot of live objects). I definitely recommend giving it a try if this is something still relevant to you.

TLDR of what DATAS does is dynamic heap count scaling and much smarter heap up/downsizing depending on allocation rate/frequency and anticipated throughput impact of adjusting those.

[0] https://itnext.io/how-segments-and-regions-differ-in-decommi... / https://devblogs.microsoft.com/dotnet/put-a-dpad-on-that-gc/

[1] https://maoni0.medium.com/dynamically-adapting-to-applicatio...