Thanks for responding. My experience with tracing GC at scale is exclusively in the realm of .Net, and RC exclusively with C++ smart pointers. That matches your “sophisticated vs crude” contrast.
The experience with .Net is that GC impact was difficult to profile and correct, and also “lumpy”, although that may have been before GC tuning. GC would dominate performance profiling in heavy async code, but these days can be corrected by value tasks and other zero alloc methods.
For C++ style ref counting, the impact was a continuous % load and simple to profile (and therefore improve). Although here, the ref counting needed to be stripped from the hot paths.
The biggest issue I’ve hit between the two modes though, is how they behave when hitting memory limits. Tracing GC appears to have an order of magnitude perf hit when memory becomes scarce, while ref counting does not suffer in this way. This is enough for me to personally dislike tracing GC, as that failure state is particularly problematic.
Same experience with doing database system programming in Java. The number of performance issues caused by GC and the amount of code complication introduced in order to avoid them is mind-blowing. We’re managing memory manually in critical parts. But Java ergonomics around manual memory management (cake off-heap) is just terrible. There are moments I think I’d be more productive in pure C because of that. Not C++, not Rust, but pure C.
When you hit memory limits, .NETs GC implementation would perform much more frequent, invasive and aggressive collections, including LOH compaction to reduce memory watermark which leads to greater GC pauses, though this is rarely seen in such a bad way on modern versions with e.g. SRV GC.
The most scaling way to address this is usually to just allocate less and use valuetasks with pooling where applicable (frequent asynchronous yields), I'm certain if you built a .NET 8 based solution you would see user-written code dominate heap allocations profile, as most hot internal paths of async utilize said state machine box pooling+ValueTask<T>[0] and may be entirely allocation-free.
> When you hit memory limits, .NETs GC implementation would perform much more frequent, invasive and aggressive collections, including LOH compaction to reduce memory watermark which leads to greater GC pauses, though this is rarely seen in such a bad way on modern versions with e.g. SRV GC.
The trouble with server GC mode is that then there is no natural back pressure. If the processing is not CPU bound, then memory allocation can grow unbounded. This is not something that happens with RC as, again, the GC performance hit is inlined with task processing. The service may not be capable of as much throughput, but it doesn’t take out the entire server either.
> The most scaling way to address this is usually to just allocate less and use valuetasks with pooling where applicable (frequent asynchronous yields), I'm certain if you built a .NET 8 based solution you would see user-written code dominate heap allocations profile, as most hot internal paths of async utilize said state machine box pooling+ValueTask<T>[0] and may be entirely allocation-free.
Absolutely; I think it’s relatively simple to write servers that scale using modern .net; the memory allocation foot-guns when dealing with asynchronous code are now well understood, and tooling is good. I am compressing ~15 years of experiences in that previous post.
It’s probably the case that a tracing GC is the better choice for most modern applications, excepting memory constrained devices (like phones), and so long as you design with memory in mind.
You are correct, sustained load heap size of SRV GC has been a known pain point that had been particularly exacerbated after beefy Windows Server hosts fell out of fashion and got replaced by 512Mi Linux containers.
There has been work conducted on this each release throughout Core 2.1, 3.1, and then 5, 6, 7 and 8 versions to make it play nicer with more constrained memory limit systems.
The two major features that address this are Regions[0] (.NET 6/7) and DATAS[1] (.NET 8). The former is enabled by default everywhere except macOS and the latter is available opt-in either via env var DOTNET_GCDynamicAdaptationMode=1 or msbuild poperty GarbageCollectionAdaptationMode: 1 (see more in [1]).
The latter has shown to significantly reduce sustained (or, especially, idling) heap size for some particularly problematic workloads (but not all, sometimes you just have a lot of live objects). I definitely recommend giving it a try if this is something still relevant to you.
TLDR of what DATAS does is dynamic heap count scaling and much smarter heap up/downsizing depending on allocation rate/frequency and anticipated throughput impact of adjusting those.
The experience with .Net is that GC impact was difficult to profile and correct, and also “lumpy”, although that may have been before GC tuning. GC would dominate performance profiling in heavy async code, but these days can be corrected by value tasks and other zero alloc methods.
For C++ style ref counting, the impact was a continuous % load and simple to profile (and therefore improve). Although here, the ref counting needed to be stripped from the hot paths.
The biggest issue I’ve hit between the two modes though, is how they behave when hitting memory limits. Tracing GC appears to have an order of magnitude perf hit when memory becomes scarce, while ref counting does not suffer in this way. This is enough for me to personally dislike tracing GC, as that failure state is particularly problematic.