Using ltrace to debug a memory leak

scottlamb · on June 15, 2016

Is there a convenient way to get a heap profile in Rust? I'd expect it shouldn't be too hard. I'm used to using pprof with tcmalloc, jemalloc is based on tcmalloc, and Rust uses jemalloc by default.

pprof heap profiles make this kind of thing easy to debug. It draws a call graph using graphviz. Boxes are scaled according the total byte size, so if there's a large leak you can pull up the graph and the giant box all but says "you idiot, the problem is right here".

hywel · on June 15, 2016

This is my favourite kind of post to read: a tool I don't know well, and someone using it enthusiastically to solve a real-world problem.

masklinn · on June 15, 2016

> At this point I was tired so I fell asleep. This isn't done, of course -- we still need to chase the Rust program to figure out why that allocation never gets freed. But it's a start!

The type of .boxed() would probably help. I would expect a .boxed to return a Box<_> (e.g. Vec::into_boxed_slice(v) returns a Box<[_]>) which would get RAII-deallocated. So either .boxed() returns a raw pointer for some reason, or it properly returns a Box but leaks internally.

wyldfire · on June 15, 2016

There's a good chance that if the memory isn't getting deallocated that it's not technically a leak but rather unbounded growth. "Oh, oops, the refcount will never go to zero for this item I allocated."

masklinn · on June 15, 2016

Also possible, but Rust's Rc/Arc are RAII'd, so that also shouldn't happen, you don't normally need to manually decref.

wyldfire · on June 15, 2016

Yeah, I could've been more clear I meant "Oh, oops, I stored a reference of every single transaction ever, so we'll never reap that memory (because it will never get its refcount decremented automatically)."

It's not a leak but this kind of thing is often detected as a positive slope in a process' RSS graph and it's common to refer to it as a "memory leak" because from this symptom it's indistinguishable from an actual leak.

viscanti · on June 15, 2016

That's still a memory leak. It's just caused by a logic error that a compiler couldn't ever catch.

wyldfire · on June 15, 2016

Well, you might think it's a distinction without a difference but in fact it's not a leak. A leak only occurs when the handle one might use to deallocate memory/resources is itself deallocated.

Whether "I added one FramistanObject to the FramistanObjectLog for every single HTTPConnection" is a logic error or not depends on what the intended design was.

okasaki · on June 15, 2016

Valgrind calls that an indirect leak (stuff that wasn't freed at exit but is referenced by stuff that also wasn't freed)

yoodenvranx · on June 15, 2016

I wish there would be a blog / website which collects those kind of debugging stories. It would be a very valuable ressource for programmers.

tokenrove · on June 15, 2016

https://github.com/danluu/debugging-stories is a good start

yoodenvranx · on June 15, 2016

Thanks for the link! I was kind of thinking about creating such a blog as a side project but now that I see that link I am a bit demotivated -.-

zellyn · on June 15, 2016

The author's website is a pretty good start :-)

Also check out https://rachelbythebay.com/

majewsky · on June 15, 2016

Does Valgrind work with Rust binaries? If yes, then this would make everything much easier. You just compile in debug mode and say `valgrind --leak-check=full`, and it will give you stacktraces to the allocations that were not freed.

masklinn · on June 15, 2016

> Does Valgrind work with Rust binaries?

Technically yes, but for allocations you need to run nightly and either alloc_system (disable jemalloc) or rebuild nightly with JEMALLOC_FLAGS='--enable-valgrind', otherwise valgrind misses jemalloc's allocations, which is more or less all of them: https://github.com/rust-lang/rust/issues/28224

And the second fix may or may not keep working as jemalloc 5 apparently removes valgrind support: https://github.com/jemalloc/jemalloc/issues/369

makomk · on June 15, 2016

Neat! Hadn't come across this tool before but it let me track down a PulseAudio crash that's been bugging me for months. Turns out that it was unloading a shared library whilst executing code from that shared library, which meant that the inevitable crash happened in code that was no longer loaded, and that was why I couldn't get a meaningful backtrace at the time of the crash. The call to dlclose() right before the crash was a huge hint; knowing that was the last library function called made it easy to set a breakpoint and catch it in the act.

Thanks for the tip!

doomrobo · on June 15, 2016

I'd be curious to see the full source code. A memory leak from an owned pointer should be impossible unless you're using `unsafe` somewhere in your code or in a library you're pulling in.

kibwen · on June 16, 2016

Technically it's possible to leak a Box without using `unsafe`, since the std::mem::forget function is not marked as an unsafe function (because it itself can't cause memory unsafety, and because there are (incredibly convoluted) ways to create an equivalent function entirely in safe code).

However, the only reason you'd have for using std::mem::forget would be because you're doing unsafe things elsewhere in the program (as discussed at http://doc.rust-lang.org/std/mem/fn.forget.html ), so you're correct that in practice there's almost always an `unsafe` block somewhere that's the culprit.