Sorry state of dynamic libraries on Linux

dkarl · on Jan 16, 2012

All this for the possibility of interposition? Yes, it seems so. The impact is there for this little-known and little-used feature. Instead of optimising for the common-case scenario where the symbols are not overridden, the ABI optimises for the corner case.

Using LD_PRELOAD isn't so rare and strange that you can propose throwing it out without quantifying its performance cost. I can't recall offhand why I needed it (I would guess Valgrind or Massif) but I've used it several times as a developer. What exactly is the payoff for giving it up? It can't be bigger than the current performance difference between statically and dynamically linked executables, can it?

gioele · on Jan 16, 2012

> All this for the possibility of interposition? Yes, it seems so. The impact is there for this little-known and little-used feature. Instead of optimising for the common-case scenario where the symbols are not overridden, the ABI optimises for the corner case.

It also looks like PIE+PIC is required if you want a secure system with ASLR: <http://blog.flameeyes.eu/2009/11/02/the-pie-is-not-exactly-a....

Flameeyes (Gentoo dev) has sent patches to all the main library developer that make sure only the necessary number of symbols is exposed and as much data as possible is marked as read-only. I think this effort is more valuable than proposing a very unlikely ABI change.

klodolph · on Jan 17, 2012

Of course, a bunch of this work is sabotaged by stupid default `LDFLAGS` you get if you use `pkg-config` and some package (like `gmodule`) throws random stuff in there like `-Wl,--export-dynamic` which is just totally unnecessary for at least 99% of executables which only ever used `gmodule` indirectly in the first place...

ars · on Jan 17, 2012

LD_PRELOAD is used by fakeroot, artsdsp, esddsp, alsadsp to name some of the most popular programs that use it.

forgotusername · on Jan 17, 2012

Only for overriding parts of libc, which already cannot be 'unexported'. The article's main beef is with internal program symbols also being made indirect (AIUI a patch adding a new visibility option was landed in gcc years ago that more or less automatically fixed this up, but I may be wrong)

Edit: rushing out the door here, last sentence was referring to this: http://gcc.gnu.org/wiki/Visibility

brigade · on Jan 16, 2012

I started writing a comment before the site went down (hah, I knew it would be WordPress) and now I'll probably forget to post it there so I'll just post it here.

Is the address of externalFunction that is stored in the GOT really resolved to the address of the stub and not the final real address of the function? Because it has to be for data (or code simply wouldn't work), and I don't see why function symbols would be any different.

Also C and C++ say that a function has the same address in all translation units, so comparing the address ought to match regardless of PIC or shared libraries or symbol overriding, and if they don't it's probably a bug in the compiler/linker (or you're using a nonstandard option that breaks this guarantee, e.g. symbol hiding.) And actually this should require that the address in the GOT is the final address of the function.

By the way, any CPU that has static destination branch prediction will predict as well for a double indirect call as a single indirect call (assuming of course the addresses don't change.) The cost is taking up two entries in the branch prediction tables, another L1I cacheline for the stub, and a hiccup in instruction decoding which may or may not have a real effect depending on the code before and after.

> If there’s a reason for getting the address indirectly like this, I have yet to find it.

It should be because of PIC, and the fact that PC-relative addressing on x86_64 has only ±2GB displacement, so if your final binary is over 2GB the linker could fail to put the symbol within range of the offset and fail. Whereas for calls the linker can just insert a stub if this happens and noone's the wiser. Disabling PIC results in "movq $externalFunction, externalVariable(%rip)" for me.

But -mcmodel=small is the default, which should contradict this explanation...

EDIT: so I just tried a test and it appears the GOT on Linux really does contain the address of the stub. what the fuck

On OS X it contains the real address.

marshray · on Jan 16, 2012

C and C++ say that a function has the same address in all translation units

Until perhaps very recently, the ISO C and C++ standards didn't actually support dynamic linking. (They didn't officially support multithread concurrency either but flexibility is one of those languages' strengths).

if your final binary is over 2GB the linker could fail to put the symbol within range of the offset and fail

I have had to code around this limitation too but it didn't turn out to be that hard in practice.

How about we optimize for the case where the final binary is 2GB or smaller? :-)

brigade · on Jan 16, 2012

> Until perhaps very recently, the ISO C and C++ standards didn't actually support dynamic linking. (They didn't officially support multithread concurrency either but flexibility is one of those languages' strengths).

How so? They certainly didn't mention it but they shouldn't have to - describing the final linked behaviour is enough and means that whether it was statically or dynamically linked doesn't matter if it produces the same run-time behaviour. Which resulted in a huge mess in the linker for C++.

> How about we optimize for the case where the final binary is 2GB or smaller? :-)

I agree, but compilers should be standards-compliant by default and any such optimizations should be under non-default flags (e.g. -fvisibility-inlines-hidden). But -mcmodel=small is already the default...

marshray · on Jan 16, 2012

I'd seen references to possible standards issues with dynamic linking, but hadn't really thought about it until you pointed it out: taking the address of a function or data object with linkage is no longer naturally returns the same address in different translation units.

Believe it or not, in C++ a pointer to an object or function is valid as a non-type template parameter. Heck, I bet you can even partially specialize on it.

Rexxar · on Jan 16, 2012

I'm very curious to know how you manage to need 2GB final binary ?

- Huge use of template metaprogramming ?

- Generated code ?

- May be it's only for the debug build with all symbols ?

marshray · on Jan 17, 2012

One can include arbitrary amounts of data in binaries. I don't see this for ELF systems, but consider all the Windows installers that package entire applications into a single .exe.

klodolph · on Jan 17, 2012

Yes, when you call a function you often jump to the stub. When you take the address of the function, it does something more complicated for PIC executables. Try examining the assembly of the following code:

    void func1(void);
    void *func2(void) { return func1; }

You'll notice that unlike function calls, the code here differs with PIC enabled. I think your assumption is that function calls and function addresses in C use the same address, which is not true.

brigade · on Jan 17, 2012

My assumption (in your example) was actually that with PIC, func2 would return the address of the real func1 and a hypothetical 'func1();' would call a stub that calls the real func1. On OS X, this is true. On Linux, func2 returns the address of the stub, and I see no reason why (DYLD_INSERT_LIBRARIES works fine on OS X.)

This is actually the specific thing that prompted the author to investigate and write up this whole post. It's so what the fuck to me that I didn't believe him at all until I tried it myself.

EDIT: actually, I finally realized one potential benefit to this: it lets you completely lazily resolve the functions address. But again, variables can't benefit from this so need to be resolved immediately. I really wonder whether the load-time gain is actually worth the runtime hit for this case...

AceJohnny2 · on Jan 16, 2012

Website down, looks like reddit pumelled his server before HN did.

Google cache: http://webcache.googleusercontent.com/search?q=cache:http://...

G+ discussion with the author: https://plus.google.com/108138837678270193032/posts/No8T7VLo...

psykotic · on Jan 16, 2012

While we're complaining about the sorry state of Linux, something it's desperately lacking (for those of us who ship binary-only libraries and executables) is an equivalent of Windows's PDBs or OS X's dSYMs for post-mortem debugging without bloating the shipping binaries.

ice799 · on Jan 16, 2012

hi. linux does have this.

you just strip the debug symbols out (and put them somewhere safe). then write a .gnu_debuglink section to the stripped ELF binary with a CRC that matches the stripped symbols.

once something bad happens: you just take the core dump, the symbols you have tucked away, and you are able to debug just fine.

psykotic · on Jan 16, 2012

Thanks to you and others for jumping in and pointing this out! There's nothing better than being corrected when it means learning something new and solving a long-standing problem.

marshray · on Jan 16, 2012

This is great! Seems like this may have just become available in the last few years?

AceJohnny2 · on Jan 16, 2012

While not as straightforward as on Windows (which isn't developer oriented), it's possible on Linux and standard on most distributions (just look at all the -dbg packages of debian-based distributions). A little googling found these instructions involving objcopy to separate debug symbols: http://www.technovelty.org/code/debug-info.html

egonk · on Jan 16, 2012

See http://linux.die.net/man/1/objcopy -> --only-keep-debug

Then you can open the core file as usual: gdb executable core (just be sure to have debug files in the same directory)

longlivedeath · on Jan 16, 2012

Have you looked at Google Breakpad? They have some support for this:

http://code.google.com/p/google-breakpad/wiki/LinuxStarterGu...

marshray · on Jan 16, 2012

Yes please. Even for those of us who just deploy binaries we would really like to have a way to put them back together with the symbols and source.

99.9% of the time we don't need to, but in that 0.1% it makes a big difference.

waitwhat · on Jan 16, 2012

Is this the symbols-embedded-in-the-binary thing? Disc space is cheap, and they don't use extra RAM.

psykotic · on Jan 16, 2012

The bloating can be extraordinary. When you're shipping libraries to discerning customers (programmers, not end users), perception really does matter.

A related issue which is more about the compiler than the operating system is that PDBs with Visual C++ on Windows are much more useful with highly optimized binaries than symbols (either embedded or separate) are with GCC-optimized binaries. This is understandable when you consider the different development cultures, but that doesn't make it any less of a problem for us. :)

wavetossed · on Jan 17, 2012

For an interesting use of RPATH headers to make portable Linux binaries (and shared libraries) have a look at this script that I used to build Python 2.7.2 and a whole pile of 3rd party libraries https://github.com/wavetossed/pybuild

I think that things like RPATH and LD_PRELOAD are exactly what shared libraries should be doing. The reason for shared libraries in the modern age, is increased flexibility.

bch · on Jan 17, 2012

I think it was here (on HN) that I read a comment that suggested shared libs in Unix originated at Sun Microsystems, originating due to some politics in their work w/ the X Window System (iirc). I've searched for that story since, but not found it -- I'm guessing there may be somebody reading this story who may know what I'm talking about. Retell the story?