Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Signalfd Is Useless (2015) (ldpreload.com)
69 points by thunderbong on May 25, 2024 | hide | past | favorite | 24 comments


The author is unhappy with how signals work, and proposes some really wonky ways to get the behavior he thinks is correct. But for a programmer, it's a lot easier to just do things the way designers of signals intended.

For example, multiple SIGCHLD coalesce, so you are supposed to call waitpid in a loop:

  int status;
  while((pid_t child = waitpid(-1, &status, WHOHANG)) > 0) { // handle zombie
This API is mature and obvious problems like the author points out have already been solved before. The author seems to reject the established practice on purely aesthetic grounds, which is a... choice. But not one you should make, if you want your life to be easy.


waitpid(-1) in a program composed of libraries means that loop is stealing PIDs from other things that manage children within the same process and try to wait on them by PID, which means they'll lose access to the exit status and can also lead to hangs due to pid recycling and then waiting on the wrong pid.

It can work in a self-contained program, but not in anything complicated.

For children specifically pidfds are more reliable. But that doesn't help with other signals.


Yes waitpid(-1) doesn't compose, but lots of things in the C world don't compose

- using library A which uses say libuv, and one that uses libevent (two different event loops)

- forking while holding locks

- two libraries using two different thread pools -- concurrency policy is a global concern. (and this one isn't specific to C)

FWIW the solution I use in a shell is simple - create a Waiter object that wraps waitpid(), and only code that has a reference to the waiter can do anything with processes.

It's basically like making an event loop object and passing it around, rather than making it global.

This is also an argument for libraries not doing I/O -- they should be pure. They should be parameterized by I/O, including starting threads, etc.

I haven't ever wanted to use a library that starts processes behind my back


> using library A which uses say libuv, and one that uses libevent (two different event loops)

It's not ideal, but I don't see a reason this wouldn't work if you ran the two event loops on separate threads

> forking while holding locks

Yes. You can't use fork in a multi-threaded program, unless it is followed by an exec. Which is one reason forking is somewhat rare in modern code.

> two libraries using two different thread pools -- concurrency policy is a global concern.

That doesn't break the semantics of the program.

> create a Waiter object that wraps waitpid(), and only code that has a reference to the waiter can do anything with processes.

This doesn't help if you use a library that calls waitpid directly.

This is actually probably more of a problem in non-c code, where the standard library likely to have an abstraction that calls waitpid on child processes. So handling SIGCHLD can interfere with other code waiting for a child to finish.


Everything about signals makes more sense if you understand that they're modeled on hardware interrupts. They're the same mechanism, but for userspace code instead of kernel code. Preempting and taking over the stack, coalescing (without which a flood generated faster than they could be consumed could consume unbounded memory) - all the same upsides and downsides. IOW, they're not message passing.

(Which makes sense, considering they date from the 70s (research UNIX era) - that's not much later than the first modern code for handling hardware interrupts, which date roughly circa the Apollo program.)

And similar workarounds for those limitations: threaded interrupts (although in practice, we just punt to workqueue context in the kernel) vs. sigaltstack or signalfd with a dedicated thread.

Not as much code has to deal with them directly anymore; they're really only used in the interfaces that date way back, but the concept of edge (and vs. level) triggered interrupts is still worth understanding.


What I don’t understand is the kernel has trauma from its rough childhood but it chooses to perpetuate the cycle of abuse by imposing IRQs onto userspace when it could, like, not? The model for signals makes perfect sense but the choice of doing things these days remains as mystifying as ever.


You have to understand - rolling out new APIs tends to be _quite_ the hassle, you need to get it signed off by a lot of people, and rightly so - rolling out new APIs that involve _design_ work is doubly a hassle, and changing the semantics of signals would be quite the change

and since signals are only for an ancient part of the API that really isn't used for anything new (new code yes, but people aren't doing anything new conceptually with that stuff) - people are going to take a "why bother" approach. Lot of work for debatable gain.

I'm working on new syscalls for exposing and using filehandles (because we can't rely on st_ino for inode number uniqueness anymore, 64 bits isn't sufficient for uniqueness with subvolumes, or for stacking filesystems), and some related work for exposing and traversing subvolumes (and mountpoints) in a clean standardized way - and they're probably going to be 6 month projects (granted, not my main project right now). It's just a _lot_ of work to do this stuff right when it involves fundamental APIs.


I use signalfd when I can, and this argument doesn't make much sense to me. The main goal of most signals you'd use it with -- like SIGCHLD/SIGPIPE -- is to wake you up and tell you that something happened, not to give you all the information. If you get a SIGCHLD, you can call wait() to get information. If you get SIGPIPE you can poll your file descriptors (though you might be doing that anyway so the signal isn't very useful). For the purpose of waking you up, coalescing isn't a problem -- only missed notifications are a problem, and signalfd handles those correctly.

Of course, for "real" signals that you must handle synchronously, like SIGSEGV, signalfd is less useful and makes less sense (and arguably something closer to Windows's SEH might make more sense). It's an odd historical artifact of Unix that SIGCHLD and SIGSEGV use the same mechanism.


The point of the article is that signald could and probably should be more useful than it is.

Even if you don't worry about coalescing, it would be a lot more useful if you didn't have to separately set the signal mask, and reset it for child processes.

> If you get a SIGCHLD, you can call wait() to get information

But you don't know how many signals actually happened, so what you actually need to do is call waitpid in non-blocking mode in a loop until you don't get an actual pid back. Or poll pidfd file descriptors.


While my C is familiar yet rusty enough to raise an eyebrow at this post and not elaborate better than the other comments in this thread - stuff like this I feel is a sign of a mentality I often see in software that goes roughly like “this piece of software doesn’t do what I think it should do therefore it is bad.”

I used to think like this but find it more valuable to think more in the mindset “this may have been useful to somebody or else it wouldn’t have been written this way” (which really falls apart with enterprise software sometimes I am aware) which allows me to think more critically about what I am implementing.


In the current world it is also more valuable to read a title like "Signalfd Is Useless" as "A usecase where Signalfd could be made more useful"


Most of this article seems like mere unfamiliarity with how interrupts are generally processed. The point is you're just getting a flag indicating that you need to poll. You do the least possible in the interrupt handler so the rest of your program is disrupted as little as possible. The advantages of this aren't so super necessary when you're already event driven using select() and its descendants. But it's also not terribly hard to work into that paradigm, awkwardness of the loopback fd pattern in directly-written-C not withstanding.

System calls returning early with EINTR is the exact example from Worse is Better, so that's been been debated to death.

And FWIW it seems like SA_SIGINFO itself is more of the problem, especially if (when?) there isn't a good way to poll that information elsewhere.


Pretentious title with little substance to justify the claim. The inherited signal disposition mask with the blocked signals can be restored in a forked child process, before an exec. Because it makes a shell-spawning function like system() even less desirable, doesn’t render it “useless”.


The whole signal mask / handler thing (and inherited FDs to a certain extent) certainly make a function to "just launch another program" harder to write.


It’s an essential part of the Unix process model. The whole design feels dated and centered around shells and terminals.

A more productive endeavour would be to propose a new design without all these warts. Not cherry-picking and whining about random aspects, without seeing the big picture.


I fail to see how creating a new thread which waits forever and ensuring it is the only one which receives signals (by blocking them in all other threads, which requires unblocking them when starting child processes just like with signals) is any better than signalfd. Also now your program is multithreaded. Good luck.


The author is/was working on MIO, which is a “cross-platform event-handling library for Rust”. And in Rust, all programs are basically always considered to be multi–threaded. The borrow checker applies consistent rules whether you spawn any additional threads or not, the built–in test runner is multi–threaded even if your program is not, etc, etc.


(2015)


We now have pidfd, CLONE_PIDFD and friends, which should be helpful for handling children.


libcs makes it hard to use because they don't allow programs to call clone. So you have to go through libc-sanctioned wrappers (which have only been arriving very recently and have limitations) or jump through hoops[0] to get the pidfd from the child.

[0] https://github.com/rust-lang/rust/blob/21e6de7eb64c09102de3f...


Yeah, the nominal libc restrictions are definitely pretty annoying. In my own programs, I've just taken to calling SYS_clone (or SYS_clone3) directly and treating it like _fork() w.r.t. which functions are safe to call from the child. It's not my problem if the libc doesn't want to define some proper way to do it, and both glibc and musl gave up on current-process caching years ago. Though of course, the Rust project has to be more conservative.

(Linux libcs, or at least glibc, have a weird position overall. "We aren't responsible for our APIs working if a raw syscall has changed the program state", but also "we won't provide wrappers for new kernel functionality, just use a raw syscall". Which is it? "Just don't use any new functionality unless it's scoped to an fd"?)


> I've just taken to calling SYS_clone (or SYS_clone3) directly and treating it like _fork() w.r.t. which functions are safe to call from the child.

Things get really spicy when you want to use CLONE_VFORK to get fast process spawning and pidfds at the same time[0]. I think technically any syscall through libc would be illegal after that because errno is thread-local state and the vfork child isn't allowed to touch that. Regular vfork() handles this by updating thread state.

And it's not just libc. The kernel devs are imo a bit too lax when it comes to their API specs. E.g. I recently ran into an issue around the specification of the close() syscall. Unlike write() its manpage doesn't have the "other errors may occur" caveat, and yet FUSE can cause arbitrary errors to be returned from close(), including EBADF. When I requested clarification[1] they were neither willing to call the FUSE behavior a bug nor update the docs.

[0] https://sourceware.org/bugzilla/show_bug.cgi?id=26371 [1] https://lore.kernel.org/all/0a0a1218-a513-419b-b977-5757a146...


> Regular vfork() handles this by updating thread state.

It doesn't, though? At least not on x86 [0] [1]. Stuff like this is why I'm inclined to regard the libc rules as fictions of dubious utility.

> And it's not just libc. The kernel devs are imo a bit too lax when it comes to their API specs. E.g. I recently ran into an issue around the specification of the close() syscall. Unlike write() its manpage doesn't have the "other errors may occur" caveat, and yet FUSE can cause arbitrary errors to be returned from close(), including EBADF. When I requested clarification[1] they were neither willing to call the FUSE behavior a bug nor update the docs.

I mean, FUSE is by no means the only offender with syscall return values. Before execve()ing your program, I can install a seccomp filter that makes any syscall return any errno. Even infallible operations like sched_yield() can be made to return an error. So I operate on the principle that the results I get from a Linux syscall are whatever the environment wants me to see: it's the environment's responsibility not to do something totally schizophrenic.

(Well, except for the possibilities where a less-privileged FUSE mount could confuse a more-privileged process. But the others in that thread want to see specific scenarios for that, which makes sense to me. In any case, if you have a privileged process that you don't want less-privileged filesystems to blow up, you likely want special isolated handling for all syscalls accessing it.)

[0] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/uni...

[1] https://git.musl-libc.org/cgit/musl/tree/src/process/x86_64/...


> It doesn't, though? At least not on x86 [0] [1]. Stuff like this is why I'm inclined to regard the libc rules as fictions of dubious utility.

Hrm, interesting. At least the musl author claimed[0] that using clone invalidates the thread state in a way that (presumably) vfork() wouldn't.

> I mean, FUSE is by no means the only offender with syscall return values. Before execve()ing your program, I can install a seccomp filter that makes any syscall return any errno.

I would put Seccomp and FUSE are different buckets. Seccomp is more like ptrace, it's hooking right into the process like a debugger. If it's actively sabotaging your process you have already lost, it could make mmap return the same pointer twice for example. FUSE is different, the kernel sits between the process and the fuse server. So the kernel is in a position to uphold its API contract.

[0] https://github.com/rust-lang/rust/issues/89522#issuecomment-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: