Scheduling Internals

samsquire · on Feb 29, 2024

Love these kind of animations

I have my own animation here - if you scroll down. You can see the parsing I do for an async syntax (scroll to the hearing named Definition) - you can edit the syntax and it reparses and displays the parsed output. Sorry for the mess it's kind of a place I work on ideas. Further down is my graph renderer.

https://processes3.replit.app/

I want to auto parallelize async tasks.

I am working on a multithreaded runtime in C which is incomplete but can communicate between threads in roughly 60 nanoseconds at 6 million requests per second or closer to LMAX throughout of 20 million requests per second at a latency of 90-120 nanoseconds.

In my learnings Mutexes do not scale.

I am inspired by bulk synchronous parallel which means threads synchronize in a rhythm without Mutexes and a lock free algorithm to do with semaphores and observation rather than mutability.

I run two io_urings in different threads - one thread does send and another thread does recv. One ioring is talking to an epoll instance for network buffer readiness with EPOLLOUT.This means sending and receiving is full duplex - you can send and receive to different clients in parallel. I call it split IO.

I plan to write up all my learnings soon.

My adapted HTTP server from the loti examples gets a throughout that scales with the number of connections I am at 7000 requests per second. I am not nginx.

I use mailboxes and double buffering for thread safe communication. Each thread can communicate with another thread through its mailbox. This is like Erlang.

I plan to add coroutines based on Marce Colls coroutines. I can deschedule a coroutine when there is an await.

deathanatos · on Feb 29, 2024

I think the schedulers have a few bugs. The multi-core one under:

> The simplest way to achieve multi-core scheduling, is to do exactly as before. Having a global queue of tasks that are ready to run, and run them once a core is ready:

The second core, in that example, for me, eventually stopped doing anything at all: https://i.imgur.com/9RtfkSG.mp4

I also think the deadline scheduler has a bug, too: sometimes the pink blob will just skip the queue entirely; it seems like its deadline is "0s away" until it has moved all the way to the queue, so if it is mid move & the CPU becomes available, it seems like the most runnable. (It happens with the other colors too, under the same conditions, but you'll see it with pink more often, since it spends far more time in that state — moving from I/O to waiting to run —, but I did catch other colors doing it too.)

And the one at the top would occasionally double-schedule a core. (Multiple dots would animate onto a single core.) https://i.imgur.com/bPHgPJ0.mp4

tontinton · on March 1, 2024

Thanks for the feedback, I fixed most issues with the animations.

matheusmoreira · on Feb 29, 2024

This is great information. Especially the libuv internals.

The article uses longjmp to explain coroutines. The native stack is doing a lot of heavy lifting there... What would the implementation look like if you reified the stack?

This aside also left me wanting more:

> Rust's async transforms a block of code into a state machine that is not run until you await it.

Would be interesting to read about the internals of that!

I recently asked a question related to all this on the new language implementation stack exchange.

https://langdev.stackexchange.com/q/3547

supriyo-biswas · on Feb 29, 2024

There is a 1:1 correspondence between a generator/coroutine and an iterator with internal state. I suggest looking at the PHP generator RFC[1] and Concurrency from the Ground Up[2] talk.

[1] https://wiki.php.net/rfc/generators

[2] https://m.youtube.com/watch?v=MCs5OvhV9S4

kitd · on Feb 29, 2024

Indeed. The fundamental concept is "Algebraic Effects"

https://stackoverflow.com/questions/49626714/what-does-algeb...

rdtsc · on Feb 29, 2024

Excellent overview of scheduling with code examples and animations.

One minor correction about Erlang is that that scheduler doesn't get invoked just on function calls. It will be invoked as soon as the lightweight process consumes a certain number of allowed operations. So even if you have one huge function that never calls others, just computes something, it will still consume operations and will be preempted. Some internal C utility functions also consume some virtual number of ops as well and may yield.

tontinton · on March 1, 2024

Yeah I might have been wrong, simply went with https://blog.stenmans.org/theBeamBook/#_reductions

Now that I reread that section, it also depends on whether you call a BIF or not, I'll think about how to phrase that better in the blog post.

Thanks!

rdtsc · on March 1, 2024

Even some BIFs and NIFs (BIF are mostly just an built-in NIFs) can yield [1] and choose to reschedule themselves later. But inside the C code it is voluntary, of course. An example of this can be seen when running hash functions [2].

Another interesting idea is that a NIF which may not reschedule themselves with yield, can still signal the scheduler that they are consuming so many "reductions" if they did some lengthy work. So if, say, they run for 10 msec, they might want to bump the reduction counter a bit: enif_consume_timeslice. Otherwise a tight recursive loop calling the same NIF could still effectively block the VM.

[1] https://www.erlang.org/doc/man/erl_nif#enif_schedule_nif

[2] https://github.com/erlang/otp/blob/ab7b354c37dac92704faac455...

[3] https://www.erlang.org/doc/man/erl_nif#enif_consume_timeslic...

IvyMike · on Feb 29, 2024

> Running a thread per client is famously known as The C10K Problem.

Gonna have to be that guy: the C10K problem was "how to serve 10000 clients from a single server" specifically in the 2000ish time-frame. The article links to the C10K problem page, which discusses the tradeoffs involved in at least five high-level strategies[1], only one of which is the strategy of running a thread per client [2].

[1] http://kegel.com/c10k.html#strategies [2] http://kegel.com/c10k.html#threaded

tontinton · on March 1, 2024

You're right, I'll fix it in the post.

Thanks!

signa11 · on Feb 29, 2024

i think, it is time already to go from C10k -> C10m or somesuch.

monocasa · on Feb 29, 2024

Yeah, there was talk of the C10M problem. Just like C10K was considered solved with event loops like epoll, kqueue, and IOCP, C10M is generally been considered solved with techniques that colocate your network stack and your business layer code.

So DPDK on one side keeping everything in user space talking directly to the NIC, or for static resources Netflix style techniques of optimized pathways for KTLS+sendfile to keep the dataplane all in kernel space. With that you can avoid the copies from a buffer in a socket call to into the network device packet buffers.

usefulcat · on Feb 29, 2024

I like the animation at the top, except that it's meaningless unless you know what the different parts represent. If you scroll down far enough, you will eventually find this:

    Each circle is a task. The white progress circle around tasks is the time left to run until the task is blocked.
    Purple box - The queue holding tasks ready to run.
    Green box - The CPU.
    Gray box - Tasks blocked on something (e.g. I/O).

It would be great to have that description right at the top with the first animation.

tontinton · on March 1, 2024

I thought about it, I just wanted people to get hooked with some eye candy, and to want to continue reading ;)

jvanderbot · on Feb 29, 2024

Nice article.

Every time I said "But wait you can just ... " the author went on to do that thing. Much appreciated!

tontinton · on March 1, 2024

Thanks! :)

tpoacher · on Feb 29, 2024

I wonder if I can use this for scheduling my inbox tasks.

(probably not ... they all have the same deadline: "yesterday!")

orangepanda · on Feb 29, 2024

My priorities are named: Immediate, Very immediate, Extremely immediate. This way clients dont know they're asking for a low (immediate) priority.

throwawaaarrgh · on Feb 29, 2024

This is a fantastic article, bravo. Required reading for all SW developers

tontinton · on March 1, 2024

Thanks a lot! :)