'I like C, but I have to admit that, sometimes, “The Old Man of Programming” can be a bit of a killjoy. This is one of the most exciting eras in computer history, but lately, C’s acting like he doesn’t even want to have a good time. While the cool kids like Ruby and Haskell are living it up, C’s over in the corner obsessing over bits and bytes and memory alignment and pointers and the stack and machine architecture and unreachable allocations and multiple indirections…and it’s…kind of a buzzkill. You want to tell him, “Dude! Lighten up! This is supposed to be fun!”
But of course C can’t relax. He’s holding the entire computing universe on his shoulders, after all. It’s not turtles all the way down — it’s turtles all the way down, __then C__. Not a lot of room for fun underneath that, is there?
Is there?
Well. Here’s the secret: C does let loose sometimes. And after being bottled up for so long, when the release finally does come, it’s pretty much an F5 of crazy.'
I've programmed in C a fair amount, but I am incapable of understanding the construct contained therein. Specifically, the interaction of the switch and the loop and all those fall-through cases. I would absolutely love it if some C-guru hn-er could describe it in layman's terms?
It's really less complex than it appears, especially once you grok that switch/case is just a computed goto; cases are just labels.
So, the idea behind Duff's Device is just to unroll a loop, dealing in an elegant (some would say abhorrent - they're wrong) way with the "leftover" portion.
So, the canonical Duff's is copying shorts, with the copy loop unrolled to eight copy statements inside the loop. If you want to copy, say 47 shorts, the loop would need to iterate five times, and there would be seven shorts left to copy.
Duff's uses the switch to jump into the middle of the loop just far enough to copy those seven leftover shorts (47 % 8 = 7). The do-while then proceeds to run through the five full iterations.
Modern compilers will do that for you now, though there are still applications for similar code.
Okay, this definitely helps but I'm still struggling (sorry), I'm probably focusing too much on minutiae but:
* At the static level, what does it mean to start the loop at `case 0` then fall through to the termination of the loop? Part of me is surpisued that it even compiles (again, sorry); it's like the mechancis of the loop itself become dependent on the runtime input, at which point my brain begins to frazzle gently.
* Why is the `to` pointer never incremented? Is that some super-clever thing that I'm not seeing or is it just left out of the algorithm for clarity?
- case 0 is first, because this indicates that the modulo operation in the switch statement (count % 8) has no remainder, i.e., all 8 copy instructions should be executed. So, this is first because of the properties of the modulo operation.
The basic idea is that the switch jumps into the body of the loop somewhere, after that it can do blocks of the same function over and over without having to do the conditional check to exit the loop as often.
For example, if you're copying memory, and you want to copy say, 9 bytes then you'd jump into the loop, copy 1 byte, run the conditional check, realize that you're not done, and then copy 8 more bytes, run the conditional check, realize you're done and exit the loop. For the 8*n times you run through the "unrolled" loop after that first pass through, there are a lot less conditional branches, so many processors can execute those instructions faster. That's the idea, anyway.
That is a really good description (and I love the `crappy flowchart`, thanks). But I think what I'm mostly taking away from this is that my brain struggles with the more serious aspects of C-style imperative programming.
It's really just sort of a weird optimization trick. You shouldn't use it in a regular program unless you have a very, very good reason. The fact that it works at all (ie, that you can inject the start of a loop into the middle of a switch statement) is surprising to most people. I've shown this to people who've been programming since before I was born, and they struggled to understand it at first since they had no idea that it was syntactically possible to do that -- it's not the kind of thing you normally think of when writing a program in C. If you know an assembly language and how C maps onto assembly, it's pretty easy to make sense of though, after you get over the shock of the syntax permitting this particular construction.
The other two comments are right, but the context you might be missing is why you would unroll a loop. Due to cache behavior (and other low-level details), doing something eight times and then checking if it needs to be done another eight times has less overhead than checking after each loop iteration. Why eight? It was probably a sweet spot in time vs. code size, at some point. (Processor caches are larger now.)
Duff's device just gets the (remainder of N/8) steps out of the way the first time through, then drops down to looping eight at a time. If it seems more complicated than that, you're probably overthinking it. It's "just" a creative abuse of C syntax, a bunch of offsets and gotos.
Sometimes low-level optimization like this makes a huge difference, but make sure it's a hotspot first, and that the compiler isn't already doing those things for you. Measurements will keep you objective.
Also, if you're doing a lot with C, check out Lua!
Thanks. Lua is on "my list" actually, Fortunately I am only forced to work in C for one particular project (sorry if that sounds negative, C-heads, I'm just a messy enough person to require garbage collection to stop me from messing up too bad).
'I like C, but I have to admit that, sometimes, “The Old Man of Programming” can be a bit of a killjoy. This is one of the most exciting eras in computer history, but lately, C’s acting like he doesn’t even want to have a good time. While the cool kids like Ruby and Haskell are living it up, C’s over in the corner obsessing over bits and bytes and memory alignment and pointers and the stack and machine architecture and unreachable allocations and multiple indirections…and it’s…kind of a buzzkill. You want to tell him, “Dude! Lighten up! This is supposed to be fun!”
But of course C can’t relax. He’s holding the entire computing universe on his shoulders, after all. It’s not turtles all the way down — it’s turtles all the way down, __then C__. Not a lot of room for fun underneath that, is there?
Is there?
Well. Here’s the secret: C does let loose sometimes. And after being bottled up for so long, when the release finally does come, it’s pretty much an F5 of crazy.'