This is interesting, but as someone who's not very knowledgeable about C & assembly, I'm not sure what's going on. Could anyone explain what this does?
Even the article writer doesn't need it: as far as I understand he never presented some use case which would benefit from such techniques:
"It's mostly intended as an example of how these tricks work."
Now, if you have an OS component it's convenient to have a mechanism to easily hook some externally exposed API function. The code he presents inserts a special place in such functions which are so prepared to be hooked and another code does the redirection.
If you write applications and not OS components, I can't imagine you'd ever need such a code.
A very simple model of a computer to think about is one that has two components: a memory and a CPU. Picture the memory as a big array that contains both code and data (and there's no way of telling which is which); picture the CPU as a black box that has a pointer into the array, called a program counter or a PC.
And here's how the CPU runs:
* Read `mem[PC]`.
* Decode that value into an instruction.
* Runs the instruction.
* Increments PC by 1.
* Repeat.
Now that's not a very interesting CPU since you can't really implement conditionals or loops or functions like that because the PC just keeps incrementing. So make sure one of your instructions is a jump, which lets you skip the increment and assign an arbitrary value to PC.
Now, the original trace implementation was this:
if (_point) printf(_args);
This translates roughly into this pseudo-assembler code:
read mem[address of _point] into register
branch-if-zero register label
push _args onto stack
jump to printf
label:
[rest of the function]
Lots of new, unexplained stuff here so here's a quick rundown:
* Registers are a small set of variables provided by the CPU that let you store memory an order of magnitude of faster than the regular memory. Throwing it over to Wikipedia: http://en.wikipedia.org/wiki/Processor_register
* Labels are just a way for us to mark places in assembly code instead of using indices into the code's memory. The assembler will figure out the indices for us and rewrite the labels to be numbers.
* Branch if zero! Think of branch-if-zero taking two arguments: a register and a label. If the register is zero, it jumps to the label. If the register is nonzero, nothing happens.
* The stack is a place chosen by the operating system where arguments to a function is passed. I'm being vague because there's a lot to say: http://en.wikipedia.org/wiki/Call_stack
So, back to the code. There are two problems here, as laid out by the post's author:
* In constrained environments (the example in the post is the Linux kernel), an extra read for tracing is expensive. Real computers have caches: if you read mem[0x1234] a hundred times, the CPU will keep that value around so later reads are faster (much faster) than the first. Reading `mem[address of _point]` means one less slot in the cache, which depending on what you're writing can be unacceptable. There's so much more to be said about caches: http://en.wikipedia.org/wiki/CPU_cache
* Branch if zero! Real CPUs are optimized to decode and run a bunch of instructions in parallel. (This is called pipelining.) Branches are kryptonite because CPUs don't know ahead of time whether the jump will occur or not so they have to guess which instructions to pipeline. More more more to be said: http://en.wikipedia.org/wiki/Instruction_pipeline (especially the Complications section) and http://en.wikipedia.org/wiki/Branch_predictor
So the rest of the blog post is devoted to writing some assembly code that gets around those two problems. There's a lot of time spent in the details, but here's a very high-level overview:
Replace the original trace implementation one instruction: `nop`, which is a no-op or an instruction that doesn't do anything. It's a placeholder.
At runtime, if tracing is enabled, rewrite the `nop` to be a jump to a function. That function contains the actual code that calls `printf`.
This is possible because the CPU doesn't distinguish between code and data. Just like you can manipulate data at runtime, you can also find and manipulate code. And, as this is used for good here, it can also be used for evil: Imagine taking advantage of a bug in a program to make it give you control of its code. Then you can rewrite the code to email you secret information or to make 100 HTTP requests a second to a server you don't like.
Everything else is bookkeeping, making sure the compiler, the linker, and the operating system are all on board with this plan.