This article can not be trusted. The version of FreeBSD they use in the test is in 'debug mode'. The speed of every critical system is impacted while debug symbols are enabled. The fact the author fails to mention this critical fact casts doubt over the veracity of the entire article.
MALLOC_PRODUCTION was set and INVARIANTS and WITNESS were disabled on the 10th, a full week before RC1 was built. Wouldn't be much of a release candidate if it wasn't built using release flags, would it?
Most of the benchmarks look to be more about compiler version and flags than OS, though. Lots of CPU-bound tasks which barely even touch the kernel, woopee. The Stream numbers were quite interesting, perhaps indicating that particularly microbenchmark loves NUMA awareness, which is hardly surprising but perhaps not entirely representative of many real world workloads.
The benchmarks might have been more exciting if they measured scaling to multiple CPUs, given that's where most of the advances in modern OS's are going. Some Apache/MySQL/PgSQL/Varnish/Squid numbers might have been nice too, given that server tasks are a major focus of both OS's.
What exactly do you mean by "debug mode"?
FreeBSD has some standard debugging features during development (-current) and beta stages like WITNESS etc, but those have been disabled for 8-RC1 and later.
I understand they were testing out of the box stuff, but I wish they had upped to a newer gcc in the FreeBSD install, and recompiled some of those programs. Id really like to get a feel for the actual differences between the OSes without extra variales like compiler and 3rd part libs getting in the way. Such a thing would really be kernel+libc comparisons then.
Those differences that "get in the way" are actually relevant in the real world, where people will use those systems as they are and not after upgrading and recompiling core components.
John The Ripper is the most famous password cracker in the world, presumably in the benchmark because it's a good example of a heavily optimized compute-bound application.
Which is to say, just the sort of application for which OS differences are unlikely to matter much. (Compared with, say, what compiler happened to be used to build the application, or what else was running on the mcahine at the same time.)
Unfortunately, many of the benchmarks they used seem to be of this sort. Not that it matters much, since the author of the article offered basically no analysis or explanation of any of the results, just a series of barcharts, each followed by a single paragraph trying to find a superficially different way of paraphrasing the numbers in the barchart.
Meh. Scheduler? I don't know. I clicked this link by accident, saw "John the Ripper", and just decided to clear up what JtR was and how it could be a benchmark program. I think OS benchmarks are themselves kind of silly.