>> The notion of standard deviation has confused hordes of scientists
> What an assertion! It also proved to be very useful for hordes of scientists... what about some examples of confused scientists ?
He is exaggerating, for sure. But the point is valid: the mean average deviation (MAD) is often very different than the standard deviation (STD), and the MAD is more intuitive, it has a natural geometrical interpretation - STD's usage of squaring the distance makes it more complex.
And yes, this confuses people in some cases, including scientists. Many scientists are not statistical experts, they use tools as they were taught, and they often assume MAD is approximately the STD, because it usually is, except in rare cases when it is not. I've seen examples of those people in grad school, he is not making this up.
The STD is far more easy to analyze in a mathematical way. That is the huge value it brings - squaring is an operation you can take the derivative of, but absolute value you cannot. STD gives us nice properties like easily provable sum of variances is the variance of the sum, for independent variables.
MAD, however is nicer for reporting data since it is more intuitive. I think he makes a valid point that STD is used more frequently than it should be.
> Ok so he asserted that people should just use mean deviation instead of mean of squares. Guess what though, taking the squares have a purpose: it penalizes big deviations so two situations which have the same mean deviation but one is more stable have different standard deviations.
His point is that many people are not aware of that property and do not want it.
I'm relieved to hear from others who didn't love his book, now I don't feel so left out. I checked out Black Swan from the library a while back. It seemed like Taleb's smug prose and mudslinging were writing a check his evidence couldn't cash, but I couldn't stand more then the first 50 pages, so I never got to find out. My girlfriend read it all of it and sort of gave me the cliff notes; she thought I wasn't missing much.
If I ever met anyone in real life who was as impressed by his book as as much as the inexplicably fawning blogs and reviews I have seen, I would give it another try, but I sorta feel like I've been had. At least it was a library book, so I didn't give him 12 bucks.
Anybody here love his stuff, wanna convince me to try again?
I couldn't make it through 50 pages of Black Swan. I haven't read his newest book.
Fooled by Randomness[1] was very good. It's more of a set of cautionary tales, mostly pulled from Wall Street, about how to think about chance and making sure you're judging all events against what is expected by chance.
The thing about Taleb is that all of his success hinges on a single, long-ago tail event that few people even remember. It defines his entire outlook and he often assumes that people who interview him or read his articles know this. The article in question not withstanding, if you mentally insert "when discussing tail events" before his claims, many of them make much more sense :)
What is hilarious is that I have been scratching my head over the exact opposite phenomenon!
I actually love Taleb's books (Black Swan, Antifragile). And I work with "data scientists", most with Ph.D.s, many in statistics.
I always talk about his book, but I cannot for the life of me find a single person who's even read it. I think: these are some the ONLY book about stats on the NY Times best seller lists (up until Nate Silver). And yet all these professionals have not only not read it, but barely even heard of it?
My hypothesis was that it is more popular on the East Coast than the West Coast (of the US). Are you from either of those places? I am originally from the east coast but work on the west coast. To hand wave a bit, I feel like west coast people are less into "ideas" and more into actions and experiences. Taleb's ideas do have somewhat of a nytimes-ish new yorker-ish east coast culture flavor. And a lot of people working in Silicon Valley are not really that interested in philosophy.
On the subject of his writing, I can totally see that people can be turned off by his writing. He can be arrogant and insulting. I find it kind of funny, but that's a matter of taste.
A few things I remember from his books that I really liked:
- The story of Nobel prize winner Myron Scholes, and namesake of the Black-Scholes equation, which I learned about in computational finance in college. He started a company "Long Term Capital Management", to monetize this ideas, and promptly lost billions of dollars. http://en.wikipedia.org/wiki/Myron_Scholes
That's not interesting? I think the difference between theory and practice is intensely interesting, and Taleb has a lot to say about it.
- I largely agree with his philosophy that people who claim to know things cause more harm than good. The downfall of Alan Greenspan and Bernanke is their arrogance. They think they can control the economy. But they can't and caused millions of people real harm.
- Respect for the old. For all its virtues, Silicon Valley does have a severe case of "neomania". Taleb's ideas about things that last apply to software too. Unix and C are going to be around a lot longer than say Hadoop or Puppet/Chef.
- The philosophy of fragility also applies to software in a straightforward way. Most people know this now, but you should continually expose your software to users and the market, not build up grand ideas in your head.
- actions over knowledge, i.e. people who know how to do things but not explain them
I could list a half a dozen more important ideas but I'll stop there.
I did write in my comments about this book that he overreached on his "trilogy" idea for the Antifragile. But I do like how he draws together a lot of seemingly disparate ideas that are philosophically related.
FWIW, I'm from the west coast, but have spent the last several years in New York working with natural sciences students and post-docs. I might be what you would call an "ideas" person. Abstraction appeals to me. People in our department seem happier when their work is closer to physical observerations. Measuring stuff with yard sticks and radar: good. Getting big piles of data from other people, and doing stats: OK. Fitting a parematerized model to someone else's data: healthy skepticism. Of course healthy skepticism is generally cultivated.
It wouldn't be surprising if our friends reading lists (and reactions) depend on what they do for living. Did you meet a lot of people in finance or economics on the east coast? Maybe people's professions are spatially correlated.
My peers seem to treat models cautiously (including their own), and have tended to respond to abstract economic ideas with measured skepticism. I can't speak for them, but I sometimes perceive that economic arguments are suspected of being insufficiently empirical and subject to ulterior motives. Which, it seems, is at least a part of what Taleb is complaining about. Obviously these things can be true of any kind of argument, I'm just reporting my impression. Anyhow, it would be easier to listen Taleb if his tone was more restrained.
Most of the economists I have paid attention to (which are not near as many as I would like) seemed inclined to provocation. Sowell in "Basic Economics", and Friedman in his speeches (I haven't read his papers) tend to poke fun at their fellow citizens, for example. I think they sometimes alienate those outside their discipline because of this. It makes reading fun though. I find Friedman very amusing, and I think so does he.
I am a programmer in Silicon Valley, but I was more interested in philosophy/mathematics when I was young (I was raised and educated on the east coast). I do feel there is a cultural difference -- not sure precisely what it is though.
I guess I share Taleb's problems with models. Even before I read Taleb I would say to myself "the map is not the territory", particularly with regard to software abstractions. I think the space between the model and reality is where you find a lot of interesting things (including the ability to make a lot of money).
I also share Taleb's skepticism with economics. The core problem is that it's not really a predictive science. It's a lot of people talking about stuff. Did those ideas help anyone? You can make a good case that they hurt a lot of people. If they are so smart, why aren't they rich? The Scholes case is a great example of that.
I'm currently reading "The Signal and the Noise" by Nate Silver, which is actually a fantastic complement to Taleb's books. They say very much the same things, in very different ways. The good part is that you will not be turned off by Silver's prose -- he's humble and very readable. I didn't follow 538 at all, and didn't pay all that much attention to the 2012 election, but I can tell that his writing skill was a big reason he became so popular.
To give an example, Taleb talks over and over about "negative knowledge" -- what not to do, what things don't work, etc. And Nate Silver says the same thing. To make accurate predictions and models, you have to be aware of known classes of mistakes, cognitive biases, etc. and not fall into those traps. People often think that they need to improve themselves by learning more. But for a reasonably smart people, the bottleneck to your effectiveness is actually thinking that you know something you don't.
I am also an "ideas" person but I share the utilitarianism and empiricism of Taleb. There has to be "skin in the game", as he says. There are so many ideas out there, and generally most philosophical arguments (and journalism, advocacy, etc.) boil down to confused semantics. So the way to find truth is through actions and experiments. Economics fails these tests for truth.
EDIT: Nate Silver talks about this paper: http://www.plosmedicine.org/article/info:doi/10.1371/journal... This would resonate with Taleb quite a bit. Most ideas are false, including published ones. It would actually violate economic theory if that weren't the case -- if most science was true -- because scientists have bad incentives (something I know from direct experience).
> and the MAD is more intuitive, it has a natural geometrical interpretation
It's less intuitive, not more intuitive, at least for me. And the standard deviation definitely has a more geometric interpretation than MAD. If you measure a hundred samples, and you want to figure out how much they differ from the expected values, what could be more intuitive than Euclidean distance? But most people never bother to try and extend their intuition about ℝ^3 to ℝ^100 to realize how simple standard deviation truly is.
What is being advocated here is the use of the L_1 norm (MAD) over the familiar L_2 norm (standard deviation). Everybody knows and understands L_2, and L_2 has a lot of desirable properties.
This makes no sense. If you want to know how much they differ from the expected value, you first define what you mean by difference (i.e. L1 or L2 norm) and then measure it somehow.
The standard deviation is an estimator for the square root of the expected value of this difference, when it's chosen to be the SQUARED L_1 distance (which is the same as the L_2 distance in 1-D).
The MAD takes the mean of the L-1 distance.
While the standard deviation is proportional to the L_2 distance betwen the vector of the samples and a equal sized vector with all coordinates as the mean, that's not an intuitive expression based on the problem statement.
> that's not an intuitive expression based on the problem statement
That's the danger I'm talking about when you start using the word "intuitive". Intuition is relative, and someone who works with mathematics or statistics will develop a mathematical intuition about things. Just like if you're an experienced driver you'll intuitively know when other drivers are about to change lanes, even before they signal.
I think of L_2 more intuitive because it physical space uses the L_2 norm.
The other thing that makes L_1 counterintuitive is that if you measure the absolute deviation from the mean, then you aren't minimizing the deviation—in order to do that, you have to choose the median.
In other words, you say "this is the center" and "this is the measure of how far away everything is from the center", but you could have picked a different center which has a lower distance from your data. Counterintuitive.
Ahhh, I see what you mean, I think. Or maybe not. If I have 3 sample points, [1,1,1], then the standard deviation is 0, but where you take your Euclidean distance?
If I'm not mistaken, with 3 sample points a, b and c that have an average mu, then
sigma = |(a, b, c) - (mu, mu, mu)| (L2 norm in R3)
Is that the geometric interpretation you're referring to? It's neat, but the mu vector feels a bit artificial.
> The STD is far more easy to analyze in a mathematical way. That is the huge value it brings
This is also what I have read (I have essentially no experience with statistics).
> squaring is an operation you can take the derivative of, but absolute value you cannot
We can talk without wild, obviously false hyperbole. It's trivial to take the derivative of the absolute value function. Working with it is more difficult, but taking the derivative is as easy as anything in math:
"Easy as anything in math"? Now that's "obviously false hyperbole". You can't just brush away that discontinuity, specially when it's on the minimum (which is what people most often care about). Anyway, this is formalized in the notion of subderivative. http://en.wikipedia.org/wiki/Subderivative
You'll learn to take the derivative of |x| in high school calculus, shortly after learning how to take derivatives in general. It's not complex, and the fact that it's not defined at the origin doesn't mean there's a problem in taking it; it just means the function behaves "badly" there. I don't understand the focus on the discontinuity anyway; in working with the function, I'd be more concerned about the fact that it's piecewise defined.
Consider a variant on the absolute value function:
f(x) = x (when x >= 0); sin x (when x <= 0)
It and its first and second derivatives are all continuous (after which it comes apart at the origin), but if you were working with the derivative, even though it's differentiable everywhere, you'd have to constantly be aware of whether the origin was in your domain. The great advantage of working with x^2 instead is that it behaves the same way everywhere, not that its derivative is continuous.
This occurred to me after the editing window for my other reply passed, but...
The tangent function has discontinuities at x = pi/2 and every interval of pi from it. When taking the derivative, is it more fruitful to say "you can't just brush away those discontinuities", or "sec^2 x"? And this is a derivative all calculus students are required to know!
Oh, completely agreed. I was enlightened when I read that standard deviations were defined the way they were in order to make analysis easier; it seems more than reasonable to me. But there's no difficulty in just taking the derivative of |x|, and I hate seeing random misinformation out there.
> MAD, however is nicer for reporting data since it is more intuitive.
I have a hard time fathoming how anyone could think such a thing. This is math: we should be using things because they map to reality in some way, not because they are aesthetically pleasing.
The standard deviation maps to processes where the "importance" of changes is proportional to their square. For example, electrical power is proportional to the square of voltage, so AC power systems are conventionally rated by standard deviation. My 120 V power outlet has a standard deviation of its potential of 120 volts.
Standard deviation is also commonly used in situations where we cannot put a number on the importance of the deviation but we know it is big.
The mean average deviation is useful for numbers that are directly proportional to their importance. For example, if we have a hundred lamps and we measure their optical power outputs, the MAD would be a useful measure of their variation. Optical power is already in units of oomph.
> The standard deviation maps to processes where the "importance" of changes is proportional to their square.
Sure, but the point of the article is that STD is used very commonly, in places where that does not make sense. For example, it is common to see things like "the weight of the test subjects was 170cm (STD 5cm)".
Because of the central limit theorem, many distributions encountered in science are approximately Gaussian, which is parameterized by its mean and standard deviation. According to Wikipedia: "Height is sexually dimorphic and statistically it is more or less normally distributed, but with heavy tails."
On top of that, we have well-understood and easy to compute estimators for standard deviation. Using the sample variance is not a bad estimator at all, the only real disagreement is whether you divide by N or N-1.
> What an assertion! It also proved to be very useful for hordes of scientists... what about some examples of confused scientists ?
He is exaggerating, for sure. But the point is valid: the mean average deviation (MAD) is often very different than the standard deviation (STD), and the MAD is more intuitive, it has a natural geometrical interpretation - STD's usage of squaring the distance makes it more complex.
And yes, this confuses people in some cases, including scientists. Many scientists are not statistical experts, they use tools as they were taught, and they often assume MAD is approximately the STD, because it usually is, except in rare cases when it is not. I've seen examples of those people in grad school, he is not making this up.
The STD is far more easy to analyze in a mathematical way. That is the huge value it brings - squaring is an operation you can take the derivative of, but absolute value you cannot. STD gives us nice properties like easily provable sum of variances is the variance of the sum, for independent variables.
MAD, however is nicer for reporting data since it is more intuitive. I think he makes a valid point that STD is used more frequently than it should be.
> Ok so he asserted that people should just use mean deviation instead of mean of squares. Guess what though, taking the squares have a purpose: it penalizes big deviations so two situations which have the same mean deviation but one is more stable have different standard deviations.
His point is that many people are not aware of that property and do not want it.