Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While the mean deviation as presented is slightly nicer than sigma for intuitive purposes, it isn't as appropriate (iirc) for statistical tests on normal distributions and t-distributions.

More importantly, it doesn't fix the real problem, which is that the mean and standard deviation don't tell you everything you need to know about a data set, but often people like to pretend they do. It's not rare to read a paper in the soft sciences which might have been improved if the authors had reported the skewness, kurtosis, or similar data which could shed light on the phenomenon they're investigating. These latter statistics can reveal, for instance, a bimodal distribution, which could indicate a heterogeneous population of responders and non-responders to a drug, and that's just one example.

I'm not a statistician, so some of this might be a bit off.



Whenever you try to describe a large data set with a single number, you lose a lot of information, like you said. Having more measurements helps, but I think the larger point is that we don't have to do this anymore, we could publish the entire data set instead.

Without computers, this would be a waste of paper, but transmitting the data electronically is cheap.

So why argue over the measurements? Publish the data and my software can give me any measurement I'm interested in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: