> Publishing introduces a systematic bias, because it's difficult to get published where p>0.05 (or whatever the disciplinary standard is).
That explains why the p-values above 0.05 are rare compared to values below 0.05. But it fails to explain why p-values above 0.02 are rare compared to values below 0.02.
I agree with your point from your previous post, that lower p-values are harder to get than higher ones, at least if one is looking at all possible causal relationships, but there are at least two possible causes for the inversion seen in publishing. The first is a general preference for lower p-values on the part of publishers and their reviewers (by 'general' I mean not just at the 0.05 value); the second is that researchers do not randomly pick what to study - they use their expertise and existing knowledge to guide their investigations.
Is that enough to tip the curve the other way across the range of p-values? Well, something is, and I am open to alternative suggestions.
One other point: while the datum immediately below 0.05 would normally be considered an outlier, the fact that it is next to a discontinuity (actual or perceived) renders that call less clear. Personally, I suspect it is not an accidental outlier, but given that it does not produce much distortion in the overall trend, I am less inclined to see the 0.05 threshold (actual or perceived) as a problem than I did before I saw this chart.
> Personally, I suspect it is not an accidental outlier, but given that it does not produce much distortion in the overall trend, I am less inclined to see the 0.05 threshold (actual or perceived) as a problem than I did before I saw this chart.
Don't be fooled by the line someone drew on the chart. There's no particular reason to view this as a smooth nonlinear relationship except that somebody clearly wanted you to do that when they prepared the chart.
I could describe the same data, with different graphical aids, as:
- uniform distribution ("75 papers") between an eyeballed p < .02 and p < .05
- large spike ("95 papers") at exactly p = 0.4999
- sharp decline between p < .05 and p < .06
- uniform distribution ("19 papers") from p < .06 to p < .10
- bizarre, elevated sawtooth distribution between p < .01 and p < .02
And if I describe it that way, the spike at .05 is having exactly the effect you'd expect, drawing papers away from their rightful place somewhere above .05. If the p-value chart were a histogram like all the others instead of a scatterplot with a misleading visual aid, it would look pretty similar to the other charts.
Well, you could extend this mode of analysis to its conclusion, for each dataset, and describe each datum in the data by its difference from its predecessor and successor, but if you do, does that help? I took it as significant that you wrote "...but it's an outlier from what is otherwise a regular pattern that clearly shows that smaller p-values are more likely to occur than larger ones are" (my emphasis) and that is what I am responding to.
I think we are both, in our own ways, making the point that there is more going on here than the spike just below 0.05 - namely, the regular pattern that you identified in your original post. If we differ, it seems to be because I think it is explicable.
WRT p-values of 0.05: I almost, but did not, say that if you curve-fitted above and below 0.05 independently, there would be a gap between the two, and maybe even if you left out the value immediately below 0.05. No doubt that would also happen for other values, but I am guessing that this gap would peak at 0.05. If I have time in the near future, I may try it. If you do, and find that I am wrong, I will be happy to recant.
That explains why the p-values above 0.05 are rare compared to values below 0.05. But it fails to explain why p-values above 0.02 are rare compared to values below 0.02.