Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Study: Packages Sealed with ‘Atheist’ Tape 10 Times More Likely to Disappear (time.com)
75 points by bado on March 29, 2013 | hide | past | favorite | 34 comments


My all-time favorite link to share in Hacker News discussions about a preliminary research finding is the article "Warning Signs in Experimental Design and Interpretation" by Peter Norvig, LISP hacker and director of research at Google, on how to interpret scientific research.

http://norvig.com/experiment-design.html

As other comments have already pointed out, it's not clear from such a small sample size that the observed results can be generalized as a description of reality. If people are worried about this issue (and it does seem like a legitimate concern), try running lots of replication studies, with variations in study methodology and variations in experimental conditions, and see what happens. If I randomly receive a package, I'm suspicious in general, but maybe other people are glad to get surprises in the mail.


Commenters on the blog pointed out a pretty glaring problem in the experimental setup; their control boxes contained tape with no writing on it at all, which opens up the possibility that package handlers were responding to the presence of text, rather than the content of it. A human handler might be concerned about terrorism or something, and an OCR might just get confused. Commenters suggested labeling controls with a neutral word, and also trying labels such as "Allah" for comparison.


I agree that there are some things they could have done wrong that aren't explicitly ruled out in http://www.atheistberlin.com/study:

Were the packages the same weight or were some of them heavier? Were the packages given over to the same post office, or were some packages given to one office and some to another? What order were the packages handed over to the post office?

Assuming that there were no errors like that, this does appear to be a well designed experiment, and they are using rigorous statistical methods that are appropriate to carry out the tests, so it is entirely valid to draw causal inference that packages of that weight delivered on that day to that post office were caused to be delivered later if they had the atheist tape on them.

It would be hard to generalise that more to say that every package delivered from anywhere to the US on any day will be delayed.

It is also not necessarily possible to jump from the valid causal inference above to "USPS employs bigots" without further experimentation - perhaps the difference occurred before the packages reached USPS? Perhaps USPS employees occasionally have to make a judgement call about the last packages to go into a crate, and they are more likely to put in a package that might be time sensitive than one with tape that implies it probably isn't (it would be interesting to see the impact of other labelled tapes, like Christian, Muslim, Toys, Business Documents, Baby Formula, Shoes and so on).


> perhaps the difference occurred before the packages reached USPS?

Or after. If we're searching for bigots, it's not hard to imagine that packages could have been placed in plain view and illegally removed by people unrelated to USPS.


I agree, in terms of the science in question. However, I will still give you 5:1 odds that this particular phenomenon is real and not due to a small sample size. Let me know if you want to take that action. :)


178 packages to test a system that is designed to give the same result every single time. How is that a "small sample size"?

I picked a random article from Nature Medicine. They used groups of 6 mice, you know, the living organisms, to test the result of some obscure DNA change on bone formation, you know, the process with hundreds of intermediate stages and a bazillion of environmental factors. These studies are rarely if ever reproduced.

But it's here that all the theorethical scientists come out of the woodwork to give some off-remark on how it has a "too small sample size" or didn't control for X.


Not to disagree with your sentiment entirely, but my wife, who's career was spent as a laboratory animal vet, is frothing at the mouth over your comment.

Studies using mice are WAY more controlled than you would expect. The mice are custom bred, genetically modified, and have specifically tuned immune systems. They cost tens of thousands of dollars each (this is not a joke). Since mice reproduce so quickly (and is so profitable), these things can be basically custom ordered to exactly your specifications - with statistically insignificant genetic differences between each one. They come with paperwork showing their exact genetic details, they are RFID tagged in some cases, and they are tested repeatedly before being used in studies - genetically and otherwise.

Regarding environment, these animals are kept in extremely controlled conditions. Forced airflow is repeatedly filtered and tested; the water comes from highly filtered sources into each cage. Animal food and bedding is irradiated to kill any microbes, and is generally highly regulated. Each animal room contains multiple "sentinel" animals, which are exposed to the same environment as the test subjects (and controls) but are tested to make sure that environmental factors don't impact the study.

Additionally, each mouse cage (about the size of a shoebox) holds 5 adult mice; racks of these cages connected to water/air contain ~144 cages. A study with only 6 mice is highly unlikely (although not impossible). With such a large number of animals per square foot, and such a high concentration of grad students, studies are repeated ad nauseam - more so than is ever publicized (mistakes do get made).

Again, not to say that your view of this little "experiment" with tape is invalid, but animal studies (especially mice) are way more controlled than you could ever imagine.


I didn't want to express that these studies don't take the utmost precautions to guarantee a reproducible and significant result, using animals that are essentially clones of each other and all the things you describe.

But at the end, results come down to whether a statistical test found that their findings are statistically significant. They don't exactly try to establish a causal nexus: they admit ignorance over the controlling factors.

ATHEIST did just that, run a bunch of statistical tests, and they all yielded p-values lower than what passes for statistical significance in that Nature article.


Do you keep a repertoire of your posts handy in an emacs buffer?


It's a plain text file that is routinely open in my text editor. (My browser is open all day in screen 1, and the text editor is open all day in screen 2.)


poor experimental design/interpretation. great PR. if only the shoes themselves somehow harkened back to the rebellious brand image that this press created...


Thank you!

It is very obvious that this is a flawed study to begin with as it's being conducted by a company with a very obvious conflict of interest. For example, I hadn't heard of "atheist" shoes until now... PR stunt? Possibly!

The USPS is under serious attack by conservatives already, do we really need to lull liberals into hating it too with this strange sneaker campaign?

USPS has taken no government funding for decades and is the second largest employer in the US after walmart. It's time they got some help.


Some of you may not be aware, but legislation was drafted by republicans to force the USPS to pay too much into the civil service retirement fund. They have not taken any government funding for decades, and currently there is more money than is required in the fund to pay the pensions of every postal worker.

Because of this, the USPS is having to take austerity measures, like ending saturday delivery. But it will only get worse.

That's the reason why they don't have money. It's not because they are inefficient. Another reason is that every time they try to do something competitive against UPS or FedEx, lobbyists get it shot down.

They have been operating at a severe handicap.

By the way, I'm an atheist, and the idea of "atheist shoes" makes me cringe a bit.


I'd like to see the actual survey data. I'm not even sure how to interpret this:

"Thus the ad hoc study: The company says it sent two packages each to 89 people (178 packages total) canvassing nearly every U.S. state — one package with the Atheist-branded tape, one without. And this is where the results suggest something fishy: According to Atheist Shoes, company-branded packages took on average three days longer to reach their destination and were 10 times more likely to disappear outright."

What does "10 times more likely to disappear outright" mean? Are they saying that 10 of the 89 Atheist packages disappeared, and only 1 of the 89 non-atheist packages disappeared?


I can't find the previous submission but someone pointed out that one of the Atheist labelled packages was lost for almost 40 days: that alone dragged the Atheist package average down so much it can almost account for the entire difference.

edit: I misremembered about the difference it made, here's the submission: https://news.ycombinator.com/item?id=5442728


Check out the comments on their own web site: http://www.atheistberlin.com/study

Not sure it will completely sate you but it's a start.


Thanks - their comments (below the infographic) are awesome - very level headed and it looks like they are going to publish a peer-reviewed paper.



Agreed - the sample size is also really small to be interpreting slower arrival times as well, if only 1 or 2 packages took a long time it's likely they just found an outlier.


According to http://www.atheistberlin.com/study they got a significant Wilcoxon Signed Rank Test at p < 0.01. The Wilcoxon Signed Rank Test is nonparametric (i.e. sacrifices some statistical power in order to not make assumptions about the underlying distribution of the data) so is meaningful even if there is a long tail of packages that take longer due to circumstances outside of the study variable.


Sample size is something that must be interpreted in the presence of power. You can make a solid conclusion with a very small sample size if the true difference in arrival times is very large, given that the assumptions of the hypothesis test hold (t-test can be a little ridiculous with some of its assumptions sometimes).

In the original article, one of the footnotes mentioned that they tested the data using Wilcoxon's Signed-Rank test, which mitigates a lot of the impact of single outliers.

I'd love to see the raw data though, to see an even less-sensitive method to outliers (sign test). If the difference between the groups is as large as the article would lead us to believe, the loss of power should not present any problem.


Yeah, that 3 days could either be damning, or absolutely nothing. Until we can examine the data this is a non-story despite how juicy we may want it to be.


Yep pretty much. In the original article (not from time.com) it was 9 and 1. So I am not sure where the number came from.


I realize this is purely anecdotal, but my uncle worked for the USPS as a mail carrier in a major East Coast city for decades and saw people fired for really small violations. In one case, a guy with many years of seniority was fired because he stole some spam-mail coupons that a company had mailed to a non-existent address.

I'm not saying the USPS is a model of efficiency and I'm sure there are regional variations on strictness and oversight, but it's pretty likely that any USPS who loses (or "loses") a greater-than-average number of packages is risking their job.

I don't doubt that there are people who'd object to a package marked "ATHIEST" but how many USPS employees would be so offended they'd risk their jobs?

USPS workers often work there for decades and a lot of USPS employees have an eye towards staying there until retirement. They also get good health benefits, a nice government pension, and you don't need a higher education diploma to get hired by the USPS in the first place. A USPS employee is unlikely to have a lot of comparable job opportunities after spending years with the USPS, so I have a hard time believing so many of them would be like, "ATHIEST? Screw this!" and simply punt a package they disliked off of a bridge.


This could be sabotage, but maybe somebody is just doing it for kicks. Either way, it's really too bad about all these lost soles.


The heels responsible should foot the bill and be given the boot.


Almost as witty as the article headline announcing British politician Michael Foot being placed in charge of a committee on decommissioning nuclear weapons

"Foot heads arms body"


Study: Controversy is 10 times more likely to sell shoes


I don't care if I get downvotes for saying this, but I think this story is full of crap.

Such a small sample size could only be considered acceptable if there was overwhelming consistence.

Also, the USPS system is not a black box, it is well known that they optimize for cost, not speed. e.g., if the adresses happen to not have the 9-digit zip, the packages will have to go through manual sorting, probably understaffed.

I'd say no real atheist would trumpet for this crap quasi-science, only those who pretend to be one just for making a quick buck would.



If one had several thousand dollars to waste it'd be interesting to run a larger test with a range of deities as well as "atheist", "flying spaghetti monster", the little fish symbol and other religious and non-religious words and symbology. I'd throw in stuff like Mickey Mouse, Obama and George Bush images in there to have some fun.

The results could be very interesting.


Not really though. Does finding that many religious people in the US are insanely intolerant really prove anything?


It proves that God is a third variable?

Hidden. Unknowable. The connecting thread between two completely unrelated incidents.


Sad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: