Torturing Data in the Name of Nonsense
BY GARY SMITH
Spiritualism began more than 150 years ago with the three Fox sisters: Leah, Margaret, and Kate. People who attended their séances reported that the deceased used rapping sounds to communicate with the living. (Margaret eventually admitted that the mysterious sounds were made by the sisters cracking their toe joints!) The subsequent decades have seen an amazing array of mind-over-matter tricks involving entertainers jangling tambourines, bending spoons, and abusing other props.
Nowadays, in our age of big data and big computers, peer-reviewed research is often used to demonstrate the existence of implausible mental powers. Every study of implausible mental feats that I’ve looked at—and I’ve looked at many—provides further evidence of the wisdom of Nobel-laureate Ronald Coase’s wry observation, “If you torture the data long enough, it will confess.” I will illustrate this statistical mischief with several life-and-death examples.
Scary Days
The British Medical Journal, one of the world’s top medical journals, published a study provocatively titled, “The Hound of the Baskervilles Effect,” referring to Sir Arthur Conan Doyle’s story in which Charles Baskerville dies of a heart attack while he is being pursued down a dark alley by a vicious dog:
The dog, incited by its master, sprang over the wicket-gate and pursued the unfortunate baronet, who fled screaming down the yew alley. In that gloomy tunnel it must indeed have been a dreadful sight to see that huge black creature, with its flaming jaws and blazing eyes, bounding after its victim. He fell dead at the end of the alley from heart disease and terror.
The study’s author argued that Japanese and Chinese Americans are similarly susceptible to heart attacks on the fourth day of every month because in Japanese, Mandarin, and Cantonese, the pronunciation of four and death are very similar. Four is an unlucky number for many Asian-Americans, but are they really so superstitious and fearful that the fourth day of the month—which, after all, happens every month—is as terrifying as being chased down a dark alley by a ferocious dog?
I looked at the Baskervilles study (isn’t the BS acronym tempting?) and found that the authors examined California data for Japanese and Chinese Americans who died of coronary disease. Of those deaths that occurred on the third, fourth, and fifth days of the month, I found that 33.9 percent were on day 4, which does not differ substantially or statistically from the expected 33.3 percent. So, how did the Baskervilles study come to the opposite conclusion? They tortured the data. There are dozens of categories of heart disease and they only reported results for the five categories in which more than one-third of the deaths occurred on day 4. Unsurprisingly, attempts by other researchers to replicate their results failed.
Illustration in this article by Anna Maltese
Cruel Parents
Another study compared the age at death of people with positive initials (like ACE), negative initials (like ASS), and neutral initials (like AGW). The authors reported that, on average, compared to people with neutral initials, males with positive initials lived 4.5 more years, males with negative initials lived 2.8 fewer years, females with positive initials lived 3.4 more years, and that there was no difference for females with negative initials. A 4.5-year difference in life expectancy is larger than the difference between the United States and Venezuela and almost as large as the difference between the United States and Algeria. There are plausible explanations for the differences between the United States and Venezuela or Algeria. There is no comparable scientific explanation for why initials might have such large effects on life expectancy.
There were two problems with this study. One is the suspicious labeling of initials. Why did their list of negative initials include BUG but not FAG, DUD but not DUM, HOG but not FAT? Perhaps they labeled initials to fit the data. The second problem is that they compared the age at death of people who died in the same year, instead of people who were born in the same year. Suppose, for example, that mortality rates are identical for people with positive and negative initials, but that negative initials are more common today than in the past. If so, recently deceased people with negative initials will be younger than people with other initials.