Hedgehog visits the physician

[]

In August, Open Science collaboration published a summary of its replication project. From the abstract:

Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study.

They got one-third success when they used significance (reject $H_0$) as a criterium for replication success and around one-half when meta-analytic methods were used to asses the replication success. All replication studies had good power and one would expect around 95% replication success. So hypothesis testing does not work in psychology. What do we do now?

One popular excuse for doing nothing is the following provided by Lisa Feldman Barrett:

Suppose you have two well-designed, carefully run studies, A and B, that investigate the same phenomenon. They perform what appear to be identical experiments, and yet they reach opposite conclusions. Study A produces the predicted phenomenon, whereas Study B does not. We have a failure to replicate. Does this mean that the phenomenon in question is necessarily illusory? Absolutely not. If the studies were well designed and executed, it is more likely that the phenomenon from Study A is true only under certain conditions. The scientist’s job now is to figure out what those conditions are, in order to form new and better hypotheses to test.

This explanation reminds of a joke about a hedgehog:

Hedgehog goes to physician and complains that he and his wife want a child but his wife is not getting pregnant.

Physician: "Do you and your wife eat carrots?"

Hedgehog: "No we don't."

Physician: "But you have to. Without eating carrots you can't conceive a child."

Hedgehog leaves, but returns a week later.

Hedgehog: "Doctor, it doesn't work. My wife is still not pregnant."

Physician: "Do you and your wife drink mint tea?"

Hedgehog: "No we don't."

Physician: "But you have to. Without drinking mint tea you can't conceive a child."

Hedgehog leaves, but once again, returns a week later.

Hedgehog: "Doctor, it doesn't work. My wife is still not pregnant."

Physician: "Are you and your wife having sex?"

Hedgehog: "No we don't."

Physician: "But you have to. Without having sex with your wife you can't conceive a child."

If we accept Feldman Barrett's view of replication we are giving researchers a license to make up spurious claims without having to care about providing solid evidence. This is a situation similar to that of the physician in the above joke - physician is allowed to make up explanations without providing solid evidence. Hedgehog - just like the replicating researchers is asked to test the claims and if he fails he is just asked to try something else until he succeeds. The set of propositions about getting pregnant is not infinite, so physician will, at some point, hit upon a solution to hedgehog's problem. To suppress such guessing behavior, the hedgehog should ask the physician, what are all the necessary conditions that need to be satisfied in order to conceive a child. This is what the Open science collaboration does before each replication. They contact the authors of the original study and ask them what are potential moderators that would alter the results. The researchers then attempt to control the mentioned moderators in the replication.

The joke illustrates another aspect. The physician's suggestions are surprising and counter-intuitive, because carrots and mint tea are easily obtainable, yet no one would assume that they affect the probability of conception. Similar, most of the studies in the high-profile psychological journals (and those included in the replication attempt) present surprising and counter-intuitive findings. After an iterative addition of moderators, the findings become less robust, more context-dependent and even trivial. In the end, the physician hits upon a suggestion that solves hedgehog's problem. Unfortunately, one doesn't necessarily need to ask physician to get this kind of suggestion.

The psychological phenomena may be context dependent and fragile, but in that case one should be on against making claims about robust and context-independent effects and against using experiments designed to test robust and context-independent effects. Unfortunately, many of the studies that fail to replicate are conceived to test hypotheses that posit robust and context-independent effects.

Mozgostroje

Hedgehog visits the physician