The ‘replication crisis’ across science
For those of us who share a faith in science, the growing claims of a “replication crisis” in research come as something of a shock. The ability to reproduce experimental research is one of the hallmarks of the scientific method. The idea is that if you reproduce the methods of an experiment, and you find the same results, those results become believable. But new evidence suggests when you actually try to replicate the findings of many studies, you can’t, raising questions about the credibility of the evidence and causing some to argue science is broken. More concerning still, some of the studies that can’t be replicated have been highly influential and spawned a great deal of subsequent science that is now clouded in doubt.
Research in psychology has come in for the most scrutiny recently, with the Open Science Collaboration trying to replicate the findings of 100 studies published previously in leading psychology journals. The original psychology studies covered everything from how humans make decisions, to why we display political bumper stickers, to whether pride can be correctly read on our faces. Ninety-seven per cent of the original studies had statistically significant results. When replicated by the collaboration, only 36 per cent of the studies had statistically significant results. And when the replication studies did find a similar result, the size of the effect or the strength of an association tended to be less impressive. Overall the Open Science Collaboration concluded in 2015 that a “large proportion of replications produced weaker evidence”, and “there is still more work to do to verify whether we know what we think we know”.
It’s not just psychology. Work in medicine also suggests findings of many important laboratory studies cannot be replicated. In 2011, researchers inside the German drug company Bayer published a paper in Nature Reviews titled “Believe it or not”. Concerned about how much they might rely on earlier findings about potential new drugs, the Bayer group collected data from 67 in-house projects that had attempted to replicate previous studies, mainly about cancer, but also on women’s health and heart disease. In only 20 to 25 per cent of those projects had Bayer’s new studies confirmed results of previously published research. “Surprisingly,” they wrote, “even publications in prestigious journals or from several independent groups did not ensure reproducibility.” Searching for explanations, they speculated intense pressure to publish could have caused sloppiness in the original experiments or how they were reported, and noted that positive results tended to get published more than negative ones.
A year later in 2012, Nature reported a similar experience inside drug company Amgen. Scientists tried to confirm results of what were deemed to be 53 landmark pre-clinical cancer studies. Findings were confirmed in only six of 53. In the six studies where results could be successfully reproduced, the original researchers tended to pay close attention to their method and biases, and ensured complete reporting of results. For the rest, where results couldn’t be replicated, the original researchers were not routinely blinded to whether data was from the experimental or the control group, opening up potential for serious bias in the results. This is not just an academic concern. As the Nature authors pointed out, some of the papers that couldn’t be replicated had “spawned an entire field”, with hundreds of follow-up papers that didn’t seek to confirm the original results. “More troubling,” they wrote, “some of the research has triggered a series of clinical studies – suggesting that many patients had subjected themselves to a trial or a regimen or agent that probably wouldn’t work.”
More broadly, there are estimates that perhaps half of all pre-clinical studies – not just in cancer – suffer from this problem of non-reproducibility, which means in the Unites States, according to a recent study in PLOS Biology, “approximately $28 billion a year is spent on research that cannot be replicated”. The authors cited reasons including problems in study designs, biological agents, laboratory protocols and reporting of results. Other researchers have pointed out many studies – not just pre-clinical ones – can’t be replicated because there’s simply not enough information in the published paper to allow someone else to repeat the study. Moreover, science is a human and often corporate affair with a systemic bias towards producing positive results. Researchers tend to want to produce novel and upbeat findings, rather than publish a negative result or run an elegantly reproducible method, while sponsors tend to want – and get – favourable findings for studies of their products.
Some are using these fresh findings about replication to help build a wider argument that contemporary science is in deep trouble. In an article titled “Scientific Regress”, published this month in the religion and public life journal First Things, software engineer William Wilson attacks a “bloated scientific bureaucracy” and critiques a “scientism” that worships scientists and treats their evidence as the new “holy book”. Along with psychology and medicine, physics is also targeted, with Wilson pointing out scientists had to retract much-celebrated claims in 2011 that subatomic particles could travel faster than the speed of light. It’s worth noting that First Things is published by an institute founded to “confront the ideology of secularism”. But Wilson’s article makes some interesting points and has inspired others to make grander claims that the so-called “replication crisis” is a sign that “science is broken”.
There is little doubt that non-reproducibility is a major contemporary concern, and that much published science is poor quality, conflicts of interest are endemic and the peer-review system designed to protect scientific integrity is wanting. But a dose of healthy scepticism may help address any perceptions that science itself is in some way in crisis.
The Open Science Collaboration psychology project seen as confirming a widespread problem with replication has itself been the subject of scientific criticism. An article in Science in March pointed to flaws in the collaboration’s method, and argued its findings should lead to the opposite conclusion – that “the reproducibility of psychological science is quite high” and there is no evidence of a “replication crisis”. The studies from the Bayer and Amgen scientists are small and in-house, and while their results are worrying, neither project was run as a rigorous scientific study. The team who announced the faster-than-the-speed-of light claims retracted them directly the following year, citing clockwork errors. And even the big psychology replication project carried a strong warning in its 2015 article: “Any temptation to interpret these results as a defeat for psychology, or science more generally, must contend with the fact that this project demonstrates science behaving as it should.”
Further evidence science is behaving as it should arises from the host of strategies under way to deal with the problem of replication, including crowdsourced collaborations. With millions of dollars from philanthropic foundations, the Centre for Open Science – which hosted the psychology project – is currently hosting another major attempt to replicate studies, this time in cancer. It has also produced guidelines for more transparency in research processes, which are gaining traction with many scientific journals. There is even discussion about trying to replicate experimental findings in independent labs, before publication.
Former Nobel prize-winner Daniel Kahneman has weighed in with an open letter. “It is good form to pretend that science is a purely rational activity, an objective and unemotional search for the truth,” he writes, “but of course we all know that this image is a myth.” Given the harsh human reality of science, Kahneman is calling for new replication etiquette where study authors are guaranteed a role in any attempts to replicate their work.
More broadly there are reforms under way to try to improve the quality of scientific research, and the way it’s reported and used. In medicine we’ve seen the rising importance of the systematic review, or “meta-analyses”, a higher form of evidence that offers a summary of all relevant studies, so decisions being made by doctors, patients or public officials are not taken on the basis of a single study alone. But even systematic reviews are not perfect: they often summarise biased evidence, and reviewers, like everyone else, have biases, too. In one example, two different teams summarised the same evidence – about whether screening for depression was valuable – and came to totally different conclusions.
As the presenter of ABC’s The Science Show Robyn Williams suggests, part of the problem here is that researcher interest in replication is generally very low, telling me “you don’t tend to get the Nobel prize for redoing someone else’s work”.
Williams agrees there’s a problem with replication, but rejects the idea science is broken. For him, a suite of reforms pushing more transparency – and the work of critics such as his nephew and best-selling author of Bad Science, Ben Goldacre – are “helping the process of the truth being revealed eventually”.
This article was first published in the print edition of The Saturday Paper on May 28, 2016 as "Doubt masks replicas".
A free press is one you pay for. In the short term, the economic fallout from coronavirus has taken about a third of our revenue. We will survive this crisis, but we need the support of readers. Now is the time to subscribe.