- Registered Replication Reports
- Article Type Description
- Instructions for Authors
- Instructions for Reviewers
- Ongoing Replication Projects
- Replication failures in psychology not due to differences in study populations
- Under scrutiny
- Replication chain
- Not a doomsayer
- More social science studies just failed to replicate. Here’s why this is good
- Rigorous retests of social science studies often yield less impressive results
- It’s not always clear why a study doesn’t replicate. Science is hard.
- The “replication crisis” in psychology has been going on for years now. And scientists are reforming their ways.
- Leaning into the replication crisis: Why you should consider conducting replication research
- About the author
Registered Replication Reports
Replicability is a cornerstone of science. Yet replication studies rarely appear in psychology journals.
The new Registered Replication Reports article type in Advances in Methods and Practices in Psychological Science fortifies the foundation of psychological science by publishing collections of replications a shared and vetted protocol. It is motivated by the following principles:
• Psychological science should emphasize findings that are robust, replicable, and generalizable.
• Direct replications are necessary to estimate the true size of an effect.
• Well-designed replication studies should be published regardless of the size of the effect or statistical significance of the result.
Traditional psychology journals emphasize theoretical and empirical novelty rather than reproducibility. When journals consider a replication attempt, the process can be an uphill battle for authors.
Given the challenges associated with publishing replication attempts, researchers have little incentive to conduct such studies in the first place.
Yet, only with multiple replication attempts can we adequately estimate the true size of an effect.
A central goal of publishing Registered Replication Reports is to encourage replication studies by modifying the typical submission and review process. Authors submit a detailed description of the method and analysis plan.
The submitted plan is then sent to the author(s) of the replicated study for review. Because the proposal review occurs before data collection, reviewers have an incentive to make sure that the planned replication conforms to the methods of the original study.
Consequently, the review process is more constructive than combative. Once the replication plan is accepted, it is posted publicly, and other laboratories can follow the same protocol in conducting their own replications of the original result.
Those additional replication proposals are vetted by the editors to make sure they conform to the approved protocol.
The results of the replication attempts are then published together in Advances in Methods and Practices in Psychological Science as a Registered Replication Report.
Crucially, the results of the replication attempts are published regardless of the outcome, and the protocol is predetermined and registered in advance. The conclusion of a Registered Replication Report should avoid categorizing each result as a success or failure to replicate.
Instead, it should focus on the cumulative estimate of the effect size. Together with the separate results of each replication attempt, the journal will publish a figure illustrating the measured effects from each study and a meta-analytic effect size estimate.
The details of the protocol, including any stimuli or code provided by the original authors or replicating laboratories as well as data from each study, will be available on the Open Science Framework (OSF) website and will be linked from the published report and the APS website for further inspection and analysis by other researchers. Once all the replication attempts have been collected into a final report, the author(s) of the original article will be invited to submit a short, peer-reviewed commentary on the collection of replication attempts.
This publication model provides many broader benefits to psychological science:
- Because the registered replication attempts are published regardless of outcome, researchers have an incentive to replicate classic findings before beginning a new line of research extending those findings.
- Subtleties of methodology that rarely appear in method sections of traditional journals will emerge from the constructive review process because original authors will have an incentive to make them known (i.e., helping to make sure the replications are designed properly).
- Multiple labs can attempt direct replications of the same finding, and all such replication attempts will be interlinked, providing a cumulative estimate of the true size of the effect.
- The emphasis on estimating effect sizes rather than on the dichotomous characterization of a replication attempt as a success or failure statistical significance could lead to greater awareness of the shortcomings of traditional null-hypothesis significance testing.
- Authors and journalists will have a source for vetted, robust findings, and a stable estimate of the effect size for controversial findings.
- Researchers may hesitate to publish a surprising result from a small-sample study without first verifying that result with an adequately powered design.
Back to Top
Article Type Description
A Registered Replication Report consists of a collection of independently conducted, direct replications of an original study, all of which follow a shared, predetermined protocol.
The collection of replications will be published as a single article in Advances in Methods and Practices in Psychological Science, and all researchers contributing replications will be listed as authors.
The initial submission will be only the plan (as the results will not have been collected yet), but the final publication will include the following:
- A brief introduction explaining the importance of the original study and the reason why a more precise, cumulative estimate of size and robustness of the reported effect will benefit the field.
- A detailed description of the shared protocol used by all replication teams.
- A figure showing the effect sizes measured by each replication team, along with a meta-analytic estimate of the effect size. (This figure will be generated by the editors, in consultation with experts in meta-analytic techniques.)
- Brief descriptions of the results and analyses for each individual replication attempt (written separately by each replication team).
- A brief discussion of the cumulative findings.
The author of the original article that was the focus of the collected replications will be offered an opportunity to submit a short, peer-reviewed commentary on the Registered Replication Report.
The published report will link to more extensive reports from each replicating lab on the Open Science Framework website, and all replicating labs are expected to post the data from their replication attempts.
Additional replications completed after the initial registered replication report appears in print should be posted on the Open Science Framework, and those results may be incorporated into a meta-analytic effect size estimate published in AMPPS.
Back to Top
Instructions for Authors
Detailed instructions for authors can be found here.
Back to Top
Instructions for Reviewers
Detailed instructions for reviewers can be found here.
Back to Top
Ongoing Replication Projects
A list of ongoing replication projects and instructions for joining a project can be found here. You can also get announcements and updates from the editors by joining the Registered Replication Report Google group.
Back to Top
Replication failures in psychology not due to differences in study populations
A large-scale effort to replicate results in psychology research has rebuffed claims that failures to reproduce social-science findings might be down to differences in study populations.
The drive recruited labs around the world to try to replicate the results of 28 classic and contemporary psychology experiments. Only half were reproduced successfully using a strict threshold for significance that was set at P< 0.0001 (the P value is a common test for judging the strength of scientific evidence).
The initiative sampled populations from across six continents, and the team behind the effort says that its overall findings suggest that the culture or setting of the group of participants is not an important factor in whether results can be replicated.
The reproducibility of research results — and psychology particularly — has come under scrutiny in recent years. Several efforts have tried to repeat published findings in a variety of fields, with mixed outcomes.
The latest effort, called Many Labs 2, was led by psychologist Brian Nosek of the Center for Open Science in Charlottesville, Virginia. Nosek and his colleagues designed their project to address major criticisms of previous replication efforts — including questions about sampling and the assertion that research protocols might not be carried out properly in reproducibility attempts.
Researchers obtained the original materials used in each experiment, and asked experts — in many cases, the original authors of the studies — to review their experimental protocols in advance.
Sixty different labs in 36 countries and territories then redid each experiment, providing combined sample sizes that were, on average, 62 times larger than the original ones.
The results of the effort are posted today as a preprint1 and are scheduled to be published in Advances in Methods and Practices in Psychological Science.
“We wanted to address the common reaction that, of course the replication failed because the conditions changed, and people are different,” says Nosek. “It’s a possible explanation, but not a satisfying one, because we don’t know why that difference is important.”
Even under these conditions, the results of only 14 of the 28 experiments were replicated, and the researchers determined that the diversity of the study populations had little effect on the failures. “Those that failed tended to fail everywhere,” says Nosek.
For successful replication attempts, the picture was more complicated. For these studies, the results showed some differences between different replication attempts but overall, that variation was relatively small.
“Heterogeneity occurs, but it is not as big as we think, and is not a plausible explanation for why some studies fail to replicate,” says Nosek. “It closes off one of the obvious alternative explanations.”
Many Labs 2 is the latest in a series of six large-scale replication efforts in psychology. It focused on a range of studies, none of which had been looked at by other big reproducibility projects.
They include classic studies such as psychologist Daniel Kahneman’s 1981 work2 on framing effects, a form of cognitive bias in which people react differently to a particular choice depending on how it is presented (the study was successfully replicated), and modern research, including work3 by Yoel Inbar in 2009 showing that people who were more ly to experience feelings of disgust tended to be more homophobic.
The attempt to replicate Inbar’s study failed with the strict significance criterion, which surprised Nosek. “I had high confidence in that one because it’s related to things I study myself.”
Inbar, a psychologist at the University of Toronto Scarborough in Canada, who took part in Many Labs 2, was also surprised that his work failed to replicate, but he doesn’t question the outcome. “We could have just gotten lucky, since the original sample size was small, or attitudes may have shifted over time,” he says.
Inbar says that there were also weaknesses in his original study. For instance, he used data initially collected by a colleague for another study.
The focus on reproducibility in recent years means that Inbar, many psychologists, has changed how he works in an effort to produce more-reliable results. “These days, I would never take an opportunistic secondary analysis that,” he says.
Not a doomsayer
Replication projects such as Nosek’s do not establish the overall replication rate in a field, because the studies chosen for replication are not a representative sample.
Nor do they answer the question of what a ‘good’ replication rate would be. Researchers are not aiming for a perfect score.
“Achieving 100% reproducibility on initial findings would mean that we are being too conservative and not pushing the envelope hard enough,” says Nosek.
A previous Many Labs project4 successfully replicated 10 13 studies, while other projects have found replication rates as low as 36%. Of the 190 studies examined in the 6 large-scale efforts combined, 90 were successfully replicated, for a rate of 47%.
That seems too low to Inbar. “If we only have a coin-flip chance to replicate with a large sample size, that feels wrong,” he says.
But Fritz Strack, a psychologist at the University of Würzburg in Germany, is not sure that such replication projects reveal anything useful about the state of psychology.
Rather, he says, each replication teaches us more about what might be affecting the result.
“Instead of declaring yet another classical finding a ‘false positive’, replicators should identify the conditions under which an effect can and cannot be obtained,” he adds.
Nosek counters that ongoing replication efforts are important for two reasons: to ensure that the replication results are themselves replicable, and to address criticisms of previous work, as this one did. “That is how science advances: evidence, criticism, more evidence to examine the viability of the criticisms,” he says.
- Correction 19 November 2018: An earlier version of this story included an incorrect reference for the reproducibility paper.
Klein, R. A. et al. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/9654g (2018).
Tversky, A., & Kahneman, D. Science211, 453–458 (1981).
PubMed Article Google Scholar
Inbar, Y., Pizarro, D., Knobe, J., & Bloom, P. Emotion 9, 435-439 (2009).
PubMed Google Scholar
Klein, R. A. et al.Soc. Psychol.45, 142–152 (2014).
Article Google Scholar
More social science studies just failed to replicate. Here’s why this is good
Psychologists are still wondering: “What’s going on in there?” They’re just doing it with greater rigor. Enis Aksoy/Getty Creative Images
One of the cornerstone principles of science is replication.
This is the idea that experiments need to be repeated to find out if the results will be consistent. The fact that an experiment can be replicated is how we know its results contain a nugget of truth.
Without replication, we can’t be sure.
For the past several years, social scientists have been deeply worried about the replicability of their findings.
Incredibly influential, textbook findings in psychology — the “ego depletion” theory of willpower, or the “marshmallow test” — have been bending or breaking under rigorous retests.
And the scientists have learned that what they used to consider commonplace methodological practices were really just recipes to generate false positives. This period has been called the “replication crisis” by some.
And the reckoning is still underway. Recently, a team of social scientists — spanning psychologists and economists — attempted to replicate 21 findings published in the most prestigious general science journals: Nature and Science.
Some of the retested studies have been widely influential in science and in pop culture, a 2011 paper on whether access to search engines hinders our memories, or whether reading books improves a child’s theory of mind (meaning their ability to understand that other people have thoughts and intentions different from their own).
On Monday, they’re publishing their results in the journal Nature Human Behavior. Here’s their take-home lesson: Even studies that are published in the top journals should be taken with a grain of salt until they are replicated. They’re initial findings, not ironclad truth. And they can be really hard to replicate, for a variety of reasons.
Rigorous retests of social science studies often yield less impressive results
The scientists who ran the 21 replication tests didn’t just repeat the original experiments — they made them more rigorous. In some cases, they increased the number of participants by a factor of five, and preregistered their study and analysis designs before a single participant was brought into the lab.
All the original authors (save for one group that couldn’t be reached), signed off on the study designs too. Preregistering is making a promise to not deviate from a plan and inject bias into the results.
Here are the results: 13 of the 21 results replicated.
But perhaps just as notable: Even among the studies that did pass, the effect sizes (that is, the difference between the experimental group and the control group in the experiment, or the size of the change the experimental manipulation made) decreased by around half, meaning that the original findings ly overstated the power of the experimental manipulation.
“Overall, our study shows statistically significant scientific findings should be interpreted rather cautiously until they have been replicated, even if they have been published in the most renowned journals,” Felix Holzmeister, an Austrian economist and one of the study co-authors, says.
It’s not always clear why a study doesn’t replicate. Science is hard.
Many of the papers that were retested contained multiple experiments. Only one experiment from each paper was tested. So these failed replications don’t necessarily mean the theory behind the original findings is totally bunk.
For instance, the famous “Google Effects on Memory” paper — which found that we often don’t remember things as well when we know we can search for them online — did not replicate in this study. But the experiment chosen was a word-priming task (i.e.
, does thinking about the internet make it harder to retrieve information), and not the more real-world experiment that involved actually answering trivia statements.
And other research since has bolstered that paper’s general argument that access to the internet is shifting the relationship we have with, and the utility of, our own memories.
There could be a lot of reasons a result doesn’t replicate. One is that the experimenters doing the replication messed something up.
The Unexplainable newsletter guides you through the most fascinating, unanswered questions in science — and the mind-bending ways scientists are trying to answer them. Sign up today.
Another reason can be that the study stumbled on a false positive.
One of the experiments that didn’t replicate was from University of Kentucky psychologist Will Gervais. The experiment tried to see if getting people to think more rationally would make them less willing to report religious belief.
“In hindsight, our study was outright silly,” Gervais says. They had people look at a picture of Rodin’s The Thinker or another statue. They thought The Thinker would nudge people to think harder.
“When we asked them a single question on whether they believe in God, it was a really tiny sample size, and barely significant … I’d to think it wouldn’t get published today,” Gervais says. (And know, this study was published in Science a top journal.)
In other cases, a study may not replicate because the target — the human subjects — has changed. In 2012, MIT psychologist David Rand published a paper in Nature on human cooperation.
The experiment involved online participants playing an economics game. He argues that a lot of online study participants have since grown familiar with this game, which makes it a less useful tool to probe real-life behaviors.
His experiment didn’t replicate in the new study.
Finding out why a study didn’t replicate is hard work. But it’s exactly the type of work, and thinking, that scientists need to be engaged in. The point of this replication project, and others it, is not to call out individual researchers.
“It’s a reminder of our values,” says Brian Nosek, a psychologist and the director of the Center for Open Science, who collaborated on the newstudy. Scientists who publish in top journals should know their work may be checked up on.
It’s also important, he notes, to know that social science’s inability to be replicable is in itself a replicable finding.
Often, when studies don’t replicate, it’s not that the effort totally disproves the underlying hypothesis. And it doesn’t mean the original study authors were frauds. But replication results do often significantly change the story we tell about the experiment.
For instance, I recently wrote about a replication effort of the famous “marshmallow test” studies, which originally showed that the ability to delay gratification early in life is correlated with success later on. A new paper found this correlation, but when the authors controlled for factors family background, the correlation went away.
Here’s how the story changed: Delay of gratification is not a unique lever to pull to positively influence other aspects of a person’s life. It’s a consequence of bigger-picture, harder-to-change components of a person.
In science, too often, the first demonstration of an idea becomes the lasting one. Replications are a reminder that in science, this isn’t supposed to be the case. Science ought to embrace and learn from failure.
The “replication crisis” in psychology has been going on for years now. And scientists are reforming their ways.
The “replication crisis” in psychology, as it is often called, started around 2010, when a paper using completely accepted experimental methods was published purporting to find evidence that people were capable of perceiving the future, which is impossible. This prompted a reckoning: Common practices drawing on small samples of college students were found to be insufficient to find true experimental effects.
Scientists thought if you could find an effect in a small number of people, that effect must be robust. But often, significant results from small samples turn out to be statistical flukes. (For more on this, read our explainer on p-values.)
The crisis intensified in 2015 when a group of psychologists, which included Nosek, published a report in Science with evidence of an overarching problem: When 270 psychologists tried to replicate 100 experiments published in top journals, only around 40 percent of the studies held up.
The remainder either failed or yielded inconclusive data. And again, the replications that did work showed weaker effects than the original papers.
The studies that tended to replicate had more highly significant results compared to the ones that just barely crossed the threshold of significance.
Another important reason to do replications, Nosek says, is to get better at understanding what types of studies are most ly to replicate, and to sharpen scientists’ intuitions about what hypotheses are worthy of testing and which are not.
As part of the new study, Nosek and his colleagues added a prediction component. A group of scientists took bets on which studies they thought would replicate and which they thought wouldn’t. The bets largely tracked with the final results.
As you can see in the chart below, the yellow dots are the studies that did not replicate, and they were all unfavorably ranked by the prediction market survey.
“These results suggest [there’s] something systematic about papers that fail to replicate,” Anna Dreber, a Stockholm-based economist and one of the study co-authors, says.
Nature Human Behavior
One thing that stands out: Many of the papers that failed to replicate sound a little too good to be true. Take this 2010 paper that finds simply washing hands negates a commonhuman hindsight bias.
When we make a tough choice, we often look back on the choice we passed on unfavorably and are biased to find reasons to justify our decision.
Washing hands in an experiment “seems to more generally remove past concerns, resulting in a metaphorical ‘clean slate’ effect,” the study’s abstract stated.
It all sounds a little too easy, too simple — and it didn’t replicate.
All that said, there are some promising signs that social science is getting better. More and more scientists are preregistering their study designs.
This prevents them from cherry-picking results and analyses that are more favorable to their favored conclusions.
Journals are getting better at demanding larger subject pools in experiments and are increasingly insisting that scientists share all the underlying data of their experiments for others to assess.
“The lesson this project,” Nosek says, “is a very positive message of reformation. Science is going to get better.”
«,»author»:»Brian Resnick»,»date_published»:»2018-08-27T15:00:02.000Z»,»lead_image_url»:»https://cdn.vox-cdn.com/thumbor/q57DSTD2-c-ciceY8VHjbXc5kaA=/0x414:4000×2508/fit-in/1200×630/cdn.vox-cdn.com/uploads/chorus_asset/file/12562441/GettyImages_815777738.jpg»,»dek»:null,»next_page_url»:null,»url»:»https://www.vox.com/science-and-health/2018/8/27/17761466/psychology-replication-crisis-nature-social-science»,»domain»:»www.vox.com»,»excerpt»:»What scientists learn from failed replications: how to do better science.»,»word_count»:1695,»direction»:»ltr»,»total_pages»:1,»rendered_pages»:1}
Leaning into the replication crisis: Why you should consider conducting replication research
As a student, you may have heard buzz about the replication crisis in psychology. Some of the more sensational headlines paint a bleak picture of modern research and depending on to whom you speak, individuals can be optimistic while others are demoralized. So what is going on and is it really a crisis?
The dialogue around replication ignited in 2015 when Brian Nosek’s lab reported that after replicating 100 studies from three psychology journals, researchers were unable to reproduce a large portion of findings. This report was controversial because it called into question the validity of research shared in academic journals.
Publication in high profile journals requires the research to be subjected to a rigorous peer-review process. At this point, it is assumed the conclusions shared are trustworthy and others can now replicate or build upon the work.
Following the Nosek study, more labs began to conduct replications and a disturbing trend emerged: a large portion of studies across multiple disciplines in science failed the replication test.
Replication is vital to psychology because studying human behavior is messy. There are numerous extraneous variables that can result in bias if researchers are not vigilant. Replication helps verify that the presence of a behavior at one point in time is not due to chance.
The report that the Open Science Collaboration (2015) put forth did not undermine the peer-review process per se; rather, it highlighted a problem within the research culture. Journals were more ly to publish innovative studies over replication studies.
Following the trends of the journals, researchers who require publications in order to advance their careers are unly to conduct a replication.
As a result, without continued investigation, the exploratory studies can be treated as established lines of research rather than fledgling inquiries.
In response to the replication crisis, more individuals have been embracing the movement of transparency in research.
The Open Science Foundation (OSF) and the Society for Improving Psychological Science (SIPS) have created opportunities for researchers to brainstorm means of strengthening research practices and provide avenues to share replication results.
these changes, I would argue the issue of replication was not a crisis, but an awakening for researchers who had become complacent to the consequences of the toxic elements of the research culture. Highlighting the issue resulted in dialogue and change.
It is a perfect example of the dynamic nature of science and captures the essence of how a career in research can be intellectually stimulating, rewarding and sometimes frustrating.
From a student perspective, engaging in replication research is a useful tool to develop your own research skills. I have found that many students have misconceptions about how to conduct research. Some common behaviors include:
- Assuming their idea is unique, but not conducting a thorough literature search to determine what is established.
- Proposing studies that are too complicated or have design faults.
- Lack of awareness of ethics or the approval process needed to conduct experiments.
- Lack of experience in regard to data entry or statistical analysis.
- Desire to change practices mid-study to increase participant compliance.
The replication movement has presented a unique opportunity for undergraduate researchers to provide meaningful contributions to science by bolstering evidence needed to substantiate exploratory findings.
I teach a research seminar at Central Oregon Community College that requires teams to complete a replication study provided by the Collaborative Replication and Education Project (associated with OSF).
Reviewers give feedback before and after data collection identifying problematic areas and insuring the study is an appropriate replication. Completed projects are shared on the website and exemplary studies are eligible for research badges. The process of replication requires students to slow down and analyze strategies.
Over the course of the term, the student understanding of the process matures as groups question the choices of the original researchers. It is a high impact, low-risk educational environment because students learn valuable lessons whether or not the replication is successful.
Replication studies may not offer rewards for professionals, but there are direct incentives for students.
Former seminar students have been able to add research experience to resumes, which, in turn, has allowed them to secure competitive positions in labs upon transfer to four-year institutions.
My students have also reported feeling more prepared for upper division courses and more confidence in their abilities to conduct individual research.
If you would to learn more about the replication movement or how you can begin a replication study, I suggest beginning with the reference below and visiting some of the websites of the organizations listed in this article.
Website for the Open Science Foundation
Website for the Society for the Improvement of Psychological Science
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. DOI: 10.1126/science.aac4716
About the author
Andria Woodell, PhD, is a professor at Central Oregon Community College in Bend, Oregon. She teaches a variety of courses ranging from introductory psychology, development, social, positive and psychology of violence.
She provides trainings to local and regional groups on psychological issues that arise in the workplace. She co-developed the COCC Teaching and Learning Center and is the current coordinator of the COCC Teaching Externship.
Beyond the classroom, her primary focus is preparing community college students for a successful transition into the university system.
She co-advises the COCC collaborative learning program, which develops students’ professional skills and provides research/leadership opportunities beyond the classroom.
In 2017, she earned the COCC Faculty Achievement award which is given to a faculty member who has demonstrated significant achievement in classroom teaching, leadership and professional excellence.