Volume 23, Number 3/4
Table of Contents
ReplicationWiki: Improving Transparency in Social Sciences Research1
Jan H. Höffler
University of Göttingen, Germany
jhoeffl [at] uni-goettingen.de
In empirical social sciences research, only a small minority of study material is publicly available, therefore allowing full replication. The number of replication studies published in academic journals is even smaller. Our wiki documents the results of more than 300 replications, so far mainly in economics. It includes a database of more than 2,600 empirical studies. For each study we provide information about the availability of material for replication. This helps instructors to identify practical examples for courses focusing on empirical methods or on particular topics. Furthermore, it gives researchers better access to information on previous studies they can build on or compare their work with. We provide an overview of journals and their policies regarding data availability and publication of replications. The project has attracted interest from various fields and is open for expansion.
Keywords: Replication, Social Sciences, Education, Documentation, Data Archiving, Statistical Software
Replication refers to the duplication of published results (McCullough et al., 2006). While in other sciences replicability is regarded as a fundamental principle for research and a prerequisite for the publication of results, in empirical social sciences it is still not treated as a top priority. Results are based on data and calculations that usually do not get published and that are not routinely controlled. Although there have already been warnings in the literature for decades about the dangers of the neglect of replication, the big picture has not changed much. It should be the standard, not the exception, however, only a small minority of journals in economics have introduced policies that should help to ensure the replicability of their published results. Even in the few cases of mandatory online archives for data and code used for calculations, replicability cannot be taken for granted at all (McCullough et al., 2006). McCullough and Vinod (2003) put it very clearly: "Research that is not replicable is not science and cannot be trusted either as a part of the profession's accumulated body of knowledge or as a basis for policy." By improving the visibility of replication studies we hope to change the attitude towards replications, so that researchers see it as natural and do not feel singled out when others look at their work more closely.
In their 2000 reply to a replication of their controversial 1994 finding of a case where the introduction of minimum wages had caused an increase in employment (Neumark and Wascher, 2000), David Card and Alan B. Krueger follow this approach, pointing out: "Replication and reanalysis are important endeavors in economics, especially when new findings run counter to conventional wisdom". The economic knowledge about the effects of minimum wages on employment have greatly benefited from this debate, as has the econometric theory on how to identify such effects. In the same way economics has benefited from the debate about the success of development aid inspired by Burnside and Dollar (2000), its replication and extension by Easterly, Levine and Roodman (2004) and the subsequent reply by Burnside and Dollar (2004). Such progress is much facilitated by sharing the material used for research. Some researchers might also be better at giving helpful comments and improving ideas others had rather than at introducing entirely new thoughts of their own. These skills, which can be equally valuable, would otherwise be wasted.2
Hamermesh (2007) calls attention to the fact that a very large number of findings in economics are supported by data from just one country, the United States, even in those cases that have relevance internationally. This is not surprising as that is the country in which the highest number of researchers are located, most high profile journals are published in the United States, and there are many institutions with a long history of high-quality data collection. However, one cannot generally assume that findings based on a specific period of time and in one country will also hold elsewhere or when investigating a different period of time.
The information we collected about geographic origin of data provides further evidence for the dominance of US data in the economics literature. Of the 1,602 studies for which we collected information on geographic origin of data, we found 931 with data from the United States, but we found no, or very few, articles containing data from many other countries. Even for important economies the numbers are comparatively small. Sixty-eight studies had data from India, 60 from the UK, 45 from Canada, 33 from Italy, 27 from Sweden, 27 from China, 25 from Japan, 24 from Spain, 22 from Australia, 19 from Kenya, 17 from Norway, 16 from Switzerland, 13 from Israel, six from Russia, five from South Africa a search for the string "Africa" gives 14 further results. A significant number of studies indicate data from more than one country is used, and not all are listed.
Neither English as the official language of the country nor size of the economy can explain these numbers. Availability of quality data and funding for research will play a role. A study based on publications until 2005 found a bias for US data in top journals and a bias for European data in second tier journals (Das et al., 2013). The study found that GDP per capita is the main driving factor for use of data from a country. Some of the variation in the wiki data cannot be explained by this, e.g., the underrepresentation of Japan or the relatively high number of studies with Kenyan data. One factor is the low number of economics journals published outside of the United States and Europe and the lack of their acceptance in the field. For example, the Thomson Reuters Social Sciences Citation Index for 2014 lists eight journals from Latin America, and they rank between 297 and 332 of the 333 journals. This should be investigated further, and strategies are needed to overcome the misrepresentation of different economic realities in different locations in the economic literature.
Findings that are not tested by others in the field are less reliable. Findings should also be tested in different locations because they may not hold in different contexts. When they lead to wrong policies it can be very costly for societies.
2 The Wiki
By providing an infrastructure for sharing information about replications we facilitate the dissemination of insights derived from replication studies. In contrast to previous efforts such as the report on the American Economic Review Data Availability Compliance Project (Glandon, 2011), for our project we provide, as a basis, a replicable review paper on replicability in which we give an account of which studies were tested and which results were found in each case.
In the database of our wiki project we provide information about more than 2,600 empirical studies, especially with regards to the availability of material for their replication. As of March 15, 2017, for 965 of these studies data, code and a README for replication are marked as available in a journal archive. For many further studies at least data or code are marked as available in the journal archives or elsewhere online. One can browse by JEL codes, methods, data type, software, geographical origin of data, data sources used, keywords, and authors.
Three hundred thirty-nine replication studies are listed. For each there are fields for information on the type and result of replication, raw data availability, whether the original results are called into question, and whether the original authors accepted or rejected the results in cases where they replied. For replication type, the question is whether the same or different code and data were used as in the original study. Replication success is defined as obtaining the same results, and alternatively one can state that no results could be obtained, i.e., the study was not replicable, or it was only partially successful. For raw data availability the alternatives are: the final dataset can be exactly replicated from raw data; raw data can be used to replicate a dataset that is not substantially different from the final dataset used for the publication; raw data is somewhat available but insufficient to replicate the final dataset; or raw data is not available at all.
All these categorizations are suggestions and can be changed by the community; both the general categorization options and the values assigned for every individual study. The more replications we look at the more we see that each case has its particularities and there needs to be clear rules for how to categorize. These rules will have to be constantly refined in order to do justice to all cases that will arise. It is crucial to understand that the wiki community does not, and cannot, judge who is right in case where replications come to a conclusions different from the original studies. The wiki should not be seen as an endorsement of all the replications covered.
There are also, of course, replications of questionable quality, and it surprised us during our project that we found that replications are themselves often not replicable journals usually do not require making code and data available for replications, which makes it difficult for the readers to come to their own conclusion. Often findings different from those of the original study arise because several changes are made to the original calculations simultaneously and it is not always clear what influence each of these changes had. Ideally, discussions could help to find a consensus, but the wiki is still far away from achieving this, and also scientific consensus can change over time. The wiki can only be used to document the state of the literature and provide a platform for discussion. It should be noted that there are no property rights, everyone can contribute, and everyone's contributions can be edited by everyone else.
In cooperation with the RePEc bibliographic database dedicated to economics, studies in the ReplicationWiki are linked to their pages and those link back, which significantly raised the number of views of our pages. We want to contribute to the identification of studies the scientific community regards as especially important for replication, as already practised by the 3ie project for the replication of impact evaluations in development economics.
Finally, we started a working paper series for replications that also accepts very short contributions by students (Wohlfarth, 2012; Zakula, 2012; Weißer, 2014), and we provide an instructional video on replication.3 Our teaching resources4 can be freely used by any interested institutions or individuals such that everyone can participate in the improvement of replicability in empirical econometrics.
3 Journals Investigated, Replication Policies, and Wiki Features
So far, we focus on six journals that provide data of empirical studies in online archives. Five are journals of the American Economic Association that all follow a similar data availability policy: The American Economic Review (see Bernanke, 2004) and the American Economic Journal: Macroeconomics, Microeconomics, Applied Economics and Economic Policy). The sixth journal, the Journal of Political Economy, adopted the American Economic Review's data availability policy. It is unclear to us why the Papers and Proceedings of the American Economic Review were for many years held to a lesser scientific standard and were exempted from the data availability policy of the American Economic Review. Fortunately, this changed in 2015.
The Biometrical Journal was the first journal we became aware of that has a Reproducible Research Editor who checks the replicability of published results, at least from the material provided, even though that journal does not require its authors to contribute to its data archive. So far very few journals have followed suit. It is our experience that students, even PhD students among prestigious faculties, are typically not aware of the fact that reviewers do not routinely check the results of the studies they referee. The American Journal of Political Science announced in 2015 that an independent research institute would verify replication material for all empirical studies before publication in that journal, "to guarantee that they do, in fact, properly reproduce the analysis results". This to us seems like a convincing step to ensure replicability because in-house checks of policies often seem to fail and we encountered a number of journals that officially have a mandatory replication policy but in fact little can be found in the journal archives (Höffler, forthcoming).
A further problem seems to be even more widespread and more difficult to solve: even for the few journals that ask their authors to provide information about how the final data that get archived were obtained from the raw data, no details, or insufficient details, are supplied. There seems to be a lack of standards for issues such as how to document data cleaning and the merging of different data sets. Furthermore, many institutions that provide data frequently change datasets and do not archive each version of them. It seems promising to us that the project DataCite is using the DOI System to assign digital object identifiers to datasets, and we hope this will become an established standard.
The leading journal with a regular section focusing on replication of published studies is the Journal of Applied Econometrics (Pesaran, 2003). As this journal typically publishes technically more demanding studies, and it does not require its authors to archive the code they used to obtain their results, we considered their material as insufficient, at least for our bachelor students. However, the replications published inspired us and we regarded that journal's data archive as such a valuable resource that we included most of the studies published in our wiki dataset.
In our wiki, we give an overview about journal policies on replication. To date, only a minority of journals have introduced mandatory online archives for data and code used for quantitative empirical studies (Huschka and Wagner, 2012). To our knowledge, no journal has found a convincing strategy to achieve transparency of data cleaning and to deal with other issues concerning the manipulation of raw data. Very few journals regularly publish replication studies, most prominently the Journal of Applied Econometrics. The initiative to start a replication journal could help to improve the outlets for replication studies (Zimmermann, 2015). This project requires the collective work of the community since a large amount of replicatwritingion work is impossible to find in an acceptable amount of time for any single group, given that the results are often just mentioned as asides of published studies. In particular, when it comes to reproductions, i.e., empirical work on the same question as in a previous study but with different data or methodology, specific expertise is needed for each subfield of economics in order to assess the results in the context of the existing literature.
For the reasons described above, we invite to further discussion about how policies should be designed in order to ensure replicability. At the time of this writing, several blogs have reported about the wiki,5 it has been cited various times in the literature (not only in economics but also in the political sciences, empirical law, and in international studies), it is recommended by various scientific institutions,6 and its pages have been viewed more than 2.3 million times. One hundred thirty-nine researchers have registered, but contributions from outside of our team have been very limited.
The feature for voting
on which studies are most relevant to be replicated has not been widely used so far. We think that this could change if we get further endorsements from key players such as associations of economists or leading scientists and include a more comprehensive part of the empirical economics literature for which replication material is available. With our experience, we know how to expand the number of articles and journals covered at low cost, and the more research is covered the more useful the project can be for researchers.
We are trying to expand our network and motivate those who teach seminars to include their results in the wiki. For this reason we held a workshop with the Berkeley Institute for Transparency in the Social Sciences and the Young Scholar Initiative (YSI) of the Institute for New Economic Thinking, directly after the 2016 Annual Meeting of the American Economic Association. We also organized a conference session on replication during the YSI Plenary Meeting later that year
Combining our project with other projects could also help. It might be that many researchers are hesitant to register with their name, a restriction we had thought necessary to avoid the risk of anonymous allegations or even libel, in particular after the website Science Fraud had to be shut down due to legal threats (Frezza, 2013). We are considering lifting that restriction at least temporarily until more users are active. To curtail abuse of anonymity we may suggest using "sighted versions" that indicate if changes have been looked at and are appropriate.7
4 Related Research
In order to develop standards for how to make research replicable and how to write replication studies it is important that the community has knowledge of other projects that facilitate the sharing of material for empirical research. Examples include the Harvard Dataverse (Crosas, 2011), Research Compendia, and the RunMyCode page (Stodden et al., 2012). The project Teaching Integrity in Economic Research (TIER) set a protocol for documentation of work with statistical data that can serve researchers as orientation (Ball and Medeiros, 2012).
Additionally, projects from related disciplines that focus on replication and from which economists can learn include the replication project in psychology that collects information about replication studies (Spellman, 2012; Psych FileDrawer) and the Reproducibility Projects of the Center for Open Science, one of which was adapted for cancer biologists. Especially with regards to teaching the psychologists' perspective is very helpful to us (Frank and Saxe 2012).
5 Conclusion and Outlook
Much remains to be done to ensure full replicability of quantitative empirical research. Journals that have data availability policies need to enforce them, and such policies need to become a universal standard in our discipline. Additionally, standards are needed for accomplishing replication. In order to reduce the amount of work each journal would need to undertake to establish standards, there should be a common effort to locate and publicize existing standards. Also, the results of replication studies need to be documented so that it is easy to compare and evaluate them. This initiative may have a better outcome if it is removed from journals and their editors, as they may be subject to conflicts of interest (Laband and Piette, 1994). Crowd-sourcing that enables everyone to comment and make a contribution seems to us to be an approach that avoids such conflicts.
Replication work can be a good starting point for research by young scholars. An increased availability of data and code from various journal archives would facilitate this. Further efforts are needed to increase the access to data from countries other than the United States, which should be used to test findings from other parts of the world.
In our further research we investigate how replicability of published studies influences citations (Höffler, forthcoming). It seems to us plausible that datasets of easily replicable research will be used by other scientists, too, who will then cite the original work. We found evidence indicating that this seems to be the case, and we see it as another incentive to make research replicable to both authors and journals.
For their advice and experience we thank the participants of our seminars and our 2016 workshop, our research assistants, Chris Müris, Dwayne Benjamin, Albert Berry, Thomas Kneib, Steffen Kühnel, Lore Vandewalle, Xiaohua Yu, Jesse Rothstein, and those who discussed and commented at the presentations of previous versions of this work at the 2014 Annual Meeting of the American Economic Association in Philadelphia, the Econometric Society European Meeting in Toulouse, 2014, the 2014 Research Transparency Forum of the Berkeley Initiative for Transparency in the Social Sciences, the 2015 American Economic Association Conference on Teaching and Research in Economic Education in Minneapolis, and the 2016 Annual Meeting of the American Economic Association in San Francisco, amongst others.
The research for this project was funded by the Institute for New Economic Thinking (INET).
||This paper was presented at the First International Workshop on Reproducible Open Science 2016, October 31, 2016.
||We thank Prof. Albert Berry of the University of Toronto for this comment.
||Kneib (2012) introduces to replication in general and to our project (in German).
||See Höffler, Jan H. (2014), and the respective presentation slides in our page on replication in teaching.
||Blogs include: Econometrics Beat: Dave Giles' Blog; Economist's View by Mark Thoma; EDaWaX, European Data Watch Extended; Berkeley Institute for Transparency in the Social Sciences (BITSS); The RePEc Blog by Christian Zimmermann; Statistical Modeling, Causal Inference, and Social Science by Andrew Gelman; The Grumpy Economist, John Cochrane's blog and Development Impact Blog of the World Bank by Markus Goldstein.
||The American Economic Association Resources for Economists on the Internet, the Economics Network, The German National Library of Economics ZBW Leibniz Information Centre for Economics, and the University of Cambridge, Marshall Library.
||The feature of "patrolled edits" is already in use. Pages are highlighted in yellow in the list of new pages and have a red (!) symbol in the list of recent changes until a user with the special right to do so has marked them as reviewed.
||Ball, Richard, Norm Medeiros (2012), "Teaching Integrity in empirical research: A protocol for documenting data management and analysis", Journal of Economic Education, 43(2), 182-9. https://doi.org/10.1080/00220485.2012.659647
||Bernanke, Ben (2004), "Editorial Statement", American Economic Review, 94 (1), 404.
||Burnside, Craig, David Dollar (2000), "Aid, policies and growth", American Economic Review 90(4), 847-68. https://doi.org/10.1257/aer.90.4.847
||Burnside, Craig, David Dollar (2004), "Aid, policies and growth: reply", American Economic Review 94(3), 781-4. https://doi.org/10.1257/0002828041464524
||Card, David, Alan B. Krueger (1994), "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania", American Economic Review 84(4), 772-93.
||Card, David, Alan B. Krueger (2000), "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Reply", American Economic Review 90(5), 1397-420. https://doi.org/10.1257/aer.90.5.1397
||Crosas, Mercè (2011), "The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving Data", D-Lib Magazine, 17(1/2). https://doi.org/10.1045/january2011-crosas
||Das, Jishnu, Quy-Toan Doa, Karen Shainesa, Sowmya Srikant (2013), "U.S. and them: The Geography of Academic Research", Journal of Development Economics, 105, 112-130. https://doi.org/10.1016/j.jdeveco.2013.07.010
||Easterly, William, Ross Levine, David Roodman (2004), "Aid, Policies, and Growth: Comment", American Economic Review, 94(3), 774-80. https://doi.org/10.1257/0002828041464560
||Frank, Michael C., Rebecca Saxe (2012), "Teaching Replication", Perspectives on Psychological Science, 7(6), 600-4. https://doi.org/10.1177/1745691612460686
||Frezza, Bill (2013), "A Barrage Of Legal Threats Shuts Down Whistleblower Site, Science Fraud", Forbes 1/09/2013.
||Glandon, Philip (2011), "Appendix to the Report of the Editor: Report on the American Economic Review Data Availability Compliance Project", American Economic Review: Papers & Proceedings, 101(3), 695-9. https://doi.org/10.1257/aer.101.3.684
||Hamermesh, Daniel S. (2007), "Viewpoint: Replication in economics", Canadian Journal of Economics 40(3), 715-33. https://doi.org/10.1111/j.1365-2966.2007.00428.x
||Höffler, Jan H. (forthcoming), "Replication and Economics Journal Policies", American Economic Review Papers & Proceedings. https://doi.org/10.1257/aer.p20171032
||Höffler, Jan H. (2014), "Teaching Replication in Quantitative Empirical Economics," Replication Working Paper No. 4.
||Huschka, Denis, Gert G. Wagner (2012), "Data accessibility is not sufficient for making replication studies a matter of course", RatSWD Working Paper Series, No. 194.
||Kneib, Thomas (2012), "Replication in Empirical Economics", presentation (in German) at the Pluralist Event in addition to the Annual Conference of the German Economic Association, organized by the committee Real World Economics.
||Laband, David N., Michael J. Piette. (1994), "Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors", Journal of Political Economy, 102(1), 194-203. https://doi.org/10.1086/261927
||McCullough, B.D., K. A. McGeary, T. D. Harrison (2006), "Lessons from the JMCB Archive", Journal of Money, Credit and Banking, 38(4), 1093-107. https://doi.org/10.1353/mcb.2006.0061
||McCullough, B.D., Hrishikesh D. Vinod (2003), "Verifying the Solution from a Nonlinear Solver: A Case Study", American Economic Review, 93(3), 873-92. https://doi.org/10.1257/000282803322157133
||Neumark, David, William Wascher (2000), "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Comment", American Economic Review 90(5), 1362-96. https://doi.org/10.1257/aer.90.5.1362
||Pesaran, Hashem (2003), "Introducing a replication section", Journal of Applied Econometrics, 18(1), 111. https://doi.org/10.1002/jae.709
||Spellman, Barbara A. (2012), "Introduction to the Special Section on Research Practices", Perspectives on Psychological Science, 7(6), 655-6. https://doi.org/10.1177/1745691612465075
||Stodden, Victoria, Christophe Hurlin, Christophe Prignon (2012), "RunMyCode.Org: A Novel Dissemination and Collaboration Platform for Executing Published Computational Results", Analyzing and Improving Collaborative eScience with Social Networks (eSoN 12); Workshop with IEEE e-Science 2012; Monday, 8 October 2012, Chicago, IL, USA. https://doi.org/10.2139/ssrn.2147710
||Thomson Reuters Social Sciences Citation Index (2014).
||Weißer, Christoph (2014), "Replication in the narrow sense of 'Financial Stability, the Trilemma, and International Reserves' (Obstfeld, Shambaugh & Taylor 2010)", Replication Working Paper No. 3.
||Wohlfarth, Paul (2012), "Replication in the narrow sense of Banzhaf/Walsh (2008)", Replication Working Paper No. 2.
||Zakula, Björn (2012), "Narrow Replication of Ashcraft (2005): Are Banks Really Special?", Replication Working Paper No. 1.
||Zimmermann, Christian (2015), "On the Need for a Replication Journal", Federal Reserve Bank of St. Louis Working Paper 2015-016A.
About the Author
Jan H. Höffler studied at University of Mannheim, Humboldt-University Berlin, University of Toronto and the Graduate Institute, Geneva, before starting the ReplicationWiki at the University of Göttingen. He is editor of the University of Göttingen Replication Working Paper Series and the RePEc bibliography on replication as well as catalyst of the Berkeley Institute for Transparency in the Social Sciences.