Journals and their policies on Research Data Sharing
Paul Sturges, Centre for Information Management, Loughborough University, Loughborough LE11 3TU, UK, email@example.com ;
Marianne Bamkin, Centre for Research Communication, Nottingham University, Nottingham NG7 2NR, UK. firstname.lastname@example.org ;
Jane Anders, Centre for Research Communication, Nottingham University, Nottingham NG7 2NR, UK. email@example.com ;
Azhar Hussain, Centre for Research Communication, Nottingham University, Nottingham NG7 2NR, UK firstname.lastname@example.org .
It is widely agreed that sharing of research data is important for both research transparency and the potential for re-use in further research. Yet despite the weight of positive comment, the mechanisms by which sharing might be effectively implemented still remain topics for discussion rather than functioning aspects of the research world. The JoRD project at Nottingham University* confirmed doubts as to whether the principle of data sharing is operative in practice. The research concentrated on the role of research journals in mandating and enabling sharing. The team looked at the data sharing policies of journals in the expectation that these would provide good guidance on data structures and metadata, and direct authors to suitable web-linked repositories. In fact they found that the state of journal data sharing policies at the current time was what can be described kindly as patchy and inconsistent.
There are numerous authoritative statements in favour of data sharing. The International Council for Science (ICSU, 2004) the Organisation for Economic Cooperation and Development (OECD, 2007) and the UK Royal Society (Royal Society, 2012) have made firm statements on the topic, calling for openness and freely available access to publicly-funded research data. Similarly, funding bodies such as the US National Academy of Sciences (2003) expect data to be made openly accessible. There is also the Brussels Declaration (STM, 2007), which nevertheless reflects the unease of the publishing industry about open deposit of accepted manuscripts in rights-protected archives. There is previous research on the potential of journal data sharing policies. In the mid-1990s McCain (1995) surveyed 850 journals, discovering that only 132 had identifiable policies. A smaller survey of medical journals by Shriger et al (2006) found contradictory approaches and little strong guidance. Since then there has been a series of important papers by Piwowar, usually with Chapman (including Piwowar and Chapman, 2008b; Piwowar, 2010; Piwowar and Chapman 2010a; Piwowar and Chapman 2010b). Perhaps the most significant is Piwowar and Chapman 2008a, which builds on McCain’s work, using the data on gene expression microarrays to explore policies in depth. The article classifies policies according to their strength (strong, weak, non-existent); the relationship of policy strength to the journal’s impact rating; and the number of instances of data submission that can be identified. More recently, the PARSE project (Kuipers and van der Hoeven, 2009) has produced helpful data on attitudes to data sharing, and a strong viewpoint on what needs to be done (Smit, 2011; Smit and Gruttemeier, 2011).
Data policies were sought on the webpages of a widely representative survey of 400 journals. Once a data policy had been located, it was broken down into categories such as: what, when and where to deposit data; accessibility of data; types of data; monitoring data compliance, consequences of non-compliance and policy strength, based on Piwowar & Chapman (2008a)’s definition of strong and weak journal policies. A stakeholder consultation was used to supplement the lessons of the journal survey, using simple qualitative methods to establish the views of key stakeholders. Views were elicited from publishers, funding agencies, data services, librarians, research administrators and managers on the principles underlying data sharing, the drivers for change, and the challenges faced in effecting change.
The survey of journals revealed that approximately half the journals examined had no data sharing policy at all. Of the journal policies found, more than three quarters were by our definition weak, with only the remaining quarter deemed to be strong. Significantly, the journals with high impact factors tended to have the strongest policies. Not only did fewer low impact journals actually have any data sharing policy, those policies these were less likely to mandate data sharing. Few of the policies were specific as to where the data should be deposited. Statements on expectations as to access were notably lacking, with only 28 out of the policies commenting on this. Four of these talked of free access, two of open access and eighteen of low cost access. Perhaps most damning of all, only one policy discussed the inclusion of metadata with deposits. On the question as to when the data should be deposited (either before publication or when publication occurred) there was again a lack of consistency and direction. About half of the policies that said something specific on this mentioned depositing data along with the submission of the article, with about a quarter indicating that the data should be available for the peer review process. Only 10% of the policies contained mention of sanctions in the event of non-compliance with requirements on deposit.
The stakeholder consultation revealed low levels of mutual understanding between the various interested groups. Stakeholders made assumptions about each other’s views and actions and had obviously made little attempt to investigate the broader landscape. This very clearly explains the lack of consistency between policies revealed by the survey of journals. Although all stakeholders purported to be in favour of sharing data and were willing to list the benefits of data sharing, they all raised caveats and concerns and identified barriers. Researchers were not yet sharers by instinct: which underlines the importance of policy clarity in changing behaviour. Many researchers were simply not aware of data repositories and those who were showed concern about their general infrastructure. It is clear that they need a journal data policy which will state whether the data should be deposited in a named repository, with a trusted content policy, whether a permanent URL should be used and if any data citation style is necessary. The timing of the release of data raises an interesting point, researchers did not seem over concerned about what point in the publication process the data should be made openly accessible, but at which point in the research process. Articles are not only written at the conclusion of some studies, but at intervals while the research is in progress. It might well not be felt appropriate to release the data relating to research still in progress, for very obvious reasons.
The publishers who present policy to authors on their websites and in the pages of their journals (or very often do not), in fact revealed anxieties over the capacity of the current digital infrastructure to allow data to be reliably linked to articles, if the data was distributed amongst a variety of databases and other repositories. Some of them were also not confident that their own databases would be viable alternative places of deposit because of the increasing file size of research data deposits and requirement for greater storage capacity. This offers research institutions and funders the opportunity to take the archiving issue in hand, through appropriate data repositories and libraries, but they need to mediate this process through clear, enforceable policy articulated via the journals.
A series of other anxieties emerged from the consultation. Both researchers and publishers considered that it would be difficult to deposit and link data in the original state in which it was gathered. There was a need for data to undergo a certain basic level of refinement before it might be shared. Raw qualitative data, for instance, might well be recorded in ways only truly understood by the data gatherer. The currency of data was also an issue, with the danger that some data might either be too out of date by the time of publication to be of value for subsequent research. This difficulty relates to a wider requirement, identified by the publishers, that linked data in a journal article should be fit for use and replicable. In the past data has sometimes been saved unstructured, not supplied with sufficient metadata, and in formats which have subsequently become incapable of retrieval. These anxieties need to be addressed at the policy level.
The statements of principle on research data sharing have been made. The case is more or less unanswerable. However, the means to make sharing effective are currently lacking. The authors of this article are firmly convinced that the necessary intervention in the publication process should be made by the research journals in the form of a data sharing policy. The JoRD project was in a position to both cumulate the content of existing policies and to develop ideas on the design of a policy on the basis of qualitative research. Our work on policy design is reported elsewhere. What we present here is evidence that needs to be fed into the process of policy creation so that researchers will have no doubts about what they are required to do to meet not merely the requirements of the journals, but those of the journal publishers and the whole research community that they serve.
ICSU (International Council for Science (2004) ICSU Report of the CSPR Assessment Panel on Scientific Data and Information. Paris: ICSU.
Kuipers, T. and van der Hoeven, J. (2009) PARSE: Insight into issues of permanent access to the records of science in Europe. Survey report. Brussels: European Commission.
McCain, K. (1995) Mandating sharing: journal policies in the natural sciences. Science Communication 16, 403-431.
National Academy of Sciences (2003). Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences. Retrieved Mar. 3 2014 from URL: http://www.nap.edu/catalog/10613.html
OECD (Organisation for Economic Co-operation and Development) (2007) OECD Principles and Guidelines for Access to Research Data from Public funding. Paris: OECD.
Piwowar, H. and Chapman, W. (2008a) A review of journal policies for sharing research data In: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 – Proceedings of the 12th International Conference on Electronic Publishing (ELPUB) June 25-27 2008, Toronto Canada. Retrieved Mar. 3 2014 from URL http://ocs.library.utronto.ca/index.php/Elpub/2008/paper/view/684
Piwowar, H. and Chapman, W. (2008b) Identifying data sharing in biomedical literature. AMIA Annual Symposium Proceedings, 596-600. Retreived Mar. 3 2014 from URL: http://www.ncbi.nih.gov/pmc/articles/PM2655927
Piwowar, H. and Chapman, W. (2010a) Public sharing of research datasets: a pilot study of associations. Journal of Informetrics 4(2) 148-156. Retrieved Mar. 3 2014 from URL http://www.sciencedirect.com/science/article/pii/S1751157709000881
Piwowar, H. and Chapman, W. (2010b) Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers. Journal of Biomedical Discovery and Collaboration 5, 7-20. Retrieved Mar 3 2014 from URL: http://www.ncbi.nih.gov/pmc/articles/PMC2990274
Piwowar, H. (2010) Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS One 6:7 07. Retrieved Mar. 3 2014 from URL: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018657
Royal Society (2012) Science as an open enterprise: summary report, June 2012. London: Royal Society. Retrieved Mar. 3 2014 from URL: http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE-Summary.pdf
Shriger, D. et al (2006) The content of medical journal instructions for authors. Annals of Emergency Medicine 48(6), 742-749.
Smit, E. and Gruttemeier, H. (2011) Are scholarly publications ready for the data era? Suggestions for best practice guidelines and common standards for the integration of data and publications. New Review of Information Networking 16(1) 54-70.
Smit, E. (2011) Abelard and Heloise: why data and publications belong together. D-Lib Magazine 17(1-2). Retrieved Mar. 3 2014 from URL: http://www.dlib.org/dlib/january11/smit/01smit
STM (International Association of Scientific, Technical and Medical Publishers) (2007) Brussels Declaration. Retrieved Mar.3 2014 from URL: http://www.stm-assoc.org/brussels-declaration/
NOTE * The JoRD project was funded by JISC (www.jisc.ac.uk).