A rather long post, but quite a brief summary

Here is a summary of the the project so far.

Sharing the data which is generated by research projects is increasingly being recognised as an academic priority by funders, researchers and publishers.  The issue of the policies on sharing set out by academic journals has been raised by scientific organisations, such as the US National Academy of Sciences, which urges journals to make clear statements of their sharing policies. On the other hand, the publishing community expresses concerns over the intellectual property implications of archiving shared data, whilst broadly supporting the principle of open and accessible research data .

The JoRD Project was a feasibility study on the possible shape of a central service on journal research data policies, funded by the UK JISC under its Managing Data Research Programme. It was carried out by the Centre for Research Communications Research at Nottingham University (UK) with contributions from the Research Information Network and Mark Ware Consulting Ltd. The project used a mix of methods to examine the scope and form of a sustainable, international service that would collate and summarise journal policies on research data for the use of researchers, managers of research data and other stakeholders. The purpose of the service would be to provide a ready reference source of easily accessible, standardised, accurate and clear guidance and information, on the journal policy landscape relating to research data. The specific objectives of the study were:  to identify the current state of journal data sharing policies; to investigate the views and practices of stakeholders; to develop an overall view of stakeholder requirements and possible service specifications; to explore the market base for a JoRD Policy Bank Service; and to investigate and recommend sustainable business models for the development of a JoRD Policy Bank Service

A review of relevant literature showed evidence that scientific institutions are attempting to draw attention to the importance of journal data policies and a sense that the scientific community in general is in favour of the concept of data sharing.  At the same time it seems to be the case that more needs to be done to convince the publishing world of the need for greater consistency in data policy and author guidelines, particularly on vital questions such as when and where authors should deposit data for sharing.

The study of journal policies which currently exist found that a large percentage of journals do not have a policy on data sharing, and that there are great inconsistencies between journal data sharing policies. Whilst some journals offered little guidance to authors, others stipulated specific compliance mechanisms. A valuable distinction is made in some policies between two categories of data: integral, which directly supports the arguments and conclusions of the article, and supplementary, which enhanced the article, but was not essential to its argument. What we considered to be the most significant study on journal policies (Piwowar & Chapman, 2008), defined journal data sharing policies as “strong”, “weak” or “non-existent”. A strong policy mandates the deposit of data as a condition of publication, whereas a weak policy merely requests the deposit of data. The  indication from previous studies that researchers’ data sharing behaviour is similarly inconsistent was confirmed by our online survey. However, there is general assent to the data sharing concept and many researchers who would be prepared to submit data for sharing along with the articles they submit to journals.

We then investigated a substantial sample of journal policies to establish our own picture of the policy landscape. A selection of 400 international and national journals were purposefully chosen to represent the top 200 most cited journals (high impact journals), and the bottom 200 least cited (low impact journals), equally shared between Science and Social Science, based on the Thomson Reuters citation index.  Each policy we identified relating to these journals was broken into different aspects such as: what, when and where to deposit data; accessibility of data; types of data; monitoring data compliance and consequences of non compliance. These were then systematically entered onto a matrix for comparison. Where no policy was found, this was indicated on the matrix. Policies were categorised as either being “weak”, only requesting that data is shared, or “strong”, stipulating that data must be shared.

Approximately half the journals examined had no data sharing policy. Nearly three quarters of the policies we found we assessed as weak and only just under one quarter we deemed to be strong (76%: 24%). The high impact journals were found to have the  strongest policies,  whereas not only did fewer low impact journals include a data sharing policy, those policies were  were less likely to stipulate data sharing, merely suggested that it may be done. The policies generally give little guidance on which stage of the publishing process is data expected to be shared.

Throughout the duration of the project, representatives from publishing and other stakeholders were consulted in different ways. Representatives of publishing were selected from a cross section of different types of publishing house; the researchers we consulted were self selected through open invitations by way of the JoRD Blog. Nine of them attend a focus group and 70 answered an online survey. They were drawn from every academic discipline and ranged over a total of 36 different subject areas. During the later phases of the study, a selection of representatives of stakeholder organisations was asked to explore the potential of the proposed JoRD service and to comment on possible business models. These included publishers, librarians, representatives of data centres or repositories, and other interested individuals. This aspect of the investigation included a workshop session with representatives of leading journal publishers in order to assess the potential for funding a JoRD Policy Bank service. Subsequently an analysis of comparator services and organisations was performed, using interviews and desk research.

Our conclusion from the various aspects of the investigation was that although idea of making scientific data openly accessible for share is widely accepted in the scientific community, the practice confronts serious obstacles. The most immediate of these obstacles is the lack of a consolidated infrastructure for the easy sharing of data. In consequence, researchers quite simply do not know how to share their data. At the present juncture, when policies are either not available, or provide inadequate guidance, researchers acknowledge a need for the kind of information that a policy bank would supply. The market base for a JoRD policy bank service would be the research community, and researchers did indicate they believed such a service would be used.

Four levels of possible business models for a JoRD service were identified and finally these were put to a range of stakeholders. These stakeholders found it hard to identify a clear cut option of service level that would be self sustaining. The funding models of similar services and organisations were also investigated. In consequence, an exploratory two phase implementation of a service is suggested. The first phase would be the development of a database of data sharing policies, engagement with stakeholders, third party API development with the intention to build use to the level at which a second phase, a self sustaining model, would be possible.

Thomson Reuters Web of Knowledge – Data Citation Index

DATA CITATION INDEX

The Data Citation Index is coming soon to Web of KnowledgeSM.

See how Thomson Reuters is working to help solve the issues of discovery, attribution and measurement in data sharing to support:

  • Advancing scholarship
  • Increasing transparency
  • Promoting work in new ways
  • Curbing double-funding

Find out more here:

http://app.info.science.thomsonreuters.biz/e/es.aspx?s=1556&e=545485&elq=57ca697c40444efa88c746a827127e7d

Incentivisation and Data Sharing – why should I cite the data I have used?

DATA CITATION

National Archive of Computerized Data on Aging (NACDA)

Browsing data archives generally, the following was found amongst the pages of NACDA concerning why re-used data should be cited:

Citing data files in publications based on those data is important for several reasons:

  • Other researchers may want to replicate findings and need the citation to identify/locate the data.
  • Citations are harvested by key social sciences indexes, such as Web of Science, providing credit to the researchers.
  • Data producers and funding agencies can track citations to measure impact.

http://www.icpsr.umich.edu/icpsrweb/NACDA/studies/4248/detail

These statements demonstrate the incentivisation process for people to share their data and make it available for re-use. Benefits are accrued back the the original researcher(s) for having shared their data, but also the discipline itself becomes more impactful.

WHAT A DATA CITATION MIGHT LOOK LIKE

Examples

United States Department of Commerce. Bureau of the Census, and United States Department of Labor. Bureau of Labor Statistics. Current Population Survey: Annual Demographic File, 1987 [Computer file]. ICPSR08863-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-02-03. doi:10.3886/ICPSR08863

Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O’Malley, and John E. Schulenberg. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2007 [Computer File]. ICPSR22480-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2008-10-29. doi:10.3886/ICPSR22480

News from America, “U-M, Sloan Foundation to enhance open access to research data”

“Professional associations, journals, data repositories and funding agencies must work together to make the entire scientific venture more transparent and to encourage broader access to research data,” said ICPSR Director George Alter. “The first step is to give scientists who produce important research data the recognition they deserve.”

U-M, Sloan Foundation to enhance open access to research data (http://www.ur.umich.edu/update/archives/121002/sloan)

The University of Michigans’ Inter-university Consortium for Political and Social Research and the Alfred P. Sloan Foundation are working together to promote open access to research data and improve the link between published works and the background data.

In particular, the ICPSR will be working with stakeholders within the social sciences, to improve:

  • Data citation
  • Transparency of research
  • Collaboration across scientific fields to study sustainable funding models for data repositories

Our survey work with JoRD has indicated that Social Sciences journals are behind Science journals in having policies on data sharing and archiving. This project has the potential to address this imbalance.

Inter-university Consortium for Political and Social Research

http://www.icpsr.umich.edu/icpsrweb/landing.jsp

Alfred P. Sloan Foundation

http://www.sloan.org/

Some differences between the Sciences and the Social Sciences

Humanities and Social Sciences – what’s different compared to Science, Technology and Medicine (STM)?

 

The following summarises the key points of comparison from an article in Research Information:

  • Attitude to information? – one fifth of researchers in the life sciences and physical sciences rated print versions of current journal issues as useful for their research. In Arts and Humanities the figure was three fifths.
  • Funding of the research sectors? – Unlike STM, much research in the humanities and social sciences is produced by individual researchers without the support of a specific project grant (does not therefore cover publication costs). There is more funding in STM.
  • Journal prices – usually higher in STM fields than in the Humanities or Social Sciences.
  • Type of publication? – Humanities researchers generally value books rather than journals.
  • Where publishing? – STM’s main conduit for research dissemination is the academic journal. For Humanities and Social Sciences it is more of a mixed model.
  • What’s being written? – Humanities and Social Sciences tend to write long-form publications because their thoughts need more space. They value the extended argument.
  • Time sensitivity? – publication of Humanities research is often less time sensitive (e.g. you haven’t cured a disease for which people need to know the results quickly)

(from Pool, R. Open to debate – Information access in the Social Sciences and Humanities. Research Information. April/May 2010. Issue 47, pp.12-14)

Report from The Royal Society – Science as an Open Enterprise

Key Points of Relevance to the JoRD Project from a report by The Royal Society:

Science as an open enterprise (June 2012)

The full report can be found at the following location:

http://royalsociety.org/policy/projects/science-public-enterprise/report/

Areas for action

Six key areas for action are highlighted in the report:

  • Scientists need to be more open among themselves and with the public and media
  • Greater recognition needs to be given to the value of data gathering, analysis and communication
  • Common standards for sharing information are required to make it widely usable
  • Publishing data in a reusable form to support findings must be mandatory
  • More experts in managing and supporting the use of digital data are required
  • New software tools need to be developed to analyse the growing amount of data being gathered

Data analysis

The report gives the highlights of the results of the following study:

Public availability of published research data in high-impact journals

Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP.

PLoS One. 2011;6(9):e24357. Epub 2011 Sep 7.

PMID:
21915316
[PubMed – indexed for MEDLINE]

Free PMC Article

 
Returning to the original article the following is found:

Abstract

BACKGROUND:

There is increasing interest to make primary data from published research publicly available. We aimed to assess the current status of making research data available in highly-cited journals across the scientific literature.

METHODS AND RESULTS:

We reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available.

CONCLUSION:

A substantial proportion of original research papers published in high-impact journals are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals. Journals should adopt more routinely policies for data sharing, expanding the types of data that are subject to public sharing policies with the ultimate target of covering all types of data. Moreover, it is essential to develop mechanisms for journals to ensure that existing data availability policies are consistently followed by researchers and published research findings are easily reproducible.

Literature Review – Articles Relevant to the Field

This bibliography of useful literature has been sitting in the draft section for some months, but as our study had now finished, and the feasibility study report is in the hands of Jisc, we are practising our own preaching and passing on out information to others who may be interested in this area. I am sorry, but it is a rather long list and looks tedious and boring.

More data will follow in the next few weeks.

LITERATURE REVIEW

An early paper on journal policies.

McCain, K. (1995) Mandating sharing: journal policies in the natural sciences. Science Communication 16, 403-431.

Baseline paper on journal policies (and examples of the other work of Piwowar and Chapman on data sharing).

Piwowar, H. and Chapman, W. (2008)  A review of journal policies for sharing research data   In: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 – Proceedings of the 12th International Conference on Electronic Publishing (ELPUB) June 25-27 2008, Toronto Canada. Available at http://ocs.library.utronto.ca/index.php/Elpub/2008/paper/view/684

Piwowar, H. and Chapman, W. (2008) Identifying data sharing in biomedical literature. AMIA Annual Symposium Proceedings, 596-600. Available at http://www.ncbi.nih.gov/pmc/articles/PM2655927

Piwowar, H. and Chapman, W. (2010) Public sharing of research datasets: a pilot study of associations. Journal of Info-metrics 4(2) 148-156. Available at http://www.sciencedirect.com/science/article/pii/S1751157709000881

Piwowar, H. and Chapman, W. (2010) Recall and bias of retrieving gene expression micro array datasets through PubMed identifiers. Journal of  Biomedical Discovery and Collaboration 5, 7-20. Available at http://www.ncbi.nih.gov/pmc/articles/PMC2990274

Piwowar, H. (2010) Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS One 6:7 07. Available at http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018657

Most recent work on best practice for scholarly publishing.

Shriger, D. et al (2006) The content of medical journal instructions for authors. Annals of Emergency Medicine 48(6), 742-749.

Looked at 166 journals and found contradictory policies and little guidance on methodological and statistical issues.

Smit, E. and Gruttemeier, H. (2011) Are scholarly publications ready for the data era? Suggestions for best practice guidelines and common standards for the integration of data and publications. New Review of Information Networking 16(1) 54-70.

Smit, E. (2011) Abelard and Heloise: why data and publications belong together. D-Lib Magazine 17(1-2). Available at http://www.dlib.org/dlib/january11/smit/01smit

Recent broad explorations of the issues.

Schriger, D. et al (2006) From submission to publication: a retrospective review of the tables and figures in a cohort of randomised controlled trials submitted to the British Medical journal. Annals of Emergency Medicine 48(6) 750-756.

Carpenter, T. (2009) Journal article supplementary materials: a Pandora’s box of issues needing best practices. Against the Grain 21(6) 84-85.

Neylon, C. (2009) Scientists lead the push for open data sharing. Research Information 41, 22-23.

Hodson, S. (2009) Data-sharing culture has changed. Research Information 45, p.12.

Fisher, J. and Fortmann, L. (2010) Governing the data commons: policy, practice and the advancement of science. Information and Management 47(4) 237-245.

Bizer, C., Heath, T. and Berners-Lee, T ( ? ) Linked data – the story so far. International Journal on Semantic web and Information Systems. Special Issue on Linked Data. Available at http://linkeddata.org/docs/ijswis-special-issue

Hrynaszkiewicz, I. (2011). The need and drive for open data in biomedical publishing. Serials 24(1) 31-37.

Bechhofer, S. et al (2011) Why linked data is not enough for scientists. Future Generation Computer Systems (forthcoming as of Aug 2011)

Kauppinen, T. and Espindola, G. (2011) Linked open science – communicating, sharing and evaluating data, methods and results for executable papers. Procedia Computer Science 4, 726-731.

LOS has 4 ‘silver bullets’ 1. Publication of data using Linked Data principles 2. Open source and need-based environments, 3. Cloud computing use, 4. Creative commons.

Parsons, M. (2011) Expert Report on Data Policy – Open Access. Available at http://151.1.219.218/57883ed7-88bc-4e6f-92ed-3af6e96600be.pdf.

Tenopir, Carol, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. “Data Sharing by Scientists: Practices and Perceptions.” PLoS ONE 6, no. 6 (2011): e21101. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0021101

Borgman, C. (2012) The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6) 1059-1078.

Selected specific studies on aspects of data archiving and sharing.

Hrynaszkiewicz, I. and Altman, D. (2009). Towards agreement on best practice for publishing raw clinical trial data. Trials 10(17) 1-5. Available at http://www.biomedcentral.com/content/pdf/1745-6215-10-17.pdf

Groves, T. (2009) Managing UK research data for future use. BMJ 338 b1252. Available at http:www.bmj.com/content/338/bmj.b1252.Full?tab=response-form/

De Roure, D. et al. (2009) Towards open science: the myexperiment approach. Concurrency and Computation: Practice and Experience (submitted 2009). Available at http://eprints.soton.ac.uk/267270/

Colin Elman, Diana Kapiszewski and Lorena Vinuela (2010). Qualitative Data Archiving: Rewards and Challenges. PS: Political Science & Politics, 43 , pp 23-27 doi:10.1017/S104909651099077X

Moore, R. and Anderson, W. (2010) ASIS&T Research Data Access and Preservation Summit: conference summary. Bulletin of the American Society for Information Science and Technology 36(6) 42-45.

Planta, A. et al (2010) The enduring value of social science research: the use and reuse of primary research data. In: The Organisation, economics and Policy of scientific Research Workshop, Torino, Italy, April 2010. Available at http://www.carloalberto.org/files/brick_dime_strike_workshopagenda_april2010/.pdf

Eschenfelder, K. and Johnson, A. (2011) The limits of sharing: controlled data collections. Proceedings of the American Society for Information Science & Technology 48(1) 1-10.

Neveol, A. et al (2011) Extraction of data deposition statements from the literature: a method for automatically tracking research results. Bioinformatics 27(23) 3306-3312.

Ingwersen, P. and Chavan, V. (2011) Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure. BMC Bioinformatics 12(S3).

Korjonen, M. (2012) Clinical trial information: developing an effective model of dissemination and a framework to improve transparency. UCL PhD thesis. Available at http://discovery.ucl.ac.uk/1344051/

Bibliography

Bailey, C. (2012) Research Data Curation Bibliography. Houston: Digital Scholarship. Available at http://digital-scholarship.org/rdcb/rdcb.htm

Approaches the question from a library/archive perspective.

Themes from the literature – obligations in the Life Sciences

 

LITERATURE REVIEW – USEFUL NOTES

The notes which are given below refer to the following publication:

National Academy of Sciences (2003). Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences.

Obtained online at http://www.nap.edu/catalog/10613.html

Some of the pertinent findings and discussion materials of the Committee on Responsibilities of Authorship in the Biological Sciences are given below.

 **************************

1. Community Standards and the Sharing of Materials and Data

The Uniform Principle for Sharing Integral Data and materials Expeditiously (UPSIDE) is given as follows (p.4):

Community standards for sharing publication-related data and materials should flow from the general principle that the publication of scientific information in intended to  move science forward. More specifically, the act of publishing is a quid pro quo in which authors receive credit and acknowledgment in exchange for disclosure of their scientific findings. An author;s obligation is not only to release data and materials to enable others to verify or replicate published findings (as journals already implicitly or explicitly require) but also to provide them in a form on which other scientists can build with further research. All members of the scientific community – whether working in academia, government, or a commercial enterprise – have equal responsibility for upholding community standards as participants in the publication system, and all should be equally able to derive benefits from it.

 **************************

2. Community Principles – Expectations and Authors

In addition to the UPSIDE statement, the committee identified five corollary principles associated with sharing publication-related data. These are given as follows (pp.5-7):

DATA AND SOFTWARE

Principle 1. Authors should include in their publications the data, algorithms, or other information that is central or integral to the publication – that is, whatever is necessary to support the major claims of the paper and would enable one skilled in the art to verify or replicate the claims.

Principle 2. If central or integral information cannot be included in the publication for practical reasons (for example, because a dataset is too large), it should be made freely (without restriction on its use for research purposes and at no cost) and readily accessible through other means (for example, on-line). Moreover, when necessary to enable further research, integral information should be made available in a form that enables it to be manipulated, analyzed, and combined with other scientific data.

Principle 3. If publicly accessible repositories for data have been agreed on by a community of researchers and are in general use, the relevant data should be deposited in one of these repositories by the time of publication.

MATERIALS

Principle 4. Authors of scientific publications should anticipate which materials integral to their publication are likely to be requested and should state in the “Materials and Methods” section or elsewhere how to obtain them.

Principle 5. If a material integral to a publication is patented, the provider of the material should make the material available under a license for research use

 **************************

3. Community recommendations – for all those who participate in the publication process

This includes:

  • Academic, government and industrial scientists
  • Scientific societies, publishers and editors of scientific journals
  • Institutions and organisations that conduct and fund scientific research

The committee made the following recommendations for future discussion in the workshop (pp 5-12):

Recommendation 1. The scientific community should continue to be involved in crafting appropriate terms of any legislation that provides additional database protection

Recommendation 2. It is appropriate for scientific reviewers of a paper submitted for publication to help identify materials that are integral to the publication and likely to be requested by others.

Recommendation 3. It is not acceptable for the provider of publication-related material to demand an exclusive license to commercialise a new substance that a recipient makes with the provider’s material or to require collaboration or coauthorship of future publications.

Recommendation 4. The merits of adopting a standard MTA should be examined closely by all institutions engaged in technology transfer, and efforts to streamline the process should be championed.

Recommendation 5. As a best practice, participants in the publication process should commit to a limit of 60 days to complete the negotiation of publication-related MTAs and transmit the requested materials and data.

Recommendation 6. Scientific journals should clearly and prominently state (in the instructions for authors and on their Websites) their policies for distribution of publication-related materials, data, and other information. Policies for sharing materials should include requirements for depositing materials in an appropriate repository. Policies for data sharing should include requirements for deposition of complex datasets in appropriate databases and for the sharing of software and algorithms integral to the findings being reported. The policies should also clearly state the consequences for authors who do not adhere to the policies and the procedure for registering complaints about noncompliance.

Recommendation 7. Sponsors of research and research institutions should clearly and prominently state their policies for distribution of publication-related materials and data by their grant or contract recipients or employees.

Recommendation 8. If an author does not comply with a request for data or materials in areasonable time period and the requestor has contacted the author to determine if extenuating circumstances may have caused the delay, it is acceptable for the requestor to contact the journal in which the paper was published. If that course of action is unsuccessful in due course, the requestor may reasonably contact the author’s university or other institution or the funder of the research in question for assistance.

Recommendation 9. Funding organisations should provide the recipients of research grants and contracts with the financial resources needed to support dissemination of publication- related data and materials.

Recommendation 10. Authors who have received data or materials from other investigators should acknowledge such contributions appropriately.

 **************************

4. Identified problem areas with data sharing due to rapid changes in the life-sciences discipline

The following problems were highlighted at the time – 2003 (p.18):

  • Disagreement and uncertainty about the responsibilities of authors to share data and materials
  • A sense that, in practice, publication-related materials and data are not always readily available to researchers who desire access to them
  • Suggestions that standards for sharing are not being enforced
  • Controversy over seemingly different application of journal policies to different authors
  • Questions about how standards and policies apply to various types of data and materials, such as large databases and software
  • Suggestions that standards for sharing may be in conflict with federal legislation that encourages commercialisation of the results of federally  funded research.
  • The prospect that new legal protections for databases, particularly in Europe, will complicate the development of comprehensive and consistent standards.
  • Uncertainty as to whether academic investigators should be treated differently from industry investigators with regard to the provision of access to their publication-related data or materials