Literature Review – Articles Relevant to the Field

This bibliography of useful literature has been sitting in the draft section for some months, but as our study had now finished, and the feasibility study report is in the hands of Jisc, we are practising our own preaching and passing on out information to others who may be interested in this area. I am sorry, but it is a rather long list and looks tedious and boring.

More data will follow in the next few weeks.

LITERATURE REVIEW

An early paper on journal policies.

McCain, K. (1995) Mandating sharing: journal policies in the natural sciences. Science Communication 16, 403-431.

Baseline paper on journal policies (and examples of the other work of Piwowar and Chapman on data sharing).

Piwowar, H. and Chapman, W. (2008)  A review of journal policies for sharing research data   In: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 – Proceedings of the 12th International Conference on Electronic Publishing (ELPUB) June 25-27 2008, Toronto Canada. Available at http://ocs.library.utronto.ca/index.php/Elpub/2008/paper/view/684

Piwowar, H. and Chapman, W. (2008) Identifying data sharing in biomedical literature. AMIA Annual Symposium Proceedings, 596-600. Available at http://www.ncbi.nih.gov/pmc/articles/PM2655927

Piwowar, H. and Chapman, W. (2010) Public sharing of research datasets: a pilot study of associations. Journal of Info-metrics 4(2) 148-156. Available at http://www.sciencedirect.com/science/article/pii/S1751157709000881

Piwowar, H. and Chapman, W. (2010) Recall and bias of retrieving gene expression micro array datasets through PubMed identifiers. Journal of  Biomedical Discovery and Collaboration 5, 7-20. Available at http://www.ncbi.nih.gov/pmc/articles/PMC2990274

Piwowar, H. (2010) Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS One 6:7 07. Available at http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018657

Most recent work on best practice for scholarly publishing.

Shriger, D. et al (2006) The content of medical journal instructions for authors. Annals of Emergency Medicine 48(6), 742-749.

Looked at 166 journals and found contradictory policies and little guidance on methodological and statistical issues.

Smit, E. and Gruttemeier, H. (2011) Are scholarly publications ready for the data era? Suggestions for best practice guidelines and common standards for the integration of data and publications. New Review of Information Networking 16(1) 54-70.

Smit, E. (2011) Abelard and Heloise: why data and publications belong together. D-Lib Magazine 17(1-2). Available at http://www.dlib.org/dlib/january11/smit/01smit

Recent broad explorations of the issues.

Schriger, D. et al (2006) From submission to publication: a retrospective review of the tables and figures in a cohort of randomised controlled trials submitted to the British Medical journal. Annals of Emergency Medicine 48(6) 750-756.

Carpenter, T. (2009) Journal article supplementary materials: a Pandora’s box of issues needing best practices. Against the Grain 21(6) 84-85.

Neylon, C. (2009) Scientists lead the push for open data sharing. Research Information 41, 22-23.

Hodson, S. (2009) Data-sharing culture has changed. Research Information 45, p.12.

Fisher, J. and Fortmann, L. (2010) Governing the data commons: policy, practice and the advancement of science. Information and Management 47(4) 237-245.

Bizer, C., Heath, T. and Berners-Lee, T ( ? ) Linked data – the story so far. International Journal on Semantic web and Information Systems. Special Issue on Linked Data. Available at http://linkeddata.org/docs/ijswis-special-issue

Hrynaszkiewicz, I. (2011). The need and drive for open data in biomedical publishing. Serials 24(1) 31-37.

Bechhofer, S. et al (2011) Why linked data is not enough for scientists. Future Generation Computer Systems (forthcoming as of Aug 2011)

Kauppinen, T. and Espindola, G. (2011) Linked open science – communicating, sharing and evaluating data, methods and results for executable papers. Procedia Computer Science 4, 726-731.

LOS has 4 ‘silver bullets’ 1. Publication of data using Linked Data principles 2. Open source and need-based environments, 3. Cloud computing use, 4. Creative commons.

Parsons, M. (2011) Expert Report on Data Policy – Open Access. Available at http://151.1.219.218/57883ed7-88bc-4e6f-92ed-3af6e96600be.pdf.

Tenopir, Carol, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. “Data Sharing by Scientists: Practices and Perceptions.” PLoS ONE 6, no. 6 (2011): e21101. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0021101

Borgman, C. (2012) The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6) 1059-1078.

Selected specific studies on aspects of data archiving and sharing.

Hrynaszkiewicz, I. and Altman, D. (2009). Towards agreement on best practice for publishing raw clinical trial data. Trials 10(17) 1-5. Available at http://www.biomedcentral.com/content/pdf/1745-6215-10-17.pdf

Groves, T. (2009) Managing UK research data for future use. BMJ 338 b1252. Available at http:www.bmj.com/content/338/bmj.b1252.Full?tab=response-form/

De Roure, D. et al. (2009) Towards open science: the myexperiment approach. Concurrency and Computation: Practice and Experience (submitted 2009). Available at http://eprints.soton.ac.uk/267270/

Colin Elman, Diana Kapiszewski and Lorena Vinuela (2010). Qualitative Data Archiving: Rewards and Challenges. PS: Political Science & Politics, 43 , pp 23-27 doi:10.1017/S104909651099077X

Moore, R. and Anderson, W. (2010) ASIS&T Research Data Access and Preservation Summit: conference summary. Bulletin of the American Society for Information Science and Technology 36(6) 42-45.

Planta, A. et al (2010) The enduring value of social science research: the use and reuse of primary research data. In: The Organisation, economics and Policy of scientific Research Workshop, Torino, Italy, April 2010. Available at http://www.carloalberto.org/files/brick_dime_strike_workshopagenda_april2010/.pdf

Eschenfelder, K. and Johnson, A. (2011) The limits of sharing: controlled data collections. Proceedings of the American Society for Information Science & Technology 48(1) 1-10.

Neveol, A. et al (2011) Extraction of data deposition statements from the literature: a method for automatically tracking research results. Bioinformatics 27(23) 3306-3312.

Ingwersen, P. and Chavan, V. (2011) Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure. BMC Bioinformatics 12(S3).

Korjonen, M. (2012) Clinical trial information: developing an effective model of dissemination and a framework to improve transparency. UCL PhD thesis. Available at http://discovery.ucl.ac.uk/1344051/

Bibliography

Bailey, C. (2012) Research Data Curation Bibliography. Houston: Digital Scholarship. Available at http://digital-scholarship.org/rdcb/rdcb.htm

Approaches the question from a library/archive perspective.

Advertisements

Stakeholder Consultation

A crucial component of the JoRD project is now under way. Central to the building of a case for the JoRD policy bank is an in-depth consultation with stakeholders that have an interest in the policies and practices deployed by academic journal publishers with regards to data produced by researchers. These stakeholders naturally include publishers and journals, but also other players such as research funders, research administrators, data managers and librarians.

The consultation is intended to build on other strands of JoRD which are identifying and categorising current relevant data policies. We will thus seek to tease out the thinking and philosophy that underlies these policies, views about how these might develop and perceptions of what research data represents in the context of the publication process.

In the first instance, the consultation takes the form of semi-structured interviews with a selection of fifteen individuals. The questions that frame the interviews, reflecting the sort of issues outlined above, are attached here for information. The individuals concerned, ten of which come from the publishing world, have now been contacted and, with the exception of a couple of them from which a final confirmation is expected, all have readily expressed an interest and have agreed to take part. An interview schedule is being drawn up, covering the period 17 September to 12 October; indeed, as of today (20 September), the first interviews have already taken place.

The tight timescale of the project, along with budgetary constraints, limits the number of interviews that can realistically be carried out by mid-October. However, to capitalise on the positive reactions which the project has generated, and to enrich as much of possible the range of views that are being gathered, we are also asking an additional ten or so individuals to provide us with written responses to the interview questions. We are thus aiming to collate and synthesise the thoughts of about 25 key people. These will reflect a good diversity of circumstances; within the publishing world, we thus aim to represent the standpoints of commercial and learned society publishers; open-access and subscription based publishers; small and large organisations; university presses; and individual journals, where these have in place policies that are distinct from their parent publishing houses. Figuring among the non-publishing organisations to be consulted will be RCUK, HEFCE, DCC, JISC and ARMA – and hopefully, to provide an non-UK perspective, the Australian National Data Service.

The outputs from the interviews and responses to the questions will be synthesised into an interim report, to be produced during the second half of October. This in turn will form the basis of a discussion at an expert workshop, which will flesh out and refine salient points that will have emerged. The event is expected to take place at a date to be confirmed in late October or during the first half of November. Several of the interviewees have already agreed in principle to take part. More about this in a later post, once matters have progressed in the initial phases of the consultation.

Stéphane Goldstein

Invitation to a Focus Group – Monday 8th October (evening) – Nottingham

Do you live near Nottingham? – Would you like to take part in project JoRd and have your say?

Research Data, Data Sharing, and the Policies of Journals

The project is arranging a focus group evening in central Nottingham on the evening of Monday 8th October.

Do you generate research data? Do you share your data? How do you share it? How do you feel about this? Do you re-analyse the research data of other people? Would you welcome a policy bank of journal research data sharing policies? What are the issues involved in this area?

If you live in the East Midlands and want to come along to discuss the topic we would be pleased to hear your views.

This event has been scheduled as part of the programme of the Nottingham Cafe Scientifique et Culturel which operates from a Meetup site.

Details of the event can be found at the following link on the Meetup Group site:

http://www.meetup.com/nottingham-culture-cafe-sci/events/82740942/

In brief, the details are:

Date: Monday 8th October (evening)

Time: 8.30 for 8.45 p.m.

Venue: Lord Roberts Public House (Basement), 24 Broad Street, Nottingham (same street as the Broadway Cinema in the Hockley/Lace Market area).

RSVP: by email to Melanie Heeley to register your interest (melanie.heeley@nottingham.ac.uk)

Themes from the literature – obligations in the Life Sciences

 

LITERATURE REVIEW – USEFUL NOTES

The notes which are given below refer to the following publication:

National Academy of Sciences (2003). Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences.

Obtained online at http://www.nap.edu/catalog/10613.html

Some of the pertinent findings and discussion materials of the Committee on Responsibilities of Authorship in the Biological Sciences are given below.

 **************************

1. Community Standards and the Sharing of Materials and Data

The Uniform Principle for Sharing Integral Data and materials Expeditiously (UPSIDE) is given as follows (p.4):

Community standards for sharing publication-related data and materials should flow from the general principle that the publication of scientific information in intended to  move science forward. More specifically, the act of publishing is a quid pro quo in which authors receive credit and acknowledgment in exchange for disclosure of their scientific findings. An author;s obligation is not only to release data and materials to enable others to verify or replicate published findings (as journals already implicitly or explicitly require) but also to provide them in a form on which other scientists can build with further research. All members of the scientific community – whether working in academia, government, or a commercial enterprise – have equal responsibility for upholding community standards as participants in the publication system, and all should be equally able to derive benefits from it.

 **************************

2. Community Principles – Expectations and Authors

In addition to the UPSIDE statement, the committee identified five corollary principles associated with sharing publication-related data. These are given as follows (pp.5-7):

DATA AND SOFTWARE

Principle 1. Authors should include in their publications the data, algorithms, or other information that is central or integral to the publication – that is, whatever is necessary to support the major claims of the paper and would enable one skilled in the art to verify or replicate the claims.

Principle 2. If central or integral information cannot be included in the publication for practical reasons (for example, because a dataset is too large), it should be made freely (without restriction on its use for research purposes and at no cost) and readily accessible through other means (for example, on-line). Moreover, when necessary to enable further research, integral information should be made available in a form that enables it to be manipulated, analyzed, and combined with other scientific data.

Principle 3. If publicly accessible repositories for data have been agreed on by a community of researchers and are in general use, the relevant data should be deposited in one of these repositories by the time of publication.

MATERIALS

Principle 4. Authors of scientific publications should anticipate which materials integral to their publication are likely to be requested and should state in the “Materials and Methods” section or elsewhere how to obtain them.

Principle 5. If a material integral to a publication is patented, the provider of the material should make the material available under a license for research use

 **************************

3. Community recommendations – for all those who participate in the publication process

This includes:

  • Academic, government and industrial scientists
  • Scientific societies, publishers and editors of scientific journals
  • Institutions and organisations that conduct and fund scientific research

The committee made the following recommendations for future discussion in the workshop (pp 5-12):

Recommendation 1. The scientific community should continue to be involved in crafting appropriate terms of any legislation that provides additional database protection

Recommendation 2. It is appropriate for scientific reviewers of a paper submitted for publication to help identify materials that are integral to the publication and likely to be requested by others.

Recommendation 3. It is not acceptable for the provider of publication-related material to demand an exclusive license to commercialise a new substance that a recipient makes with the provider’s material or to require collaboration or coauthorship of future publications.

Recommendation 4. The merits of adopting a standard MTA should be examined closely by all institutions engaged in technology transfer, and efforts to streamline the process should be championed.

Recommendation 5. As a best practice, participants in the publication process should commit to a limit of 60 days to complete the negotiation of publication-related MTAs and transmit the requested materials and data.

Recommendation 6. Scientific journals should clearly and prominently state (in the instructions for authors and on their Websites) their policies for distribution of publication-related materials, data, and other information. Policies for sharing materials should include requirements for depositing materials in an appropriate repository. Policies for data sharing should include requirements for deposition of complex datasets in appropriate databases and for the sharing of software and algorithms integral to the findings being reported. The policies should also clearly state the consequences for authors who do not adhere to the policies and the procedure for registering complaints about noncompliance.

Recommendation 7. Sponsors of research and research institutions should clearly and prominently state their policies for distribution of publication-related materials and data by their grant or contract recipients or employees.

Recommendation 8. If an author does not comply with a request for data or materials in areasonable time period and the requestor has contacted the author to determine if extenuating circumstances may have caused the delay, it is acceptable for the requestor to contact the journal in which the paper was published. If that course of action is unsuccessful in due course, the requestor may reasonably contact the author’s university or other institution or the funder of the research in question for assistance.

Recommendation 9. Funding organisations should provide the recipients of research grants and contracts with the financial resources needed to support dissemination of publication- related data and materials.

Recommendation 10. Authors who have received data or materials from other investigators should acknowledge such contributions appropriately.

 **************************

4. Identified problem areas with data sharing due to rapid changes in the life-sciences discipline

The following problems were highlighted at the time – 2003 (p.18):

  • Disagreement and uncertainty about the responsibilities of authors to share data and materials
  • A sense that, in practice, publication-related materials and data are not always readily available to researchers who desire access to them
  • Suggestions that standards for sharing are not being enforced
  • Controversy over seemingly different application of journal policies to different authors
  • Questions about how standards and policies apply to various types of data and materials, such as large databases and software
  • Suggestions that standards for sharing may be in conflict with federal legislation that encourages commercialisation of the results of federally  funded research.
  • The prospect that new legal protections for databases, particularly in Europe, will complicate the development of comprehensive and consistent standards.
  • Uncertainty as to whether academic investigators should be treated differently from industry investigators with regard to the provision of access to their publication-related data or materials

 

Initial thoughts on what a model data sharing policy might contain….

A POSSIBLE START FOR A ‘MODEL DATA SHARING POLICY’

As a result of the work done on the initial methodology for carrying out the survey, the information found as part of the survey, and from thoughts obtained from reading the following publication:

National Academy of Sciences (2003). Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences.

Obtained online at http://www.nap.edu/catalog/10613.html

an initial ‘Model Data Sharing Policy’ for journals has been attempted as follows:

MODEL POLICY

  • GENERAL POLICY STATEMENT OR PREMISE – (e.g. Nature Publishing Group = “An inherent principle of publication is that others should be able to replicate and build upon the author’s published claims”. Chemical Society Reviews = “The RSC’s Electronic Supplementary Information (ESI) service is a free facility that enables authors to enhance and increase the impact of their articles”.)
  • WHOSE DATA SHARING POLICY IS IT? – (e.g. journal’s own, publisher’s own, society/association’s own, or refer to the ethics of the discipline with a link to an external policy such as the American Psychological Association – Ethical Guidelines for Authorship)
  • WHAT IS TO BE MADE AVAILABLE – (e.g. PRIMARY MATERIALS [usually integral to the article] = data, materials, software, other, and SUPPLEMENTARY MATERIALS [usually to enhance an article] = multimedia, spreadsheets etc)
  • GUIDELINES FOR DATA FORMATS FOR EACH TYPE OF DATA – (discipline/external guidelines such as MIAME, internal journal guidelines such as multimedia file sizes and formats)
  • OTHER INSTRUCTIONS RELATED TO THE DATA – (e.g. how references to a crystal structure (the data) should appear in the actual article, whether multiple datasets should be combined, provide clear file names for supplementary material – metadata/DOIs)
  • REQUIRED OR REQUESTED OR OTHER – (for each type of data mentioned state whether it is a requirement of publication or only a suggestion, or even whether the journal prefers to limit the data as in the cases of some supplementary material type policies)
  • WHERE DIFFERENT TYPES OF DATA ARE TO BE HELD – (EXTERNAL= e.g. repository, database, website, and provide the web links to information about the online databases, INTERNAL= e.g. with online journal)
  • WHERE TO STATE WHAT DATA IS AVAILABLE AND HOW TO ACCESS IT – (e.g. state it in the Methods section, provide link to it from the online article)
  • WHEN IT IS TO BE MADE AVAILABLE – (e.g. pre-publication, to reviewers etc)
  • EMBARGO PERIODS – (are these allowed, why, how long for?)
  • ACCESSIBILITY OF DATA  – (open access, free, low cost, or other levels of restrictions)
  • WHAT OTHER TERMS AND CONDITIONS OF ACCESS TO THE DATA COULD OPERATE? – (e.g. related to the rights of recipient to use the material, Material Transfer Agreements ?)
  • ARE ANY EXCEPTIONS TO THE DATA POLICY ALLOWABLE? – (what would these be, to whom should they be referred for vetting? E.g. Journal of the National Cancer Institute –“authors are not expected to share materials that are difficult to obtain and cannot be propagated, nor are they expected to provided materials for commercial use”)
  • HOW MONITORING OF COMPLIANCE WILL OCCUR – (e.g. using accession numbers, other identifiers given by public databases, etc)
  • CONSEQUENCES OF NON COMPLIANCE WITH POLICY – (e.g. article not published, later retraction of article, refusal to publish future articles by that author)
  • HOW COMPLAINTS FROM OTHER RESEARCHERS ARE HANDLED IF THEIR REQUESTS ARE NOT MET – (how the journal will handle this)

Journal Research Data Policies – Survey

Carrying out the Survey

There are 4 main working spreadsheets for the survey of  Top/Bottom Science/Social Science Journals – approximating to 400 journals.

Each journal has now been reviewed and the results collated into the appropriate spreadsheet in accordance with the example given on the Project Data page of the JoRD Blog.

The  survey of  Journal Research Data Policies incorporates:

  • the top 100 and bottom 100 Science Journals
  • the top 100 and bottom 100 Social Science Journals

according to Thomson Reuters’ Journal Citation Reports.

This will help to shed light on the current state of data sharing policies within various journals.

Finding a Data Sharing Policy

Data sharing policies are likely to be found in:

  • Instructions for Authors
  • Author Guidelines
  • Submission Information
  • Publishing Policy
  • Open Access Policy
  • Data Sharing Policy
  • Data Accessibility Policy
  • Supplementary publication procedure
  • Supplementary online material
  • Ethical Guidelines

Some Preliminary Notes on the Data Collected From the Survey

  •  Science journals – many top rated journals have STRONG POLICIES relating to known/named data repositories which give accession numbers for the datasets entered (or similar) e.g. GENBANK.
  • Science/Social Science journals – with STRONG POLICIES are also likely to have one or more ‘supplementary data’ type policies (WEAK POLICIES).
  • STRONG POLICIES – usually monitored by Accession Number (or something related to the external storage of the data) which needs to be in the MS on submission – pre-monitoring rather than post-publication monitoring.
  • WEAK POLICIES – are mainly Supplementary Data type – and the data is mainly stored with the journal itself (although occasionally a link to an existing repository can be given).
  • WEAK POLICIES – Multimedia figures heavily in WEAK Supplementary Data type policies.
  • Some policies operate at publisher level – so there is a generic policy for many of the titles (e.g. Annual Reviews). Occasionally though there are policies specific to the individual journal for a given publisher (depending on the nature of the data/journal/editorial board).
  • Some journals did not provide EXPLICIT Data Sharing instructions in their Author Guidelines – however, this may be because there were instructions to the Author to follow Guidelines on other sites or Links which may make mention of Data Sharing e.g. Ethical policies, the guidelines of the American Psychological Association. This is part of the Ethical landscape of the discipline itself rather than the individual journal.
  • Bottom rated journals seemed less likely to have data sharing policies.
  • Some journals did not seem to have any obvious Author Guidelines at all (let alone Data Guidelines).
  • Some journals had broken links, so policies were unavailable on the day of review.
  • Impact of the nature of the data? – falsifiable/experimental data, with named repositories, having clear data formats, leads to more policies?

JoRD

Welcome to the JoRD (Journal Research Data Policy Bank) project blog – JoRD is a JISC funded initiative conducting a feasibility study into the scope and shape of a sustainable service that will collate and summarise journal policies on Research Data to provide researchers, managers of research data and other stakeholders with an easy source of reference to understand and comply with these policies.

This initiative is funded as part of the JISC Digital Infrastructure Programme