Data comes in all sorts of shapes and sizes

The JoRD project has not set out to define the term “data” (or the singular form of the word, “datum”). This was a fortunate choice, because one of the messages that has clearly come across from all the participants of our study is that data can take many forms. The recent Royal Society Report, “Science as an Open Enterprise”, (http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE.pdf) includes a glossary of data terms which illustrates the ways in which the term “data” can be used. For example:

  • big data – data that requires massive computing power to process
  • broad data – structured big data
  • data set – a collection of  information held in electronic form
  • linked data – data that has been allocated a unique identifying number to be able to access it from an electronic storage facility

… and those are just a few terms that it explains. The word “Data” is defined as “Qualitative  or  quantitative statement or numbers that are (or assumed to be ) factual”. The researchers that were part of this study considered that their data took more forms that just statements or numbers.

Researchers described the data that their research generated as:  software, video footage, geodata, geological maps, ontologies, web services and data models , as can be seen in the table below. The multitude of forms therefore makes it difficult for publishers to include in their on-line published articles. The publishers said that linked data in a journal article should be  “fit for use” and “replicable” and consider that data in many different formats is “Messy” and currently is not supplied with sufficient meta-data. Another consideration is the resulting file size of an article if the publisher saves the embedded data on their own servers. Data repositories and data centres are the more practical method of data storage with published articles incorporating linked data.

Therefore that is one reason for Journals to have a data policy, and a good argument for those policies to be collected and made accessible in a centralised resource, a JoRD Policy Bank  Service.

Researchers description of data Qualitative(documents and text) Quantitative(figures) Visual data (images) Virtual data (software or protocols)
Collection of examiner reports and questions supervisory reports, letters and other documentary evidence.
Dataset of measurements and statistical analyses
Digitised Textual Sources
Excavation, field observation, environmental monitoring, software to collate mine and analyse
Excel sheets
Focus Group, Interview Transcripts, some footage of people using computers, digital photographs
Geodata
Geologic maps, chemical and isotopic analyses of Earth Materials, GIS datasets
Interview transcripts
Ontologies
Reports
Visualization
Web Services, Data Models and Specifications
Advertisement

News from America, “U-M, Sloan Foundation to enhance open access to research data”

“Professional associations, journals, data repositories and funding agencies must work together to make the entire scientific venture more transparent and to encourage broader access to research data,” said ICPSR Director George Alter. “The first step is to give scientists who produce important research data the recognition they deserve.”

U-M, Sloan Foundation to enhance open access to research data (http://www.ur.umich.edu/update/archives/121002/sloan)

The University of Michigans’ Inter-university Consortium for Political and Social Research and the Alfred P. Sloan Foundation are working together to promote open access to research data and improve the link between published works and the background data.

In particular, the ICPSR will be working with stakeholders within the social sciences, to improve:

  • Data citation
  • Transparency of research
  • Collaboration across scientific fields to study sustainable funding models for data repositories

Our survey work with JoRD has indicated that Social Sciences journals are behind Science journals in having policies on data sharing and archiving. This project has the potential to address this imbalance.

Inter-university Consortium for Political and Social Research

http://www.icpsr.umich.edu/icpsrweb/landing.jsp

Alfred P. Sloan Foundation

http://www.sloan.org/

Initial thoughts on what a model data sharing policy might contain….

A POSSIBLE START FOR A ‘MODEL DATA SHARING POLICY’

As a result of the work done on the initial methodology for carrying out the survey, the information found as part of the survey, and from thoughts obtained from reading the following publication:

National Academy of Sciences (2003). Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences.

Obtained online at http://www.nap.edu/catalog/10613.html

an initial ‘Model Data Sharing Policy’ for journals has been attempted as follows:

MODEL POLICY

  • GENERAL POLICY STATEMENT OR PREMISE – (e.g. Nature Publishing Group = “An inherent principle of publication is that others should be able to replicate and build upon the author’s published claims”. Chemical Society Reviews = “The RSC’s Electronic Supplementary Information (ESI) service is a free facility that enables authors to enhance and increase the impact of their articles”.)
  • WHOSE DATA SHARING POLICY IS IT? – (e.g. journal’s own, publisher’s own, society/association’s own, or refer to the ethics of the discipline with a link to an external policy such as the American Psychological Association – Ethical Guidelines for Authorship)
  • WHAT IS TO BE MADE AVAILABLE – (e.g. PRIMARY MATERIALS [usually integral to the article] = data, materials, software, other, and SUPPLEMENTARY MATERIALS [usually to enhance an article] = multimedia, spreadsheets etc)
  • GUIDELINES FOR DATA FORMATS FOR EACH TYPE OF DATA – (discipline/external guidelines such as MIAME, internal journal guidelines such as multimedia file sizes and formats)
  • OTHER INSTRUCTIONS RELATED TO THE DATA – (e.g. how references to a crystal structure (the data) should appear in the actual article, whether multiple datasets should be combined, provide clear file names for supplementary material – metadata/DOIs)
  • REQUIRED OR REQUESTED OR OTHER – (for each type of data mentioned state whether it is a requirement of publication or only a suggestion, or even whether the journal prefers to limit the data as in the cases of some supplementary material type policies)
  • WHERE DIFFERENT TYPES OF DATA ARE TO BE HELD – (EXTERNAL= e.g. repository, database, website, and provide the web links to information about the online databases, INTERNAL= e.g. with online journal)
  • WHERE TO STATE WHAT DATA IS AVAILABLE AND HOW TO ACCESS IT – (e.g. state it in the Methods section, provide link to it from the online article)
  • WHEN IT IS TO BE MADE AVAILABLE – (e.g. pre-publication, to reviewers etc)
  • EMBARGO PERIODS – (are these allowed, why, how long for?)
  • ACCESSIBILITY OF DATA  – (open access, free, low cost, or other levels of restrictions)
  • WHAT OTHER TERMS AND CONDITIONS OF ACCESS TO THE DATA COULD OPERATE? – (e.g. related to the rights of recipient to use the material, Material Transfer Agreements ?)
  • ARE ANY EXCEPTIONS TO THE DATA POLICY ALLOWABLE? – (what would these be, to whom should they be referred for vetting? E.g. Journal of the National Cancer Institute –“authors are not expected to share materials that are difficult to obtain and cannot be propagated, nor are they expected to provided materials for commercial use”)
  • HOW MONITORING OF COMPLIANCE WILL OCCUR – (e.g. using accession numbers, other identifiers given by public databases, etc)
  • CONSEQUENCES OF NON COMPLIANCE WITH POLICY – (e.g. article not published, later retraction of article, refusal to publish future articles by that author)
  • HOW COMPLAINTS FROM OTHER RESEARCHERS ARE HANDLED IF THEIR REQUESTS ARE NOT MET – (how the journal will handle this)