A rather long post, but quite a brief summary

Here is a summary of the the project so far.

Sharing the data which is generated by research projects is increasingly being recognised as an academic priority by funders, researchers and publishers.  The issue of the policies on sharing set out by academic journals has been raised by scientific organisations, such as the US National Academy of Sciences, which urges journals to make clear statements of their sharing policies. On the other hand, the publishing community expresses concerns over the intellectual property implications of archiving shared data, whilst broadly supporting the principle of open and accessible research data .

The JoRD Project was a feasibility study on the possible shape of a central service on journal research data policies, funded by the UK JISC under its Managing Data Research Programme. It was carried out by the Centre for Research Communications Research at Nottingham University (UK) with contributions from the Research Information Network and Mark Ware Consulting Ltd. The project used a mix of methods to examine the scope and form of a sustainable, international service that would collate and summarise journal policies on research data for the use of researchers, managers of research data and other stakeholders. The purpose of the service would be to provide a ready reference source of easily accessible, standardised, accurate and clear guidance and information, on the journal policy landscape relating to research data. The specific objectives of the study were:  to identify the current state of journal data sharing policies; to investigate the views and practices of stakeholders; to develop an overall view of stakeholder requirements and possible service specifications; to explore the market base for a JoRD Policy Bank Service; and to investigate and recommend sustainable business models for the development of a JoRD Policy Bank Service

A review of relevant literature showed evidence that scientific institutions are attempting to draw attention to the importance of journal data policies and a sense that the scientific community in general is in favour of the concept of data sharing.  At the same time it seems to be the case that more needs to be done to convince the publishing world of the need for greater consistency in data policy and author guidelines, particularly on vital questions such as when and where authors should deposit data for sharing.

The study of journal policies which currently exist found that a large percentage of journals do not have a policy on data sharing, and that there are great inconsistencies between journal data sharing policies. Whilst some journals offered little guidance to authors, others stipulated specific compliance mechanisms. A valuable distinction is made in some policies between two categories of data: integral, which directly supports the arguments and conclusions of the article, and supplementary, which enhanced the article, but was not essential to its argument. What we considered to be the most significant study on journal policies (Piwowar & Chapman, 2008), defined journal data sharing policies as “strong”, “weak” or “non-existent”. A strong policy mandates the deposit of data as a condition of publication, whereas a weak policy merely requests the deposit of data. The  indication from previous studies that researchers’ data sharing behaviour is similarly inconsistent was confirmed by our online survey. However, there is general assent to the data sharing concept and many researchers who would be prepared to submit data for sharing along with the articles they submit to journals.

We then investigated a substantial sample of journal policies to establish our own picture of the policy landscape. A selection of 400 international and national journals were purposefully chosen to represent the top 200 most cited journals (high impact journals), and the bottom 200 least cited (low impact journals), equally shared between Science and Social Science, based on the Thomson Reuters citation index.  Each policy we identified relating to these journals was broken into different aspects such as: what, when and where to deposit data; accessibility of data; types of data; monitoring data compliance and consequences of non compliance. These were then systematically entered onto a matrix for comparison. Where no policy was found, this was indicated on the matrix. Policies were categorised as either being “weak”, only requesting that data is shared, or “strong”, stipulating that data must be shared.

Approximately half the journals examined had no data sharing policy. Nearly three quarters of the policies we found we assessed as weak and only just under one quarter we deemed to be strong (76%: 24%). The high impact journals were found to have the  strongest policies,  whereas not only did fewer low impact journals include a data sharing policy, those policies were  were less likely to stipulate data sharing, merely suggested that it may be done. The policies generally give little guidance on which stage of the publishing process is data expected to be shared.

Throughout the duration of the project, representatives from publishing and other stakeholders were consulted in different ways. Representatives of publishing were selected from a cross section of different types of publishing house; the researchers we consulted were self selected through open invitations by way of the JoRD Blog. Nine of them attend a focus group and 70 answered an online survey. They were drawn from every academic discipline and ranged over a total of 36 different subject areas. During the later phases of the study, a selection of representatives of stakeholder organisations was asked to explore the potential of the proposed JoRD service and to comment on possible business models. These included publishers, librarians, representatives of data centres or repositories, and other interested individuals. This aspect of the investigation included a workshop session with representatives of leading journal publishers in order to assess the potential for funding a JoRD Policy Bank service. Subsequently an analysis of comparator services and organisations was performed, using interviews and desk research.

Our conclusion from the various aspects of the investigation was that although idea of making scientific data openly accessible for share is widely accepted in the scientific community, the practice confronts serious obstacles. The most immediate of these obstacles is the lack of a consolidated infrastructure for the easy sharing of data. In consequence, researchers quite simply do not know how to share their data. At the present juncture, when policies are either not available, or provide inadequate guidance, researchers acknowledge a need for the kind of information that a policy bank would supply. The market base for a JoRD policy bank service would be the research community, and researchers did indicate they believed such a service would be used.

Four levels of possible business models for a JoRD service were identified and finally these were put to a range of stakeholders. These stakeholders found it hard to identify a clear cut option of service level that would be self sustaining. The funding models of similar services and organisations were also investigated. In consequence, an exploratory two phase implementation of a service is suggested. The first phase would be the development of a database of data sharing policies, engagement with stakeholders, third party API development with the intention to build use to the level at which a second phase, a self sustaining model, would be possible.

Barriers to sharing data

There is a stereo-typical image of a covetous academic, dedicated to their work and who hoards the data for their research, so that no-one else will achieve the acclaim for their life’s work. Presumable this stereo-type arose from such stories as Isaac Newton and Gottfried Leibniz having a major dispute over which of them first discovered Calculus. In hindsight, both of them discovered it independently and both deserved acclaim. Charles Darwin kept his data on the “Origin of the Species” for very many years, before being persuaded to publish what turned out to be a popular science book of its day.

But we are not in the 17th or 19th Centuries, we are in the age of Information, Internet and global networks where collaboration has become respected. Teams of scientists are now rewarded, for example the Manchester University Physicists Andre Geim and Kostya Novoselov who won the Nobel Prize for Physics with their invention of Graphene. The Royal Society report “Science as an Open Enterprise” (http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE.pdf) describes how an outbreak of e-coli which originated in Hamburg was contained by the work of scientists in four continents who posted their analysis of the virus onto open source sites.  The genetic sequencing of the virus was completed by scientists in Hamburg and China, which was then posted onto an open source site with an open data license. In July of last year the European Commission published a press release outlining the measures that they will take to improve open access to scientific information that is produced in Europe, because the Commission feels that open access to data will improve Research and Development,and increase knowledge and  competitiveness in Europe (“Scientific data: open access to research results will boost Europe’s innovation capacity” http://europa.eu/rapid/press-release_IP-12-790_en.htm).

Such openness and swift communication is expected by today’s researcher. However, an EU study found that only 25% of researchers openly share their data.  The researchers that participated in our study expressed the desire to share their data, some were already sharing, but others found that although they wanted to share it was not easy to achieve. Many felt that there were barriers put in their way, one of which involved the old stereotype, they were not expected to share. For example, funding bodies may well be encouraging researchers to give open access to data that was paid for from public funds, but researchers believe that they will not get funding from using the data that someone else has collected although it would be an efficient and economical way of  carrying out research. Researchers also reported that universities attract funding for new projects, not for re-use of data, and there is more interest in publishing new research rather than replication studies.

Practical reasons were also mentioned, for instance personal barriers to sharing data were listed as:

  • Not knowing  where to deposit data
  • Lack of time and resources to undertake the deposit of data
  • Confidentiality and sensitivity of data, restrictions from funding body or breaking trust with research participants

Barriers in the wider scientific environment were reported as the difficulty in accessing data repositories because of lack of standardisation, and a poorly supported data sharing environment. It would seem that there are two main barriers to be crossed before the open sharing of data is completely commonplace. First the stereotype of the data hugging scientist must disappear from the minds of  researchers, funders, Higher Educational Institutions and publishing houses. Secondly, the infra-structure of  data deposit sites, how, when and where to deposit data, has to be fully resolved, publicised and implemented. Once again, it would appear that a JoRD Policy Bank Service would be of great value to researchers because it would supply a central resource of how, when and where to share data,  contribute to improving the data-depositing infra-structure and remove one barrier to the open access of data.

Summary of workshop, discussion about the nature of JoRD

Here is another summary of the concluding discussion that took place at the workshop on 13th November. This is about the expectations and perceptions of publishers concerning the nature of the JoRD Data Bank service.

A prominent consideration of the publishers was that JoRD should be an authoritative resource, such that a JoRD compliance stamp, or quality mark, could be displayed on Journal’s websites. There was discussion that for JoRD to be authoritative, the content of the database should be added, updated and maintained by the JoRD team. It was mentioned that publishers might initially populate the data base, but ongoing maintenance would be the responsibility of JoRD. However, there should be a guarantee that the content is accurate and that publishers would need to commit to providing policies that can be machine readable in order for them to be automatically harvested.

It was suggested that the operational database should not be merely a static catalogue or encyclopaedia. It was requested that the non-compliance of a journal to a data sharing policy, or to a funder’s policy, could be flagged and reported to the publisher, although that request was queried as to whether that was the remit of the service, or the publisher themselves. Similarly, it was questioned whether the service would mediate user complaints, and proposed that it would engage with complaints concerning policies only. To maintain functionality, could there be automatic URL checking which would send an alert to the publisher if links were broken.  Updates to policy changes would also be a useful function.

The service website should include a model data policy framework or an example of a standard data policy and offer guidance and advice to journals and funders about policy development. However, the processing and ratification of a model policy could be a time consuming process to some publishers. It was asked whether repository policies would also be included, and there was mention of compliance with the OpenAIRE European repository network. The website should also contain:

  • Links to the publishers web-pages
  • Dates of the records
  • Lists of links to repositories
  • Set of criteria for data hosting repository

It should look inviting, but businesslike and be simple and clear, but be sufficiently detailed.

Methods of funding the service were considered and the benefits of membership. For example, would only the policies of members to the service be entered into the database? Would there be different levels of membership or different service options that publishers could choose? and would there be extra costs for extra services? One such service could be to contain historical records and persistent records to former policies. In the publisher’s opinion, they would be prepared to pay for a service that is transparent and would save them time.

Other comments included:

  • Would the service be a member of the World Data System?
  • Could it be released in Beta?
  • There are around 4-600 titles to enter initially
  • When set up the service could be studied to discover its effectiveness and impact
  • Further consultation may be needed

Preliminary Results of Online Questionnaire

The online questionnaire  closed on Monday 5th November and had been answered by 70 researchers. The survey comprised 20 questions asking for information about the researcher, their data sharing habits, their opinions of the possibility of openly sharing their data and the utility of a policy bank service. The first ten questions were as follows:

  • What is your academic discipline?
  • What is your subject?
  • How long have you been a researcher?
  • In which part of the world is your research institution based?
  • Do you generate research data/materials/programs etc?
  • What kind of data/materials/programs do you generate?
  • Where do you currently store you digital data?
  • Where do you currently store your non-digital data?
  • How accessible are your data/materials/programs to other researchers?
  • Are your data/materials/programs etc sharing habits going to change in the future?

Most of the respondents worked in the disciplines of Science or Social Science, however there were representatives from a substantial range of fields which means that the self selecting  sample was from a cross-section of research disciplines. The most frequently listed subject was some variety of Information Studies and around 33% of respondents were actively working on a PhD or M/Phil and roughly 30% had been post qualification researchers for between 5 – 14 years. The respondents were overwhelmingly based in Europe and nearly all of them considered that they generated some sort of data, which was mainly qualitative, but there was an equal balance between textual and numerical data.  Most people stored digital data on own computer and at a work server. The favoured form of other digital storage was Dropbox. However, when it came to non-digital data, many more people stored that at their workplace. Surprisingly around 56% of respondents already share their data, albeit with their colleagues. Slightly more researchers thought that they were unlikely to change their sharing habits (approx 37%) than change their sharing habits (36%).

The least number of respondents were from the field of Economics, one respondent was studying for a MSc, and fewer respondents had been working as researchers for over 15 years. Geographically, a very small number of respondents were based in South America and Africa, and a very few people answered that they did not generate any data. Visual Data was the least form generated. Few respondents stored digital data on a disciplinary digital or archive,  or non-digital data at an external repository. One correspondent appeared to destroy all raw data after research publication. None of the correspondents answered that they shared data with no-one, although certain researchers  shared only with their research partner. A few considered that they would share less of their data in future, while a small number of researchers were not able to share because of the sensitive nature of the data.

Questions 11 – 20 will be analysed and reported next week.