Another week, another presentation

Early this morning, well before normal work time, the dedicated Centre for Research Communication employees, Marianne and Jane, entered the special media communication room which contains the video conferencing equipment so that they could jointly present “Publisher Interest towards a role for Journals in Data Sharing: The Findings of the JoRD Project”. In the true spirit of global access and the digital world, they presented in Nottingham, UK and the presentation was seen at the ELPUB conference in Karleskrona, Sweden. We are pleased to report that the Nottingham technology worked really well, but a fellow presenter, also speaking through Adobe Connect, had difficulties with her connection and transmitted the sound of a large aircraft which was passing over the room where she was speaking. Jane and Marianne had chosen the high-tech route, because currently a tram line and bridge is being noisily constructed out side their office window, and had they decided to present from their computer, there would have been the sound of heavy machinery moving, beeps and rumbles, drilling and clangs.

Here is the link for the power-point slides:

JoRDELPUB

 

 

 

 

Advertisement

Prezis from the presentations

Last week was busy for the JoRD team. Jane did the presentation for ANDS, and Marianne appeared twice at Oxford, once to present a brief summary of the JoRD project to the Jisc organised “Now and Future of Data Publishing” event, and later in the week, to give a selection of the project findings to the Dryad Members meeting. The links to both the Oxford presentations follow, with a text summary.

The JoRD Project and its implications for repositories

http://prezi.com/ytir00evayoj/the-jord-project-and-implications-for-repositories/?kw=view-ytir00evayoj&rc=ref-5597897

JoRD and the implications for data sharing and repositories

1. The project was Jisc funded to explore the possibility of setting up a self sustaining data base and service to collate and summarise academic Journal policies on the deposition of data associated with published articles
2. Current belief that openly accessible research data is a good thing because it drives science forward
3. Aims Jisc funded project to look at the possibility of setting up a central resource of journal instructions to authors about sharing the data on which articles are based
4. Objectives
• Investigate current state of Journal data policies
• Investigate current data sharing views and habits
5. Landscape of data sharing There has always been data published in printed journals in the form of charts and tables
6. But digital data becomes a problem, where should it be stored? In a repository? On a website? Embedded into articles?
7. This is a journal data policy, it is an instruction to authors of where to share or deposit research data that is relevant to a published article
8. We initially analysed 230 research data policies and found many inconsistencies and a lack of standardisation
9. Some journals were vague about the form of data to be deposited, others were more precise
10. Some journals were specific about where the data should be deposited, most were less so.
11. Go back to the policy and explain
12. We spoke these stakeholder groups and we found a number of dichotomies
13. Taking researchers first, they said that they would be happy to share their data (with certain caveats, which I will not go into here). These were the reasons they gave for sharing data
14. However, when we asked how much they shared and where, most of them only shared with colleagues. Only a small number mentioned that they put their data into repositories
15. We asked them why that was, and they their replies ranged over No time, don’t know where, difficulty of accessing Institutional repositories. And that current research models do not value and encourage data sharing (A PhD researcher sated that he felt that if he shared his data during the course of the research, he may be “gazzumped”, meaning that should someone publish  research on his chosen topic, the thesis would no longer be unique and therefore the doctoral thesis would no longer be credited)
16. The publishers also showed a dichotomy whereas they also appreciated the benefits of sharing data, they felt that their servers would have difficulty holding the quantity of data included in each article and that repositories were the right place. However there was some discussion about the long term availability of repositories. They have not yet been proven, but the publishing houses have been around for a long time
17. Worries about links, etc
18. Academic librarians and Repository managers, no conflicting concerns, practicality
19. Data sharing landscape is a mess
20. How could a Jord Service improve the infra-structure?
• Develop a model data policy framework, which takes into account the concerns of all the stakeholders
21. Improved policies saves the time of publishers and authors, more consistent
22. Address the fears of IP, data citation etc, eliminating dichotomies, improving the infrastructures, creates order
• Implications for repositories, authors know where data can be deposited to be shared and re-used, more will do so.

The JoRD Project: Now and Future

http://prezi.com/ork2eo_6lb7x/the-jord-project-now-and-the-future/?kw=view-ork2eo_6lb7x&rc=ref-5597897

The JoRD Project: now and future

1. JISC funded feasibility study central resource of research Journal data policies
2. Looked at what the service should include and whether it could pay for itself
3. And 4 Tried to answer two questions
• Can Journal data policies encourage deposition of data?
• Will a JoRD service help publicly funded data to be shared and re-used?
5. Why bother? When an author publishes she is trading her intellectual property with a publisher, as part of a transaction and there are certain obligations on both sides, this can include data linked to the article. Author needs to know and understand what to do with it (reading the small print)
6. Needed to find out three things
• Understand current journal data policies
• Would anyone bother to use the service
• Could it generate sufficient income for development, building and maintenance?
7. We analysed some journal data policies in depth
8. Looked at 371 journals,
9. What was in the policies? Main areas were data type, when to deposit, and where
10. Little requirement for open access or compliance or consequences for non compliance
11. That does not provide an argument that journal data policies will help open data sharing
12. And 18 But there are signs that the situation is changing
• More publishers are considering data policies
• Elsevier Journal of the future
• Rise of data journals
• Apparent upward trend of journals with data policies
19. If there were a JoRD Service, would anyone use it?
20. All the stakeholders said that they would
21. For a variety of reasons BUT
22. They all wanted different things…
23. …apart from these, difficult to build one service
24. And will anyone pay for it?
25. Resounding no, except from publishers if the service was all singing and dancing
26. So, how does a JoRD service stand?
27. Now, with few policies stipulating deposit of data and stakeholders not financially contributing,
28. BUT… Let’s think of the future? The landscape is changing
29. Funders are asking for data plans to be included in funding bids
30. Universities are installing data management systems
31. Increase of data journals
32. And expectation that data should be included in articles
33. We have an opportunity to build a high quality data-base of existing journals data policies, which can be added to and maintained to a high level with simple user interface. Establish a user base and develop a sustainable business model which can be implemented in a later stage.
34. JoRD is the future And we should build it now when the quantity of data is smaller and the cost will be lower
35. Before the data deluge comes

A rather long post, but quite a brief summary

Here is a summary of the the project so far.

Sharing the data which is generated by research projects is increasingly being recognised as an academic priority by funders, researchers and publishers.  The issue of the policies on sharing set out by academic journals has been raised by scientific organisations, such as the US National Academy of Sciences, which urges journals to make clear statements of their sharing policies. On the other hand, the publishing community expresses concerns over the intellectual property implications of archiving shared data, whilst broadly supporting the principle of open and accessible research data .

The JoRD Project was a feasibility study on the possible shape of a central service on journal research data policies, funded by the UK JISC under its Managing Data Research Programme. It was carried out by the Centre for Research Communications Research at Nottingham University (UK) with contributions from the Research Information Network and Mark Ware Consulting Ltd. The project used a mix of methods to examine the scope and form of a sustainable, international service that would collate and summarise journal policies on research data for the use of researchers, managers of research data and other stakeholders. The purpose of the service would be to provide a ready reference source of easily accessible, standardised, accurate and clear guidance and information, on the journal policy landscape relating to research data. The specific objectives of the study were:  to identify the current state of journal data sharing policies; to investigate the views and practices of stakeholders; to develop an overall view of stakeholder requirements and possible service specifications; to explore the market base for a JoRD Policy Bank Service; and to investigate and recommend sustainable business models for the development of a JoRD Policy Bank Service

A review of relevant literature showed evidence that scientific institutions are attempting to draw attention to the importance of journal data policies and a sense that the scientific community in general is in favour of the concept of data sharing.  At the same time it seems to be the case that more needs to be done to convince the publishing world of the need for greater consistency in data policy and author guidelines, particularly on vital questions such as when and where authors should deposit data for sharing.

The study of journal policies which currently exist found that a large percentage of journals do not have a policy on data sharing, and that there are great inconsistencies between journal data sharing policies. Whilst some journals offered little guidance to authors, others stipulated specific compliance mechanisms. A valuable distinction is made in some policies between two categories of data: integral, which directly supports the arguments and conclusions of the article, and supplementary, which enhanced the article, but was not essential to its argument. What we considered to be the most significant study on journal policies (Piwowar & Chapman, 2008), defined journal data sharing policies as “strong”, “weak” or “non-existent”. A strong policy mandates the deposit of data as a condition of publication, whereas a weak policy merely requests the deposit of data. The  indication from previous studies that researchers’ data sharing behaviour is similarly inconsistent was confirmed by our online survey. However, there is general assent to the data sharing concept and many researchers who would be prepared to submit data for sharing along with the articles they submit to journals.

We then investigated a substantial sample of journal policies to establish our own picture of the policy landscape. A selection of 400 international and national journals were purposefully chosen to represent the top 200 most cited journals (high impact journals), and the bottom 200 least cited (low impact journals), equally shared between Science and Social Science, based on the Thomson Reuters citation index.  Each policy we identified relating to these journals was broken into different aspects such as: what, when and where to deposit data; accessibility of data; types of data; monitoring data compliance and consequences of non compliance. These were then systematically entered onto a matrix for comparison. Where no policy was found, this was indicated on the matrix. Policies were categorised as either being “weak”, only requesting that data is shared, or “strong”, stipulating that data must be shared.

Approximately half the journals examined had no data sharing policy. Nearly three quarters of the policies we found we assessed as weak and only just under one quarter we deemed to be strong (76%: 24%). The high impact journals were found to have the  strongest policies,  whereas not only did fewer low impact journals include a data sharing policy, those policies were  were less likely to stipulate data sharing, merely suggested that it may be done. The policies generally give little guidance on which stage of the publishing process is data expected to be shared.

Throughout the duration of the project, representatives from publishing and other stakeholders were consulted in different ways. Representatives of publishing were selected from a cross section of different types of publishing house; the researchers we consulted were self selected through open invitations by way of the JoRD Blog. Nine of them attend a focus group and 70 answered an online survey. They were drawn from every academic discipline and ranged over a total of 36 different subject areas. During the later phases of the study, a selection of representatives of stakeholder organisations was asked to explore the potential of the proposed JoRD service and to comment on possible business models. These included publishers, librarians, representatives of data centres or repositories, and other interested individuals. This aspect of the investigation included a workshop session with representatives of leading journal publishers in order to assess the potential for funding a JoRD Policy Bank service. Subsequently an analysis of comparator services and organisations was performed, using interviews and desk research.

Our conclusion from the various aspects of the investigation was that although idea of making scientific data openly accessible for share is widely accepted in the scientific community, the practice confronts serious obstacles. The most immediate of these obstacles is the lack of a consolidated infrastructure for the easy sharing of data. In consequence, researchers quite simply do not know how to share their data. At the present juncture, when policies are either not available, or provide inadequate guidance, researchers acknowledge a need for the kind of information that a policy bank would supply. The market base for a JoRD policy bank service would be the research community, and researchers did indicate they believed such a service would be used.

Four levels of possible business models for a JoRD service were identified and finally these were put to a range of stakeholders. These stakeholders found it hard to identify a clear cut option of service level that would be self sustaining. The funding models of similar services and organisations were also investigated. In consequence, an exploratory two phase implementation of a service is suggested. The first phase would be the development of a database of data sharing policies, engagement with stakeholders, third party API development with the intention to build use to the level at which a second phase, a self sustaining model, would be possible.

Some interesting news on Open Data

I’ve been away from the desk for a few days, here are some of the open data related news I found upon my return:

From January 2013, the BMJ will require there to be a commitment to make the relevant anonymised patient level data available on reasonable request, before publishing clinical trials results

http://www.bmj.com/content/345/bmj.e7304

Erin C. McKiernan, a researcher working primarily in experimental and theoretical neuroscience, asks Who owns research data and the rights to publish?

http://emckiernan.wordpress.com/2012/10/24/who-owns-research-data-and-the-rights-to-publish-it/

http://emckiernan.wordpress.com/2012/10/31/who-owns-research-data-and-the-rights-to-publish-part-ii/

And finally, what are the Benefits of Open Data – Impact on Economic Research

http://oanow.org/2012/11/the-benefits-of-open-data-impact-on-economic-research/

JISCMRD Programme Progress Workshop / DCC Institutional Engagements Workshop, Wed 24-Thu 25 October, Nottingham

Last week we have been attending the JISCMRD Programme meeting, here in Nottingham. This was an opportunity for projects on the JISC Managing Research Data Programme 2011-13 and DCC Institutional Engagements and associate projects to meet up and discuss progress.

The event covered

  • Institutional RDM policies; developing an institutional strategy and an ‘EPSRC’ roadmap.
  • Managing active data: storage, access, academic dropbox services.
  • Data management planning: developing good practice and providing effective support.
  • Data repositories and storage: options for repository service solutions.
  • Training & guidance.
  • Triage and handover: what to keep and where to entrust it?  Selection and appraisal; deposit and handover.
  • Business case: covering roles, responsibility, costing, sustainability, advocacy etc.
  • Data catalogues: metadata profiles, identifiers.

In addition to presentations, most projects also brought poster detailing their progress. We will tell you all about our poster next week.

It was very interesting to here from various projects around the UK, and how they are going about implementing data management plans, and data repositories at their institutions.

Of particular interest to JoRD project was PRIME.

PRIME along with PREPARDE and ourselves, are part of the JISC Managing Research Data: Innovative Research Data Publication strand.

PRIME (Publisher, repository and institutional metadata exchange), will be looking into automated ways for publishers, repositories and institutions to share metadata about datasets.

An example use case, would be that a researcher submits a data paper to a metajournal, which in turn shares the metadata on the dataset with subject and institutional repositories.

#jiscmrd #ukdcc

Literature Review – Articles Relevant to the Field

This bibliography of useful literature has been sitting in the draft section for some months, but as our study had now finished, and the feasibility study report is in the hands of Jisc, we are practising our own preaching and passing on out information to others who may be interested in this area. I am sorry, but it is a rather long list and looks tedious and boring.

More data will follow in the next few weeks.

LITERATURE REVIEW

An early paper on journal policies.

McCain, K. (1995) Mandating sharing: journal policies in the natural sciences. Science Communication 16, 403-431.

Baseline paper on journal policies (and examples of the other work of Piwowar and Chapman on data sharing).

Piwowar, H. and Chapman, W. (2008)  A review of journal policies for sharing research data   In: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 – Proceedings of the 12th International Conference on Electronic Publishing (ELPUB) June 25-27 2008, Toronto Canada. Available at http://ocs.library.utronto.ca/index.php/Elpub/2008/paper/view/684

Piwowar, H. and Chapman, W. (2008) Identifying data sharing in biomedical literature. AMIA Annual Symposium Proceedings, 596-600. Available at http://www.ncbi.nih.gov/pmc/articles/PM2655927

Piwowar, H. and Chapman, W. (2010) Public sharing of research datasets: a pilot study of associations. Journal of Info-metrics 4(2) 148-156. Available at http://www.sciencedirect.com/science/article/pii/S1751157709000881

Piwowar, H. and Chapman, W. (2010) Recall and bias of retrieving gene expression micro array datasets through PubMed identifiers. Journal of  Biomedical Discovery and Collaboration 5, 7-20. Available at http://www.ncbi.nih.gov/pmc/articles/PMC2990274

Piwowar, H. (2010) Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS One 6:7 07. Available at http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018657

Most recent work on best practice for scholarly publishing.

Shriger, D. et al (2006) The content of medical journal instructions for authors. Annals of Emergency Medicine 48(6), 742-749.

Looked at 166 journals and found contradictory policies and little guidance on methodological and statistical issues.

Smit, E. and Gruttemeier, H. (2011) Are scholarly publications ready for the data era? Suggestions for best practice guidelines and common standards for the integration of data and publications. New Review of Information Networking 16(1) 54-70.

Smit, E. (2011) Abelard and Heloise: why data and publications belong together. D-Lib Magazine 17(1-2). Available at http://www.dlib.org/dlib/january11/smit/01smit

Recent broad explorations of the issues.

Schriger, D. et al (2006) From submission to publication: a retrospective review of the tables and figures in a cohort of randomised controlled trials submitted to the British Medical journal. Annals of Emergency Medicine 48(6) 750-756.

Carpenter, T. (2009) Journal article supplementary materials: a Pandora’s box of issues needing best practices. Against the Grain 21(6) 84-85.

Neylon, C. (2009) Scientists lead the push for open data sharing. Research Information 41, 22-23.

Hodson, S. (2009) Data-sharing culture has changed. Research Information 45, p.12.

Fisher, J. and Fortmann, L. (2010) Governing the data commons: policy, practice and the advancement of science. Information and Management 47(4) 237-245.

Bizer, C., Heath, T. and Berners-Lee, T ( ? ) Linked data – the story so far. International Journal on Semantic web and Information Systems. Special Issue on Linked Data. Available at http://linkeddata.org/docs/ijswis-special-issue

Hrynaszkiewicz, I. (2011). The need and drive for open data in biomedical publishing. Serials 24(1) 31-37.

Bechhofer, S. et al (2011) Why linked data is not enough for scientists. Future Generation Computer Systems (forthcoming as of Aug 2011)

Kauppinen, T. and Espindola, G. (2011) Linked open science – communicating, sharing and evaluating data, methods and results for executable papers. Procedia Computer Science 4, 726-731.

LOS has 4 ‘silver bullets’ 1. Publication of data using Linked Data principles 2. Open source and need-based environments, 3. Cloud computing use, 4. Creative commons.

Parsons, M. (2011) Expert Report on Data Policy – Open Access. Available at http://151.1.219.218/57883ed7-88bc-4e6f-92ed-3af6e96600be.pdf.

Tenopir, Carol, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. “Data Sharing by Scientists: Practices and Perceptions.” PLoS ONE 6, no. 6 (2011): e21101. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0021101

Borgman, C. (2012) The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6) 1059-1078.

Selected specific studies on aspects of data archiving and sharing.

Hrynaszkiewicz, I. and Altman, D. (2009). Towards agreement on best practice for publishing raw clinical trial data. Trials 10(17) 1-5. Available at http://www.biomedcentral.com/content/pdf/1745-6215-10-17.pdf

Groves, T. (2009) Managing UK research data for future use. BMJ 338 b1252. Available at http:www.bmj.com/content/338/bmj.b1252.Full?tab=response-form/

De Roure, D. et al. (2009) Towards open science: the myexperiment approach. Concurrency and Computation: Practice and Experience (submitted 2009). Available at http://eprints.soton.ac.uk/267270/

Colin Elman, Diana Kapiszewski and Lorena Vinuela (2010). Qualitative Data Archiving: Rewards and Challenges. PS: Political Science & Politics, 43 , pp 23-27 doi:10.1017/S104909651099077X

Moore, R. and Anderson, W. (2010) ASIS&T Research Data Access and Preservation Summit: conference summary. Bulletin of the American Society for Information Science and Technology 36(6) 42-45.

Planta, A. et al (2010) The enduring value of social science research: the use and reuse of primary research data. In: The Organisation, economics and Policy of scientific Research Workshop, Torino, Italy, April 2010. Available at http://www.carloalberto.org/files/brick_dime_strike_workshopagenda_april2010/.pdf

Eschenfelder, K. and Johnson, A. (2011) The limits of sharing: controlled data collections. Proceedings of the American Society for Information Science & Technology 48(1) 1-10.

Neveol, A. et al (2011) Extraction of data deposition statements from the literature: a method for automatically tracking research results. Bioinformatics 27(23) 3306-3312.

Ingwersen, P. and Chavan, V. (2011) Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure. BMC Bioinformatics 12(S3).

Korjonen, M. (2012) Clinical trial information: developing an effective model of dissemination and a framework to improve transparency. UCL PhD thesis. Available at http://discovery.ucl.ac.uk/1344051/

Bibliography

Bailey, C. (2012) Research Data Curation Bibliography. Houston: Digital Scholarship. Available at http://digital-scholarship.org/rdcb/rdcb.htm

Approaches the question from a library/archive perspective.