Data Citation: Data, Journals and Academic Publishers Webinar on YouTube

Early in the morning, last Tuesday I got up to give a presentation for the Australian National Data Service (ANDS) as part of their  Data Citation: Data, Journals and Academic Publishers webinar.

The full webinar, including a talk from Dr. Fiona Murphy of PREPARDE can be seen and heard here.


Online survey results part two

The second set of questions asked in the online survey ask for the opinions of researchers about data sharing and the usefulness of a data policy bank service. They are as follows:

  • Where do you access or locate the research output of other researchers?
  • In your opinion are the key drivers behind increasing access to research data?
  • In your opinion what are the main problems associated with sharing research data?
  • What do you think about linking a publication with digital data that are integral to its main conclusions?
  • What do you think about linking an article with supplementary material that enhances the article?
  • Do you think that journals should provide digital data sharing policies?
  • Do you think there would be benefits in having a service offering information about journal research data policies?
  • Would you use a service of this kind?
  • What information should be included in a policy bank service?
  • Do you have any other comments?

Most of the respondents locate other researcher’s data from colleagues or in their own institution or organisation and feel that the four most important key drivers to increasing access to data are:

  • Openness
  • Accountability
  • Increased access to data
  • Increased efficiency of research resources

The most frequently expressed concern is that of attribution of intellectual property right to the data being shared. The next frequently expressed issue is that current  institutional and establishment models and mindsets of institutions and some individuals create barriers to sharing data. However just over one-third of respondents (35%) consider that linking digital data as an integral part of  main conclusions in published online journals would be useful and should be mandatory.

Linking articles to supplementary data to enhance the article was considered useful by more respondents (43%) but it would also depend on the context of the data shared. Over 74% of researchers considered that journals should provide data sharing policies and a similar percentage (73%) thought that such a service would be of benefit, because it would be a central resource. Nearly 80% of respondents said that they would use such a service, either to gather data, or as a means of selecting where to publish their work. Many ideas of what to include in a policy data bank were suggested, which included:

  • Clarity and simplicity of use
  • Archiving URLs
  • Guidelines
  • Usage licences (eg Creative Commons)

Eight researchers commented that they considered the initiative important.

The least number of respondents said that they gather other research data from their own blog, or from hard copy data sets. The concerns expressed about sharing data were those of trust, confidentiality and the need to overcome existing mindsets and institutional barriers. A small number of researchers felt that sharing data would affect the future of research and that before sharing data certain conditions would have to be fulfilled. A very low number of people (3%) said that linking data to main conclusions was not useful and unnecessary; that they would only be interested in a published article, not in any additional material and that journals should not provide data sharing policies. One researcher commented that further research about the topic with a trial  would help their decision as to whether published data sharing policies would be of personal benefit.

Three percent of respondents thought that there would be no benefit to a data policy bank service, because it is not needed, not feasible or there would be conflicting journal ethos. Twenty one percent considered that they would not use such a service because they did not find it relevant and one researcher stated that they would prefer to deal directly with the journal.

On balance, it appears that more respondents are pro-data sharing, have positive opinions about the JoRD policy bank service and would find it useful, than respondents who feel that there is no need or use for such a service.

Preliminary Results of Online Questionnaire

The online questionnaire  closed on Monday 5th November and had been answered by 70 researchers. The survey comprised 20 questions asking for information about the researcher, their data sharing habits, their opinions of the possibility of openly sharing their data and the utility of a policy bank service. The first ten questions were as follows:

  • What is your academic discipline?
  • What is your subject?
  • How long have you been a researcher?
  • In which part of the world is your research institution based?
  • Do you generate research data/materials/programs etc?
  • What kind of data/materials/programs do you generate?
  • Where do you currently store you digital data?
  • Where do you currently store your non-digital data?
  • How accessible are your data/materials/programs to other researchers?
  • Are your data/materials/programs etc sharing habits going to change in the future?

Most of the respondents worked in the disciplines of Science or Social Science, however there were representatives from a substantial range of fields which means that the self selecting  sample was from a cross-section of research disciplines. The most frequently listed subject was some variety of Information Studies and around 33% of respondents were actively working on a PhD or M/Phil and roughly 30% had been post qualification researchers for between 5 – 14 years. The respondents were overwhelmingly based in Europe and nearly all of them considered that they generated some sort of data, which was mainly qualitative, but there was an equal balance between textual and numerical data.  Most people stored digital data on own computer and at a work server. The favoured form of other digital storage was Dropbox. However, when it came to non-digital data, many more people stored that at their workplace. Surprisingly around 56% of respondents already share their data, albeit with their colleagues. Slightly more researchers thought that they were unlikely to change their sharing habits (approx 37%) than change their sharing habits (36%).

The least number of respondents were from the field of Economics, one respondent was studying for a MSc, and fewer respondents had been working as researchers for over 15 years. Geographically, a very small number of respondents were based in South America and Africa, and a very few people answered that they did not generate any data. Visual Data was the least form generated. Few respondents stored digital data on a disciplinary digital or archive,  or non-digital data at an external repository. One correspondent appeared to destroy all raw data after research publication. None of the correspondents answered that they shared data with no-one, although certain researchers  shared only with their research partner. A few considered that they would share less of their data in future, while a small number of researchers were not able to share because of the sensitive nature of the data.

Questions 11 – 20 will be analysed and reported next week.


Literature Review – Articles Relevant to the Field

This bibliography of useful literature has been sitting in the draft section for some months, but as our study had now finished, and the feasibility study report is in the hands of Jisc, we are practising our own preaching and passing on out information to others who may be interested in this area. I am sorry, but it is a rather long list and looks tedious and boring.

More data will follow in the next few weeks.


An early paper on journal policies.

McCain, K. (1995) Mandating sharing: journal policies in the natural sciences. Science Communication 16, 403-431.

Baseline paper on journal policies (and examples of the other work of Piwowar and Chapman on data sharing).

Piwowar, H. and Chapman, W. (2008)  A review of journal policies for sharing research data   In: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 – Proceedings of the 12th International Conference on Electronic Publishing (ELPUB) June 25-27 2008, Toronto Canada. Available at

Piwowar, H. and Chapman, W. (2008) Identifying data sharing in biomedical literature. AMIA Annual Symposium Proceedings, 596-600. Available at

Piwowar, H. and Chapman, W. (2010) Public sharing of research datasets: a pilot study of associations. Journal of Info-metrics 4(2) 148-156. Available at

Piwowar, H. and Chapman, W. (2010) Recall and bias of retrieving gene expression micro array datasets through PubMed identifiers. Journal of  Biomedical Discovery and Collaboration 5, 7-20. Available at

Piwowar, H. (2010) Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS One 6:7 07. Available at

Most recent work on best practice for scholarly publishing.

Shriger, D. et al (2006) The content of medical journal instructions for authors. Annals of Emergency Medicine 48(6), 742-749.

Looked at 166 journals and found contradictory policies and little guidance on methodological and statistical issues.

Smit, E. and Gruttemeier, H. (2011) Are scholarly publications ready for the data era? Suggestions for best practice guidelines and common standards for the integration of data and publications. New Review of Information Networking 16(1) 54-70.

Smit, E. (2011) Abelard and Heloise: why data and publications belong together. D-Lib Magazine 17(1-2). Available at

Recent broad explorations of the issues.

Schriger, D. et al (2006) From submission to publication: a retrospective review of the tables and figures in a cohort of randomised controlled trials submitted to the British Medical journal. Annals of Emergency Medicine 48(6) 750-756.

Carpenter, T. (2009) Journal article supplementary materials: a Pandora’s box of issues needing best practices. Against the Grain 21(6) 84-85.

Neylon, C. (2009) Scientists lead the push for open data sharing. Research Information 41, 22-23.

Hodson, S. (2009) Data-sharing culture has changed. Research Information 45, p.12.

Fisher, J. and Fortmann, L. (2010) Governing the data commons: policy, practice and the advancement of science. Information and Management 47(4) 237-245.

Bizer, C., Heath, T. and Berners-Lee, T ( ? ) Linked data – the story so far. International Journal on Semantic web and Information Systems. Special Issue on Linked Data. Available at

Hrynaszkiewicz, I. (2011). The need and drive for open data in biomedical publishing. Serials 24(1) 31-37.

Bechhofer, S. et al (2011) Why linked data is not enough for scientists. Future Generation Computer Systems (forthcoming as of Aug 2011)

Kauppinen, T. and Espindola, G. (2011) Linked open science – communicating, sharing and evaluating data, methods and results for executable papers. Procedia Computer Science 4, 726-731.

LOS has 4 ‘silver bullets’ 1. Publication of data using Linked Data principles 2. Open source and need-based environments, 3. Cloud computing use, 4. Creative commons.

Parsons, M. (2011) Expert Report on Data Policy – Open Access. Available at

Tenopir, Carol, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. “Data Sharing by Scientists: Practices and Perceptions.” PLoS ONE 6, no. 6 (2011): e21101.

Borgman, C. (2012) The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6) 1059-1078.

Selected specific studies on aspects of data archiving and sharing.

Hrynaszkiewicz, I. and Altman, D. (2009). Towards agreement on best practice for publishing raw clinical trial data. Trials 10(17) 1-5. Available at

Groves, T. (2009) Managing UK research data for future use. BMJ 338 b1252. Available at

De Roure, D. et al. (2009) Towards open science: the myexperiment approach. Concurrency and Computation: Practice and Experience (submitted 2009). Available at

Colin Elman, Diana Kapiszewski and Lorena Vinuela (2010). Qualitative Data Archiving: Rewards and Challenges. PS: Political Science & Politics, 43 , pp 23-27 doi:10.1017/S104909651099077X

Moore, R. and Anderson, W. (2010) ASIS&T Research Data Access and Preservation Summit: conference summary. Bulletin of the American Society for Information Science and Technology 36(6) 42-45.

Planta, A. et al (2010) The enduring value of social science research: the use and reuse of primary research data. In: The Organisation, economics and Policy of scientific Research Workshop, Torino, Italy, April 2010. Available at

Eschenfelder, K. and Johnson, A. (2011) The limits of sharing: controlled data collections. Proceedings of the American Society for Information Science & Technology 48(1) 1-10.

Neveol, A. et al (2011) Extraction of data deposition statements from the literature: a method for automatically tracking research results. Bioinformatics 27(23) 3306-3312.

Ingwersen, P. and Chavan, V. (2011) Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure. BMC Bioinformatics 12(S3).

Korjonen, M. (2012) Clinical trial information: developing an effective model of dissemination and a framework to improve transparency. UCL PhD thesis. Available at


Bailey, C. (2012) Research Data Curation Bibliography. Houston: Digital Scholarship. Available at

Approaches the question from a library/archive perspective.