Prezis from the presentations

Last week was busy for the JoRD team. Jane did the presentation for ANDS, and Marianne appeared twice at Oxford, once to present a brief summary of the JoRD project to the Jisc organised “Now and Future of Data Publishing” event, and later in the week, to give a selection of the project findings to the Dryad Members meeting. The links to both the Oxford presentations follow, with a text summary.

The JoRD Project and its implications for repositories

JoRD and the implications for data sharing and repositories

1. The project was Jisc funded to explore the possibility of setting up a self sustaining data base and service to collate and summarise academic Journal policies on the deposition of data associated with published articles
2. Current belief that openly accessible research data is a good thing because it drives science forward
3. Aims Jisc funded project to look at the possibility of setting up a central resource of journal instructions to authors about sharing the data on which articles are based
4. Objectives
• Investigate current state of Journal data policies
• Investigate current data sharing views and habits
5. Landscape of data sharing There has always been data published in printed journals in the form of charts and tables
6. But digital data becomes a problem, where should it be stored? In a repository? On a website? Embedded into articles?
7. This is a journal data policy, it is an instruction to authors of where to share or deposit research data that is relevant to a published article
8. We initially analysed 230 research data policies and found many inconsistencies and a lack of standardisation
9. Some journals were vague about the form of data to be deposited, others were more precise
10. Some journals were specific about where the data should be deposited, most were less so.
11. Go back to the policy and explain
12. We spoke these stakeholder groups and we found a number of dichotomies
13. Taking researchers first, they said that they would be happy to share their data (with certain caveats, which I will not go into here). These were the reasons they gave for sharing data
14. However, when we asked how much they shared and where, most of them only shared with colleagues. Only a small number mentioned that they put their data into repositories
15. We asked them why that was, and they their replies ranged over No time, don’t know where, difficulty of accessing Institutional repositories. And that current research models do not value and encourage data sharing (A PhD researcher sated that he felt that if he shared his data during the course of the research, he may be “gazzumped”, meaning that should someone publish  research on his chosen topic, the thesis would no longer be unique and therefore the doctoral thesis would no longer be credited)
16. The publishers also showed a dichotomy whereas they also appreciated the benefits of sharing data, they felt that their servers would have difficulty holding the quantity of data included in each article and that repositories were the right place. However there was some discussion about the long term availability of repositories. They have not yet been proven, but the publishing houses have been around for a long time
17. Worries about links, etc
18. Academic librarians and Repository managers, no conflicting concerns, practicality
19. Data sharing landscape is a mess
20. How could a Jord Service improve the infra-structure?
• Develop a model data policy framework, which takes into account the concerns of all the stakeholders
21. Improved policies saves the time of publishers and authors, more consistent
22. Address the fears of IP, data citation etc, eliminating dichotomies, improving the infrastructures, creates order
• Implications for repositories, authors know where data can be deposited to be shared and re-used, more will do so.

The JoRD Project: Now and Future

The JoRD Project: now and future

1. JISC funded feasibility study central resource of research Journal data policies
2. Looked at what the service should include and whether it could pay for itself
3. And 4 Tried to answer two questions
• Can Journal data policies encourage deposition of data?
• Will a JoRD service help publicly funded data to be shared and re-used?
5. Why bother? When an author publishes she is trading her intellectual property with a publisher, as part of a transaction and there are certain obligations on both sides, this can include data linked to the article. Author needs to know and understand what to do with it (reading the small print)
6. Needed to find out three things
• Understand current journal data policies
• Would anyone bother to use the service
• Could it generate sufficient income for development, building and maintenance?
7. We analysed some journal data policies in depth
8. Looked at 371 journals,
9. What was in the policies? Main areas were data type, when to deposit, and where
10. Little requirement for open access or compliance or consequences for non compliance
11. That does not provide an argument that journal data policies will help open data sharing
12. And 18 But there are signs that the situation is changing
• More publishers are considering data policies
• Elsevier Journal of the future
• Rise of data journals
• Apparent upward trend of journals with data policies
19. If there were a JoRD Service, would anyone use it?
20. All the stakeholders said that they would
21. For a variety of reasons BUT
22. They all wanted different things…
23. …apart from these, difficult to build one service
24. And will anyone pay for it?
25. Resounding no, except from publishers if the service was all singing and dancing
26. So, how does a JoRD service stand?
27. Now, with few policies stipulating deposit of data and stakeholders not financially contributing,
28. BUT… Let’s think of the future? The landscape is changing
29. Funders are asking for data plans to be included in funding bids
30. Universities are installing data management systems
31. Increase of data journals
32. And expectation that data should be included in articles
33. We have an opportunity to build a high quality data-base of existing journals data policies, which can be added to and maintained to a high level with simple user interface. Establish a user base and develop a sustainable business model which can be implemented in a later stage.
34. JoRD is the future And we should build it now when the quantity of data is smaller and the cost will be lower
35. Before the data deluge comes


A rather long post, but quite a brief summary

Here is a summary of the the project so far.

Sharing the data which is generated by research projects is increasingly being recognised as an academic priority by funders, researchers and publishers.  The issue of the policies on sharing set out by academic journals has been raised by scientific organisations, such as the US National Academy of Sciences, which urges journals to make clear statements of their sharing policies. On the other hand, the publishing community expresses concerns over the intellectual property implications of archiving shared data, whilst broadly supporting the principle of open and accessible research data .

The JoRD Project was a feasibility study on the possible shape of a central service on journal research data policies, funded by the UK JISC under its Managing Data Research Programme. It was carried out by the Centre for Research Communications Research at Nottingham University (UK) with contributions from the Research Information Network and Mark Ware Consulting Ltd. The project used a mix of methods to examine the scope and form of a sustainable, international service that would collate and summarise journal policies on research data for the use of researchers, managers of research data and other stakeholders. The purpose of the service would be to provide a ready reference source of easily accessible, standardised, accurate and clear guidance and information, on the journal policy landscape relating to research data. The specific objectives of the study were:  to identify the current state of journal data sharing policies; to investigate the views and practices of stakeholders; to develop an overall view of stakeholder requirements and possible service specifications; to explore the market base for a JoRD Policy Bank Service; and to investigate and recommend sustainable business models for the development of a JoRD Policy Bank Service

A review of relevant literature showed evidence that scientific institutions are attempting to draw attention to the importance of journal data policies and a sense that the scientific community in general is in favour of the concept of data sharing.  At the same time it seems to be the case that more needs to be done to convince the publishing world of the need for greater consistency in data policy and author guidelines, particularly on vital questions such as when and where authors should deposit data for sharing.

The study of journal policies which currently exist found that a large percentage of journals do not have a policy on data sharing, and that there are great inconsistencies between journal data sharing policies. Whilst some journals offered little guidance to authors, others stipulated specific compliance mechanisms. A valuable distinction is made in some policies between two categories of data: integral, which directly supports the arguments and conclusions of the article, and supplementary, which enhanced the article, but was not essential to its argument. What we considered to be the most significant study on journal policies (Piwowar & Chapman, 2008), defined journal data sharing policies as “strong”, “weak” or “non-existent”. A strong policy mandates the deposit of data as a condition of publication, whereas a weak policy merely requests the deposit of data. The  indication from previous studies that researchers’ data sharing behaviour is similarly inconsistent was confirmed by our online survey. However, there is general assent to the data sharing concept and many researchers who would be prepared to submit data for sharing along with the articles they submit to journals.

We then investigated a substantial sample of journal policies to establish our own picture of the policy landscape. A selection of 400 international and national journals were purposefully chosen to represent the top 200 most cited journals (high impact journals), and the bottom 200 least cited (low impact journals), equally shared between Science and Social Science, based on the Thomson Reuters citation index.  Each policy we identified relating to these journals was broken into different aspects such as: what, when and where to deposit data; accessibility of data; types of data; monitoring data compliance and consequences of non compliance. These were then systematically entered onto a matrix for comparison. Where no policy was found, this was indicated on the matrix. Policies were categorised as either being “weak”, only requesting that data is shared, or “strong”, stipulating that data must be shared.

Approximately half the journals examined had no data sharing policy. Nearly three quarters of the policies we found we assessed as weak and only just under one quarter we deemed to be strong (76%: 24%). The high impact journals were found to have the  strongest policies,  whereas not only did fewer low impact journals include a data sharing policy, those policies were  were less likely to stipulate data sharing, merely suggested that it may be done. The policies generally give little guidance on which stage of the publishing process is data expected to be shared.

Throughout the duration of the project, representatives from publishing and other stakeholders were consulted in different ways. Representatives of publishing were selected from a cross section of different types of publishing house; the researchers we consulted were self selected through open invitations by way of the JoRD Blog. Nine of them attend a focus group and 70 answered an online survey. They were drawn from every academic discipline and ranged over a total of 36 different subject areas. During the later phases of the study, a selection of representatives of stakeholder organisations was asked to explore the potential of the proposed JoRD service and to comment on possible business models. These included publishers, librarians, representatives of data centres or repositories, and other interested individuals. This aspect of the investigation included a workshop session with representatives of leading journal publishers in order to assess the potential for funding a JoRD Policy Bank service. Subsequently an analysis of comparator services and organisations was performed, using interviews and desk research.

Our conclusion from the various aspects of the investigation was that although idea of making scientific data openly accessible for share is widely accepted in the scientific community, the practice confronts serious obstacles. The most immediate of these obstacles is the lack of a consolidated infrastructure for the easy sharing of data. In consequence, researchers quite simply do not know how to share their data. At the present juncture, when policies are either not available, or provide inadequate guidance, researchers acknowledge a need for the kind of information that a policy bank would supply. The market base for a JoRD policy bank service would be the research community, and researchers did indicate they believed such a service would be used.

Four levels of possible business models for a JoRD service were identified and finally these were put to a range of stakeholders. These stakeholders found it hard to identify a clear cut option of service level that would be self sustaining. The funding models of similar services and organisations were also investigated. In consequence, an exploratory two phase implementation of a service is suggested. The first phase would be the development of a database of data sharing policies, engagement with stakeholders, third party API development with the intention to build use to the level at which a second phase, a self sustaining model, would be possible.

What to put in an ideal JoRD service

The Feasibility Study has been asking researchers, representatives of Publishing Houses, repository staff and librarians about their image of an ideal JoRD service to give some sort of indication of how to build a resource that will be useful. So far, the most ideal service which would achieve the desires of all the stakeholders would not only include a database to contain all the details of every journal data sharing policy, cross-matched with funders requirements and lists of suitable repositories but also employ a team of human staff to constantly update the data base, provide customer service and advice about best practice and give educational workshops and seminars. This would be ideal, but expensive, and ideals cannot always be reached, at least not initially.

So, who wants what out of the service? These are the service requirements each stakeholder group suggested.

Researchers would like the service to:

  • Have a clear, visual user friendly website with technical support, and information about the service and its scope
  • Include summaries of policies, RCUK baseline policies, compliance statistics
  • Include the URL of journal policy
  • Provide contact details of researchers

Researchers told us that they would use the service to find the journal which is right for their data and funder’s requirements, find appropriate repositories and to look for openly accessed data.

Publishers asked for:

  • A simple attractive web page
  • An authoritative resource
  • Compliance monitoring and sanction information
  • Technical error reporting
  • Guidance about best practice, current issues, changes and trends and a model policy
  • A policy grading system
  • Levels of membership

Publishers said that they would use the service to gather competitor intelligence, a source of advice and as a central resource to get information about funder’s requirements and accredited repositories.

Both researchers and publishers wanted:

  • Guidelines about data submission,  such as copyright, use licensing, ethical clearance, restrictions and embargoes and file format
  • URLs of places where data can be archived and retrieved

As far as other stakeholders are concerned, librarians  considered that the service could give publication and funding compliance guidance for researchers as well as support research data management policies. Funders thought that the  service could track the development of Journal data policies and influence the data sharing behaviour of researchers. Representatives of repositories thought that a central data policy bank would be a resource where they could check consistency and compliance of journal data policies and possibly identify partner journals. It seems that a JoRD Policy Bank Service would have something to offer for everyone in the research industry. The quest now, as in all research activity, is finding someone who will pay, so that the ideal service will not be such a distant dream.

Data comes in all sorts of shapes and sizes

The JoRD project has not set out to define the term “data” (or the singular form of the word, “datum”). This was a fortunate choice, because one of the messages that has clearly come across from all the participants of our study is that data can take many forms. The recent Royal Society Report, “Science as an Open Enterprise”, ( includes a glossary of data terms which illustrates the ways in which the term “data” can be used. For example:

  • big data – data that requires massive computing power to process
  • broad data – structured big data
  • data set – a collection of  information held in electronic form
  • linked data – data that has been allocated a unique identifying number to be able to access it from an electronic storage facility

… and those are just a few terms that it explains. The word “Data” is defined as “Qualitative  or  quantitative statement or numbers that are (or assumed to be ) factual”. The researchers that were part of this study considered that their data took more forms that just statements or numbers.

Researchers described the data that their research generated as:  software, video footage, geodata, geological maps, ontologies, web services and data models , as can be seen in the table below. The multitude of forms therefore makes it difficult for publishers to include in their on-line published articles. The publishers said that linked data in a journal article should be  “fit for use” and “replicable” and consider that data in many different formats is “Messy” and currently is not supplied with sufficient meta-data. Another consideration is the resulting file size of an article if the publisher saves the embedded data on their own servers. Data repositories and data centres are the more practical method of data storage with published articles incorporating linked data.

Therefore that is one reason for Journals to have a data policy, and a good argument for those policies to be collected and made accessible in a centralised resource, a JoRD Policy Bank  Service.

Researchers description of data Qualitative(documents and text) Quantitative(figures) Visual data (images) Virtual data (software or protocols)
Collection of examiner reports and questions supervisory reports, letters and other documentary evidence.
Dataset of measurements and statistical analyses
Digitised Textual Sources
Excavation, field observation, environmental monitoring, software to collate mine and analyse
Excel sheets
Focus Group, Interview Transcripts, some footage of people using computers, digital photographs
Geologic maps, chemical and isotopic analyses of Earth Materials, GIS datasets
Interview transcripts
Web Services, Data Models and Specifications

Summary of workshop, discussion about the nature of JoRD

Here is another summary of the concluding discussion that took place at the workshop on 13th November. This is about the expectations and perceptions of publishers concerning the nature of the JoRD Data Bank service.

A prominent consideration of the publishers was that JoRD should be an authoritative resource, such that a JoRD compliance stamp, or quality mark, could be displayed on Journal’s websites. There was discussion that for JoRD to be authoritative, the content of the database should be added, updated and maintained by the JoRD team. It was mentioned that publishers might initially populate the data base, but ongoing maintenance would be the responsibility of JoRD. However, there should be a guarantee that the content is accurate and that publishers would need to commit to providing policies that can be machine readable in order for them to be automatically harvested.

It was suggested that the operational database should not be merely a static catalogue or encyclopaedia. It was requested that the non-compliance of a journal to a data sharing policy, or to a funder’s policy, could be flagged and reported to the publisher, although that request was queried as to whether that was the remit of the service, or the publisher themselves. Similarly, it was questioned whether the service would mediate user complaints, and proposed that it would engage with complaints concerning policies only. To maintain functionality, could there be automatic URL checking which would send an alert to the publisher if links were broken.  Updates to policy changes would also be a useful function.

The service website should include a model data policy framework or an example of a standard data policy and offer guidance and advice to journals and funders about policy development. However, the processing and ratification of a model policy could be a time consuming process to some publishers. It was asked whether repository policies would also be included, and there was mention of compliance with the OpenAIRE European repository network. The website should also contain:

  • Links to the publishers web-pages
  • Dates of the records
  • Lists of links to repositories
  • Set of criteria for data hosting repository

It should look inviting, but businesslike and be simple and clear, but be sufficiently detailed.

Methods of funding the service were considered and the benefits of membership. For example, would only the policies of members to the service be entered into the database? Would there be different levels of membership or different service options that publishers could choose? and would there be extra costs for extra services? One such service could be to contain historical records and persistent records to former policies. In the publisher’s opinion, they would be prepared to pay for a service that is transparent and would save them time.

Other comments included:

  • Would the service be a member of the World Data System?
  • Could it be released in Beta?
  • There are around 4-600 titles to enter initially
  • When set up the service could be studied to discover its effectiveness and impact
  • Further consultation may be needed

Very brief summary of JoRD workshop

On Tuesday 13th November some of the JoRD team met with representatives of several well known journal publishers for workshop a session to discuss a number of points concerning the potential JoRD data bank service. This is a very potted summary of the discussions that took place. If any of the attendees are reading this and feel that their comments have not been correctly interpreted, then please comment to correct any misunderstandings.

Preservation of and sustained access to published supplementary material: The current situation
The group perceived that at present there are a variety of issues that impede the maintenance of data added to an on-line journal as supplementary material, or even the practice of including data within an article. The areas where difficulties lie include:
• Technology
• Data repositories
• Embargoes
• Peer review
• Licensing
• Copyright
Unstable URLs, PDF formats and usable forms of preserved data present technological problems that need to be solved to ensure that data can be accessed in the long term. However, transferring data to new formats has fewer difficulties. Data may be linked to external repositories, but they present a problem because they each have different policies and practices. Embargoes placed on data release complicates matters, there is not standard for their length. To overcome these issues, an alternative solution would be not to include the data file with the article but to add information of where it can be obtained directly from the researcher. However, on-line journals will be upgrading to enriched HTML and should therefore commit to include data.

The group were concerned about the peer review of data, which is currently “Ad Hoc”. It was queried whether peer reviewers have time to examine data alongside judging arguments and suggested that data is reviewed by the research community. Currently publishers’ practices concerning licensing and copyrighting of data as supplementary material vary greatly. However EU legislation does not allow data to be copyrighted. Authors could be offered choices of licensing and work is being done to define data and on forms of data citation, however, publishers do feel a duty of care to the knowledge that they publish.

About data repositories: Advantages and disadvantages
Ideally, publishers would like repositories to be a searchable archive that manages data and collects retrospectively, such as the library of Columbia University gathering data for PLOS.


  • The situation for publishers would be made simpler should data be held in external repositories
  • Technically more able to deal with digital data
  • Guidelines about re-depositing data if closed
  • Institutional repositories could manage data then aggregate it as in Australia


  • May want to take over from publishers
  •  Not currently ready for influx of data
  • Funding may not be sustained
  • Discovery issues

Solutions to any of the issues posed above are not given in this post, but there is opportunity for you to comment. The remainder of the discussion focused on the structuring and content of a JoRD Policy Bank service, which will be summarised in the next post.

Online survey results part two

The second set of questions asked in the online survey ask for the opinions of researchers about data sharing and the usefulness of a data policy bank service. They are as follows:

  • Where do you access or locate the research output of other researchers?
  • In your opinion are the key drivers behind increasing access to research data?
  • In your opinion what are the main problems associated with sharing research data?
  • What do you think about linking a publication with digital data that are integral to its main conclusions?
  • What do you think about linking an article with supplementary material that enhances the article?
  • Do you think that journals should provide digital data sharing policies?
  • Do you think there would be benefits in having a service offering information about journal research data policies?
  • Would you use a service of this kind?
  • What information should be included in a policy bank service?
  • Do you have any other comments?

Most of the respondents locate other researcher’s data from colleagues or in their own institution or organisation and feel that the four most important key drivers to increasing access to data are:

  • Openness
  • Accountability
  • Increased access to data
  • Increased efficiency of research resources

The most frequently expressed concern is that of attribution of intellectual property right to the data being shared. The next frequently expressed issue is that current  institutional and establishment models and mindsets of institutions and some individuals create barriers to sharing data. However just over one-third of respondents (35%) consider that linking digital data as an integral part of  main conclusions in published online journals would be useful and should be mandatory.

Linking articles to supplementary data to enhance the article was considered useful by more respondents (43%) but it would also depend on the context of the data shared. Over 74% of researchers considered that journals should provide data sharing policies and a similar percentage (73%) thought that such a service would be of benefit, because it would be a central resource. Nearly 80% of respondents said that they would use such a service, either to gather data, or as a means of selecting where to publish their work. Many ideas of what to include in a policy data bank were suggested, which included:

  • Clarity and simplicity of use
  • Archiving URLs
  • Guidelines
  • Usage licences (eg Creative Commons)

Eight researchers commented that they considered the initiative important.

The least number of respondents said that they gather other research data from their own blog, or from hard copy data sets. The concerns expressed about sharing data were those of trust, confidentiality and the need to overcome existing mindsets and institutional barriers. A small number of researchers felt that sharing data would affect the future of research and that before sharing data certain conditions would have to be fulfilled. A very low number of people (3%) said that linking data to main conclusions was not useful and unnecessary; that they would only be interested in a published article, not in any additional material and that journals should not provide data sharing policies. One researcher commented that further research about the topic with a trial  would help their decision as to whether published data sharing policies would be of personal benefit.

Three percent of respondents thought that there would be no benefit to a data policy bank service, because it is not needed, not feasible or there would be conflicting journal ethos. Twenty one percent considered that they would not use such a service because they did not find it relevant and one researcher stated that they would prefer to deal directly with the journal.

On balance, it appears that more respondents are pro-data sharing, have positive opinions about the JoRD policy bank service and would find it useful, than respondents who feel that there is no need or use for such a service.

Overview of policy types from the Science journals in the sample

Policy Types – Science Publications

From an analysis, the following sections represent various different policy types represented in the sample of Science publications.

Integral – Data/Materials/Software (Integral to your article)

Various policies talk about the data, materials and software etc that have been generated or used in the study, which would be integral to the study findings and necessary for subsequent study replication/verification purposes or to enable other researchers to build on the findings. These are illustrated below.

1. Data Release and Materials Release Policies


Cold Spring Harbor Laboratory Press: Genome Research (Top Science)

This is a clearly laid out and extensive ‘Life Science’ type policy denoting that it is a condition of publication of the journal that materials required to replicate the work must be made freely available – this principle needs to be agreed to on acceptance. Data should also be made as freely accessible as possible prior to publication. There are clear guidelines about the location of materials and a whole set of weblinks are given for the locations of the following types of material:

  • Sequence data
  • Genotype/Phenotype and genomic variation data
  • Microarray data
  • Proteomics and molecular interactions

Accession numbers must be included in the abstract. The policy says that if reasonable requests are not honoured then researchers should contact the Editor

BioMed Central: Multidisciplinary Respiratory Medicine (Bottom Science)

In the Instructions for Authors, the following data types are listed in their policy under ‘Data and Materials release’; these are also fairly typical for Life Science disciplines:

  • Nucleotide Sequences
  • Protein Sequences
  • Mass spectrometry
  • Structures
  • Chemical structures and assays
  • Functional genomics data (such as microarray, RNA-seq or ChIP-seq data)
  • Computational Modelling
  • Plasmids

Each data type specifies the named databases for storing the data and gives weblinks for ease of access. Appropriate external guidelines for the data are given where appropriate such as MIAME. These materials are classed as “readily reproducible” and are to be made “freely available to any scientist wishing to use them for non-commercial purposes”. This ‘life science’ type policy only seems unusual in its timescale for inclusion of the Accession Number – which is in time to be included in the published article (rather than say with the submitted manuscript).

Cell Press :  A range of publications including e.g. Cell (Top Science)

Cell Press publications have a ‘Distribution of Materials and Data’ section which states that it is a term and condition of publishing for authors to be willing to distribute any materials (cells, DNA, antibodies, reagents, organisms, mouse strains, ES cells) and protocols. Structures should also have their relevant information lodged with named or appropriate databases. MIAME guidelines should be followed as appropriate. Authors should contribute additional data/materials to appropriate databases and repositories. Accession numbers are required.

The Royal Society of Chemistry: Chemical Society Reviews (Top Science)

RSC journals have very comprehensive guidelines for both single crystal and powder diffraction data.  In the case of the former, authors should prepare their work in CIF (Crystallographic Information File) format. For single crystal work, structural information should be deposited with the Cambridge Crystallographic Data Centre (CCDC) and upon submission of the manuscript the CCDC reference numbers will be requested. Powder diffraction data may be submitted as a CIF file via the RSC submissions service.

Nature Publishing Group (includes all journals published by Nature which have Nature in the title)

NPG has very full guidelines for ‘Availability of data and materials’, ‘Sharing Materials’, and ‘Sharing data sets’. They refer to various named repositories and databases for many types of materials and data. Guidelines such as MIAME are noted. There is also a comprehensive Further Reading list which encompasses Nature Journal editorials on these topics.

2. Data/Materials Sharing as the ‘Ethical Guidelines’ of the discipline

Several publishers/publications refer to ‘ethical  guidelines’ which are part of the landscape of the discipline concerned. Professional conduct means that data/materials should be made available for appropriate researchers to allow for further analysis and review.

 American Chemical Society: Chemical Reviews (Top Science)

The American Chemical Society publishes a number of journals and has created a set of “Ethical Guidelines to Publication of Chemical Research”. Authors wishing to publish in journals such as Chemical Reviews are expected to follow these ethical guidelines. Part of the ‘Ethical Obligations of Authors’ states that “When requested, the authors should make every reasonable effort to provide data, methods, and samples of unusual materials……. to other researchers” and “Authors are encouraged to submit their data to a public database, where available”.

American Physical Society: Reviews of Modern Physics (Top Science)

See Under:  Ethics and Values (Guidelines for Professional Conduct) – Research Results:

“The results of research should be recorded and maintained in a form that allows analysis and review”

Elsevier: (e.g.  Progress in Polymer Science – Top Science)

Ethics in Research Publication – Data access and retention:

“Authors may be asked to provide the raw data in connection with a paper for editorial review, and should be prepared to provide public access to such data (consistent with the ALPSP-STM Statement on Data and Databases), if practicable, and should in any event be prepared to retain such data for a reasonable time after publication.

3. Data Sharing – not necessarily mandatory

BMJ Group – British Medical Journal (Top Science)

Authors are ‘encouraged’ to link their articles to external databases (no hosting to be done by BMJ) and then include a ‘data sharing statement’ at the end of the manuscript. This statement should state if data sharing is available or not, and if it is, where to obtain the information. BMJ is also interested in the informed consent of the research participants and reference to this should also be made in the statement.

Data sharing is thus not mandatory to the journal, but the journal recognises that it could be mandatory according to certain funders etc.

4. Database Linking – connecting with external databases

Elsevier – Current Opinion in Cell Biology (Top Science)

In the Author Information Pack, Elsevier draw attention to linking to external databases that help to build a better understanding of the described research:

“Elsevier encourages authors to connect articles with external databases”.

This is very vague, and is rather more ‘encouraging’ than ‘mandatory’.

 Nature Reviews – e.g. Neuroscience, Molecular Cell Biology (Top Science)

Nature Reviews are journals which publish reviews of existing data in different fields in any case – “Proteins, protein domains, genes and diseases are linked to specific pages in relevant and high-quality public databases”.

 5. Links to materials on an authors’ institutional website

American Physiological Society: Physiological Reviews (Top Science)

This journal permits one of the authors to provide a working URL from their institutional website (links to additional datasets and/or detailed methods and protocols) which is to be given in an Endnote in the manuscript – under the proviso that it is recognised that this material is not peer-reviewed and may be updated from time to time. It is for readers seeking to replicate or expand on the work.

Supplementary Materials

Supplementary materials are frequently of the request/suggest type and are lodged with the journal – they are mainly of the ‘enhancing your article type’ and often include multimedia. There are, however, exceptions to this general idea of article ‘enhancement’ as some of the supplementary materials could actually be classed as ‘integral’ to the article’s findings.

1. Request/Suggest type – and happy to accept it – usually submitted with the manuscript – published with the journal

Essentially similar to those which are prevalent in the Social Sciences – especially where the publisher is the same (e.g. Springer Publications such as Proceedings of the National Academy of Sciences, India Section B: Biological Sciences – Bottom Science).

Cell Press (See for example Immunity – Top Science)

They see ‘supplemental information’ as a useful resource, but recognise that it needs to be managed by structure and limits. The material is considered to be “additional or secondary support for the main conclusions” (thus implying not of the integral type). They require information to be submitted according to three headings: 1. Supplemental Data, 2. Supplemental Experimental Procedures, 3. Supplemental References. They give file formats and sizes. Alongside this, Immunity also has a Distribution of Materials and Data policy. Immunity is one of the journals to have more than one data policy.

Annual Reviews (See for example Astronomy and Astrophysics – Top Science)

A comprehensive ‘Supplemental Materials Policy’.  Preparation guidelines are provided, along with acceptable and unacceptable file types. This material is to be “supportive but not primary”.

Nature publications (e.g. Cell Biology – Top Science)

Supplementary information is not copy-edited, modifications after publication require a formal correction, guidelines are to be followed for it or publication may be delayed, each piece of supplementary material must be referred to at least once in the text of the main article. There is a comprehensive set of guidelines for SI.

The Lancet (e.g. Infectious Diseases, Neurology – Top Science)

Unlike other publications which refer to ‘supplementary’ or ‘supplemental’ material, publications by The Lancet tend to refer to ‘Guidelines for web extra material’, however these refer to fairly standard things such as text, tables, data, drug names, references, figures, and audio/video material. It is preferred that this material is submitted as one PDF with the paper, and it will be peer-reviewed.

The American Astronomical Society: Astrophysical Journal Supplement Series

The AAS have a policy on machine readable tables (MRT) whereby lengthy tables should be moved to MRT format. There are full guidelines about this.

2. Hosting this material is new to us

The Canadian Field Naturalist (Bottom Science) – “Supplementary Material”

“Supplementary material is a new feature for CFN so we do not know which file formats can and cannot be accepted; please consult our journal Manager with any question about specific formats”

This journal is just starting out on the process and has yet to clarify its procedures.

3. Supporting Information – but ‘essential’ or ‘central’ for understanding the main points of the article – with journal

Wiley Online Library: Angewandte Chemie International Edition (Top Science)

From the ‘Supporting Information’ section – here, although the information is classed under the heading of ‘supporting’, it is actually deemed ‘essential to understanding the article and includes “experimental procedures, spectroscopic data, graphics etc”, rather than just enhancing the article. There is a blurring here of supporting material with integral material. This is interesting here as the same journal also has a policy about Crystal Structure Analysis, in that Crystallographic data should not be sent as Supporting Information but should be lodged with the named Data Centres and deposition numbers must be supplied with the manuscript.

American Society of Clinical Oncology: Journal of Clinical Oncology (Top Science)

This journal “requires that large data sets central to the premise of a manuscript be submitted along with the original work as a supplemental file”. It does also state that data which can be submitted to a public database should be deposited and accession numbers provided.

 4. Supplementary Information – which should not be the sole evidence for the article

Wiley Online Library: Ecology Letters (Top Science)

From  the ‘Online Supplementary Information’ section – the journal clearly states that “the material published on the internet cannot be used as sole evidence for the print version of the article”. This implies that more integral data – the evidence base for the findings – should also be available elsewhere.

5. Supplemental Materials – only at the Editor’s discretion

Wiley Online Library: CA: A Cancer Journal for Clinicians (Top Science)

This journal states that Supplemental Materials presented as Appendices are not permitted and should be placed within the manuscript or eliminated. Supplemental materials are published at the Editor’s discretion. This journal is not really encouraging concerning the use of supplementary materials.

American Society for Microbiology: Microbiology and Molecular Biology Reviews

The Supplemental Material section states “Please avoid supplemental material”. It is an Editorial decision if any is to be published.

6. Supplementary Information – carefully controlling the volume of SI

Nature: Neuroscience, Immunology  (Top Science)

This publisher suggests with respect to these journals that since SI is proliferating and can be unwieldy “we have therefore decided to carefully control the volume of Supplementary Information”

New Data – Which is the Actual Publication itself


 IngenieraQuimica – Chemical Engineering (Bottom Social Science):

Under their ‘Write for the Site’ section they include:

“Post, articles, images related to chemical engineering, software or spreadsheets that you have prepared.”  Here the data to be shared becomes the article.

Overview of policy types from the Social Science journals in the sample

Policy Types – Social Sciences

 Integral – Data/Materials/Software (Integral to your article)

Like the Sciences, some Social Science publications also have policies for integral data.

1.  Integral data – but weak policy

This is the type of policy that should actually be strong in that it should really be monitored, and can indeed be monitored, but is referred to in terms which are quite weak implying that you can do this if you want to.


a) Elsevier – Schizophrenia Research (Top Social Science)

This policy refers to DNA sequences and GenBank Accession numbers – which in the case of strong policies are used to monitor that the data has been deposited.

However, the policy says “Many Elsevier journals cite “gene accession numbers”” and “Elsevier authors wishing to enable other scientists to use the accession numbers….” – which are not statements indicating that the data must be deposited as a requirement of publication.

b) The Royal College of Psychiatrists – The British Journal of Psychiatry (Top Social Science)

Under ‘Access to Data’ their policy states:

“If the study includes original data, at least one author must confirm that he or she had full access to all the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analysis. We strongly encourage authors to make their source data publicly available.”

This is the entirety of what it states and whilst it appears to be strong there is no indication of any monitoring or recommendations as to how the data should be made accessible. There is no recommendation here as to where data could be stored.

2. Integral data – the journal refers you to external Ethical Guidelines

a) Sage – Personality and Social Psychology Review (Top Social Science).

The Submission Guidelines refer you to the ethical guidelines of the American Psychological Association – these ethical guidelines contain the following statement:

8.14 Sharing Research Data for Verification
(a) After research results are published, psychologists do not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose, provided that the confidentiality of the participants can be protected and unless legal rights concerning proprietary data preclude their release. This does not preclude psychologists from requiring that such individuals or groups be responsible for costs associated with the provision of such information.

Here the journal policy refers you to data sharing policies that are part of the ethical landscape of the discipline.

This also obviously refers to any publication that is actually published by the American Psychological Association – e.g. Psychological Methods (Top Social Science)

b) Sage – American Sociological Review (Published in association with the American Sociological Association – Top Social Science)

Under the Manuscript Submission procedures, this journal refers you to the ethical guidelines of the American Sociological Association – these ethical guidelines contain the following statement:

“Sociologists make their data available after completion of a project or its major publications, except where proprietary agreements with employers, contractors, or clients preclude such accessibility or when it is impossible to share data and protect the confidentiality of the data or the anonymity of research participants (e.g. raw field notes or detailed information from ethnographic interviews)”

3. Integral Data – but refers to the analysis of pre-existing datasets

a) Physicians Postgraduate Press – The Journal of Clinical Psychiatry (official journal of the American Society for Clinical Psychopharmacology)

See under:  ‘Analyses of Preexisting Datasets’ – Here the author is not necessarily the creator of the original dataset but is required to provide details about how the dataset in question can be accessed.

Supplementary Materials (Enhancing your article)

1. Request/Suggest type – and happy to accept it – submitted with the manuscript – published with the journal


a) Taylor and Francis publications – “Adding multimedia and supplementary content to your article” (generic to the publications of the publisher)

  • Reviewed in connection with Journal of Spanish Cultural Studies and Asia Pacific Journal of Social Work and Development (Bottom Social Science Journals)

This policy makes a range of suggestions about what types of material would enhance the article and is happy to accept the material to be published with the journal.

This policy refers to Animations, Movie Files, Sound files, Text files, and Supplementary Material (pertinent and support the article).

A range of file formats, file sizes and other instructions are provided. The material must be submitted with the manuscript.

The policy is weak. The material is not a ‘requirement’ of publication.

Both Elsevier (Video Data and Supplementary Data) and the American Psychological Association Publications (Multimedia Files) have a similar generic policy on data which enhances articles.

b) Springer Publications – “Electronic Supplementary Material (generic to publisher)

  • Reviewed  in connection with Asia Europe Journal (Bottom Social Science Journal)

This generic policy similarly refers to Audio, Video, and Animations. But it does also make mention of more specialised formats such as .pdb (chemical), .wrl (VRML) and .tex.

This policy also refers to the “Accessibility” of the provided content (related to catering for disabilities etc).

c) Springer Publications: Studies in East European Thought – “Electronic Supplementary Material”

This also makes mention of large original data such as additional tables.

Wiley Blackwell also have a “Supporting Information” type policy which contains Multimedia elements but also refers to “native datasets and specialist software” (possibly moving into the integral data arena as in the section below).

2. Request/Suggest type – and happy to accept it – but you can also link to an external database or repository (but not your own website)

a) Maney Publishing: London Journal – “Supplementary Material” (Bottom Social Science)

Formats and instructions are given.

3. ‘Supplemental Type’ Materials – which should really be described as ‘Integral’ Materials

a) Project HOPE – Health Affairs – “Supplemental Materials” (Top Social Science)

Some of the materials are probably described as supplemental (and thus supplementary to the article itself) because they will be deposited with the journal. However, the material they refer to is not of the multimedia type (which would enhance the article) but concerns “supplying information that is necessary to evaluate the credibility of their work” and probably should therefore be described as ‘integral’. There is a definition issue at work here.

The journal is particularly keen on the full details of any regressions which have been used.

Other ‘supplemental materials’ “may” be submitted and are therefore properly of the ‘request/suggest’ type.

b) Lippincott Williams & Wilkins  – Epidemiology – “Online Supplemental Material” (Top Social Science)

Underneath the section on ‘Online Supplemental Material’ the journal makes reference to Questionnaires – which should also be provided as online supplemental material. As these are foundational to the actual dataset, they are properly classed as integral materials. This is one of the few mentions of a questionnaire in a data/materials policy. Questionnaires are emphasised separately here as they are very frequent research tools in the Social Sciences but they are only mentioned once in the policies under review in the JoRD project.


  • Mention of appropriate Databases and Repositories for Social Sciences are not much in evidence in the policies unless the journal discipline is more scientifically orientated. This begs the question of where Social Science related materials would be stored if policies were to be made STRONG in the Social Sciences. What are the qualitative data databases that need to be referred to? What would the equivalent of an Accession number be?
  • Not much mention is made of specifically Social Science types of data – e.g. Transcripts of Interviews, Focus Groups, Questions and Questionnaires, although some of the data may be implicit in Multimedia policies (e.g. videos of scenarios under investigation in the article – the deposit of such data is complicated by needing to gain the permission of the participants who may be taking part in the videos and recordings of interviews?? As below).
  • There is a debate in the Social Sciences about the nature of ‘data’ itself. Social Sciences debate the concept of whether data is ‘out there’ waiting to be found (positivist assumptions), or ‘constructed’ in a ‘reflexive’ manner between researcher and participant. Also, can the context of a previous research study be transferred to the new study – or does the new researcher bring a new reflexivity to the data in question.
  • The data landscape in the Humanities and Social Sciences is complicated by the data being collected needing to respect the anonymity of individual human subjects who may be recognisable from raw data such as field notes. Can totally raw data be provided in the Social Sciences? (this may also apply to Science journals and patient data though, as some patients may be recognisable from their symptoms).

Literature Review – Articles Relevant to the Field

This bibliography of useful literature has been sitting in the draft section for some months, but as our study had now finished, and the feasibility study report is in the hands of Jisc, we are practising our own preaching and passing on out information to others who may be interested in this area. I am sorry, but it is a rather long list and looks tedious and boring.

More data will follow in the next few weeks.


An early paper on journal policies.

McCain, K. (1995) Mandating sharing: journal policies in the natural sciences. Science Communication 16, 403-431.

Baseline paper on journal policies (and examples of the other work of Piwowar and Chapman on data sharing).

Piwowar, H. and Chapman, W. (2008)  A review of journal policies for sharing research data   In: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 – Proceedings of the 12th International Conference on Electronic Publishing (ELPUB) June 25-27 2008, Toronto Canada. Available at

Piwowar, H. and Chapman, W. (2008) Identifying data sharing in biomedical literature. AMIA Annual Symposium Proceedings, 596-600. Available at

Piwowar, H. and Chapman, W. (2010) Public sharing of research datasets: a pilot study of associations. Journal of Info-metrics 4(2) 148-156. Available at

Piwowar, H. and Chapman, W. (2010) Recall and bias of retrieving gene expression micro array datasets through PubMed identifiers. Journal of  Biomedical Discovery and Collaboration 5, 7-20. Available at

Piwowar, H. (2010) Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS One 6:7 07. Available at

Most recent work on best practice for scholarly publishing.

Shriger, D. et al (2006) The content of medical journal instructions for authors. Annals of Emergency Medicine 48(6), 742-749.

Looked at 166 journals and found contradictory policies and little guidance on methodological and statistical issues.

Smit, E. and Gruttemeier, H. (2011) Are scholarly publications ready for the data era? Suggestions for best practice guidelines and common standards for the integration of data and publications. New Review of Information Networking 16(1) 54-70.

Smit, E. (2011) Abelard and Heloise: why data and publications belong together. D-Lib Magazine 17(1-2). Available at

Recent broad explorations of the issues.

Schriger, D. et al (2006) From submission to publication: a retrospective review of the tables and figures in a cohort of randomised controlled trials submitted to the British Medical journal. Annals of Emergency Medicine 48(6) 750-756.

Carpenter, T. (2009) Journal article supplementary materials: a Pandora’s box of issues needing best practices. Against the Grain 21(6) 84-85.

Neylon, C. (2009) Scientists lead the push for open data sharing. Research Information 41, 22-23.

Hodson, S. (2009) Data-sharing culture has changed. Research Information 45, p.12.

Fisher, J. and Fortmann, L. (2010) Governing the data commons: policy, practice and the advancement of science. Information and Management 47(4) 237-245.

Bizer, C., Heath, T. and Berners-Lee, T ( ? ) Linked data – the story so far. International Journal on Semantic web and Information Systems. Special Issue on Linked Data. Available at

Hrynaszkiewicz, I. (2011). The need and drive for open data in biomedical publishing. Serials 24(1) 31-37.

Bechhofer, S. et al (2011) Why linked data is not enough for scientists. Future Generation Computer Systems (forthcoming as of Aug 2011)

Kauppinen, T. and Espindola, G. (2011) Linked open science – communicating, sharing and evaluating data, methods and results for executable papers. Procedia Computer Science 4, 726-731.

LOS has 4 ‘silver bullets’ 1. Publication of data using Linked Data principles 2. Open source and need-based environments, 3. Cloud computing use, 4. Creative commons.

Parsons, M. (2011) Expert Report on Data Policy – Open Access. Available at

Tenopir, Carol, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. “Data Sharing by Scientists: Practices and Perceptions.” PLoS ONE 6, no. 6 (2011): e21101.

Borgman, C. (2012) The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6) 1059-1078.

Selected specific studies on aspects of data archiving and sharing.

Hrynaszkiewicz, I. and Altman, D. (2009). Towards agreement on best practice for publishing raw clinical trial data. Trials 10(17) 1-5. Available at

Groves, T. (2009) Managing UK research data for future use. BMJ 338 b1252. Available at

De Roure, D. et al. (2009) Towards open science: the myexperiment approach. Concurrency and Computation: Practice and Experience (submitted 2009). Available at

Colin Elman, Diana Kapiszewski and Lorena Vinuela (2010). Qualitative Data Archiving: Rewards and Challenges. PS: Political Science & Politics, 43 , pp 23-27 doi:10.1017/S104909651099077X

Moore, R. and Anderson, W. (2010) ASIS&T Research Data Access and Preservation Summit: conference summary. Bulletin of the American Society for Information Science and Technology 36(6) 42-45.

Planta, A. et al (2010) The enduring value of social science research: the use and reuse of primary research data. In: The Organisation, economics and Policy of scientific Research Workshop, Torino, Italy, April 2010. Available at

Eschenfelder, K. and Johnson, A. (2011) The limits of sharing: controlled data collections. Proceedings of the American Society for Information Science & Technology 48(1) 1-10.

Neveol, A. et al (2011) Extraction of data deposition statements from the literature: a method for automatically tracking research results. Bioinformatics 27(23) 3306-3312.

Ingwersen, P. and Chavan, V. (2011) Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure. BMC Bioinformatics 12(S3).

Korjonen, M. (2012) Clinical trial information: developing an effective model of dissemination and a framework to improve transparency. UCL PhD thesis. Available at


Bailey, C. (2012) Research Data Curation Bibliography. Houston: Digital Scholarship. Available at

Approaches the question from a library/archive perspective.