JISCMRD Programme Progress Workshop / DCC Institutional Engagements Workshop, Wed 24-Thu 25 October, Nottingham

Last week we have been attending the JISCMRD Programme meeting, here in Nottingham. This was an opportunity for projects on the JISC Managing Research Data Programme 2011-13 and DCC Institutional Engagements and associate projects to meet up and discuss progress.

The event covered

  • Institutional RDM policies; developing an institutional strategy and an ‘EPSRC’ roadmap.
  • Managing active data: storage, access, academic dropbox services.
  • Data management planning: developing good practice and providing effective support.
  • Data repositories and storage: options for repository service solutions.
  • Training & guidance.
  • Triage and handover: what to keep and where to entrust it?  Selection and appraisal; deposit and handover.
  • Business case: covering roles, responsibility, costing, sustainability, advocacy etc.
  • Data catalogues: metadata profiles, identifiers.

In addition to presentations, most projects also brought poster detailing their progress. We will tell you all about our poster next week.

It was very interesting to here from various projects around the UK, and how they are going about implementing data management plans, and data repositories at their institutions.

Of particular interest to JoRD project was PRIME.

PRIME along with PREPARDE and ourselves, are part of the JISC Managing Research Data: Innovative Research Data Publication strand.

PRIME (Publisher, repository and institutional metadata exchange), will be looking into automated ways for publishers, repositories and institutions to share metadata about datasets.

An example use case, would be that a researcher submits a data paper to a metajournal, which in turn shares the metadata on the dataset with subject and institutional repositories.

Stakeholder Consultation – Researchers and Public Engagement



The Focus group was carried out as one of the normal meetings of the Nottingham Café Scientifique et Culturel on the evening of Monday 8th October 2012. This society meets for the purposes of ‘public engagement’ with the latest ideas arising in science and culture. The audience is mainly comprised of academics, professionals, and students. The audience therefore has an interest in understanding research and associated matters. As the ‘general public’ they are also interested in how public money is spent on research, and what happens to the outputs that are gained from this research.

Prior to the Focus Group starting, the purpose of the Focus Group was explained to the participants. They were then asked to sign a consent form and given a sheet of suggested questions which they could refer to throughout the Focus Group. They were asked to provide either their experiences or opinions related to this area.

The following topics arose during the course of the Focus Group Meeting.


The focus group reported a variety of academic, practical (e.g.  professional purposes such as obtaining  community data to submit funding bids), and personal research projects (e.g. database of output of personal research interests in an academic field). The focus group participants seemed to have a data sharing mindset and overall felt that data should be shared.


People wondered where data should be submitted so that it did not get lost – this is important as it is a public record produced often at the public’s expense.

What is the best method of finding data?

Journals – People still publish via journals, people are used to this model, and it means that people then know where to look for research output.

Use of Google Scholar – Google Scholar can help with locating studies. But Google itself provides a search list which shows the items that are most frequently consulted, rather than necessarily showing those which are of better quality.

Institutional Repositories – However, these are not consistent from one organisation to another; they have different methods and the software can be configured differently. IRs may lead to searching Google instead.



At what point in the research process should the data be shared?

Should there be a choice about the timing of release?

Raw Data – Should the data be in its ‘raw’ state or should it be contextualised by the researcher first? The data in some of its early states may not be comprehensible or usable by others. In these states it could be liable to misuse. It may be better to release the data once it is determined that there are no errors in it which could lead to unreliable studies by other researchers.

Interpretation before release – If people are still processing the data, they may feel the need to interpret it before sharing it. They may thus wait until the PhD, or other report is finished, before going public with the actual data. People would not necessarily want to share their data prior to producing their publications in order to maximise the number of publications.

The nature of the data – It may depend on what people want to use the data for, and the nature of the data itself as to whether shared data is useful. Is this more of an issue for qualitative data which is based on the interpretation of the researcher, rather than quantitative data?

Relevance of the data – Should the data be released while it is still of interest? Old data may lose its relevance or appeal.

Peer Review – The data should be available for the review process to enable peer-review to check the data. This could however be a time consuming process. Not all reviewers may feel they have the time to check the data as well as the article to which it relates.


New outcomes – Other people may be able to produce fresh interpretations of the data to advance the subject. Different researchers may find patterns that other people have missed.

Preservation – data which is copied and updated by others is more likely to be preserved; it is also more likely to be checked and is thus more reliable.

Ensuring reliability – e.g. making pharmaceutical data open ensures that it is not ‘rubbish’ (see arguments of Ben Goldacre)

Producing a sharing culture – everyone sharing their data means that people cannot ‘bury’ flawed research.

Collaboration/Comprehensivity – sharing a personal database of research means that other people would be able to contribute; one person cannot collect all the data necessary for the project. This would then lead to a comprehensive database.

Pooling data – sharing data would enable data to be pooled from different sources.


Confidentiality – Issues of confidentiality were raised related to data sharing which would make it difficult to be shared.

Infrastructure – Lack of infrastructure in the researcher’s organisation may deter data sharing.

Preservation – Data formats: some are not straightforward; digital data may have been stored in formats that are no longer used (floppy discs for example); more reliable formats are needed; readers for obsolete data types may be required. What would assist with data preservation? (e.g. more reliable formats such as tablets of stone, the web).

Time – It is time consuming to prepare data for sharing.

Value judgments – Who is qualified to make a judgment on what data should be preserved, as not everything can be preserved?; Who should have the job of filtering other people’s minds? Will this lead to value judgments being made about some forms of data?

Knowledge is power – it is also access to future funding. People may be concerned about sharing data if it means that it is used by others in a way which prevents them from obtaining future funding to continue with the line of research.

Misuse – future analyses may be incorrect, or cherry-picking of the data may take place to aid a particular argument – and data which does not support the argument can be ignored.

Processed data – people may claim that the data has been fiddled with (processed in some unreliable way).

Lack of Knowledge of how to share data – Someone reported that they did not know how to share data but would like to be able to do this.

Information Overload – A data sharing culture may mean that eventually there is too much information out there to manage successfully.

New research – Research could become a process of analysing old datasets rather than producing new data. Science would then become a process of interpretation.

Different languages – This could be a barrier to collaboration and sharing.

Ownership disputes – There could be disputes between authors as to who owns the data.

Verification studies – Funders do not want to fund them, they are seen as low status and not worthwhile. Journals do not want to publish straightforward replication studies; they value newness but this does not mean that the study is necessarily worthwhile. Again there are value judgements being made here, but not by the researchers themselves. The researchers are at the joint mercy of funders and publishers.

New models of data sharing – The way data is shared changes frequently (e.g. CD v iTunes model); people have to keep up to date with the environment of sharing.

Financial models – publishers need to make money, may impede the process of data sharing? OA needs to find a way of being sustainable.


Data Citation – This ensures that all data re-use is cited so that the original researcher(s) get(s) the credit for the data they have produced.

How to incentivise?  – Given that University promotion is based on new research and high impact journals, how can researchers be incentivised to share their data if they perceive that this may weaken their professional progress?

Peer review – Peer review of data could lead to public attributions of merit.


Free information – One attendee wanted to make their personal research available – but wanted the access to be free.

Researcher pays? – This seems like vanity publishing to one of the attendees.


Someone mentioned that they could not find statistical data to back up their research but anecdotal, qualitative research supported their assumption. If they had waited for the supporting figures it would have taken too long to set their community project in motion. This is why community groups are now commissioning research.

Overview of policy types from the Science journals in the sample

Policy Types – Science Publications

From an analysis, the following sections represent various different policy types represented in the sample of Science publications.

Integral – Data/Materials/Software (Integral to your article)

Various policies talk about the data, materials and software etc that have been generated or used in the study, which would be integral to the study findings and necessary for subsequent study replication/verification purposes or to enable other researchers to build on the findings. These are illustrated below.

1. Data Release and Materials Release Policies


Cold Spring Harbor Laboratory Press: Genome Research (Top Science)

This is a clearly laid out and extensive ‘Life Science’ type policy denoting that it is a condition of publication of the journal that materials required to replicate the work must be made freely available – this principle needs to be agreed to on acceptance. Data should also be made as freely accessible as possible prior to publication. There are clear guidelines about the location of materials and a whole set of weblinks are given for the locations of the following types of material:

  • Sequence data
  • Genotype/Phenotype and genomic variation data
  • Microarray data
  • Proteomics and molecular interactions

Accession numbers must be included in the abstract. The policy says that if reasonable requests are not honoured then researchers should contact the Editor

BioMed Central: Multidisciplinary Respiratory Medicine (Bottom Science)

In the Instructions for Authors, the following data types are listed in their policy under ‘Data and Materials release’; these are also fairly typical for Life Science disciplines:

  • Nucleotide Sequences
  • Protein Sequences
  • Mass spectrometry
  • Structures
  • Chemical structures and assays
  • Functional genomics data (such as microarray, RNA-seq or ChIP-seq data)
  • Computational Modelling
  • Plasmids

Each data type specifies the named databases for storing the data and gives weblinks for ease of access. Appropriate external guidelines for the data are given where appropriate such as MIAME. These materials are classed as “readily reproducible” and are to be made “freely available to any scientist wishing to use them for non-commercial purposes”. This ‘life science’ type policy only seems unusual in its timescale for inclusion of the Accession Number – which is in time to be included in the published article (rather than say with the submitted manuscript).

Cell Press :  A range of publications including e.g. Cell (Top Science)

Cell Press publications have a ‘Distribution of Materials and Data’ section which states that it is a term and condition of publishing for authors to be willing to distribute any materials (cells, DNA, antibodies, reagents, organisms, mouse strains, ES cells) and protocols. Structures should also have their relevant information lodged with named or appropriate databases. MIAME guidelines should be followed as appropriate. Authors should contribute additional data/materials to appropriate databases and repositories. Accession numbers are required.

The Royal Society of Chemistry: Chemical Society Reviews (Top Science)

RSC journals have very comprehensive guidelines for both single crystal and powder diffraction data.  In the case of the former, authors should prepare their work in CIF (Crystallographic Information File) format. For single crystal work, structural information should be deposited with the Cambridge Crystallographic Data Centre (CCDC) and upon submission of the manuscript the CCDC reference numbers will be requested. Powder diffraction data may be submitted as a CIF file via the RSC submissions service.

Nature Publishing Group (includes all journals published by Nature which have Nature in the title)

NPG has very full guidelines for ‘Availability of data and materials’, ‘Sharing Materials’, and ‘Sharing data sets’. They refer to various named repositories and databases for many types of materials and data. Guidelines such as MIAME are noted. There is also a comprehensive Further Reading list which encompasses Nature Journal editorials on these topics. http://www.nature.com/authors/policies/availability.html

2. Data/Materials Sharing as the ‘Ethical Guidelines’ of the discipline

Several publishers/publications refer to ‘ethical  guidelines’ which are part of the landscape of the discipline concerned. Professional conduct means that data/materials should be made available for appropriate researchers to allow for further analysis and review.

 American Chemical Society: Chemical Reviews (Top Science)

The American Chemical Society publishes a number of journals and has created a set of “Ethical Guidelines to Publication of Chemical Research”. Authors wishing to publish in journals such as Chemical Reviews are expected to follow these ethical guidelines. Part of the ‘Ethical Obligations of Authors’ states that “When requested, the authors should make every reasonable effort to provide data, methods, and samples of unusual materials……. to other researchers” and “Authors are encouraged to submit their data to a public database, where available”.

American Physical Society: Reviews of Modern Physics (Top Science)

See Under:  Ethics and Values (Guidelines for Professional Conduct) – Research Results:

“The results of research should be recorded and maintained in a form that allows analysis and review”

Elsevier: (e.g.  Progress in Polymer Science – Top Science)

Ethics in Research Publication – Data access and retention:

“Authors may be asked to provide the raw data in connection with a paper for editorial review, and should be prepared to provide public access to such data (consistent with the ALPSP-STM Statement on Data and Databases), if practicable, and should in any event be prepared to retain such data for a reasonable time after publication.

3. Data Sharing – not necessarily mandatory

BMJ Group – British Medical Journal (Top Science)

Authors are ‘encouraged’ to link their articles to external databases (no hosting to be done by BMJ) and then include a ‘data sharing statement’ at the end of the manuscript. This statement should state if data sharing is available or not, and if it is, where to obtain the information. BMJ is also interested in the informed consent of the research participants and reference to this should also be made in the statement.

Data sharing is thus not mandatory to the journal, but the journal recognises that it could be mandatory according to certain funders etc.

4. Database Linking – connecting with external databases

Elsevier – Current Opinion in Cell Biology (Top Science)

In the Author Information Pack, Elsevier draw attention to linking to external databases that help to build a better understanding of the described research:

“Elsevier encourages authors to connect articles with external databases”.

This is very vague, and is rather more ‘encouraging’ than ‘mandatory’.

 Nature Reviews – e.g. Neuroscience, Molecular Cell Biology (Top Science)

Nature Reviews are journals which publish reviews of existing data in different fields in any case – “Proteins, protein domains, genes and diseases are linked to specific pages in relevant and high-quality public databases”.

 5. Links to materials on an authors’ institutional website

American Physiological Society: Physiological Reviews (Top Science)

This journal permits one of the authors to provide a working URL from their institutional website (links to additional datasets and/or detailed methods and protocols) which is to be given in an Endnote in the manuscript – under the proviso that it is recognised that this material is not peer-reviewed and may be updated from time to time. It is for readers seeking to replicate or expand on the work.

Supplementary Materials

Supplementary materials are frequently of the request/suggest type and are lodged with the journal – they are mainly of the ‘enhancing your article type’ and often include multimedia. There are, however, exceptions to this general idea of article ‘enhancement’ as some of the supplementary materials could actually be classed as ‘integral’ to the article’s findings.

1. Request/Suggest type – and happy to accept it – usually submitted with the manuscript – published with the journal

Essentially similar to those which are prevalent in the Social Sciences – especially where the publisher is the same (e.g. Springer Publications such as Proceedings of the National Academy of Sciences, India Section B: Biological Sciences – Bottom Science).

Cell Press (See for example Immunity – Top Science)

They see ‘supplemental information’ as a useful resource, but recognise that it needs to be managed by structure and limits. The material is considered to be “additional or secondary support for the main conclusions” (thus implying not of the integral type). They require information to be submitted according to three headings: 1. Supplemental Data, 2. Supplemental Experimental Procedures, 3. Supplemental References. They give file formats and sizes. Alongside this, Immunity also has a Distribution of Materials and Data policy. Immunity is one of the journals to have more than one data policy.

Annual Reviews (See for example Astronomy and Astrophysics – Top Science)

A comprehensive ‘Supplemental Materials Policy’.  Preparation guidelines are provided, along with acceptable and unacceptable file types. This material is to be “supportive but not primary”.

Nature publications (e.g. Cell Biology – Top Science)

Supplementary information is not copy-edited, modifications after publication require a formal correction, guidelines are to be followed for it or publication may be delayed, each piece of supplementary material must be referred to at least once in the text of the main article. There is a comprehensive set of guidelines for SI.

The Lancet (e.g. Infectious Diseases, Neurology – Top Science)

Unlike other publications which refer to ‘supplementary’ or ‘supplemental’ material, publications by The Lancet tend to refer to ‘Guidelines for web extra material’, however these refer to fairly standard things such as text, tables, data, drug names, references, figures, and audio/video material. It is preferred that this material is submitted as one PDF with the paper, and it will be peer-reviewed.

The American Astronomical Society: Astrophysical Journal Supplement Series

The AAS have a policy on machine readable tables (MRT) whereby lengthy tables should be moved to MRT format. There are full guidelines about this.

2. Hosting this material is new to us

The Canadian Field Naturalist (Bottom Science) – “Supplementary Material”

“Supplementary material is a new feature for CFN so we do not know which file formats can and cannot be accepted; please consult our journal Manager with any question about specific formats”

This journal is just starting out on the process and has yet to clarify its procedures.

3. Supporting Information – but ‘essential’ or ‘central’ for understanding the main points of the article – with journal

Wiley Online Library: Angewandte Chemie International Edition (Top Science)

From the ‘Supporting Information’ section – here, although the information is classed under the heading of ‘supporting’, it is actually deemed ‘essential to understanding the article and includes “experimental procedures, spectroscopic data, graphics etc”, rather than just enhancing the article. There is a blurring here of supporting material with integral material. This is interesting here as the same journal also has a policy about Crystal Structure Analysis, in that Crystallographic data should not be sent as Supporting Information but should be lodged with the named Data Centres and deposition numbers must be supplied with the manuscript.

American Society of Clinical Oncology: Journal of Clinical Oncology (Top Science)

This journal “requires that large data sets central to the premise of a manuscript be submitted along with the original work as a supplemental file”. It does also state that data which can be submitted to a public database should be deposited and accession numbers provided.

 4. Supplementary Information – which should not be the sole evidence for the article

Wiley Online Library: Ecology Letters (Top Science)

From  the ‘Online Supplementary Information’ section – the journal clearly states that “the material published on the internet cannot be used as sole evidence for the print version of the article”. This implies that more integral data – the evidence base for the findings – should also be available elsewhere.

5. Supplemental Materials – only at the Editor’s discretion

Wiley Online Library: CA: A Cancer Journal for Clinicians (Top Science)

This journal states that Supplemental Materials presented as Appendices are not permitted and should be placed within the manuscript or eliminated. Supplemental materials are published at the Editor’s discretion. This journal is not really encouraging concerning the use of supplementary materials.

American Society for Microbiology: Microbiology and Molecular Biology Reviews

The Supplemental Material section states “Please avoid supplemental material”. It is an Editorial decision if any is to be published.

6. Supplementary Information – carefully controlling the volume of SI

Nature: Neuroscience, Immunology  (Top Science)

This publisher suggests with respect to these journals that since SI is proliferating and can be unwieldy “we have therefore decided to carefully control the volume of Supplementary Information”

New Data – Which is the Actual Publication itself


 IngenieraQuimica – Chemical Engineering (Bottom Social Science):

Under their ‘Write for the Site’ section they include:

“Post, articles, images related to chemical engineering, software or spreadsheets that you have prepared.”  Here the data to be shared becomes the article.

Overview of policy types from the Social Science journals in the sample

Policy Types – Social Sciences

 Integral – Data/Materials/Software (Integral to your article)

Like the Sciences, some Social Science publications also have policies for integral data.

1.  Integral data – but weak policy

This is the type of policy that should actually be strong in that it should really be monitored, and can indeed be monitored, but is referred to in terms which are quite weak implying that you can do this if you want to.


a) Elsevier – Schizophrenia Research (Top Social Science)

This policy refers to DNA sequences and GenBank Accession numbers – which in the case of strong policies are used to monitor that the data has been deposited.

However, the policy says “Many Elsevier journals cite “gene accession numbers”” and “Elsevier authors wishing to enable other scientists to use the accession numbers….” – which are not statements indicating that the data must be deposited as a requirement of publication.

b) The Royal College of Psychiatrists – The British Journal of Psychiatry (Top Social Science)

Under ‘Access to Data’ their policy states:

“If the study includes original data, at least one author must confirm that he or she had full access to all the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analysis. We strongly encourage authors to make their source data publicly available.”

This is the entirety of what it states and whilst it appears to be strong there is no indication of any monitoring or recommendations as to how the data should be made accessible. There is no recommendation here as to where data could be stored.

2. Integral data – the journal refers you to external Ethical Guidelines

a) Sage – Personality and Social Psychology Review (Top Social Science).

The Submission Guidelines refer you to the ethical guidelines of the American Psychological Association – these ethical guidelines contain the following statement:

8.14 Sharing Research Data for Verification
(a) After research results are published, psychologists do not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose, provided that the confidentiality of the participants can be protected and unless legal rights concerning proprietary data preclude their release. This does not preclude psychologists from requiring that such individuals or groups be responsible for costs associated with the provision of such information. http://www.apa.org/ethics/code/index.aspx?item=11

Here the journal policy refers you to data sharing policies that are part of the ethical landscape of the discipline.

This also obviously refers to any publication that is actually published by the American Psychological Association – e.g. Psychological Methods (Top Social Science)

b) Sage – American Sociological Review (Published in association with the American Sociological Association – Top Social Science)

Under the Manuscript Submission procedures, this journal refers you to the ethical guidelines of the American Sociological Association – these ethical guidelines contain the following statement:

“Sociologists make their data available after completion of a project or its major publications, except where proprietary agreements with employers, contractors, or clients preclude such accessibility or when it is impossible to share data and protect the confidentiality of the data or the anonymity of research participants (e.g. raw field notes or detailed information from ethnographic interviews)”

3. Integral Data – but refers to the analysis of pre-existing datasets

a) Physicians Postgraduate Press – The Journal of Clinical Psychiatry (official journal of the American Society for Clinical Psychopharmacology)

See under:  ‘Analyses of Preexisting Datasets’ – Here the author is not necessarily the creator of the original dataset but is required to provide details about how the dataset in question can be accessed.

Supplementary Materials (Enhancing your article)

1. Request/Suggest type – and happy to accept it – submitted with the manuscript – published with the journal


a) Taylor and Francis publications – “Adding multimedia and supplementary content to your article” (generic to the publications of the publisher)

  • Reviewed in connection with Journal of Spanish Cultural Studies and Asia Pacific Journal of Social Work and Development (Bottom Social Science Journals)

This policy makes a range of suggestions about what types of material would enhance the article and is happy to accept the material to be published with the journal.

This policy refers to Animations, Movie Files, Sound files, Text files, and Supplementary Material (pertinent and support the article).

A range of file formats, file sizes and other instructions are provided. The material must be submitted with the manuscript.

The policy is weak. The material is not a ‘requirement’ of publication.

Both Elsevier (Video Data and Supplementary Data) and the American Psychological Association Publications (Multimedia Files) have a similar generic policy on data which enhances articles.

b) Springer Publications – “Electronic Supplementary Material (generic to publisher)

  • Reviewed  in connection with Asia Europe Journal (Bottom Social Science Journal)

This generic policy similarly refers to Audio, Video, and Animations. But it does also make mention of more specialised formats such as .pdb (chemical), .wrl (VRML) and .tex.

This policy also refers to the “Accessibility” of the provided content (related to catering for disabilities etc).

c) Springer Publications: Studies in East European Thought – “Electronic Supplementary Material”

This also makes mention of large original data such as additional tables.

Wiley Blackwell also have a “Supporting Information” type policy which contains Multimedia elements but also refers to “native datasets and specialist software” (possibly moving into the integral data arena as in the section below).

2. Request/Suggest type – and happy to accept it – but you can also link to an external database or repository (but not your own website)

a) Maney Publishing: London Journal – “Supplementary Material” (Bottom Social Science)

Formats and instructions are given.

3. ‘Supplemental Type’ Materials – which should really be described as ‘Integral’ Materials

a) Project HOPE – Health Affairs – “Supplemental Materials” (Top Social Science)

Some of the materials are probably described as supplemental (and thus supplementary to the article itself) because they will be deposited with the journal. However, the material they refer to is not of the multimedia type (which would enhance the article) but concerns “supplying information that is necessary to evaluate the credibility of their work” and probably should therefore be described as ‘integral’. There is a definition issue at work here.

The journal is particularly keen on the full details of any regressions which have been used.

Other ‘supplemental materials’ “may” be submitted and are therefore properly of the ‘request/suggest’ type.

b) Lippincott Williams & Wilkins  – Epidemiology – “Online Supplemental Material” (Top Social Science)

Underneath the section on ‘Online Supplemental Material’ the journal makes reference to Questionnaires – which should also be provided as online supplemental material. As these are foundational to the actual dataset, they are properly classed as integral materials. This is one of the few mentions of a questionnaire in a data/materials policy. Questionnaires are emphasised separately here as they are very frequent research tools in the Social Sciences but they are only mentioned once in the policies under review in the JoRD project.


  • Mention of appropriate Databases and Repositories for Social Sciences are not much in evidence in the policies unless the journal discipline is more scientifically orientated. This begs the question of where Social Science related materials would be stored if policies were to be made STRONG in the Social Sciences. What are the qualitative data databases that need to be referred to? What would the equivalent of an Accession number be?
  • Not much mention is made of specifically Social Science types of data – e.g. Transcripts of Interviews, Focus Groups, Questions and Questionnaires, although some of the data may be implicit in Multimedia policies (e.g. videos of scenarios under investigation in the article – the deposit of such data is complicated by needing to gain the permission of the participants who may be taking part in the videos and recordings of interviews?? As below).
  • There is a debate in the Social Sciences about the nature of ‘data’ itself. Social Sciences debate the concept of whether data is ‘out there’ waiting to be found (positivist assumptions), or ‘constructed’ in a ‘reflexive’ manner between researcher and participant. Also, can the context of a previous research study be transferred to the new study – or does the new researcher bring a new reflexivity to the data in question.
  • The data landscape in the Humanities and Social Sciences is complicated by the data being collected needing to respect the anonymity of individual human subjects who may be recognisable from raw data such as field notes. Can totally raw data be provided in the Social Sciences? (this may also apply to Science journals and patient data though, as some patients may be recognisable from their symptoms).

National Archive of Computerized Data on Aging (NACDA)

Browsing data archives generally, the following was found amongst the pages of NACDA concerning why re-used data should be cited:

Citing data files in publications based on those data is important for several reasons:

  • Other researchers may want to replicate findings and need the citation to identify/locate the data.
  • Citations are harvested by key social sciences indexes, such as Web of Science, providing credit to the researchers.
  • Data producers and funding agencies can track citations to measure impact.


These statements demonstrate the incentivisation process for people to share their data and make it available for re-use. Benefits are accrued back the the original researcher(s) for having shared their data, but also the discipline itself becomes more impactful.



News from America, “U-M, Sloan Foundation to enhance open access to research data”

“Professional associations, journals, data repositories and funding agencies must work together to make the entire scientific venture more transparent and to encourage broader access to research data,” said ICPSR Director George Alter. “The first step is to give scientists who produce important research data the recognition they deserve.”

U-M, Sloan Foundation to enhance open access to research data (http://www.ur.umich.edu/update/archives/121002/sloan)

The University of Michigans’ Inter-university Consortium for Political and Social Research and the Alfred P. Sloan Foundation are working together to promote open access to research data and improve the link between published works and the background data.

In particular, the ICPSR will be working with stakeholders within the social sciences, to improve:

  • Data citation
  • Transparency of research
  • Collaboration across scientific fields to study sustainable funding models for data repositories

Our survey work with JoRD has indicated that Social Sciences journals are behind Science journals in having policies on data sharing and archiving. This project has the potential to address this imbalance.

Some differences between the Sciences and the Social Sciences

Humanities and Social Sciences – what’s different compared to Science, Technology and Medicine (STM)?


The following summarises the key points of comparison from an article in Research Information:

  • Attitude to information? – one fifth of researchers in the life sciences and physical sciences rated print versions of current journal issues as useful for their research. In Arts and Humanities the figure was three fifths.
  • Funding of the research sectors? – Unlike STM, much research in the humanities and social sciences is produced by individual researchers without the support of a specific project grant (does not therefore cover publication costs). There is more funding in STM.
  • Journal prices – usually higher in STM fields than in the Humanities or Social Sciences.
  • Type of publication? – Humanities researchers generally value books rather than journals.
  • Where publishing? – STM’s main conduit for research dissemination is the academic journal. For Humanities and Social Sciences it is more of a mixed model.
  • What’s being written? – Humanities and Social Sciences tend to write long-form publications because their thoughts need more space. They value the extended argument.
  • Time sensitivity? – publication of Humanities research is often less time sensitive (e.g. you haven’t cured a disease for which people need to know the results quickly)

(from Pool, R. Open to debate – Information access in the Social Sciences and Humanities. Research Information. April/May 2010. Issue 47, pp.12-14)

Report from The Royal Society – Science as an Open Enterprise

Key Points of Relevance to the JoRD Project from a report by The Royal Society:

Science as an open enterprise (June 2012)

The full report can be found at the following location:


Areas for action

Six key areas for action are highlighted in the report:

  • Scientists need to be more open among themselves and with the public and media
  • Greater recognition needs to be given to the value of data gathering, analysis and communication
  • Common standards for sharing information are required to make it widely usable
  • Publishing data in a reusable form to support findings must be mandatory
  • More experts in managing and supporting the use of digital data are required
  • New software tools need to be developed to analyse the growing amount of data being gathered

Data analysis

The report gives the highlights of the results of the following study:

Public availability of published research data in high-impact journals

Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP.

PLoS One. 2011;6(9):e24357. Epub 2011 Sep 7.

[PubMed – indexed for MEDLINE]

Free PMC Article

Returning to the original article the following is found:



There is increasing interest to make primary data from published research publicly available. We aimed to assess the current status of making research data available in highly-cited journals across the scientific literature.


We reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available.


A substantial proportion of original research papers published in high-impact journals are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals. Journals should adopt more routinely policies for data sharing, expanding the types of data that are subject to public sharing policies with the ultimate target of covering all types of data. Moreover, it is essential to develop mechanisms for journals to ensure that existing data availability policies are consistently followed by researchers and published research findings are easily reproducible.