The end

Sadly, this will be the last JoRD project post. A lot has happened since the start of this project. The active part of the projected finished last February, but ripples of activity have been continuing. There have been more presentations about JoRD at conferences in  Hamburg, Turkey and Helsinki (SWIB13 Hamburg 27th Nov 2013: Journal research data sharing policies: what they tell us about linked data potential,  oral presentation only; IATUL, Helsinki 2nd June 2014. Access to research data: addressing the problem through data sharing policies,  paper; IMCW Antalya Turkey, 24-26 Nov. Abstract only). As well as this activity, we have been quietly working away trying to get  a paper published about Journal Research Data policies, including a model policy that a journal may choose to adopt. Happily, the paper has been accepted by JASIST (http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2330-1643), and will be published in due course. Meanwhile, you can read the peer-reviewed version on Nottingham E-Prints (http://eprints.nottingham.ac.uk/3185/) or here. We have been approached by researchers to know more about the project, and as it has not passed into oblivion, it would suggest that there was a real need for the research and a desire for the product that would entail. Here is a brief summary of the project: The Journal Research Data (JoRD) project was formed to test the feasibility of developing a service that would collate and summarise the policies of academic journals about data which is associated with the articles that they publish. In order to complete that task:

  • A large sample of journals were examined and their data policies analysed
  • All the stakeholders concerned were consulted
  • Literature about journal data policies was reviewed
  • Business cases were explored
  • All the gathered data was analysed and recommendations were made about the prospective service

The main outcome of the project was a feasibility study report which was presented to Jisc, but it also generated presentations and posters at six international conferences. The promotion of the project in that way has contributed to its impact, the blog is still being viewed and occasional requests for more information are still being received at the CRC. The issue of data management is becoming increasingly more important, and the objectiveness of the research, taking the views of publishers, researchers and academic library staff into consideration gave the project value and validity. The project was a success because its thorough evaluation based on first hand evidence resulted in a workable recommendation for a trial JoRD service.  Attached to this blog there are the documents, or extracts from the documents, which show the evidence collected and the analysis that took place.

Going back to basics – reusing data

It is almost a year since the first set of data was gathered to analyse journal articles, and now the benefits of saving data well is becoming fruitful. Two things are happening that means we are getting the basic figures out, dusting them off and looking at them again. The first is a paper about the development of a model journal research data policy, which is being co-authored by the JoRD team members, and the second is in response to certain questions that various people are asking.

The idea of creating a model policy emerged from the mass of data that was being found in the analytical process, and it was based on what journals were already doing, and suggestions from the report “Sharing Publication-related Data and Materials: Responsibilities of Authorship in the Life Sciences” (Committee on Responsibilities of Authorship in the Biological Sciences, 2003,  http://www.nap.edu/openbook.php?isbn=0309088593). The report was the outcome of a workshop in the United States which involved Biological Scientists. The five principles and ten recommendations stated in the report were strongly in favour of open access to the data that underpins the research reported in published articles.  A summary of the principles and recommendations can be found here: http://www.councilscienceeditors.org/files/scienceeditor/v26n6p192-193.pdf. The report suggested that the data could either be included into the article , or deposited in a reputable repository and linked to the article. The focus of the first model data policy was therefore based on the rather patchy and inconsistent set of policies that were found, from less than half the journals we analysed, and a report which was biased towards one scientific discipline.  It was decided to compare the initial model data policy with the needs of the stakeholders, which were examined at a later stage in the JoRD project. This has entailed, not only going over the data gathered from the stakeholder interviews and questionnaire, but also digging retrospectively into the reasons for the initial model criteria to be chosen.

The second reason for examining the basic data has come from interesting questions asked by a number of bodies that know about the JoRD project, and therefore assume that the JoRD team are experts in the field of Journal research data policies, an assumption that is becoming increasingly true as more questions are answered. In order for the questions to be answered, the data needed to be looked at from a different perspective. For example, to answer “How many journals make sharing a requirement of publication?” the original data set was re-examined and journals counted, because the original analysis was looking at the number of policies, some journals having up to three different data policies. Here follows a table with figures from a journal perspective:

Results of Journal Survey
Total no. of Journals surveyed 371
Total no. of Journals with data sharing policies 162
Total no. of Journals that make sharing a requirement of publication 31
Total no. of Journals that enforce the policies 27
Total no. of Journals that state consequences for non compliance 7

This process is an illustration of the way that well organised data, saved  safely, and as in this case in digital form, can be re-used after a particular project has ended. Surely it is generally after research has been concluded that questions arise and the iterative process of dipping in and out of data to validate or extend the research then begins. The moral of this blog post? Manage your data well because you never know what you will asked.

Another week, another presentation

Early this morning, well before normal work time, the dedicated Centre for Research Communication employees, Marianne and Jane, entered the special media communication room which contains the video conferencing equipment so that they could jointly present “Publisher Interest towards a role for Journals in Data Sharing: The Findings of the JoRD Project”. In the true spirit of global access and the digital world, they presented in Nottingham, UK and the presentation was seen at the ELPUB conference in Karleskrona, Sweden. We are pleased to report that the Nottingham technology worked really well, but a fellow presenter, also speaking through Adobe Connect, had difficulties with her connection and transmitted the sound of a large aircraft which was passing over the room where she was speaking. Jane and Marianne had chosen the high-tech route, because currently a tram line and bridge is being noisily constructed out side their office window, and had they decided to present from their computer, there would have been the sound of heavy machinery moving, beeps and rumbles, drilling and clangs.

Here is the link for the power-point slides:

JoRDELPUB

 

 

 

 

Prezis from the presentations

Last week was busy for the JoRD team. Jane did the presentation for ANDS, and Marianne appeared twice at Oxford, once to present a brief summary of the JoRD project to the Jisc organised “Now and Future of Data Publishing” event, and later in the week, to give a selection of the project findings to the Dryad Members meeting. The links to both the Oxford presentations follow, with a text summary.

The JoRD Project and its implications for repositories

http://prezi.com/ytir00evayoj/the-jord-project-and-implications-for-repositories/?kw=view-ytir00evayoj&rc=ref-5597897

JoRD and the implications for data sharing and repositories

1. The project was Jisc funded to explore the possibility of setting up a self sustaining data base and service to collate and summarise academic Journal policies on the deposition of data associated with published articles
2. Current belief that openly accessible research data is a good thing because it drives science forward
3. Aims Jisc funded project to look at the possibility of setting up a central resource of journal instructions to authors about sharing the data on which articles are based
4. Objectives
• Investigate current state of Journal data policies
• Investigate current data sharing views and habits
5. Landscape of data sharing There has always been data published in printed journals in the form of charts and tables
6. But digital data becomes a problem, where should it be stored? In a repository? On a website? Embedded into articles?
7. This is a journal data policy, it is an instruction to authors of where to share or deposit research data that is relevant to a published article
8. We initially analysed 230 research data policies and found many inconsistencies and a lack of standardisation
9. Some journals were vague about the form of data to be deposited, others were more precise
10. Some journals were specific about where the data should be deposited, most were less so.
11. Go back to the policy and explain
12. We spoke these stakeholder groups and we found a number of dichotomies
13. Taking researchers first, they said that they would be happy to share their data (with certain caveats, which I will not go into here). These were the reasons they gave for sharing data
14. However, when we asked how much they shared and where, most of them only shared with colleagues. Only a small number mentioned that they put their data into repositories
15. We asked them why that was, and they their replies ranged over No time, don’t know where, difficulty of accessing Institutional repositories. And that current research models do not value and encourage data sharing (A PhD researcher sated that he felt that if he shared his data during the course of the research, he may be “gazzumped”, meaning that should someone publish  research on his chosen topic, the thesis would no longer be unique and therefore the doctoral thesis would no longer be credited)
16. The publishers also showed a dichotomy whereas they also appreciated the benefits of sharing data, they felt that their servers would have difficulty holding the quantity of data included in each article and that repositories were the right place. However there was some discussion about the long term availability of repositories. They have not yet been proven, but the publishing houses have been around for a long time
17. Worries about links, etc
18. Academic librarians and Repository managers, no conflicting concerns, practicality
19. Data sharing landscape is a mess
20. How could a Jord Service improve the infra-structure?
• Develop a model data policy framework, which takes into account the concerns of all the stakeholders
21. Improved policies saves the time of publishers and authors, more consistent
22. Address the fears of IP, data citation etc, eliminating dichotomies, improving the infrastructures, creates order
• Implications for repositories, authors know where data can be deposited to be shared and re-used, more will do so.

The JoRD Project: Now and Future

http://prezi.com/ork2eo_6lb7x/the-jord-project-now-and-the-future/?kw=view-ork2eo_6lb7x&rc=ref-5597897

The JoRD Project: now and future

1. JISC funded feasibility study central resource of research Journal data policies
2. Looked at what the service should include and whether it could pay for itself
3. And 4 Tried to answer two questions
• Can Journal data policies encourage deposition of data?
• Will a JoRD service help publicly funded data to be shared and re-used?
5. Why bother? When an author publishes she is trading her intellectual property with a publisher, as part of a transaction and there are certain obligations on both sides, this can include data linked to the article. Author needs to know and understand what to do with it (reading the small print)
6. Needed to find out three things
• Understand current journal data policies
• Would anyone bother to use the service
• Could it generate sufficient income for development, building and maintenance?
7. We analysed some journal data policies in depth
8. Looked at 371 journals,
9. What was in the policies? Main areas were data type, when to deposit, and where
10. Little requirement for open access or compliance or consequences for non compliance
11. That does not provide an argument that journal data policies will help open data sharing
12. And 18 But there are signs that the situation is changing
• More publishers are considering data policies
• Elsevier Journal of the future
• Rise of data journals
• Apparent upward trend of journals with data policies
19. If there were a JoRD Service, would anyone use it?
20. All the stakeholders said that they would
21. For a variety of reasons BUT
22. They all wanted different things…
23. …apart from these, difficult to build one service
24. And will anyone pay for it?
25. Resounding no, except from publishers if the service was all singing and dancing
26. So, how does a JoRD service stand?
27. Now, with few policies stipulating deposit of data and stakeholders not financially contributing,
28. BUT… Let’s think of the future? The landscape is changing
29. Funders are asking for data plans to be included in funding bids
30. Universities are installing data management systems
31. Increase of data journals
32. And expectation that data should be included in articles
33. We have an opportunity to build a high quality data-base of existing journals data policies, which can be added to and maintained to a high level with simple user interface. Establish a user base and develop a sustainable business model which can be implemented in a later stage.
34. JoRD is the future And we should build it now when the quantity of data is smaller and the cost will be lower
35. Before the data deluge comes

Data Citation: Data, Journals and Academic Publishers Webinar on YouTube

Early in the morning, last Tuesday I got up to give a presentation for the Australian National Data Service (ANDS) as part of their  Data Citation: Data, Journals and Academic Publishers webinar.

The full webinar, including a talk from Dr. Fiona Murphy of PREPARDE can be seen and heard here.

Jane

Some news after a long silence

The JoRD team have been distracted by other projects recently, while the feasibility study report was being read, digested and commented upon. After some useful suggestions by Simon Hodson of Jisc (http://www.jisc.ac.uk/contactus/staff/simonhodson.aspx) and Andrew Treloar of ANDS (Australian National Data Service, http://www.ands.org.au/contact.html) the report is now revised and ready to be submitted. While the report was being revised, the team have been working hard to achieve the dissemination of findings from the project by sending off abstracts to a number of conferences and accepting invitations for presentations. The team will be very active over the next three months and one or other team member will be found as speakers in the following places at the following times :

ANDS  Data citation webinar:

Tuesday May 21 4pm-5pm Eastern time (7am-8am British Summer Time)

to reserve a webinar seat go to https://www4.gotomeeting.com/register/517778383

Now and Future of Data Publications, a Symposium:

Wednesday May 22nd,  St Anne’s College, Oxford

More information can be found at http://researchdata.jiscinvolve.org/wp/2013/02/19/the-now-and-future-of-data-publishing-a-symposium-22-may-2013-oxford-uk/

Dryad Members’ Meeting:

Friday May 24th, St Anne’s College, Oxford

The schedule for this event can be seen at http://datadryad.org/pages/membershipMeeting

ELPUB 13:

Friday May 14th

Two of us will be presenting remotely from Nottingham to the conference in Sweden

http://www.bth.se/com/elpub2013.nsf/pages/start

OA18:

Wednesday 19th to Friday 21st June,  University of Geneva

At least one team member will be attending and available to talk and answer questions about JoRD

http://indico.cern.ch/conferenceDisplay.py?ovw=True&confId=211600

LIBER 2013:

Wednesday 26th June to Friday 29th June,  Munich, Germany

The programme can be found here http://www.liber2013.de/index.php?id=55

OR13:

Monday 8th to Friday 12th July ,  Charlottetown, Prince Edward Island

There will be a poster and 24/7 presentation

Information about the conference can be found at http://or2013.net/

As you can see, it is a hectic schedule, and the team will be frantically writing presentations for the next few weeks.

A rather long post, but quite a brief summary

Here is a summary of the the project so far.

Sharing the data which is generated by research projects is increasingly being recognised as an academic priority by funders, researchers and publishers.  The issue of the policies on sharing set out by academic journals has been raised by scientific organisations, such as the US National Academy of Sciences, which urges journals to make clear statements of their sharing policies. On the other hand, the publishing community expresses concerns over the intellectual property implications of archiving shared data, whilst broadly supporting the principle of open and accessible research data .

The JoRD Project was a feasibility study on the possible shape of a central service on journal research data policies, funded by the UK JISC under its Managing Data Research Programme. It was carried out by the Centre for Research Communications Research at Nottingham University (UK) with contributions from the Research Information Network and Mark Ware Consulting Ltd. The project used a mix of methods to examine the scope and form of a sustainable, international service that would collate and summarise journal policies on research data for the use of researchers, managers of research data and other stakeholders. The purpose of the service would be to provide a ready reference source of easily accessible, standardised, accurate and clear guidance and information, on the journal policy landscape relating to research data. The specific objectives of the study were:  to identify the current state of journal data sharing policies; to investigate the views and practices of stakeholders; to develop an overall view of stakeholder requirements and possible service specifications; to explore the market base for a JoRD Policy Bank Service; and to investigate and recommend sustainable business models for the development of a JoRD Policy Bank Service

A review of relevant literature showed evidence that scientific institutions are attempting to draw attention to the importance of journal data policies and a sense that the scientific community in general is in favour of the concept of data sharing.  At the same time it seems to be the case that more needs to be done to convince the publishing world of the need for greater consistency in data policy and author guidelines, particularly on vital questions such as when and where authors should deposit data for sharing.

The study of journal policies which currently exist found that a large percentage of journals do not have a policy on data sharing, and that there are great inconsistencies between journal data sharing policies. Whilst some journals offered little guidance to authors, others stipulated specific compliance mechanisms. A valuable distinction is made in some policies between two categories of data: integral, which directly supports the arguments and conclusions of the article, and supplementary, which enhanced the article, but was not essential to its argument. What we considered to be the most significant study on journal policies (Piwowar & Chapman, 2008), defined journal data sharing policies as “strong”, “weak” or “non-existent”. A strong policy mandates the deposit of data as a condition of publication, whereas a weak policy merely requests the deposit of data. The  indication from previous studies that researchers’ data sharing behaviour is similarly inconsistent was confirmed by our online survey. However, there is general assent to the data sharing concept and many researchers who would be prepared to submit data for sharing along with the articles they submit to journals.

We then investigated a substantial sample of journal policies to establish our own picture of the policy landscape. A selection of 400 international and national journals were purposefully chosen to represent the top 200 most cited journals (high impact journals), and the bottom 200 least cited (low impact journals), equally shared between Science and Social Science, based on the Thomson Reuters citation index.  Each policy we identified relating to these journals was broken into different aspects such as: what, when and where to deposit data; accessibility of data; types of data; monitoring data compliance and consequences of non compliance. These were then systematically entered onto a matrix for comparison. Where no policy was found, this was indicated on the matrix. Policies were categorised as either being “weak”, only requesting that data is shared, or “strong”, stipulating that data must be shared.

Approximately half the journals examined had no data sharing policy. Nearly three quarters of the policies we found we assessed as weak and only just under one quarter we deemed to be strong (76%: 24%). The high impact journals were found to have the  strongest policies,  whereas not only did fewer low impact journals include a data sharing policy, those policies were  were less likely to stipulate data sharing, merely suggested that it may be done. The policies generally give little guidance on which stage of the publishing process is data expected to be shared.

Throughout the duration of the project, representatives from publishing and other stakeholders were consulted in different ways. Representatives of publishing were selected from a cross section of different types of publishing house; the researchers we consulted were self selected through open invitations by way of the JoRD Blog. Nine of them attend a focus group and 70 answered an online survey. They were drawn from every academic discipline and ranged over a total of 36 different subject areas. During the later phases of the study, a selection of representatives of stakeholder organisations was asked to explore the potential of the proposed JoRD service and to comment on possible business models. These included publishers, librarians, representatives of data centres or repositories, and other interested individuals. This aspect of the investigation included a workshop session with representatives of leading journal publishers in order to assess the potential for funding a JoRD Policy Bank service. Subsequently an analysis of comparator services and organisations was performed, using interviews and desk research.

Our conclusion from the various aspects of the investigation was that although idea of making scientific data openly accessible for share is widely accepted in the scientific community, the practice confronts serious obstacles. The most immediate of these obstacles is the lack of a consolidated infrastructure for the easy sharing of data. In consequence, researchers quite simply do not know how to share their data. At the present juncture, when policies are either not available, or provide inadequate guidance, researchers acknowledge a need for the kind of information that a policy bank would supply. The market base for a JoRD policy bank service would be the research community, and researchers did indicate they believed such a service would be used.

Four levels of possible business models for a JoRD service were identified and finally these were put to a range of stakeholders. These stakeholders found it hard to identify a clear cut option of service level that would be self sustaining. The funding models of similar services and organisations were also investigated. In consequence, an exploratory two phase implementation of a service is suggested. The first phase would be the development of a database of data sharing policies, engagement with stakeholders, third party API development with the intention to build use to the level at which a second phase, a self sustaining model, would be possible.

Librarians, data and JoRD

So far this blog had commented on what researchers think and what publishers and journals are currently doing. The final part of the stakeholder consultation comprises  interviews that were held with academic librarians which explored their thoughts on open access research data; the role of librarians in working with open data and a JoRD policy bank service. The librarians agreed with views of the other stakeholders that wider access to research data is beneficial. However, they showed a deeper understanding of the infrastructure required to store and access data and considered the problem of selecting which data should be preserved. In their experience, institutional practice is not advancing in line with policies, and, as information specialists, librarians considered that they have the skills necessary to improve the situation.

Librarians anticipated that their expertise could be used for the following roles:

  •   Meta-data management and structure of data
  •   Data licensing
  •   Inclusion of data in institutional repositories
  •   Data management advice and training
  •   Co-ordination with other university support departments, for example, IT, record management and research office.
  •   Enabling compliance

Librarians were also positive about the concept of a JoRD Policy Bank service, but considered that it would be a useful addition to some existing services, for example RoMEO or JISC Collections Knowledge Base+; therefore creating a single point of reference for broad advice on data management and publication. As with the views of other stakeholders, librarians considered that one function of a JoRD service would be to compare journal policies with funders requirements, but also suggested that some co-funded projects would need guidance should the funder’s policies be different. They also suggested that JoRD should rate journal policies on aspects such as usability and access of data.

The shape of a JoRD policy bank service?

We have established that researchers would certainly use a JoRD service, and publishers, repository managers, librarians would all find their own uses for the service. It has already been blogged that an ideal service that contains every item requested by stakeholders would be an expensive and extensive project, so what sort of service could be offered. Four options were devised and market tested on an assortment of stakeholders, academic librarians, publishers, repository managers, researchers, funders and representatives from similar data initiatives. The options were as follows:

  • Basic – an online searchable database of journal data policies, similar in approach to RoMEO
  • Enhanced  – an online searchable database of journal data policies with additional data integration such as funder policies, lists of recommended  repositories, or institutional policies
  • Advisory – as Basic and Enhanced services with the addition of research and advisory services, for example guides and instructions , best practice, model policy,and language, updates
  • Database with Application Programming Interface  (API) – as Basic and Enhanced but with no or minimal web interface but with and API which would allow third-parties to use data and develop applications

Most of the people interviewed thought that the basic option was option they would use.  Here is a table to show that  Possible value propositions. However, it was thought too basic to generate any income and some groups considered that it had limited value on its own. The enhanced service seems to be favoured by publishers, for example the inclusion of funder policies would be more valuable than other publisher’s journal data policies. The Advisory service was the option that most people thought would be the greater value for money, but participants cited other advisory services that could provide the same function as that aspect of JoRD. Finally, the high quality database with API and strong invitation for third party Apps was thought of as being a practical way to create an enhanced service. Unfortunately, none of the options emerged from the consultation as the optimum service which would generate its own income.

So, the shape of a JoRD service is still unknown and the method of funding is still unknown, but what has been achieved is that now there are no unknown, unknowns.

 

What is linked data?

The fact that data comes in all sorts of shapes and sizes has already been blogged about, but what is the concern about adding data into online journals? after all, printed journals have included data in the shape of graphs or tables for a great many years. The problem is now that the journal article and its corresponding data is no longer in the flat two dimensional world of a piece of paper, but is part of the multi-dimensional world of the internet, the data is linked to something else. Linked data, according to Bizer, Heath and Berners-Lee (http;//linkeddatte.org/docs/ijwis-special-issue) is the method by which data is connected, structured and published on the web resulting in a “web of data”. Linked data “refers to data published on the web in such a way that it is machine readable, its meaning is is explicitly defined, it is linked to other external data sets and can in turn be linked to from external data sets”.

Before the data is published and linked, it has to be put somewhere. Most of our research participants said that they store their data in a personal storage system, either their own work or home computer, or on a portable storage device. While, of course, such spaces may be linked to the internet, it is rather like keeping the data in a filing cabinet, although anyone can go and find the data, they have to search very hard or ask the data keeper to give it to them. Data therefore has to be uploaded to a space that is openly accessible, which could be a university repository, a subject repository, a web page, or even onto the publishers own servers.

Again this is not as simple as it seems, first you have to choose your repository and ensure that it will accept your sort of data. Once safely held in a repository, the data must be permanently linked and archived. As digital repositories are relatively new things, there is the question of what if the repository you have chosen has to close? where will the data go? If the data is uploaded onto the publisher’s server, do they have the capacity to hold all the data for all the journals that they publish, as well as all the articles? Suddenly the storage needs of a single article can become top heavy. At the moment there are not very clear answers to these concerns, therefore there needs to be some guidelines and methods of best practice resolved before all data can be truly linked.