Seasons Greetings

The JoRD team have been beavering away this week writing up the report for the feasibility study, but I am afraid that you have to wait until after the holidays to find out the outcomes. No posts for a few weeks while the office is closed and we are all refreshing our brains and enjoying our many Christmas pursuits.

 

So, on behalf of the JoRD team I wish you all a Happy Christmas and Prosperous New Year!

Data comes in all sorts of shapes and sizes

The JoRD project has not set out to define the term “data” (or the singular form of the word, “datum”). This was a fortunate choice, because one of the messages that has clearly come across from all the participants of our study is that data can take many forms. The recent Royal Society Report, “Science as an Open Enterprise”, (http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE.pdf) includes a glossary of data terms which illustrates the ways in which the term “data” can be used. For example:

  • big data – data that requires massive computing power to process
  • broad data – structured big data
  • data set – a collection of  information held in electronic form
  • linked data – data that has been allocated a unique identifying number to be able to access it from an electronic storage facility

… and those are just a few terms that it explains. The word “Data” is defined as “Qualitative  or  quantitative statement or numbers that are (or assumed to be ) factual”. The researchers that were part of this study considered that their data took more forms that just statements or numbers.

Researchers described the data that their research generated as:  software, video footage, geodata, geological maps, ontologies, web services and data models , as can be seen in the table below. The multitude of forms therefore makes it difficult for publishers to include in their on-line published articles. The publishers said that linked data in a journal article should be  “fit for use” and “replicable” and consider that data in many different formats is “Messy” and currently is not supplied with sufficient meta-data. Another consideration is the resulting file size of an article if the publisher saves the embedded data on their own servers. Data repositories and data centres are the more practical method of data storage with published articles incorporating linked data.

Therefore that is one reason for Journals to have a data policy, and a good argument for those policies to be collected and made accessible in a centralised resource, a JoRD Policy Bank  Service.

Researchers description of data Qualitative(documents and text) Quantitative(figures) Visual data (images) Virtual data (software or protocols)
Collection of examiner reports and questions supervisory reports, letters and other documentary evidence.
Dataset of measurements and statistical analyses
Digitised Textual Sources
Excavation, field observation, environmental monitoring, software to collate mine and analyse
Excel sheets
Focus Group, Interview Transcripts, some footage of people using computers, digital photographs
Geodata
Geologic maps, chemical and isotopic analyses of Earth Materials, GIS datasets
Interview transcripts
Ontologies
Reports
Visualization
Web Services, Data Models and Specifications

Other data intiatives that are out in the world

To find out whether there are any other projects, products or services already performing the same function as JoRD, a quick survey was done to find out what other data initiatives there are, and what services they offer. So far 29 have been identified, although there may well be others. Most of them are known to be current ongoing initiatives, but some of them seem to have started, but have not been updated for a while. They are mainly funded by Universities from around the world and at least four demonstrate successful collaboration between Universities internationally. Many UK initiatives are JISC funded. Three are funded by Governments, one being an international initiative. Eleven are subject specific.

Only five of the initiatives indicate that they can advise researchers about data policies and guidelines, and four deal with best practice. Nine are concerned with linked data. Fortunately, none of them appear to be supplying the type of service that JoRD would deliver. Here are some details of the most interesting projects.

  • DaMaRo ( http://damaro.oucs.ox.ac.uk/)  is an initiative between JISC and Oxford University to create the University’s data management policy and build the  infrastructure to be able to comply with the policy. It is associated with DataFlow (http://www.dataflow.ox.ac.uk/) and DataBank (http://www.dataflow.ox.ac.uk/index.php/about/about-databank.) which are being developed by Oxford University and the Bodleian Library to provide an open source developed infrastructure that will aid the storage of create DOIs for large data sets.
  • DRYAD (http://datadryad.org/) is a digital repository that is supported by many international scientific societies. It has been created by open source  development and facilitates data storage and retrieval, provides advice on best practice, links data and attributes DOIs.
  • Global Biodiversity Information Facility or GBIF (http://www.gbif.org/) provides infrastructure and links to biodiversity data
  • SPQR (http://spqr.cerch.kcl.ac.uk/) provides links and meta-data searches to ancient documents
  • KAPTUR (http://www.vads.ac.uk/kaptur/) is a new project run by a consortium of art universities to capture, preserve and produce best practice of data management unusual data formats, such as sketchbooks or textile designs.

A more explanatory table can be found here.

Chart of Data initiatives

If you have any further information about the initiatives in the table, or you know about other, then please respond by comment.