Data Publication

Realising FAIR: Data Publication

In this module, we will discover the FAIR data principles and their main elements as they relate to data publication:

  • Identify relevant tools to advance towards FAIR data
  • Discuss data publication, best practices and restrictions

Data Repository

One way to make the data of your project Findable and Accessible to a broad audience is to publish it in a data repository. Data repositories help you comply with the FAIR principles and make it easy to apply some of their main elements, such as licences and persistent identifiers. You share data for the long-term in a data repository: generally these are finalised datasets or scripts.

The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

Selecting what data and code to preserve

Preserve data and code that are:

  • Needed to verify findings you have published in a paper (see also journal/university requirements)

  • Important (taking into account the quality, originality, size, scale and innovative nature of the research)

  • Unique (based on non-repeatable observations)

  • You may need to way up the costs between collecting the data again versus making the data FAIR and storing it

Search for a disciplinary specific data repository

If you would like to look for a disciplinary specific data repository, you can use:

Data repositories will facilitate sustainable access to your data. For more information, watch this short video on ‘Sustainability of data use’ (from the Open Data Governance and Use - released under the CC-BY-NC-SA 4.0 license)

Data/Software Articles

You can also share your data and code via a data or software article.

Software Sharing

Just sharing software via platforms like GitHub is not sufficient to adhere to the FAIR principles. GitHub/GitLab does not assign a persistent identifier/DOI to the software, and GitHub/GitLab does not have a long term preservation policy like data repositories do. You can share your software on GitHub/GitLab and share a snapshot/version of the code in a data repository as well.

Checkout papers with code to see how other researchers have shared their code!

Why not the supplemental materials?

  • Data control: supplementary materials are managed by the publisher and cannot be updated, unlike most data repositories that use versioning
  • Interoperability: supplementary materials do not always allow the original format, which makes it difficult to integrate and interact with the data
  • Availability: It is difficult to access if the article is behind the paywall (supplemental materials are not included in the DOI and therefore the links can also break!)
  • Impact: Data should be a primary research output!
  • Publisher requirements: Some publishers recommend using a data repository instead
  • Not FAIR: Data/Software available in supplemental materials is not considered to be FAIR (Findable, Accessible, Interoperable, Resuable)

See also: The Push to Replace Journal Supplements with Repositories

References