11  Realising FAIR: Data Publication

Estimated time: 15 minutes

In this module, we will focus on the realisation of FAIR data. We will explore some best practices and tools that can facilitate the implementation of the FAIR principles. At the end of this module you should be able to:

11.1 Data access

It is important to reflect about data access at the beginning of your project to ensure that the right people have access to the right data.

One way to make the data of your project Findable and Accessible to a broad audience is to publish it in a data repository. Data repositories help you comply with the FAIR principles and make it easy to apply some of their main elements, such as licences and persistent identifiers.

The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

In the next video we will delve into data access during and after a research project. You will also be introduced to the topic of data publication and data repositories.

TU Delft researchers can always use the 4TU.ResearchData repository (free up to 1 TB per researcher per year)

If you would like to look for a disciplinary specific data repository, you can use:

Data repositories will facilitate sustainable access to your data. For more information, watch this short video on ‘Sustainability of data use’ (from the Open Data Governance and Use - released under the CC-BY-NC-SA 4.0 license)

Image by Errant Science

11.2 Data/Software Articles

You can also share your data and code via a data or software article.

11.3 Software Sharing

Only read this section if you’re working with code in your project, or if Research Software is one of the main outputs of your projects!

Before you get started with writing your code, you might want to look for code and software that others have already written and shared. This could be work by a predecessor in your team, a code snippet you found online, or software that is published and archived. When reusing research software, there are some relevant factors to consider, such as, where to find and how to evaluate the reusability of the available software? Maurits walks you through those considerations in the following video:

Finding and reusing code (4 minutes)

Once you get started writing your code or developing your software, other best practices and management tools will be needed. If you have collected code snippets from different sources, debugging the code might be challenging, and as the development process goes along, you might want to compare current versions to earlier versions.

The same as with research data, you should always think ‘who might be reusing your code?’ At the very least, it is your past, present and future selves. Therefore, implementing best practices to make your software FAIR and reproducible since the beginning of the project is very relevant! In the next video, Maurits will provide an overview of good practices for developing reproducible and reusable code:

Developing research software (7 minutes)

Once you are ready to publish your research work, the data and the code also needs to be published as you have already learned in the ‘Data Access and Data Publication’ video. If you need some extra encouragement to publish your research software or piece of code, in the following video Maurits will tell you why you should publish your code as open source and how to do so following the FAIR principles for research software:

Publishing Research Software (5.28 minutes)

Helpful resources:

11.3.1 FAIR software

Just sharing software via platforms like GitHub is not sufficient to adhere to the FAIR principles. GitHub/GitLab does not assign a persistent identifier/DOI to the software, and GitHub/GitLab does not have a long term preservation policy like data repositories do. You can share your software on GitHub/GitLab and share a snapshot/version of the code in a data repository as well.

Checkout papers with code to see how other researchers have shared their code!

11.4 Why not the supplemental materials?

  • Check this slide on some downsides of sharing your data/code in supplemental materials

11.5 Assessing FAIR datasets (optional)

Choose one of the datasets below and assess whether they are following the FAIR principles:

  • Findable
    • From the title to the dataset was it an easy way?
    • Does the dataset have a DOI to make it more Findable?
    • Does it have enough metadata to easily find it?
  • Accessible
    • Is it clear for you if the data is accessible for re-use?
    • Are you able to download the data for your own use?
  • Interoperable
    • Is the dataset in a file format that you can easily re-use?
  • Re-usable
    • If you come across this dataset, how likely is it that you can easily use it?
    • Does the dataset have enough documentation for you to understand the context of the data?
    • Does it have a clear usage licence? Which one?

Data set 1: “A large-scale dataset for modeling a highly renewable European electricity system” DOI: https://doi.org/10.5281/zenodo.999150

Data set 2: “Qualitative coding of 12 semi-structure interviews on food behaviour context and food reporting engagement” DOI: https://doi.org/10.4121/uuid:02b93c7c-545d-4501-b375-6db1aff039c6

Data set 3: “Highly structured 3D pyrolytic carbon electrodes derived from additive manufacturing technology” https://doi.org/10.11583/DTU.13549487.v1

Data set 4: “HANZE netCDF data on land use, population, GDP and wealth in Europe, 1870-2020” DOI: https://doi.org/10.4121/uuid:e027194f-a804-4f81-973e-8f7be9c25a99

Data set 5: “Kimberlina Groundwater Quality Simulations” DOI: https://doi.org/10.18141/1603336

Data set 7: Adaptive laboratory evolution reveals general and specific chemical tolerance mechanisms and enhances biochemical production https://github.com/biosustain/chemical-tolerance-supplementary

Dataset 11: The genetic architecture of biofilm formation in a clinical isolate of Saccharomyces cerevisiae DOI: https://doi.org/10.5061/dryad.mn71g

Dataset 12: CoralModel: A Python-based model that resembles the biophysical interactions on a coral reef DOI: https://doi.org/10.4121/19164869.v1

Dataset 14: Designing Quantum Networks Using Preexisting Infrastructure DOI: https://doi.org/10.4121/14914806.v1

Dataset 15: Modelling dynamics in protein crystal structures by ensemble refinement DOI: https://doi.org/10.5061/dryad.5n01h

Dataset 17: QDataSet, quantum datasets for machine learning - Publication DOI: https://doi.org/10.1038/s41597-022-01639-1 Link to dataset: https://cloudstor.aarnet.edu.au/plus/s/rxYKXBS7Tq0kB8o

11.6 References:

Finding and reusing code. Video recording from TU Delft MOOC Open Science: Sharing Your Research with the World. Presenter: Dr. Maurits Kok. Credits: TU Delft Extension School, TU Delft New Media Center, TU Delft Digital Competence Center. Licence: CC-BY-NC-SA

Developing research software. Video recording from TU Delft MOOC Open Science: Sharing Your Research with the World. Presenter: Dr. Maurits Kok. Credits: TU Delft Extension School, TU Delft New Media Center, TU Delft Digital Competence Center. Licence: CC-BY-NC-SA

Publishing Research Software. Video recording from TU Delft MOOC Open Science: Sharing Your Research with the World. Presenter: Dr. Maurits Kok. Credits: TU Delft Extension School, TU Delft New Media Center, TU Delft Digital Competence Center. Licence: CC-BY-NC-SA

Colavizza, Giovanni, Iain Hrynaszkiewicz, Isla Staden, Kirstie Whitaker, and Barbara McGillivray. 2020. “The Citation Advantage of Linking Publications to Research Data.” Edited by Jelte M. Wicherts. PLOS ONE 15 (4): e0230416. https://doi.org/10.1371/journal.pone.0230416.