RDM Essentials

Essentials for Research Data

At the end of this module you should be able to:

  • Identify different types of research data
  • Recognise what is considered confidential data in research
  • Realise what RDM entails within a research project
  • Store and back up the research data of your project in a secure manner

Research Data definition

Research data is any information that has been collected, observed, generated or created to validate research findings. Depending on the discipline you work in, research data can be collected or produced in different ways. You can capture them in real-time (sensors, images), you can collect them using laboratory instruments and they can derive from interviews or numerical simulations, among others. Research Data can be digital such as tabular data, videos, algorithms, scripts, transcripts, and codebooks. They can also be non-digital, for example, laboratory samples, sketchbooks, and prototypes.

Research Data in ecology

Tip

Ecological systems are dynamic and heterogeneous, interacting with numerous factors that sculpt natural history and that investigators cannot completely control. Observations may be highly dependent on spatial and temporal context, making them very difficult to reproduce, but computational reproducibility can still be achieved. (Powers and Hampton 2018)

User-friendly tools such as Kepler or VisTrails help document provenance, and Morpho provides an easy way to create standardized, machine-readable metadata using the Ecological Metadata Language (EML).

(Tulloch et al. 2018): A decision tree for assessing the risks and benefits of publishing biodiversity data

(Culina et al. 2020): Low availability of code in ecology: A call for urgent action

Confidential data

There are multiple types of confidential data that you might be working with during your research project. Some examples include:

  • Personal data (information about an identified or identifiable natural person)
  • National security data (such as nuclear research)
  • Data falling under export control regulations
  • Confidential data received from commercial, or other external partners
  • Data related to competitive advantage (for example, patent, IP)
  • Data which could lead to reputation/brand damage (climate change, personal information, animal research)
  • Politically-sensitive data (such as research commissioned by public authorities, research on societal issues)

When working with confidential data, you need additional security measures for your data to make sure that they are not accidentally released. You may need to sign a Non Disclosure Agreement (NDA), while also ensuring that you can publish the results after review. If you work with confidential data, it may not be possible to publicly share all your research results! Plan for this in advance so that you can share what is possible, or work with restricted access to still adhere to the FAIR principles.

Personal Data

Only read these materials if you work with personal data (data that can identify a person):

A visualisation on how sensitive data requires an additional track or process to be able to share (parts of it). Tools that can help to make research based on senstive data reproducible are: encryption, consent, deidentification, sytnethic data and data safe havens.

The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

Relevant RDM steps within a research project

In the following presentation you can go through a simplified cycle which can represent your project. Have a look at the RDM questions you might ask at each step of research.

02-1_ Module-2_handout_RDM_steps_Interactive_image

References

Culina, Antica, Ilona van den Berg, Simon Evans, and Alfredo Sánchez-Tójar. 2020. “Low Availability of Code in Ecology: A Call for Urgent Action.” PLOS Biology 18 (7): e3000763. https://doi.org/10.1371/journal.pbio.3000763.
Powers, Stephen M., and Stephanie E. Hampton. 2018. “Open Science, Reproducibility, and Transparency in Ecology.” Ecological Applications 29 (1). https://doi.org/10.1002/eap.1822.
Tulloch, Ayesha I. T., Nancy Auerbach, Stephanie Avery-Gomm, Elisa Bayraktarov, Nathalie Butt, Chris R. Dickman, Glenn Ehmke, et al. 2018. “A Decision Tree for Assessing the Risks and Benefits of Publishing Biodiversity Data.” Nature Ecology & Evolution 2 (8): 1209–17. https://doi.org/10.1038/s41559-018-0608-1.