Documentation

Realising FAIR: Data Documentation

In this module, we will discover the FAIR data principles and their main elements as they relate to documentation:

  • Recognise the relevance of documentation and be familiar with diverse tools to document data

Documentation tools

Documentation is vital to making your work understandable, which is, in turn, necessary for your work to be reusable. Documentation is important not only for data, but for your projects in general, including the code you write. Your documentation should provide context for your project and its data and should, for example, provide information about the data collection, structure, and ownership.

Documentation can be a description of the process (how the data was collected), a description of the data (metadata or data about data), or data provenance (data ownership/history).

README files

Write README files in an open format such as .txt or .md (Markdown)

Make it clear what the README file is documenting: if you’re using it to document a project, place the README file in the root folder, or if it’s documenting a file, add the name of that file to the title of the README file. It’s also a good idea to add this information in the text of the README file.

Structure it with defined sections

  • General information

  • Methodological information

  • Sharing and access information

Tip: take some time to create a template that you can re-use with multiple projects, datasets or files!

Data Dictonaries / Code Books (tabular data)

Codebooks are used to explain variables in tabular data.

Should include

  • Variable name

  • Variable explanation

  • Level explanation, esp. for categorical and ordinal

  • Measurement unit

  • Allowed values

  • How missing information is coded

If possible, use standard names for variables!

Git

  • Git is software to keep track of different versions of your files: a mini file-system taking ‘snapshots’ of your files.

  • You add comments to keep track of the changes.

  • If you ever break your file, you can go back to one of the old snapshots of the file – one that worked!

  • Although git is mostly used to version control scripts (code), it can be used effectively with all plain text files.

  • Git isn’t great for (big) binary files: images, videos, word files.

GitHub / GitLab

  • Git is great for keeping track of locally stored files, but these can be hard for others to access!

  • GitHub and GitLab are web-based tools for working collaboratively on your version-controlled files.

  • Both GitHub and GitLab provide free options:

    • GitHub offers free private repositories to academics.

    • GitLab offers private repositories where you can control who has access by default

Template to set up a GitHub (or GitLab) repository with all the relevant documentation about the project

Tools to add metadata

(Electronic) Lab Notebooks

Some labs make use of Electronic Lab Notebooks to manage data. Ideally everyone is using the same solution.

Labfolder is available to Max Planck researchers.

The Turing Way presents an overview of some of resources that are available to help you make a choice, or you can use the ELN finder.

There are also electronic notes such as:

Paper Lab Notebooks

In case you prefer to stick to paper, here are some organisation tips:

  • State the name of the researcher and period of use on the cover of the notebook
  • Number all pages consecutively
  • All entries should be written with permanent ink (no pencils)
  • Standard language should be English
  • Each record requires a date
  • Make sure that it is possible to separate the different experiments recorded (use meaningful titles)
  • Add a note where the raw data linked to the experiment will be stored and the name of the corresponding data file
  • Do not keep loose pages in your notebook. Fix to the notebook any relevant piece of paper and/or picture in a proper way

Make sure you scan/digitise your paper notebook so that you have a back up copy!

Software/code documentation

References