10 Realising FAIR: Data Documentation
Estimated time: 60 minutes
In this module, we will focus on the realisation of FAIR data. We will explore some best practices and tools that can facilitate the implementation of the FAIR principles. At the end of this module you should be able to:
- Recognise the relevance of documentation and be familiar with diverse tools to document data
- Identify relevant tools to advance towards FAIR data
- Watch the presentation about Data Documentation tools
- Check out the relevant optional assignments and tools
10.1 Documentation tools
Now that you know more about how to organise your data and name your files, we can talk about documentation. Documentation is vital to making your work understandable, which is, in turn, necessary for your work to be reusable. Documentation is important not only for data, but for your projects in general, including the code you write. Your documentation should provide context for your project and its data and should, for example, provide information about the data collection, structure, and ownership.
In the next video we will talk about different methods and tools you can use to document projects, datasets, and code. We will also briefly talk about metadata.
- 04-2_Module-4_Video_Presentation_DataDocumentation (12 minutes, audio will play automatically in full presentation mode)
- Read more about README files
10.2 Assignment: Create a ReadMe file (optional)
Estimated time: 15 minutes
- Download the Cornell template for a ReadMe file.
- Adapt and start filling in the template to describe a dataset of your project. You are allowed to delete and add new fields. Not all of the questions in the template will be relevant to all of the data you work with.
- If you are unsure about the meaning of a component of the ReadMe file, go to: Guide to writing “ReadMe” style metadata - Cornell University
- For example, an ORCID is a persistent/unique identifier for you as a Researcher. You can have a look at Esther’s ORCID to see what this profile looks like when you have several research outputs. You can set up your own ORCID if you do not have one yet.
10.3 Assignment: Data Dictionaries (for tabular data - optional)
Estimated time: 15 minutes
- Watch a video on data dictionaries
- Go to a tabular dataset you are working on or have worked on.
- In an excel spreadsheet, start creating your data dictionary. Save the data dictionary in the same folder as the dataset you’re describing. List all the variables that you find in the dataset on column A. Then, add a clear and understandable description of that variable on column B.
Note that you can also create a package for your data with Frictionless Data
10.4 Assignment: Electronic Lab Notebooks (optional)
Estimated time: 20 minutes
TU Delft offers two different Electronic Lab Notebooks, eLABjournal and RSpace. Watch the short videos introducing them:
- eLABjournal intro (4:11 minutes)
- RSpace intro (1:43 minutes)
Try out one of the Electronic Lab Notebooks:
To access eLABJournal
- Go to https://tudelft.elabjournal.com/
- Select ‘log in with your institute account’ (below the login form)
- Login with your netID
Note: new ELN groups must be reported to support[at]elabnext.com to get full access to eLABJournal.
To access RSpace:
- Go to https://rspace.tudelft.nl
- On the first time you log in to this tool, following authentication, you will be directed to the RSpace signup page, pre-populated with some information released from your netID, where you need to confirm that you want an account. (On subsequent logins, you will go straight to RSpace after authentication using your netID.)
Answer the following questions to see if the Electronic Lab Notebooks are helpful to your work:
- Was there any specific feature in one or both of the ELNs that would be particularly useful?
- What are the pros and cons that you can already think of when using an ELN for documenting your lab work?
10.4.1 Other Electronic Lab Note tools
If RSpace or eLabjournal are not ideal for your workflow, you can also try out some of the following tools that have been brought up by previous course participants:
10.5 Paper Lab Notebooks
In case you prefer to stick to paper, here are some organisation tips:
- State the name of the researcher and period of use on the cover of the notebook
- Number all pages consecutively
- All entries should be written with permanent ink (no pencils)
- Standard language should be English
- Each record requires a date
- Make sure that it is possible to separate the different experiments recorded (use meaningful titles)
- Add a note where the raw data linked to the experiment will be stored and the name of the corresponding data file
- Do not keep loose pages in your notebook. Fix to the notebook any relevant piece of paper and/or picture in a proper way
Make sure you scan your paper notebook so that you have a back up copy!
10.6 Software/code documentation
- See the example used in the presentation using JupyterLab
- Template to set up a GitHub (or GitLab) repository with all the relevant documentation about the project
- Code Refinery materials on Code Documentation
- Use the CodeMeta creator to create a .json metadata file to add to your repo
- Barbara Vreede’s presentation on Best Practices for Writing Reproducible Code
- Jupyter Notebooks
- Aim For Understandability If You Want To Write Good Research Software
- For Julia you can use DrWatson to create consistent folder structures. Particularly useful for simulation data.