9  Realising FAIR: Data organisation

Estimated time: 45 minutes

In this module, we will focus on the realisation of FAIR data. We will explore some best practices and tools that can facilitate the implementation of the FAIR principles. At the end of this module you should be able to:

Activities
  • Watch the videos about ‘Data organisation’ and complete the quiz
  • Try out Cookiecutter, share your folder structure on GitHub discussion #127 Folder Structure, and comment on another person’s folder structure

9.1 Data Organisation

Good data documentation starts with good organisation and file naming of the data of your project! Good data organisation on your computer or your Project Data (U:) drive will allow you to make the data Findable both for yourself and for your collaborators who have access to the data. Making the data findable also indirectly improves the re-usability of the data. You need to be able to find the data before you can re-use it! Implementing a good folder structure, data organisation and a meaningful file naming convention are first steps towards making data FAIR.

Title: anatomy of a file name. A file name that shows 'New_Final_final_new-23_fuckingfinal. jpeg'. For each component a box with explanation is added. For the first final the description 'the finished file' is added. For the second final the box says 'the finished file wihtout spelling mistakes. For New-final-final the box says 'the finished file after the client made a last minute complete change of direction. For the added new the box says 'the finished file after everyone in the pointless meeting managed to add in their own change. The 23 box says 'the finished file after was presumably 23 emails'. For fucking final the box says 'the finished version at the point that you stopped caring'

Image by Chaz Hutton

In the next video, we will go through some best practices to help you organise the data of your project efficiently. These best practices are using a good folder structure, tagging files (if possible) and an appropriate file naming convention to enhance the findability of the data in your directories.

Image by Allison Horst

9.1.1 Presentation Resources

The Turing Way illustration by Scriberia. CC-BY 4.0. DOI: 10.5281/zenodo.3332807

9.1.2 Quiz Data Organisation

Test your knowledge after watching the ‘Data organisation’ video. Think about whether the following statements are True or False (see answers below).

  1. Good data organisation makes the data Findable for yourself and your collaborators.
  2. Thinking about data organisation at the beginning of your project can save you a lot of time later on.
  3. A Research Compendium does not follow a Hierarchical structure.
  4. In a Research Compendium data, methods, and outputs should be clearly separated.
  5. Consistency is not relevant in data organisation.

Tweet by @varsha_khodiyar

9.2 Assignment: Using Cookiecutter

Estimated time: 20 minutes

You can also manually set up a folder structure (go straight to step 4 below)! Cookiecutter is especially helpful to people who program/Python-users or if you have to set up a lot of folder structures for smaller projects (for example, with students).

Note that you need to have Python and pip (or Anaconda) installed to easily follow the video. To install Anaconda:

  • Windows (instruction video)
    • Open https://www.anaconda.com/products/individual#download-section with your web browser.
    • Download the Anaconda for Windows installer with Python 3. (If you are not sure which version to choose, you probably want the 64-bit Graphical Installer Anaconda3-…-Windows-x86_64.exe)
    • Install Python 3 by running the Anaconda Installer, using all of the defaults for installation except make sure to check Add Anaconda to my PATH environment variable.
  • Linux
    • Open https://www.anaconda.com/products/individual#download-section with your web browser.
    • Download the Anaconda Installer with Python 3 for Linux.
    • Open a terminal window and navigate to the directory where the executable is downloaded (e.g., cd ~/Downloads).
    • Type 'bash Anaconda3- and then press Tab to autocomplete the full file name. The name of file you just downloaded should appear.
    • Press Enter (or Return depending on your keyboard). You will follow the text-only prompts. To move through the text, press Spacebar.
    • Type Yes and press Enter to approve the license.
    • Press Enter to approve the default location for the files.
    • Type Yes and press Enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).
    • Close the terminal window.
  • Mac (instruction video)
    • Open https://www.anaconda.com/products/individual#download-section with your web browser.
    • Download the Anaconda Installer with Python 3 for macOS (you can either use the Graphical or the Command Line Installer).
    • Install Python 3 by running the Anaconda Installer using all of the defaults for installation.
  1. Watch the video on setting up a research compendium using Cookiecutter (3:36 min).

  2. See the instructions on how to clone the Cookiecutter template

    1. Information about hidden folders on Windows

    2. You can also find a visualisation of the template on the GitHub repository

  3. Work on your own to populate the template with information/files from your project.

  4. Share your folder structure on GitHub discussion #127 Folder Structure and comment on another person’s folder structure

  5. Discuss with your peers how similar your own folder structure is to the one you created using Cookiecutter. What additional elements are there in the Cookiecutter project? Is there anything that the Cookiecutter template is missing?

If you didn’t find this template useful, you can browse these templates to see if one of these fits your needs better:

  1. Data Science
  2. Python template
  3. Reproducible Science
  4. More templates
Problems with Cookiecutter?

You can also download the template from GitHub. Note that if you’d like to reuse this template a lot, it will be easier to do it as described above.

  1. Go to “https://github.com/bvreede/good-enough-project

  2. Go to Code -> Download ZIP

  3. Extract all files in the folder where you’d like to save it, and rename the folder {{cookiecutter.project_slug}} to the name of your project

Sharing your folder structure

You can use the tree command to generate a list of your folder structure and its files.

Image by ErrantScience

9.3 Quiz Answers

  1. True / 2. True / 3. False / 4. True / 5. False

9.4 References

Community, The Turing Way. 2022. The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research. Zenodo. https://doi.org/10.5281/ZENODO.3233853.