Spreadsheets are great for certain purposes, but there are some things you need to keep in mind!
Spreadsheets are great for:
Data entry
Tables for publications
Generate summary statistics
Figures
Spreadsheets can be not so great:
Manual data entry is liable to mistakes (copy pasting wrong cells, accidental deletions)
It is not always machine readable. Software cannot process information that relies on context, notes in the margin, spatial layout or field formatting (bold/italics/colours)
Excel tries to convert everything into a date. Even human genes. (Watch a short talk on ‘Spreadsheet Problems In Genomic Research’ for more fun, or read the paper!)
Set up a spreadsheet
Be consistent
Write dates like YYYYMMDD (consider to split info)
Do not leave empty values (use NA)
It is difficult for empty cells to know if a value is missing, and blanks can be confusing when spaces/tabs are used as delimiters
Whatever null value you use, use it consistently and indicate it in the README
Put as few information possible in a single cell and one observation per row
Create a data dictionary that describes the spreadsheet and any cleaning steps you took
Leave the raw data alone!
Do not use formatting (colours, font, bolding)
Use data validation to avoid errors (OpenRefine)
Avoid empty cells (not clear whether data was not collected or whether nothing was measured - use undefined, NA, 0, or null instead).
See the extended version, based on Broman and Woo 2018 & Carpentries Spreadsheets for Ecologists or Social Scientists).
Tidy Data
Data validation
- Excel support page on data validation
- Check manually whether your data is consistent, complete and correct:
- If a column should contain only numeric values or characters, check that there are no non-numeric values or non-character
- Check for consistency in names, unit of measurements, data type and so on
- Check if there are any empty cells and replace them with your chosen null value (see above)
Accessibility (Data Curation Primer)
Comma- or Tab-Separated Value (CSV/TSV) formats are not only best for preservation, but for accessibility as well. For more information:
Make your Excel documents accessible to people with disabilities (Microsoft Office)
Excel Tips (Accessibility and Usability at Penn State)
Create Accessible Spreadsheets (General Services Administration of the 49 U.S. - focused on Excel)
Tools to work with tabular/spreadsheet data
WhatTheDuck (allows users to upload CSV files, store them in tables, and perform SQL queries on the data)