Codebooks

Research Data Management
Author

Esther Plomp

Published

November 21, 2023

This post contains some pointers on how to set up a codebook that accomanies the dataset, which will allow reuse of the data. This summary is based on Horstmann et al. 2020.

See also the spreadsheet post for tips on how to manage spreadsheet files/data.

Purpose of a Codebook/Data dictionary

The codebook ensures that the dataset is interpretable and reusable in the future. Can another researcher analyse/interpret the data without any further information apart from what is provided in the codebook?

How?

Useful codebooks are human and machine readable (information provided can be accessed using automated approaches). They should be consistent: each variable is referenced in the codebook, the same structure is used for all variables, and there are no empty/undefined cells.

The codebook should at least list all variables that are included in the data, and may include additional variables. It should refer to all variables in the dataset/article. You can explain the numerical values, range, units of measurement, sources, classification schemes used or labels of questions asked.

When using a spreadsheet as a codebook, the same formatting principles apply as to regular spreadsheet data.

Examples

More information