Research data without proper documentation becomes a barrier to reproducibility and collaboration. This tutorial teaches you to document, summarize, and validate your research data using R, focusing on practical skills that make your work more transparent and reusable.
Tutorial website: https://lmu-osc.github.io/data-dictionary-R/
By the end of this tutorial, you will be able to:
- Create data dictionaries that clearly describe your variables and datasets (both manually and automatically)
- Use summary statistics to identify data quality issues and understand your data's characteristics
- Implement automated validation workflows to catch errors systematically
- Generate professional reports that combine documentation, validation results, and code
This tutorial assumes you have completed (or are familiar with) the following LMU OSC tutorials:
This repository is structured as a Quarto website deployed to GitHub Pages with automated deployments via GitHub Actions.
Contributions are welcome! Please feel free to submit a pull request for edits, errors, and other small corrections, or open an issue for larger changes or suggestions.
The overall project is licensed under the CC BY-SA 4.0 license found at LICENSE; all code snippets are additionally licensed under the CC0 1.0 Universal license found at LICENSE-CODE.
Why two licenses? The CC BY-SA 4.0 license is for the website content, while the CC0 1.0 Universal license is for code and configuration files. This is a common practice for websites that include code snippets and other content that may be reused in other projects, particularly because the CC BY-SA 4.0 license is not intended to be used with software.
If you use this tutorial, please cite it as indicated in the CITATION.cff file.