Generate mock testing data from recodeflow metadata (variables.csv and variable-details.csv).
MockData creates realistic mock data for testing harmonisation workflows across recodeflow projects (CHMS, CCHS, etc.). It reads variable specifications from metadata files and generates appropriate categorical and continuous variables with correct value ranges, tagged NAs, and reproducible seeds.
- Metadata-driven: Uses existing
variables.csvandvariable-details.csv- no duplicate specifications needed - Recodeflow-standard: Supports all recodeflow notation formats (database-prefixed, bracket, mixed)
- Metadata validation: Tools to check metadata quality
- Universal: Works across CHMS, CCHS, and future recodeflow projects
- Test availability: 224 tests covering parsers, helpers, and generators
# Install from local directory
devtools::install_local("~/github/mock-data")
# Or install from GitHub (once published)
# devtools::install_github("your-org/MockData")Note: Package vignettes are in Quarto format (.qmd). To build vignettes locally, you need Quarto installed. For team use, this is our standard going forward.
library(MockData)
# Load metadata (CHMS example with sample data)
variables <- read.csv(
system.file("extdata/chms/chmsflow_sample_variables.csv", package = "MockData"),
stringsAsFactors = FALSE
)
variable_details <- read.csv(
system.file("extdata/chms/chmsflow_sample_variable_details.csv", package = "MockData"),
stringsAsFactors = FALSE
)
# Get variables for a specific cycle
cycle1_vars <- get_cycle_variables("cycle1", variables, variable_details)
# Get unique raw variables to generate
raw_vars <- get_raw_variables("cycle1", variables, variable_details)
# Create empty data frame
df_mock <- data.frame(id = 1:1000)
# Generate a categorical variable
result <- create_cat_var("alc_11", "cycle1", variable_details, variables,
length = 1000, df_mock = df_mock, seed = 123)
if (!is.null(result)) {
df_mock <- cbind(df_mock, result)
}
# Generate a continuous variable
result <- create_con_var("alcdwky", "cycle1", variable_details, variables,
length = 1000, df_mock = df_mock, seed = 123)
if (!is.null(result)) {
df_mock <- cbind(df_mock, result)
}Located in mockdata-tools/:
# Validate metadata quality
Rscript mockdata-tools/validate-metadata.R
# Test all cycles
Rscript mockdata-tools/test-all-cycles.R
# Compare different approaches
Rscript mockdata-tools/create-comparison.RSee mockdata-tools/README.md for detailed documentation.
-
Parsers (
R/mockdata-parsers.R):parse_variable_start(): Extracts raw variable names from variableStartparse_range_notation(): Handles range syntax like[7,9],[18.5,25),else
-
Helpers (
R/mockdata-helpers.R):get_cycle_variables(): Filters metadata by cycleget_raw_variables(): Returns unique raw variables with harmonisation groupingsget_variable_details_for_raw(): Retrieves category specificationsget_variable_categories(): Extracts valid category codes
-
Generators:
create_cat_var()(R/create_cat_var.R): Generates categorical variables with tagged NA supportcreate_con_var()(R/create_con_var.R): Generates continuous variables with realistic distributions
# Run all tests
devtools::test()
# Run specific test file
testthat::test_file("tests/testthat/test-mockdata.R")This package is part of the recodeflow ecosystem. See CONTRIBUTING.md for details.
MIT License - see LICENSE file for details.