-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
110 lines (81 loc) · 4.48 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# panelcleaner
<!-- badges: start -->
[](https://www.tidyverse.org/lifecycle/#experimental)
[](https://github.com/nyuglobalties/panelcleaner/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
Sometimes during data collection, survey structures may change over time due to a litany of potential issues. In order to combine these datasets together into a long format, panelcleaner attempts to identify as many issues as possible prior to binding datasets together by thoroughly documenting the state of each variable for each wave. Moreover, with the assistance of [`rcoder`](https://github.com/Global-TIES-for-Children/rcoder), categorical data can be easily recoded into a single, homogenized coding while not losing the labels or any associated metadata.
## Usage
```{r, include=FALSE}
library(panelcleaner)
library(magrittr)
set.seed(9381)
```
panelcleaner is built around a _panel mapping file_, which is a spreadsheet like this:
```{r echo=FALSE, warning=FALSE}
panel_map_csv <- tibble::tribble(
~ name_1, ~ description_1, ~ coding_1, ~ name_2, ~ description_2, ~ coding_2, ~ panel, ~ homogenized_name, ~ homogenized_coding,
"id", "Participant ID", "", "id", "Partcipant ID", "", "test_panel", "id", "",
"question1", "The first question", "", "q1", "The first question", "", "test_panel", "question_1", "",
"question2", "The second question", "coding(code('Yes', 1), code('No', 0))", "q2", "The second question", "coding(code('Yes', 'YES'), code('No', 'NO'))", "test_panel", "question_2", "coding(code('Yes', 1), code('No', 0))"
)
kableExtra::kable(panel_map_csv)
```
The "name", "description", and "coding" columns refer to how the data was structured in waves 1 & 2. The "code" columns are specific to categorical variables, and the `coding()` objects (written out as strings to be interpreted later) represent the categorical level structure. See [rcoder](https://github.com/Global-TIES-for-Children/rcoder) for more information.
Once the mapping is defined, the homogenization process can begin. The first step in the process is to gather waves of data into a _panel_ object:
```{r, warning=FALSE}
wave1 <- tibble::tibble(
id = LETTERS[1:4],
question1 = sample(1:100, 4, replace = TRUE),
question2 = sample(0:1, 4, replace = TRUE)
)
wave2 <- tibble::tibble(
id = LETTERS[1:4],
q1 = sample(1:100, 4, replace = TRUE),
q2 = sample(c("NO", "YES"), 4, replace = TRUE)
)
test_panel <- enpanel("test_panel", wave1, wave2)
test_panel
```
The panel name is "test_panel". Since multiple panels can be defined in the same mapping file, a panel name must be defined to distinguish which mapping belongs to which panel. Moreover, the panel is an _unhomogenized_ panel, which means that waves cannot yet be bound together to form a single, long-format dataset.
The next step is to attach a panel mapping to the created panel:
```{r, warning=FALSE}
panel_map <-
panel_map_csv %>%
panel_mapping(waves = c(1, 2))
test_panel <-
test_panel %>%
add_mapping(panel_map)
test_panel
```
`panel_mapping()` needs information about the specific waves mapped out, so we pass the loaded CSV `data.frame` through `panel_mapping()` to check that the CSV adheres to the panel mapping spec. Then we add the mapping to the panel so that the homogenization process can refer to it. Finally, we homogenize the panel.
```{r, warning=FALSE}
homogenized_panel <-
test_panel %>%
homogenize_panel() %>%
bind_waves()
homogenized_panel
```
To extract the underlying data, use `as.data.frame()` to convert the homogenized panel to a `mapped_df`, a `data.frame`-like object that carries the panel mapping as an attribute.
```{r}
str(as.data.frame(homogenized_panel))
```
## Installation
As `panelcleaner` does not exist on CRAN yet, you can install the latest development version of this package via:
``` r
# install.packages("remotes")
remotes::install_github("Global-TIES-for-Children/panelcleaner")
```
## Contributing
Please note that the `panelcleaner` project is released with a [Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.