-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-exclude_sex_mismatch_subjects.Rmd
60 lines (40 loc) · 2.17 KB
/
04-exclude_sex_mismatch_subjects.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Filter out subjects with sex mismatch {#exclude-sex-mismatch-subjects}
We exclude any subjects who have mismatch between self-reported sex and genetically determined sex from both processed demographic data and the master event table. The ID's of subjects with sex mismatch will be generated and later be used in preparing a primary care data in the next chapter.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval=F)
```
Load packages.
```{r, message = F}
library(tidyverse)
library(data.table)
library(lubridate)
```
Load the reformatted UKB assessment center data containing information about subject's genetically determined sex. The object name of the dataset is `sampleqc`. The column name containing the genetically determined sex of subjects is `genetic_sex`.
```{r}
sampleqc <- readRDS("generated_data/sampleQC_UKB.RDS")
```
Load curated master ukb event table and demographic table.
```{r}
pre_demog_sel <- readRDS("generated_data/pre_demog_sel.RDS")
pre_all_ukb_events_tab <- readRDS("generated_data/pre_all_ukb_events_tab.RDS")
```
Get subject ID's that should be excluded because of mismatch in self-reported sex and genetically determined sex. Note that everyone in demographic table has self-reported sex information. However, if a subject is missing genetically determined sex information, then we cannot acertain whether there is a mismatch. Thus, these patients are assumed to have consistent sex information.
```{r, message=F}
sex_mismatch_subject_ids <- pre_demog_sel %>% select(f.eid,SEX) %>%
full_join(sampleqc %>% select(f.eid,genetic_sex)) %>%
filter(SEX != genetic_sex) %>% .$f.eid
```
Filter the demographic table.
```{r}
demog_sel <- pre_demog_sel %>% filter(!(f.eid %in% sex_mismatch_subject_ids))
```
Filter the master event table.
```{r}
all_ukb_events_tab <- pre_all_ukb_events_tab %>% filter(!(f.eid %in% sex_mismatch_subject_ids))
```
Save sex mismatch subject ID's and filtered demographic table and the master event table.
```{r}
saveRDS(sex_mismatch_subject_ids,"generated_data/sex_mismatch_subject_ids.RDS")
saveRDS(demog_sel,"generated_data/demog_selected.RDS")
saveRDS(all_ukb_events_tab,"generated_data/all_ukb_events_tab.RDS")
```