-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
141 lines (107 loc) · 5.37 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(fig.path = "man/figures/README-")
```
```{r echo=FALSE, message=FALSE, warning=FALSE, eval = FALSE}
devtools::load_all()
```
# gutsandcages
The goal of `gutsandcages` is to estimate statistical power for microbiome
experiments in mice and rats.
## Installation
You can install the development version with:
```{r eval = FALSE}
# install.packages("devtools")
devtools::install_github("PennChopMicrobiomeProgram/gutsandcages")
```
## Example
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(pwr)
library(ggplot2)
library(gutsandcages)
library(future)
```
The `gutsandcages` library computes statistical power by simulation. For a
given experimental design and effect size, our goal is to estimate the fraction
of trials in which the null hypothesis will be rejected.
To demonstrate use of the package, we'll run simulations on a very simple
experimental design, one that we might encounter in statistics 101. Then, we'll
compare the result from our simulations with the textbook formula for a t-test.
In our simple experiment, we have 10 mice in each group, one per cage. Here's
the textbook formula for statistical power, as implemented in the `pwr`
library. The effect size, *d*, is the difference in means between the two
groups divided by the standard deviation (which we assume to be the same in
both groups).
```{r}
textbook_power <- tibble(d = seq(0.5, 2, by = 0.01)) %>%
group_by(d) %>%
mutate(power = pwr.t.test(n = 10, d = d)$power)
```
In `gutsandcages`, we first specify the experimental design.
```{r}
expt <- make_expt(ncage_treatment = 10, mice_per_cage_treatment = 1)
```
Then, we use this object to run simulations at each value of the effect size in two steps.
First, we get the overall test result in a tibble using `get_test_res`,
and then use `get_power` function to calculate the power.
We use a small number of simulations to make the computation more speedy and to
give you a feel for the uncertainty in computing statistical power by
simulation.
```{r}
simulated_power <- get_power(expt, d = seq(0.5, 2, by = 0.01), p = 0, nsim = 20)
```
```{r basicpower}
simulated_power %>%
pivot_longer(names_to = "approach", values_to = "power", cols = -d) %>%
mutate(approach = str_replace(approach,"power", "simulated")) %>%
mutate(approach = str_replace(approach, "t_test_power", "textbook")) %>%
ggplot(aes(x = d, y = power, color = approach)) +
geom_line() +
scale_color_brewer(palette = "Paired") +
theme_bw()
```
Because simulation takes long time, `get_power` also implement features that you could run this in parallel on your local computer.
Here is an example:
```{r eval= FALSE}
library(future)
plan(multisession)
sim_test<- get_power(expt,d = seq(0.5, 2, by = 0.01), p = 0, nsim = 200)
```
For mouse experiment, it is common for mice to share a cage. The gut microbiome of the mice that share the same cage
are affected by each other. In linear mixed model, we call it random effect. Therefore, when we have more than one mice per cage, we used a linear mixed model to account for the random effect of the cage.
when you specify your experimental design, make sure to specify number of cages if you have shared cages.
The function `get_power` will gather the information and use a linear mixed model instead a linear model to calculate the power. Also it is important for the model to know how much cage affected gut microbiome, we specify the propotion of the randome effect with p, which is also referred as intraclass correlation coefficient, more details here https://bookdown.org/anshul302/HE902-MGHIHP-Spring2020/Random.html#ICCintro
When you have one mouse per cage, p = 0, when you have multiple mice per cage, p is within the range of 0 to 1. it is the proportion of the effect on cage.
```{r eval = FALSE}
expt2 <- make_expt(ncage_treatment = 10, mice_per_cage_treatment = 3)
power <- get_power(expt2, seq(0.2, 3, by= 0.4), p = c(0.6,0.9), nsim = 20)
```
Let's look at how different ICC affect the power. Compare different ICC, from low to high (0.1, 0.5, 0.9)
Black dashed line is reference power from t test ncage = 10, and red dashed line is t test ncage = 50
```{r}
expt_icc <- make_expt(ncage_treatment = 10, mice_per_cage_treatment = 5)
plan(multisession)
icc_r <- get_power(expt_icc, seq(0.5, 3, by = 0.1), c(0.1, 0.5, 0.9), nsim = 1000) %>%
mutate(ICC = paste0(rep(c("low","medium", "high"), time = n()/3))) %>%
select(-t_test_power)
no_icc_r <- get_power(make_expt(ncage_treatment = 10, mice_per_cage_treatment = 1), seq(0.5, 3, by = 0.1), p = 0, nsim = 1000) %>%
mutate(p = 0) %>%
mutate(ICC = "No")
no_icc_r_m <- no_icc_r %>% select(-t_test_power)
no_icc_50 <- get_power(make_expt(ncage_treatment = 50, mice_per_cage_treatment = 1), seq(0.5, 3, by = 0.1), p = 0, nsim = 1000) %>%
rename(t_test_power2 = t_test_power)
rbind(icc_r, no_icc_r_m) %>%
mutate(ICC = factor(ICC, levels = c("No","low","medium", "high"))) %>%
left_join(no_icc_r %>% select(d, t_test_power), by = "d") %>%
left_join(no_icc_50 %>% select(d, t_test_power2), by = "d") %>%
ggplot(aes(x = d, y = power, color = ICC)) +
geom_line() +
scale_color_brewer(palette = "Dark2") +
theme_bw() +
geom_line(aes(y=t_test_power), linetype = "dashed", color = "black") +
geom_line(aes(y=t_test_power2), linetype = "dashed", color = "darkred")
```