-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCausal_Graph_Model_Analysis.R
195 lines (155 loc) · 8.01 KB
/
Causal_Graph_Model_Analysis.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
"
Full Causal Graph for Bug inspection accuracy in the context of Experiment-2
Verification of the assumptions encoded in the grpah:
1- List of marginal independences (unconditional)
2- List of conditional independences
3- Adjustments to identify direct effect paths
4- Adjustments to identify total effect paths
5- Adjustments to identify indirect effect paths
6- Sensitivity analysis
6.1 Change of priors
6.2 Change of node edges
"
library(ggdag)
library( dagitty )
library (ggm)
library(tidyr)
library(devtools)
dag <- ' dag {
bb="-0.5,-0.5,0.5,0.5";
Accuracy [outcome,pos="0.061,-0.054"];
Programmer.Skill [exposure,pos="-0.327,-0.120"];
Task.Type [exposure,pos="-0.037,-0.225"];
Code.Complexity [exposure, pos="-0.327,-0.225"];
Answer.Type [pos="-0.327,-0.054"];
Confidence [pos="-0.142,-0.054"];
Duration [pos="-0.037,-0.120"];
Explanation [pos="0.061,-0.179"];
Difficulty [pos="-0.142,-0.179"];
Answer.Type -> Confidence;
Code.Complexity -> Difficulty;
Confidence -> Accuracy;
Duration -> Confidence;
Explanation -> Accuracy;
Explanation -> Duration;
Difficulty -> Duration;
Difficulty -> Explanation;
Programmer.Skill -> Confidence;
Programmer.Skill -> Difficulty;
Task.Type -> Explanation;
Task.Type -> Difficulty;
}'
graph <- dagitty(dag)
plot(graph)
tidy_dagitty(graph)
ggdag(graph, layout = "circle")
#check layout and color options:
#https://cran.r-project.org/web/packages/ggdag/vignettes/intro-to-ggdag.html
tidy_dagitty(graph, layout = "fr") %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_node(col="lightgrey") +
geom_dag_text(col="black") +
geom_dag_edges() +
theme_dag()
impliedConditionalIndependencies(graph)
"INDEPENDENCIES IMPLIED BY THE CAUSAL GRAPH
After applying the d-separation criterion to all pairs of variables, we
obtained a set of independencies that are implied by the causal graph model."
"Marginal independencies
Some marginal independencies exist by definition, because the covariates are exposure
variables that were randomized in the experiment design.
Expected marginal independencies because of the randomized task design
Code.Complexity _||_ Task.Type
Expected marginal independencies because of the randomized task allocation
Answer.Type _||_ Code.Complexity
Answer.Type _||_ Difficulty
Answer.Type _||_ Duration
Answer.Type _||_ Explanation
Answer.Type _||_ Programmer.Skill
Answer.Type _||_ Task.Type
Code.Complexity _||_ Programmer.Skill
Programmer.Skill _||_ Task.Type
Conditional Independencies
Related to the Outcome Variable
Below are equivalent sets of independencies. They are equivalent in a sense that
the results (independence) should hold by conditioning on any of the different
sets of covariates listed. These equivalences will be used to verify if the
graph against the data.
Accuracy _||_ Answer.Type | Confidence, Duration, Programmer.Skill
Accuracy _||_ Answer.Type | Confidence, Difficulty, Duration, Task.Type
Accuracy _||_ Answer.Type | Confidence, Explanation
Accuracy _||_ Code.Complexity | Difficulty, Programmer.Skill, Task.Type
Accuracy _||_ Code.Complexity | Difficulty, Explanation, Programmer.Skill
Accuracy _||_ Code.Complexity | Duration, Explanation, Programmer.Skill
Accuracy _||_ Code.Complexity | Confidence, Difficulty, Duration, Task.Type
Accuracy _||_ Code.Complexity | Confidence, Explanation
Accuracy _||_ Difficulty | Duration, Explanation, Programmer.Skill
Accuracy _||_ Difficulty | Confidence, Explanation
Accuracy _||_ Duration | Confidence, Explanation
Accuracy _||_ Programmer.Skill | Confidence, Difficulty, Duration, Task.Type
Accuracy _||_ Programmer.Skill | Confidence, Explanation
Accuracy _||_ Task.Type | Difficulty, Explanation, Programmer.Skill
Accuracy _||_ Task.Type | Duration, Explanation, Programmer.Skill
Accuracy _||_ Task.Type | Confidence, Explanation
Dependent variables related
Code.Complexity _||_ Confidence | Duration, Programmer.Skill
Code.Complexity _||_ Confidence | Difficulty, Explanation, Programmer.Skill
Code.Complexity _||_ Confidence | Difficulty, Programmer.Skill, Task.Type
Code.Complexity _||_ Duration | Difficulty, Explanation
Code.Complexity _||_ Duration | Difficulty, Task.Type
Code.Complexity _||_ Explanation | Difficulty, Task.Type
Confidence _||_ Difficulty | Duration, Programmer.Skill
Confidence _||_ Explanation | Difficulty, Duration, Task.Type
Confidence _||_ Explanation | Duration, Programmer.Skill
Confidence _||_ Task.Type | Difficulty, Explanation, Programmer.Skill
Confidence _||_ Task.Type | Duration, Programmer.Skill
Duration _||_ Programmer.Skill | Difficulty, Task.Type
Duration _||_ Programmer.Skill | Difficulty, Explanation
Duration _||_ Task.Type | Difficulty, Explanation
Explanation _||_ Programmer.Skill | Difficulty, Task.Type
"
"IMPLIED OPEN and CLOSED CAUSAL PATHS
A causal path is a chain of cause-effect dependencies that link the expousure variable with the outcome variable.
Each path contains a covariate only once. Causal paths can be opened or closed.
Opened means that exposure and outcome variables are d-connected through that path.
Closed means that exposure and outcome variables are d-separated through that path.
A path is closed if at least one of the covariates in the path is a collider node, otherwise,
the path is opened. A collider is node that has at least two parents.
We can 'open' these paths by conditioning on the collider itself (i.e. on M, in this instance),
or any covariate that is a 'descendant' of the collider.
Conditioning is necessary to close backdoor paths, which can bias the estimate of the
effect of the exposure on the outcome. Assuming that we have a hypothesis about a set of
Z variables that we would like to condition on. The path p between exposure e and outcome o
will be closed in the following two situations:
the path p with a set of nodes x, z, y contains a chain x -> z -> y or a fork x <- Z -> y and z is in Z
the path p with a set of nodes x, z, y contains a chain x -> Z <- y and z is NOT in Z
We compute it for the exposure variables.
We can also verify these paths against the data.
"
paths(graph,from=exposures(graph),to=outcomes(graph),directed = TRUE)
#{EMPTY}
"If we considered all exposures (i.e., set their values), all directed paths are empty.
This means that only setting the value of this variables we cannot compute
the causal effect on the outcome variable. We need to set the value of other variables.
"
paths(graph,from=exposures(graph),to=outcomes(graph),directed = FALSE)
# [1] "Programmer.Skill -> Confidence -> Accuracy"
# [2] "Programmer.Skill -> Difficulty -> Duration -> Confidence -> Accuracy"
# [3] "Programmer.Skill -> Difficulty -> Explanation -> Accuracy"
# [4] "Programmer.Skill -> Difficulty -> Explanation -> Duration -> Confidence -> Accuracy"
paths(graph,c("Programmer.Skill"),"Accuracy",directed = FALSE)
# [1] "Programmer.Skill -> Confidence -> Accuracy"
# [2] "Programmer.Skill -> Confidence <- Duration <- Difficulty -> Explanation -> Accuracy"
# [3] "Programmer.Skill -> Confidence <- Duration <- Difficulty <- Task.Type -> Explanation -> Accuracy"
# [4] "Programmer.Skill -> Confidence <- Duration <- Explanation -> Accuracy"
# [5] "Programmer.Skill -> Difficulty -> Duration -> Confidence -> Accuracy"
# [6] "Programmer.Skill -> Difficulty -> Duration <- Explanation -> Accuracy"
# [7] "Programmer.Skill -> Difficulty -> Explanation -> Accuracy"
# [8] "Programmer.Skill -> Difficulty -> Explanation -> Duration -> Confidence -> Accuracy"
# [9] "Programmer.Skill -> Difficulty <- Task.Type -> Explanation -> Accuracy"
# [10] "Programmer.Skill -> Difficulty <- Task.Type -> Explanation -> Duration -> Confidence -> Accuracy"
# $open [1] TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE
adjustmentSets(graph,exposure = "Task.Type",outcome = "Accuracy",effect = c("direct"))
#{ Confidence, ExplanationSize }
#{ Duration, ExplanationSize }
#{ Difficulty }