Skip to content

Commit 928453b

Browse files
committed
Conversion to R Notebook, published to RPubs
1 parent c8d7ffb commit 928453b

9 files changed

+38772
-1256
lines changed

.ipynb_checkpoints/FraminghamReshaping-checkpoint.ipynb

+5,317
Large diffs are not rendered by default.

Datasets/framingham.csv

+11,628
Large diffs are not rendered by default.

Datasets/framingham_ms.csv

+10,133
Large diffs are not rendered by default.

FraminghamReshaping.ipynb

+74-74
Large diffs are not rendered by default.

MSM Framingham Data.ipynb

+459
Large diffs are not rendered by default.

Markov Multi-State Model - Framingham Data.ipynb

-1,182
This file was deleted.

Markov.Rmd

+256
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
---
2+
title: A Markov Multi-State Model using the Framingham Data
3+
output: html_document
4+
---
5+
6+
Measuring stage duration and transition probability & intensity with the msm package.
7+
8+
The framingham_ms dataset contains 10,132 state observations, defined as:
9+
10+
- State 1: No Disease
11+
- State 2: Hypertension
12+
- State 3: Cardiovascular disease
13+
- State 4: Death
14+
- State 99: Censored
15+
16+
The original Framingham Data were reshaped using Python 3 in a notebook entitled FraminghamReshaping.ipynb
17+
18+
msm Package Citation:
19+
Jackson CH (2011). “Multi-State Models for Panel Data: The msm Package for R.” Journal of Statistical Software, 38(8), 1–29. doi:10.18637/jss.v038.i08.
20+
21+
Kyle P. Rasku MS BSN RN
22+
23+
```{r}
24+
library("msm")
25+
frmghm_ms <- read.csv("Datasets/framingham_ms.csv", header = TRUE, sep = ",")
26+
frmghm_ms
27+
```
28+
29+
## Define states and transitions allowed for model
30+
31+
Provide initial values representing a guess that there is an equal probability of progression, recovery or death (qrr = - SUM where s ne r of qrs).
32+
33+
Or, supply the option "gen.inits=TRUE" in the msm function call. This sets the initial values for non-zero entries of the Q matrix (transition intensity matrix) to the maximum likelihood estimates under the assumption that transitions take place only at observation times.
34+
35+
For transparency, I show the state table and MLE estimated Q matrix, and then set that to the qmatrix arg. given to msm.
36+
37+
```{r}
38+
twoway4.q <- rbind(c(0, 0.166, 0.166, 0.166), c(0, 0, 0.25, 0.25), c(0, 0, 0, 0.50), c(0,0,0,0))
39+
statetable.msm(STATE, RANDID, data=frmghm_ms)
40+
Q = crudeinits.msm(STATE ~ YEARS, RANDID, data=frmghm_ms, censor = 99, censor.states = c(1,2,3), qmatrix=twoway4.q)
41+
Q
42+
```
43+
44+
```{r}
45+
# Each initial transition block must add up to -qrr (0.5) based on the assumption
46+
# In state 1 there are 3 possible transitions - to hypertension (2), to cvd (3), or death (4)
47+
# In state 2 there are 2 possible transitions - to cvd (3), or death (4)
48+
# In state 3 there are 2 possible transition - to hypertension (2), or death (4)
49+
# In state 4 there are no further transitions possible (obviously)
50+
51+
52+
rownames(Q) <- colnames(Q) <- c("Well", "Hypertensive", "CVD", "Death")
53+
```
54+
55+
```{r}
56+
frm.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, qmatrix = Q, censor = 99,
57+
censor.states = c(1,2,3), exacttimes=TRUE)
58+
```
59+
60+
```{r}
61+
frm.msm
62+
```
63+
64+
## Interpreting the Transition Intensities
65+
66+
- Mean time in the Well state: 1/0.048348 = 20.7 years
67+
- Mean time in the Hypertensive state: 1/0.021638 = 46.2 years
68+
- Mean time in the CVD state: 1/0.150787 = 6.6 years
69+
70+
71+
- From the Well state, the likelihood of transition to Hypertensive is: 0.026774 (2.7%)
72+
- From the Well state, the likelihood of transition to CVD is: 0.009887 (1%)
73+
- From the Well state, the likelihood of transition to Death is 0.011687 (1.2%)
74+
75+
76+
- From the Hypertensive state, the likelihood of transition to CVD is: 0.011434 (1.1%)
77+
- From the Hypertensive state, the likelihood of transition to Death is: 0.010204 (1%)
78+
79+
80+
- From the CVD state, the likelihood of transition to Death is: 0.150787 **(15%)**
81+
82+
```{r}
83+
# Display the Transition Probability Matrix, P(t) over an interval of t=1 (in this case, 1 year)
84+
# ci = "normal" computes a confidence interval for P(t) by repeated sampling from the asymptotic
85+
# normal distribution of the maximum likelihood estimates of the log(qrs)
86+
# Based on a default 1000 samples, converged to within 2 significant figures
87+
88+
# NOTE:
89+
# ci = "boot" would instead compute intervals using nonparametric bootstrap resampling, drawn with replacement
90+
# the model is refitted repeatedly to estimate the sampling uncertainty surrounding the estimates
91+
# more accurate, but slower
92+
93+
pmatrix.msm(frm.msm, t = 1, ci = "normal")
94+
```
95+
96+
## Transition Probabilities
97+
98+
- The probability of being Well 1 year from now, given Well is 95%
99+
- The probability of being Hypertensive 1 year from now, given Well is 2.6%
100+
- The probability of being CVD 1 year from now, given Well is 1%
101+
- The probability of being Dead 1 year from now, given Well is 1.2%
102+
103+
104+
- The probability of being Hypertensive 1 year from now, given Hypertensive is 98%
105+
- The probability of being CVD 1 year from now, given Hypertensive is 1%
106+
- The probability of being Dead 1 year from now, given Hypertensive is 1%
107+
108+
109+
- The probability of being CVD 1 year from now, given CVD is 86%
110+
- The probability of being Dead 1 year from now, given CVD is 14%
111+
112+
```{r}
113+
# A model with covariates: sex, age, diabetes (yes or no) and smoker (yes or no)
114+
115+
frm.cov.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, covariates = ~ AGE + SEX + DIABETES + CURSMOKE,
116+
qmatrix = Q, method = "BFGS", exacttimes = TRUE, censor = 99,
117+
censor.states = c(1,2,3), control = list(fnscale=30000, maxit=10000))
118+
```
119+
120+
```{r}
121+
# Display hazard ratios for each covariate on each transition with 95% confidence intervals
122+
hazard.msm(frm.cov.msm)
123+
```
124+
125+
## Interpreting the (significant) Hazard Ratios
126+
127+
128+
- 1 year increase in age is associated with a 5% increased risk of CVD onset (Well -> CVD)
129+
- 1 year increase in age is associated with a 8% increased risk of Death (on average) (Well -> Death)
130+
- 1 year increase in age is associated with a 3.2% increased risk of CVD onset, if Hypertensive (HTN -> CVD)
131+
- 1 year increase in age is associated with a 4.4% increased risk of Death, if Hypertensive (HTN -> Death)
132+
- 1 year increase in age is associated with a 5.8% increased risk of Death, if CVD (CVD -> Death)
133+
134+
135+
- Being Female increases the risk of Hypertension onset (Well -> HTN) by 39% on average.
136+
- Being Female decreases the risk of CVD onset (Well -> CVD) by 51% on average.
137+
- Being Female decreases the risk of Death (Well -> Death) by 19% on average.
138+
- Being Female decreases the risk of CVD, given HTN (HTN -> CVD) by 45% on average.
139+
- Being Female decreases the risk of Death, given HTN (HTN -> Death) by 33% on average.
140+
141+
142+
- An initial diagnosis of Diabetes increases the risk of CVD onset (Well -> CVD) by an avg. of **170%**
143+
- An initial diagnosis of Diabetes increases the risk of CVD onset, if Hypertensive (HTN -> CVD) by an avg. of **187%**
144+
- An initial diagnosis of Diabetes increases the risk of Death for those who already have HTN by 103%.
145+
- An initial diagnosis of Diabetes increases the risk of Death for those who already have CVD by 57%.
146+
147+
148+
- Being a smoker is associated with a 27% lower risk of becoming Hypertensive if Well (these are younger people).
149+
- Being a smoker is associated with a 26% increased risk of CVD, if Well.
150+
- Being a smoker is associated with a 39% increased risk of Dying, if Well.
151+
- Being a smoker is associated with a 29% increased risk of CVD, if Hypertensive.
152+
- Being a smoker is associated with a 25% increased risk of Death, if CVD.
153+
154+
```{r}
155+
# Calculating the Transition Intensity Matrix for specified covariates
156+
# Age: 40, Primary Diagnosis: Diabetes & Smoker
157+
# Note: Avg age is 50
158+
159+
qmatrix.msm(frm.cov.msm, covariates = list(AGE = 40, DIABETES = 1, CURSMOKE = 1))
160+
```
161+
162+
#### Comparing this patient to the average patient
163+
164+
- Average patient's likelihood of transition from Well -> HTN 0.026774 (2.7%), for this patient it is **9.7%**
165+
- Average patient's likelihood of transition from Well -> CVD 0.009887 (1%), for this patient it is **5.3%**
166+
- Average patient's likelihood of transition from HTN -> CVD 0.011434 (1.1%), for this patient it is **5.9%**
167+
- Average patient's likelihood of transition from CVD -> Death 0.150787 (15%), for this patient it is **6.6%**
168+
169+
170+
- Average patient's mean time in Well state 20.7 yrs, this patient **13.4 yrs**
171+
- Average patient's mean time in Hypertensive state 46.2 yrs, this patient **12.3 yrs**
172+
- Average patient's mean time in CVD state 6.6 yrs, this patient **15.2 yrs**
173+
174+
```{r}
175+
# Does the model with covariates fit significantly better than the one without?
176+
# Compare the likelihood ratio statistic to Chi-square distribution with 24 degrees of freedom
177+
178+
lrtest.msm(frm.msm, frm.cov.msm)
179+
```
180+
181+
The p-value is highly significant.
182+
183+
## When Q is piecewise-constant
184+
185+
Transition probabilities can be calculated in closed form by summing the likelihood over the unknown observed state at the times when the covariates change.
186+
187+
```{r}
188+
# Fitting a model where all intensities change 12 years after the beginning of the study
189+
# Divides data into 2 time periods: -Inf to 12 yrs, and 12 yrs to Inf
190+
# The study lasted a little more than 24 years, so this is about the halfway point
191+
192+
frm.pci.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, qmatrix = Q,
193+
pci = 12, method = "BFGS", exacttimes = TRUE,
194+
censor = 99, censor.states = c(1, 2, 3),
195+
control = list(fnscale=30000, maxit=10000))
196+
197+
# Is this data truly time-inhomogenous?
198+
lrtest.msm(frm.msm, frm.pci.msm)
199+
```
200+
201+
It is very likely that the data is time-inhomogenous.
202+
203+
```{r}
204+
hazard.msm(frm.pci.msm)
205+
```
206+
207+
## Diagnostic Plots
208+
209+
Comparing model predictions with Kaplan-Meier curves
210+
211+
```{r}
212+
par(mfrow = c(2, 2))
213+
plot.survfit.msm(frm.msm, main = "frm.msm: no covariates", mark.time = FALSE, legend.pos=c(0,0))
214+
plot.survfit.msm(frm.cov.msm, main = "frm.cov.msm: covariates", mark.time = FALSE, legend.pos=c(0,0))
215+
plot.survfit.msm(frm.pci.msm, mark.time = FALSE, legend.pos=c(0,0))
216+
title("frm.pci.msm: time-inhomogeneous", line = 2)
217+
title("(12 year change point)", line = 1)
218+
frm.pci2.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, qmatrix = Q,
219+
pci = c(6,12,18), method = "BFGS", exacttimes = TRUE,
220+
censor = 99, censor.states=c(1, 2, 3),
221+
control = list(fnscale=30000, maxit=10000))
222+
plot.survfit.msm(frm.pci2.msm, mark.time = FALSE, legend.pos=c(0,0))
223+
title("frm.pci2.msm: time-inhomogeneous", line = 2)
224+
title("(6, 12 and 18 year change points)", line = 1)
225+
```
226+
227+
The fit is much improved after adding censoring!
228+
229+
Adding additional time-change points doesn't appear to help the fit, but covariates likely do, although the differences between the first two graphs are not readily apparent, the differences in the model fits are.
230+
231+
## Comparing Observed and Expected Prevalence
232+
233+
Works best when individuals are actually observed at the computed times, otherwise assumptions are made such as individuals are only observed at these times, or midpoints are assumed.
234+
235+
The observed prevalence of a state is simply calculated as the number of individuals known to be in that state, divided by the number of individuals whose state is known at that time, which ignores the information from individuals censored at earlier times (root of Kaplan-Meier estimation :))
236+
237+
```{r}
238+
# Need to look at how this is implemented by Gentleman et al. 1994 using prevalence.msm, and plot.prevalence.msm
239+
```
240+
241+
## None of these models give an adequate fit
242+
243+
A more complex pattern of time-dependence or allowing transition intensities to depend on covariates would likely yield a better fit.
244+
245+
TO DO => Figure out how to allow transition intensities to depend on covariates!
246+
247+
## It is also possible to calculate the influence of each individual on the MLE
248+
249+
Using scoreresid.msm
250+
251+
## Extensions of msm and limitations
252+
253+
For continuously-observed processes: mstate (deWreede et al. 2010)
254+
255+
For Random Effects models (unexplained heterogeneity in transition intensities between individuals) - calculating likelihood often intractable with a few exceptions: tracking model - random effect acts on all intensities simultaneously (Satten 1999), or a discrete random effects distribution (Cook et al 2004)
256+

Markov.html

+10,894
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
name: Document
2+
title:
3+
username:
4+
account: rpubs
5+
server: rpubs.com
6+
hostUrl: rpubs.com
7+
appId: https://api.rpubs.com/api/v1/document/1044678/fc4148f2394f498bb71224dd261a2ca3
8+
bundleId: https://api.rpubs.com/api/v1/document/1044678/fc4148f2394f498bb71224dd261a2ca3
9+
url: http://rpubs.com/publish/claim/1044678/c8f634d9a24342269183bd913a205582
10+
when: 1684798169.12327
11+
lastSyncTime: 1684798169.12329

0 commit comments

Comments
 (0)