-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathProject_Code.Rmd
194 lines (134 loc) · 5.48 KB
/
Project_Code.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
title: "Project"
author: "Gema Vidal_7526, Tadaishi Yatabe-Rodriguez"
date: "March 6, 2017"
output: word_document
---
Loading packages
```{r, eval=FALSE}
library("astsa")
library("forecast")
library("nlme")
```
Setting working directory
```{r, eval=FALSE}
# from PC
setwd("/Users/gvidal/Box Sync/Vet/MPVM & PhD/PhD 2016_2017 Winter Quarter ([email protected])/STA 137/Project")
# from Mac
#setwd("/Users/gemavidal/Box Sync/Vet/MPVM & PhD/PhD 2016_2017 Winter Quarter ([email protected])/STA 137/Project")
# From Tada's PC
setwd("C:/Users/tyatabe/OneDrive/Docs/PhD Epi/Winter_17/Time series/Project/Time-series/")
df <- read.table("bostonArmedRobberies.txt", header = FALSE)
head(df)
dim(df)
```
1. Use graphical techniques to inspect the data. Describe the behaviour of the data. Mention any features you think may be important for analysis and forecasting.
```{r, eval=FALSE}
plot(df$V1, df$V2, type = "l", xlab = "Time", ylab = "Number of Armed Robberies", main = "Number of Monthly Armed Robberies in Boston")
```
Variance: unequal variance
Trend: increasing trend over time
Seasonality:
2. In what way(s) is the data not stationary? Use transformations and/or differencing to make the series stationary. Include a time plot of the transformed and/or differenced data. If you chose a transformation, use this transformation for the remainder steps.
```{r, eval=FALSE}
robs_sqrt = round(sqrt(df$V2), 2)
robs_log = round(log(df$V2), 2)
robs_third = (df$V2)^(1/3)
robs_subs = (df$V2)^(-1)
df_trans = data.frame(robs_sqrt, robs_log, robs_third, robs_subs)
# use tada's code that includes the titles of the plots.
par(mfrow = c(2,1))
for (d in df_trans) {
plot(df$V1, d, type = "l", xlab = "Time")
}
# we have to decide how we want to make the data stationary (transformations? differencing?)
```
3. Using the tranformed and/or differenced data from the previous part, obtain the sample ACF and PACF plots, as well as plots of the raw periodogram, and its smoothed version. Comment on the plots. Use the plots to make a preliminary guess for an appropriate ARIMA model. Keep in mink that differencing plays a part in determining whether the model should be ARMA or ARIMA.
```{r, eval=FALSE}
# sample ACF and PACF
Acf(robs_third, lag.max = 30)
Pacf(robs_third, lag.max = 30)
# raw periodogram and smoothed periodogram
```
4. Fit the model form the previous step. Include the model, and the parameter estimates. Plot the fitted values and the observed values on the same plot.
```{r, eval=FALSE}
# how to know what model to fit? Looking at ACF and PACF?
fit =
```
5. Examine the residuals. Provide necessary plots and/or hypothesis test results. Do the residuals resemble Gaussian white noise?
```{r, eval=FALSE}
# residuals
res = fit$residuals
res
ts.plot(res)
# sample ACF and PACF
par(mfrow = c(2,1))
Acf(res)
Pacf(res)
# Box-Ljung test
Box.test(res, lag=10, type="Ljung")
# histogram and normal QQ-plot
par(mfrow = c(2,1))
hist(res)
qqnorm(res)
qqline(res)
```
6. Use AICc to select an ARIMA model for the (possible transformed) data. Keep in mind that differencing should be incorporated into the model. It is fine to use the function auto.arima() here. It is enough to consider p = 0, ..., 8, q = 0, ...,8, and d = 0, 1, 2. Include the chose model, and provide parameter estimates and their standard errors.
```{r, eval=FALSE}
```
7. Inspect the residuals of this model. Provide necessary plots and/or hypothesis test results. Do the residuals resemble Gaussian white noise?
```{r, eval=FALSE}
```
8. Plot the (theoretical) spectral density of the final model together with the smoothed periodogram. Comment on the plots. Describe the method you chose for smoothing the preriodogram.
```{r, eval=FALSE}
```
9. Now remove the data for 1975 (the last 10 observations). Using only data from 1966 - 1975 (the first 108 observations) fit an ARIMA model using AICc. Again it is fine to use auto.arima(). Then do the following.
```{r, eval=FALSE}
dim(df)
# taking the first 158 observations.
subset_robs = df[1:108, ]
subset_fsc = df[109:nrow(df), ]
```
a. Write down the chosen model. Include parameter estimates
```{r, eval=FALSE}
fit = auto.arima()
```
b. Inspect the residuals of this model. Do they resemble Gaussian white noise?
```{r, eval=FALSE}
# residuals
res = fit$residuals
res
ts.plot(res)
# sample ACF and PACF
par(mfrow = c(2,1))
Acf(res)
Pacf(res)
# Box-Ljung test
Box.test(res, lag=10, type="Ljung") # p-value: 0.5725
# histogram and normal QQ-plot
par(mfrow = c(2,1))
hist(res)
qqnorm(res)
qqline(res)
```
c. Compute point forecasts of the values for January through October 1975. If you used a transformation, then be sure to compute forecasts of the original data, not the transmformed data.
```{r, eval=FALSE}
fst = predict(fit, n.ahead = 10)
fst$pred
```
d.Make a time plot of the entire data set, the point forecasts, and 95% prediction intervals. Make another plot of just the observed values from 1975 along with the point forecasts and the prediction intervals. Commnent on the forecast performance.
```{r, eval=FALSE}
# confirm this code
n = 158
h = 5
par(mfrow = c(1,1))
ts.plot(temp[,2])
polygon(x = c(n+(1:h), n+(h:1)), y = c(upper, lower[h:1]), col = 'lightblue', border = NA)
points(x = n+1:h, y = fst$pred, col = 'purple', type = 'b', pch = 19)
# zoom in
ts.plot(subset_fsc[,2], ylim = c(0.30, 0.48))
points(seq(1,5), fst$pred, col = 'purple', type = 'b', pch = 19)
upper = fst$pred + fst$se
lower = fst$pred - fst$se
polygon(x = c(seq(1,5), seq(5,1)), y = c(upper, lower[h:1]), col = 'lightblue', border = NA)
```