Skip to content

Commit fcdeb09

Browse files
committed
findind d paragraph
1 parent 2f1f13f commit fcdeb09

File tree

7 files changed

+181
-127
lines changed

7 files changed

+181
-127
lines changed

content/posts/finance/stock_prediction/ARIMA/arima_example.ipynb

Lines changed: 32 additions & 33 deletions
Large diffs are not rendered by default.
Loading

content/posts/finance/stock_prediction/ARIMA/index.md

Lines changed: 62 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -77,11 +77,12 @@ from statsmodels.tsa.arima.model import ARIMA
7777
from statsmodels.tsa.stattools import adfuller
7878
from sklearn.metrics import mean_squared_error
7979
import yfinance as yf
80+
import seaborn as sns
8081

8182
# Download stock data
8283
ticker = "AAPL"
83-
start_date = "2010-01-01"
84-
end_date = "2023-06-23"
84+
start_date = "2018-01-01"
85+
end_date = "2024-06-23"
8586
data = yf.download(ticker, start=start_date, end=end_date)
8687

8788
# Prepare the data
@@ -92,40 +93,38 @@ def test_stationarity(timeseries):
9293
result = adfuller(timeseries, autolag='AIC')
9394
print('ADF Statistic:', result[0])
9495
print('p-value:', result[1])
96+
return result[1]
9597

96-
# if p-value is > 0.05, it means the series is not stationary.
97-
test_stationarity(ts)
98-
99-
# If non-stationary, difference the series
100-
ts_diff = ts.diff().dropna()
101-
test_stationarity(ts_diff)
98+
# Plot the time-series
99+
plt.figure(figsize=(12,6))
100+
plt.plot(ts.index[:], ts.values[:], label='Observed')
101+
plt.title(f'{ticker} Stock Price ')
102+
# plt.legend()
103+
plt.tight_layout()
104+
plt.show()
102105

103-
# Fit ARIMA model
104-
model = ARIMA(ts_diff, order=(1,1,1))
105-
results = model.fit()
106-
print(results.summary())
107106

108-
# Forecast
109-
forecast = results.forecast(steps=30)
107+
p_val = test_stationarity(ts)
110108

111-
# Plot the results
112-
plt.figure(figsize=(12,6))
113-
plt.plot(ts.index[-100:], ts.values[-100:], label='Observed')
114-
plt.plot(forecast.index, forecast.values, color='r', label='Forecast')
115-
plt.fill_between(forecast.index,
116-
forecast.conf_int().iloc[:, 0],
117-
forecast.conf_int().iloc[:, 1],
118-
color='pink', alpha=0.3)
119-
plt.title(f'{ticker} Stock Price Prediction')
120-
plt.legend()
121-
plt.show()
109+
if p_val > 0.05:
110+
# If non-stationary, difference the series
111+
ts_diff = ts.diff().dropna()
112+
p_val = test_stationarity(ts_diff)
113+
d = 1
114+
if p_val > 0.05:
115+
ts_diff = ts.diff().diff().dropna()
116+
p_val = test_stationarity(ts_diff)
117+
d = 2
122118

123-
# Evaluate the model
124-
mse = mean_squared_error(ts.diff().dropna()[-30:], forecast)
125-
print(f'Mean Squared Error: {mse}')
119+
print(f"\nd = {d}")
126120
```
121+
> *Output:*
122+
>
123+
> d = 1
127124
128-
This script downloads stock data, checks for stationarity, fits an ARIMA model, makes predictions, and evaluates the model's performance.
125+
![png](images/time_series.png)
126+
127+
This script downloads stock data, checks for stationarity, fits an ARIMA model, makes predictions, and evaluates the model's performance. In this case, as expected from the plot, the time-series is not stationary. Hence, *d* has to be greater or equal to 1.
129128

130129
## 5. Model Selection and Diagnostic Checking
131130

@@ -148,7 +147,38 @@ Determining the optimal ARIMA parameters involves a combination of statistical t
148147
* Fine-tune with Information Criteria:
149148
- Use AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to compare different models.
150149

151-
### Finding d values from plots
150+
### Finding d parameter from plots
151+
Since, the stationary was already checkd in the previous, this paragraph is useful for graphical and comphrension purpose. Moreover, with autocorrelation parameters, it is possible to find better values of d that the ADF test cannot recognize.
152+
153+
```python
154+
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
155+
156+
plt.rcParams.update({'figure.figsize':(15,10), 'figure.dpi':80})
157+
158+
# Import data
159+
df = data.copy()
160+
161+
# Original Series
162+
fig, axes = plt.subplots(3, 2, sharex=False)
163+
axes[0, 0].plot(df.index, df.Close); axes[0, 0].set_title('Original Series - '+ticker)
164+
plot_acf(df.Close, ax=axes[0, 1], lags=len(df)-1, color='k', auto_ylims=True)
165+
166+
# 1st Differencing
167+
axes[1, 0].plot(df.index, df.Close.diff()); axes[1, 0].set_title('1st Order Differencing')
168+
plot_acf(df.Close.diff().dropna(), ax=axes[1, 1], lags=len(df)/7-2, color='k', auto_ylims=True)
169+
170+
# 2nd Differencing
171+
axes[2, 0].plot(df.index, df.Close.diff().diff()); axes[2, 0].set_title('2nd Order Differencing')
172+
plot_acf(df.Close.diff().diff().dropna(), ax=axes[2, 1], lags=len(df)/7-3, color='k', auto_ylims=True)
173+
174+
plt.tight_layout()
175+
plt.show()
176+
```
177+
![png](images/find_d.png)
178+
179+
Indeed, from the plot, *d=2* is probably a better solution since we have few coefficient that goes above the confidence threshold.
180+
181+
### Finding p parameter from plots
152182

153183

154184
### Grid Search
@@ -175,6 +205,8 @@ def grid_search_arima(ts, p_range, d_range, q_range):
175205
best_order = grid_search_arima(ts_diff, range(3), range(2), range(3))
176206
```
177207

208+
209+
178210
## 6. Limitations and Considerations
179211

180212
While ARIMA models can be powerful for time series prediction, they have limitations:

public/index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

public/posts/finance/stock_prediction/arima/arima_example.ipynb

Lines changed: 32 additions & 33 deletions
Large diffs are not rendered by default.
Loading

0 commit comments

Comments
 (0)