index.qmd

---
title: "Sign and Zero Restrictions: Optimism Shock on the Australian Business Cycle"
author: "Adam Wang"

execute:
  echo:    false
  # eval:    false
  warning: false
  cache:   true

bibliography: references.bib
---

<!-- latex shortcuts -->

\def\*#1{\mathbf{#1}}
\def\e{\boldsymbol{\varepsilon}}

> **Abstract.** This article investigates the effects of optimism shocks on the Australian economy using a Bayesian Structural Vector Autoregression (BSVAR) model. We implement the sign and zero restrictions algorithm proposed by @arias2018inference to identify the optimism shock. Impulse response functions (IRF) and forecast error variance decomposition (FEVD) are used to analyse the effects of the optimism shock on five key macroeconomic variables: productivity, stock prices, consumption, real interest rate and hours worked.
>
> **Keywords.** Bayesian Structural VAR, sign restrictions, zero restrictions, optimism shock

<!-- [Replication Package](https://www.econometricsociety.org/publications/econometrica/2018/03/01/inference-based-structural-vector-autoregressions-identified) -->

# Introduction

**Objective**: The goal of this research project is to implement the sign and zero restrictions algorithm proposed by @arias2018inference in the [bsvarSIGNs](https://github.com/bsvars/bsvarSIGNs) package, and apply it to analyse the optimism shock to Australian economy.

**Question**: Does optimism shocks drive the business cycles in Australian economy?

**Motivation**: In macroeconomics, there has been a longstanding belief that fluctuations in business cycles can be largely attributed to episodes of optimism and pessimism. An optimism shock is defined as a positive shock to stock prices that does not affect productivity. Using a penalty function approach (PFA), @beaudry2011mood finds significant evidence that optimism shocks are a key driver of business cycles in the United States as it creates a simultaneous boom in consumption and hours worked. But, @arias2018inference argue that PFA imposes additional restrictions, and they find less significant results using an importance sampler algorithm. This research project aims to extend the analysis to the Australian economy by implementing the importance sampler algorithm, and compare the results with United-States data.

# Data

<!-- load data -->

```{r include=FALSE}
library(readrba)
library(readabs)
library(tidyquant)
library(tseries)
library(tidyverse)
library(knitr)
library(zoo)


consumer_price_index = read_rba(series_id = 'GCPIAG')

productivity         = read_rba(series_id = 'GNFPROSQI')

asx200               = tq_get('^AXJO', from = '1994-01-01')
# aord               = tq_get('^AORD', from = '1994-01-01')
# stock_prices       = asx200 / consumer_price_index

retail_turnover      = read_abs(series_id = 'A3348585R')
# consumption        = retail_turnover / consumer_price_index

cash_rate            = read_rba(series_id = 'FIRMMCRI')
inflation            = read_rba(series_id = 'GCPIAGSAQP')
# real_interest_rate = cash_rate - inflation

hours_worked         = read_abs(series_id = 'A85389461V')
```

<!-- concat datasets -->

```{r include=FALSE}
df_consumer_price_index = consumer_price_index |> 
  select(date, value) |>
  rename(consumer_price_index = value) |> 
  mutate(date = as.yearmon(date))

df_productivity = productivity |> 
  select(date, value) |>
  rename(productivity = value) |> 
  mutate(date = as.yearmon(date))

df_asx200 = asx200 |>
  rename(asx200 = close) |>
  na.locf(fromLast = TRUE) |>
  mutate(ym = as.yearmon(date)) |>
  slice_max(date, by = ym) |>
  select(date, asx200) |>
  mutate(date = as.yearmon(date))

# df_aord = aord |> 
#   rename(aord = close) |> 
#   na.locf(fromLast = TRUE) |> 
#   mutate(ym = as.yearmon(date)) |> 
#   slice_max(date, by = ym) |> 
#   select(date, aord) |> 
#   mutate(date = as.yearmon(date))

df_retail_turnover = retail_turnover |>
  select(date, value) |>
  rename(retail_turnover = value) |> 
  mutate(date = as.yearmon(date))

df_cash_rate = cash_rate |>
  select(date, value) |>
  rename(cash_rate = value) |> 
  mutate(date = as.yearmon(date))

df_inflation = inflation |>
  select(date, value) |>
  rename(inflation = value) |> 
  mutate(date = as.yearmon(date))

df_hours_worked = hours_worked |>
  select(date, value) |>
  rename(hours_worked = value) |> 
  mutate(date = as.yearmon(date))

df = 
  merge(df_consumer_price_index, df_productivity, by = 'date') |>
  merge(df_asx200, by = 'date') |>
  merge(df_retail_turnover, by = 'date') |>
  merge(df_cash_rate, by = 'date') |>
  merge(df_inflation, by = 'date') |>
  merge(df_hours_worked, by = 'date') |> 
  mutate(productivity       = log(productivity),
         stock_prices       = log(asx200 / consumer_price_index),
         consumption        = log(retail_turnover / consumer_price_index),
         real_interest_rate = cash_rate - inflation,
         hours_worked       = log(hours_worked)
         ) |> 
  select(date, productivity, stock_prices, consumption, real_interest_rate, hours_worked)
```

All data are collected from the Reserve Bank of Australia (RBA), the Australian Bureau of Statistics (ABS) and Yahoo Finance. The sample period covers 1994 Q3 to 2023 Q4. Following @beaudry2011mood, we select the following five variables for our analysis

- **Productivity**: non-farm labour productivity per hour (source: RBA, series ID GNFPROSQI).

```{r}
#| fig-cap: "Productivity"
library(ggplot2)

covid_index = 103

df |> 
  ggplot(aes(x = date, y = productivity)) +
  geom_line() +
  geom_vline(xintercept = df$date[covid_index], color = "red", linetype = "dashed") +
  annotate("text", x = df$date[covid_index+8], y = min(df$productivity), label = "Covid-19", color = "red") +
  theme_bw()
```


- **Stock prices**: end-of-period ASX 200 index (source: Yahoo Finance, ticker symbol \^AXJO), divided by the consumer price index.

```{r}
#| fig-cap: "Stock prices"
df |> 
  ggplot(aes(x = date, y = stock_prices)) +
  geom_line() +
  geom_vline(xintercept = df$date[covid_index], color = "red", linetype = "dashed") +
  annotate("text", x = df$date[covid_index+8], y = min(df$stock_prices), label = "Covid-19", color = "red") +
  theme_bw()
```

- **Consumption**: retail turnover (source: ABS, series ID A3348585R), divided by the consumer price index.

```{r}
#| fig-cap: "Consumption"
df |> 
  ggplot(aes(x = date, y = consumption)) +
  geom_line() +
  geom_vline(xintercept = df$date[covid_index], color = "red", linetype = "dashed") +
  annotate("text", x = df$date[covid_index+8], y = min(df$consumption), label = "Covid-19", color = "red") +
  theme_bw()
```

- **Real interest rate**: over-night cash rate nets inflation (source: RBA, series ID FIRMMCRI and GCPIAGSAQP).

```{r}
#| fig-cap: "Real interest rate"
df |> 
  ggplot(aes(x = date, y = real_interest_rate)) +
  geom_line() +
  geom_vline(xintercept = df$date[covid_index], color = "red", linetype = "dashed") +
  annotate("text", x = df$date[covid_index+8], y = min(df$real_interest_rate), label = "Covid-19", color = "red") +
  theme_bw()
```

- **Hours worked**: total hours worked (source: ABS, series ID A85389611R).

```{r}
#| fig-cap: "Hours worked"
df |> 
  ggplot(aes(x = date, y = hours_worked)) +
  geom_line() +
  geom_vline(xintercept = df$date[covid_index], color = "red", linetype = "dashed") +
  annotate("text", x = df$date[covid_index+8], y = min(df$hours_worked), label = "Covid-19", color = "red") +
  theme_bw()
```

The first two variables (productivity and stock prices) are chosen to identify the optimism shock, the last three variables (consumption, real interest rate and hours worked) are chosen to capture the business cycle dynamics as in standard macroeconomic theory.

To capture multiplicative relationships in macroeconomic time series and percentage change interpretation, all variables are log transformed (except for real interest rate). A preview of first 6 rows of the concatenated dataset is shown below.

```{r}
library(kableExtra)
library(tidyverse)

kable(head(df), digits = 4, align = "c")
```

## ACF and PACF plot

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-acf-plot
#| fig-cap: "ACF Plots"
Y = df |> 
  select(-date) |> 
  ts(start = c(year(min(df$date)), quarter(min(df$date))), frequency = 4)

N = ncol(Y)

par(mfrow = c(2, N - 2))
for (i in 1:ncol(Y)) {
  acf(Y[, i], main = colnames(Y)[i])
}
```

The autocorrelation function (ACF) plot shows all variables have a consistent pattern of autocorrelation, this suggests that the time series are non-stationary. Stationarity is formally tested using the Augmented Dickey-Fuller test in the next section.

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-pacf-plot
#| fig-cap: "PACF Plots"
par(mfrow = c(2, N - 2))
for (i in 1:ncol(Y)) {
  pacf(Y[, i], main = colnames(Y)[i])
}
```

The partial autocorrelation function (PACF) plot shows that the partial autocorrelation of all variables is significant at lag 1, real interest rate is also significant at lag 2. Therefore, choosing a lag length for the VAR model greater than or equal to 2 is reasonable, following convention for quarterly data, we will adopt a lag length of 4 for the VAR model.

## Augmented Dickey-Fuller test

### Level

All five variables are non-stationary at 5% significance level base on the Augmented Dickey-Fuller test.

```{r}
p_value   = sapply(1:N, \(i) adf.test(Y[, i])$p.value)
variable  = colnames(Y)

adf       = cbind(variable, p_value) |> 
  data.frame() |> 
  mutate(p_value = round(as.numeric(p_value), 4)) |> 
  mutate(non_stationary = as.numeric(p_value > 0.05))

kable(adf, digits = 4)
```

### First difference

Applying Augmented Dickey-Fuller test to the first difference of the variables, we find that all variables are stationary at 5% significance level. Therefore, all variables are integrated of order one $I(1)$ and it is reasonable to put them in a VAR system without further transformation.

```{r}
Y_diff    = diff(Y)
p_value   = sapply(1:N, \(i) adf.test(Y_diff[, i])$p.value)
variable  = colnames(Y)

cbind(variable, p_value) |> 
  data.frame() |> 
  mutate(p_value = round(as.numeric(p_value), 4)) |> 
  mutate(non_stationary = as.numeric(p_value > 0.05)) |> 
  kable(digits = 4)

```

# Model

## Specification

Adopting notations from @rubio2010structural, the SVAR model is specified as follows.

The endogenous variables are

$$
\*y_t = [\text{productivity}_t,\ \text{stock prices}_t,\ \text{consumption}_t,\ \text{real interest rate}_t,\ \text{hours worked}_t]'
$$

### Structural form

$$
\begin{align*}
\*y_t' \*A_0 &= \sum_{l=1}^{p} \*y_{t-l}'\*A_l + \*c + \e_t' \\
\e_t | \*Y_{t-1} &\overset{\text{iid}}{\sim} \mathcal{N}_N(\*0, \*I)
\end{align*}
$$

where $\*y_t$ is an $N\times1$ vector of endogenous variables, $\e_t$ is an $N\times1$ vector of exogenous structural shocks, $\*A_l$ is an $N\times N$ matrix of parameters with $\*A_0$ invertible, $\*c$ is an $1\times N$ vector of parameters, and $p$ is the lag length, and $T$ is the sample size. This can be compactly written as

$$
\begin{align*}
\*y_t' \*A_0 &= \*x_t' \*A_+ + \e_t'
\end{align*}
$$

where $\*A_+ = [\*A_1'\ \cdots\ \*A_p'\ \hspace{4mm}\*c']$ and $\*x_t = [\*y_{t-1}'\ \cdots\ \*y_{t-p}'\hspace{4mm}\ 1]$. The dimension of $\*A_+$ is $K\times N$ where $K=Np+1$.

In matrix form,

$$
\begin{align*}
\*Y \*A_0 &= \*X \*A_+ + \e \\
\e | \* X &\sim \mathcal{MN}_{T\times N}(\*0, \*I_N, \*I_T)
\end{align*}
$$

where $\*Y = [\*y_1\ \cdots\ \*y_T]'$,\ $\*X = [\*x_1\ \cdots\ \*x_T]'$, and $\e = [\e_1\ \cdots\ \e_T]'$.

The matrices $\*A_0$ and $\*A_+$ are structural parameters.

### Reduced form

$$
\begin{align*}
\*y_t' &= \*x_t' \*B + \*u_t' \\
\*u_t | \*Y_{t-1} &\overset{\text{iid}}{\sim} \mathcal{N}_N(\*0, \*\Sigma)
\end{align*}
$$

where $\*B = \*A_ + \*A_0^{-1},\ \*u_t' = \e_t' \*A_0^{-1}$, and

$$
\*\Sigma = \mathbb{E}[\*u_t\*u_t'] = (\*A_0^{-1})' (\*A_0^{-1}) = (\*A_0 \*A_0')^{-1}
$$

In matrix form,

$$
\begin{align*}
\*Y &= \*X \*B + \*u \\
\*u | \* X &\sim \mathcal{MN}_{T\times n}(\*0, \*\Sigma, \*I_T)
\end{align*}
$$

where $\*u = [\*u_1\ \cdots\ \*u_T]'$.

The matrices $\*B$ and $\*\Sigma$ are reduced-form parameters.

### Orthogonal reduced-form parameterization

Since SVAR model are identified up to a rotation matrix $\*Q$, we can explicitly specified the reduced-form model as

$$
\*y_t' = \*x_t' \*B + \e_t' \*Q' h(\*\Sigma)
$$

Where $\*Q'h(\*\Sigma) = \*A_0^{-1}$ or $\*Q=h(\*\Sigma) \*A_0$, and $h$ is some differentiable decomposition, one specific choice is the upper triangular Cholesky decomposition. 

Then, we can define a mapping $f_h$ between the reduced-form parameters $(\*B, \*\Sigma, \*Q)$ and structural-form parameters $(\*A_0, \*A_+)$ as

$$
\begin{align*}
f_h(\*A_0, \*A_+) &= (
  \underbrace{\*A_+ \*A_0^{-1}}_\*B,
  \underbrace{(\*A_0 \*A_0')^{-1}}_{\*\Sigma},
  \underbrace{h((\*A_0 \*A_0')^{-1}) \*A_0}_\*Q
  ) \\
f_h^{-1}(\*B, \*\Sigma, \*Q) &= (
  \underbrace{h(\*\Sigma)^{-1} \*Q}_{\*A_0},
  \underbrace{\*B h(\*\Sigma)^{-1} \*Q}_{\*A_+}
  )
\end{align*}
$$

## Algorithm

### Reduced form

The first step is to sample the reduced-form parameters ($\*B$, $\*\Sigma$). Adopting the conjugate Normal-Inverse-Wishart prior,

$$
\begin{align*}
\*B|\*\Sigma &\sim \mathcal{MN}_{K\times N}(\underline{\*B}, \underline{\*V},\*\Sigma) \\
\*\Sigma &\sim \mathcal{IW}_N(\underline{\*S}, \underline{\nu})
\end{align*}
$$

and let

$$
\begin{align*}
\hat{\*B} &= (\*X'\*X)^{-1}\*X'\*Y \\
\*R &= (\*Y-\*X\hat{\*B})'(\*Y-\*X\hat{\*B})
\end{align*}
$$

the conjugate posterior distribution can be derived from 

\begin{align*}
p(\*B,\*\Sigma|\*Y) 

&\propto |\*\Sigma|^{-T/2}\exp\left\{ -\frac{1}{2}\text{tr}\left[ \*\Sigma^{-1}(\*Y-\*X\*B)'(\*Y-\*X\*B) \right] \right\} \\
&\quad\times |\*\Sigma|^{-T/2}\exp\left\{ -\frac{1}{2}\text{tr}[\*\Sigma^{-1}(\*B-\underline{\*B})'\underline{\*V}^{-1}(\*B-\underline{\*B})] \right\} \\
&\quad\times |\*\Sigma|^{-(\underline\nu+N+1)/2}\exp\left\{ -\frac{1}{2} \text{tr}(\underline{\*S}\*\Sigma^{-1}) \right\} \\

&\propto |\*\Sigma|^{-T/2}\exp\left\{ -\frac{1}{2}\text{tr}\left[ \left( \*R+(\*B-\hat{\*B})'\*X'\*X(\*B-\hat{\*B}) \right) \*\Sigma^{-1} \right] \right\} \\
&\quad\times |\*\Sigma|^{-T/2}\exp\left\{ -\frac{1}{2}\text{tr}[(\*B-\underline{\*B})'\underline{\*V}^{-1}(\*B-\underline{\*B})\*\Sigma^{-1}] \right\} \\
&\quad\times |\*\Sigma|^{-(\underline\nu+N+1)/2}\exp\left\{ -\frac{1}{2} \text{tr}(\underline{\*S}\*\Sigma^{-1}) \right\} \\

&\propto |\*\Sigma|^{-T/2}\exp\left\{ -\frac{1}{2} \text{tr}\left[ \left( (\*B-\hat{\*B})'\*X'\*X(\*B-\hat{\*B})+(\*B-\underline{\*B})'\underline{\*V}^{-1}(\*B-\underline{\*B}) \right) \*\Sigma^{-1} \right] \right\} \\
&\quad\times |\*\Sigma|^{-(\underline\nu+T+N+1)/2}\exp\left\{ -\frac{1}{2}\text{tr}\left[ (\*R+\underline{\*S})\*\Sigma^{-1} \right] \right\} \\

&=|\*\Sigma|^{-T/2}\exp\left\{ -\frac{1}{2} \text{tr}\left[ (\*B-\overline{\*B})'\overline{\*V}^{-1}(\*B-\overline{\*B}) \*\Sigma^{-1} \right] \right\} \\
&\quad\times |\*\Sigma|^{-(\underline\nu+T+N+1)/2}\exp\left\{ -\frac{1}{2}\text{tr}\left[ (\*R+\underline{\*S}+\hat{\*B}'\*X'\*X\hat{\*B}+\underline{\*B}'\underline{\*V}^{-1}\underline{\*B}-\overline{\*B}'\overline{\*V}^{-1}\overline{\*B})\*\Sigma^{-1} \right] \right\} \\

&=|\*\Sigma|^{-T/2}\exp\left\{ -\frac{1}{2} \text{tr}\left[ (\*B-\overline{\*B})'\overline{\*V}^{-1}(\*B-\overline{\*B}) \*\Sigma^{-1} \right] \right\} \\
&\quad\times |\*\Sigma|^{-(\underline\nu+T+N+1)/2}\exp\left\{ -\frac{1}{2}\text{tr}\left[ (\*Y'\*Y+\underline{\*S}+\underline{\*B}'\underline{\*V}^{-1}\underline{\*B}-\overline{\*B}'\overline{\*V}\overline{\*B})\*\Sigma^{-1} \right] \right\} \\

&=p(\*B|\*\Sigma,\*Y)\times p(\*\Sigma|\*Y)
\end{align*}

Therefore, the posterior distribution of the reduced-form parameters is given by

$$
\begin{align*}
\*B|\*\Sigma,\*Y &\sim \mathcal{MN}_{K\times N}(\overline{\*B}, \overline{\*V},\*\Sigma) \\
\*\Sigma|\*Y &\sim \mathcal{IW}_N(\overline{\*S}, \overline{\nu})
\end{align*}
$$

where

$$
\begin{align*}
\overline{\*B} &= \overline{\*V}(\*X'\*Y+\underline{\*V}^{-1}\underline{\*B}) \\
\overline{\*V} &= (\*X'\*X+\underline{\*V}^{-1})^{-1} \\
\overline{\*S} &= \underline{\*S}+\*Y'\*Y+\underline{\*B}'\underline{\*V}^{-1}\underline{\*B}-\overline{\*B}'\overline{\*V}^{-1}\overline{\*B} \\
\overline{\nu} &= \underline{\nu}+T
\end{align*}
$$

### Structural form

To perform zero and sign restrictions, we need an algorithm to sample from the posterior distribution of the structural parameters ($\*A_0$, $\*A_+$) conditional on the zero and sign restrictions.

However, the set of structural parameters satisfying the zero restrictions is of Lebesgue measure zero in the set of all structural parameters (akin to $\mathbb{P}(X=x)=0$ for continuous $X$). Luckily, we can sample the set of structural parameters satisfying the sign restrictions conditional on satisfying the zero restrictions.

Here is a high level outline of the algorithm proposed by @arias2018inference:

1. Sample reduced-form parameters $\*B,\*\Sigma\sim \mathcal{NIW}(\overline{\*B},\overline{\*V},\overline{\*S},\overline{\*\nu})$.
2. Sample $\*Q$ conditional on the zero restrictions and set $(\*A_0, \*A_+) = f_h^{-1}(\*B, \*\Sigma, \*Q)$.
3. If the sign restrictions are satisfied, keep $(\*A_0, \*A_+)$ and compute an importance weight, otherwise discard.
4. Repeat steps 1-3 until the desired number of samples is obtained.
5. Resample with replacement using the importance weights.

Where step 5 (importance sampling) is needed to manipulate the density induced by step 1 to the desired Nomral-Generalized-Normal posterior density $\mathcal{NGN}$.

To be explicit, starting with a Uniform-Normal-Inverse-Wishart prior distribution, the importance weight in step 3 is given by (posterior condition on $\*Y$ is dropped for brevity):

$$
\begin{align*}
\frac{\mathcal{NGN}(\*A_0,\*A_+|\mathcal Z, \mathcal S)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q)v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}
&=\frac{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal Z, \mathcal S)v_{f_h}(\*A_0,\*A_+)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q)v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}\\
&\propto\frac{|\text{det}(\*A_0)|^{-(2N+K+1)}}{v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}
\end{align*}
$$

where $\mathcal Z$ denotes zero restrictions and $\mathcal S$ denotes sign restrictions. The numerator is the target density, and the denominator is the proposal density from steps 1-2.

### Implementations

For computational efficiency, main functions in the ``bsvarSIGNs`` package are written in C++. The first function is compute posterior parameters as derived above.

```cpp
Rcpp::List niw_cpp(
    const arma::mat& Y,
    const arma::mat& X,
    const Rcpp::List prior
) {

  const int T  = Y.n_rows;
  
  mat prior_B  = as<mat>(prior["B"]);
  mat prior_V  = as<mat>(prior["V"]);
  mat prior_S  = as<mat>(prior["S"]);
  int prior_nu = as<int>(prior["nu"]);
  
  // analytic solutions
  mat prior_V_inv = inv_sympd(prior_V);
  mat post_V_inv  = prior_V_inv + X.t() * X;
  mat post_V      = inv_sympd(post_V_inv);
  mat post_B      = post_V * (X.t() * Y + prior_V_inv * prior_B);
  
  // marginal posterior of Sigma
  mat post_S  = prior_S + Y.t() * Y + prior_B.t() * prior_V_inv * prior_B - post_B.t() * post_V_inv * post_B;
  post_S      = symmatu(post_S);
  int post_nu = prior_nu + T;
  
  return List::create(
    Named("B")  = post_B,
    Named("V")  = post_V,
    Named("S")  = post_S,
    Named("nu") = post_nu
  );
}
```

The second function is to draw from the matrix normal distribution.

```cpp
arma::mat rmatnorm_cpp(
    const arma::mat& M,
    const arma::mat& U,
    const arma::mat& V
) {
  
  mat X = mat(size(M), fill::randn);
  return M + chol(U).t() * X * chol(V);
}
```

To draw from the inverse Wishart distribution, we use the ``iwishrnd`` function from the ``RcppArmadillo`` package. Using these three functions together, we are able to estimate the reduced-form BVAR model.

The following code calculates the volume element $v_{(g\circ f_h)|\mathcal Z}$ in the resampling step for the structural parameters

```cpp
double log_volume_element(
    const arma::field<arma::mat>& Z, //Field of matrices representing narrative restrictions
    const arma::mat&              A0,
    const arma::mat&              Aplus
) {
  colvec vec_structural = join_vert(vectorise(A0), vectorise(Aplus));
  
  // Compute the derivative of the narrative restrictions wrt the structural parameters
  mat Dz  = Df([Z](const colvec& x) { return zero_restrictions(Z, x); }, vec_structural);
  
  // Compute derivative of combined mapping function g_fh_vec wrt the structural parameters
  mat Dgf = Df([Z](const colvec& x) { return g_fh_vec(Z, x); }, vec_structural);
  
  // Compute the null space of the derivative of the narrative restrictions
  mat DN  = Dgf * null(Dz);
  
  // Logarithm of determinant, then take the real part
  return 0.5 * log_det(DN.t() * DN).real();
}
```

### Comparison with PFA

The PFA uses a loss function to find a rotation matrix $\*Q$ that satisfies the zero restrictions and satisfies or close to satisfying the sign restrictions, thus it is not an exact solution. In comparison, the importance sampler algorithm is an exact solution satisfying all the restrictions.

Mathematically, the PFA solves the following minimization problem:

$$
\begin{aligned}
q^* &= {\arg\min}_q \Psi(q) \quad s.t. (1)\ q'q=1,\ (2)\ R_{zero}q=0
\end{aligned}
$$

where $q$ is a column of $\*Q$. This step is analogous to steps 2-3 of the importance sampler algorithm. Intuitively, by minimizing $\Psi(q)$ we can find something close to satisfying the sign restrictions, condition (1) makes sure $\*Q$ is orthogonal and condition (2) makes sure $\*Q$ satisfies the zero restrictions.

Detailed definitions of $\Psi$ and $R_{zero}$ is beyond the scope of this project, but it is evident that the minimization problem gives an unique solution and therefore the PFA is not set-identifying $\*Q$.

### Extension

Besides the zero and sign restrictions, another popular identification scheme proposed by @antolin2018narrative is to impose narrative restrictions on the structural shocks and historical decomposition. For example, restricting a structural shock to be negative during some period, and we will apply this to restrict the optimism shock during the Covid-19 pandemic.

But, the difficulty is that the narrative restrictions $\mathcal{R}$ requires another resample with some importance weight:

$$
\frac{1}{\omega(\*B,\*\Sigma,\*Q)} \propto \frac{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal S, \mathcal R)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q)}
$$

To combine these two identifications, we need to calculate the importance weight when **all** of the three restrictions, zero, sign, and narrative, are present. Intuitively, it is the product of the importance weight for the narrative restrictions and the importance weight for the zero and sign restrictions.

Here is an preliminary proof. The insight is to notice that $\frac{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal Z, \mathcal S, \mathcal R)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal S, \mathcal R)}\propto 1$ since adding zero restrictions does not change the likelihood and only truncates uniform conditional prior $\pi(\*Q|\*B,\*\Sigma)$.

$$
\begin{align*}
\frac{\mathcal{NGN}(\*A_0,\*A_+|\mathcal Z, \mathcal S, \mathcal R)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q)v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}
=&\frac{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal Z, \mathcal S, \mathcal R)v_{f_h}(\*A_0,\*A_+)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q)v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}\\
=&\frac{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal S, \mathcal R)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q)}
\frac{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal Z, \mathcal S, \mathcal R)}{\mathcal{UNIW}(\*B,\*\Sigma,\*Q|\mathcal S, \mathcal R)}\\
&\times
\frac{v_{f_h}(\*A_0,\*A_+)}{v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}\\
\propto&\frac{1}{\omega(\*B,\*\Sigma,\*Q)}\frac{v_{f_h}(\*A_0,\*A_+)}{v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}\\
\propto&\frac{1}{\omega(\*B,\*\Sigma,\*Q)}\frac{|\text{det}(\*A_0)|^{-(2N+K+1)}}{v_{(g\circ f_h)|\mathcal Z}(\*A_0,\*A_+)}
\end{align*}
$$

The following code calculates $\omega(\*B,\*\Sigma,\*Q)$ for the narrative restrictions

```cpp
double weight_narrative(
    const int&                    T,
    arma::mat                     sign_narrative,
    const arma::cube&             irf
) {
  
  const int M         = 1e+04;  // number of draws to approximate normal distribution
  
  double    n_success = 1.0e-15;
  
  cube      Z(irf.n_rows, sign_narrative.col(5).max() + 1, M, fill::randn);
  
  // change all starting period to the first period
  // since we use the same M draws for all narrative restrictions
  sign_narrative.col(4) = ones(sign_narrative.n_rows, 1);
  
  for (int m=0; m<M; m++) {
    if (match_sign_narrative(Z.slice(m), sign_narrative, irf)) {
      n_success++;
    }
  }
  return M / n_success;
}
```

@arias2018inference provides a sufficient condition for the partial identification of the first $k$ shocks: at least $N-j$ zero restrictions and at least 1 sign restriction on the IRFs to the $j$-th shock for $1\le j\le k$. At this stage, there is no theoretical result how to incorporate narrative restrictions in this sufficient condition. Generally, narrative restrictions are used to sharpen the inference (i.e. narrower highest density interval) and does not make the structural shocks exactly identified.

If given some narrative restrictions there is an uniquely identified $\*Q^*$, step 3 of the importance sampling algorithm would never work, since the probability a randomly sampled $\*Q=\*Q^*$ is $0$. In the later empirical work, we show the algorithm can still work with narrative restrictions, therefore at least the narrative restrictions considered here are not binding. But, the advantage of narrative (and sign) restrictions is that they are less controversial and the result can be agreed upon by more researchers.

## Simulation

### Reduced form

To test the validity of our code, we simulate 1,000 observations from a bi-variate Gaussian random walk process with the covariance matrix equal to the identity matrix of order 2.

Then, we compute 1,000 posterior draws from a SVAR model with a constant term and one lag, using the default identification scheme: positive sign restrictions on the diagonal of $\*A_0$.

```{r include=FALSE}
# devtools::install_github("bsvars/bsvarSIGNs")
library(bsvarSIGNs)

set.seed(123)

# simulate data
N = 2
T = 1000

sim_Y = apply(matrix(rnorm(T*N), ncol = N), 2, cumsum)

# Initialize a Bayesian structural VAR model using simulated data `sim_Y` and lag order `p = 1`
specification  = specify_bsvarSIGN$new(sim_Y, p = 1)
posterior      = estimate(specification, S = 1000)
```

The posterior mean of the $\*B$ is

```{r}
mean_B = posterior$posterior$A |> 
  apply(c(1, 2), mean) |> 
  t()

rownames(mean_B) = c("y1_lag", "y2_lag", "constant")
colnames(mean_B) = c("y1", "y2")

kable(mean_B, digits = 4)
```

The posterior mean of $\*\Sigma$ is

```{r}
mean_Sigma = apply(posterior$posterior$Sigma, c(1, 2), mean)

rownames(mean_Sigma) = c("y1", "y2")
colnames(mean_Sigma) = c("y1", "y2")

kable(mean_Sigma, digits = 4)
```

They are close to the true values of the simulated data.

### Structural form

Suppose the true structural model is

$$
\begin{bmatrix}
-1 & 1\\
1 & 0\\
\end{bmatrix}
\begin{bmatrix}
y_{1,t}\\
y_{2,t}\\
\end{bmatrix}
=
\begin{bmatrix}
-1 & 1\\
1 & 0\\
\end{bmatrix}
\begin{bmatrix}
y_{1,t-1}\\
y_{2,t-1}\\
\end{bmatrix}
+
\begin{bmatrix}
\varepsilon_{1,t}\\
\varepsilon_{2,t}\\
\end{bmatrix},
\begin{bmatrix}
\varepsilon_{1,t}\\
\varepsilon_{2,t}\\
\end{bmatrix}
\sim
\mathcal{N}(
\begin{bmatrix}
0 \\
0
\end{bmatrix}
,
\begin{bmatrix}
1 & 0 \\
0 & 1 \\
\end{bmatrix}
)
$$

equivalently, we can simulate 1,000 observations from the reduced-form

$$
\begin{bmatrix}
y_{1,t}\\
y_{2,t}\\
\end{bmatrix}
=
\begin{bmatrix}
1 & 0\\
0 & 1\\
\end{bmatrix}
\begin{bmatrix}
y_{1,t-1}\\
y_{2,t-1}\\
\end{bmatrix}
+
\begin{bmatrix}
u_{1,t}\\
u_{2,t}\\
\end{bmatrix},
\begin{bmatrix}
u_{1,t}\\
u_{2,t}\\
\end{bmatrix}
\sim
\mathcal{N}(
\begin{bmatrix}
0 \\
0
\end{bmatrix}
,
\begin{bmatrix}
1 & 1 \\
1 & 2 \\
\end{bmatrix}
)
$$

```{r include=FALSE}
# devtools::install_github("bsvars/bsvarSIGNs")
library(bsvarSIGNs)

set.seed(123)

# simulate data
N = 2
T = 1000

U     = mvtnorm::rmvnorm(T, rep(0, N), matrix(c(1, 1, 1, 2), N, N))
sim_Y = apply(U, 2, cumsum)

sign_irf = array(matrix(c(0, 1, 1, 1), 2, 2), dim = c(2, 2, 1))
zero_irf = matrix(c(1, 0, 0, 0), 2, 2)

specification  = specify_bsvarSIGN$new(sim_Y, 
                                       p = 1, 
                                       sign_irf = sign_irf,
                                       zero_irf = zero_irf
                                       )
posterior      = estimate(specification, S = 1000)
```

Putting zero and sign restrictions on the inverse of the structural matrix

$$
\begin{bmatrix}
-1 & 1\\
1 & 0\\
\end{bmatrix}^{-1}
=
\begin{bmatrix}
0 & 1\\
1 & 1\\
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
0 & +\\
+ & +\\
\end{bmatrix}
$$

Posterior mean of 1,000 draws of the structural matrix is

```{r}
posterior$posterior$B |> 
  apply(c(1, 2), mean) |> 
  t() |> 
  round(4)
```

## Identification

The following restrictions are imposed on the contemporaneous impulse response to identify the optimism shock.

| Productivity | Stock prices | Consumption | Real interest rate | Hours worked |
|--------------|--------------|-------------|--------------------|--------------|
| 0            | Positive     | Unrestricted| Unrestricted       | Unrestricted |

The identification strategy is based on the assumption that the optimism shock positively affects stock prices, and has no contemporaneous effect on productivity.

## Interpretation

Two popular methods to interpret the SVAR model are impulse response function (IRF) and forecast error variance decomposition (FEVD) [@kilian2017structural, Chap 4].

### IRF 

The impulse response function (IRF) of the SVAR model is used to interpret the effect of the optimism shock on the endogenous variables. Specifically, we are interested in whether a positive optimism shock leads to a simultaneous boom in consumption and hours worked (as in the United States).

Mathematically, the response of the $i$-th variable to the $j$-th shock at horizon $k$ is given by the element at row $i$ columns $j$ of $\*\Theta_k$, where $\*\Theta_k$ is defined recursively as

\begin{aligned}&\*\Theta_{0} = \left(\mathbf{A}_{0}^{-1}\right)^{\prime},\quad\*\Theta_{h} = \sum_{l=1}^{h}\bigl(\mathbf{A}_{\ell}\mathbf{A}_{0}^{-1}\bigr)^{\prime}\*\Theta_{h-\ell},\quad\mathrm{} 1 \leq h \leq p,\\&\*\Theta_{h} = \sum_{\ell=1}^{p}\bigl(\mathbf{A}_{\ell}\mathbf{A}_{0}^{-1}\bigr)^{\prime}\*\Theta_{h-\ell},\quad\mathrm{} p < h < \infty.\end{aligned}

### FEVD

The forecast error variance decomposition (FEVD) is used to quantify the relative importance of the optimism shock in explaining the variability of a $h$-step ahead forecast of a particular variable. For example, we will examine the proportion of the variability of consumption and hours worked explained by the optimism shock.

Mathematically, the $i$-th variable's forecast error variance decomposition of the $j$-th shock at horizon $h$ is given by

$$
\text{FEVD}_j^i(h) = \frac{\text{MSFE}_{j}^{i}(h)}{\sum_{n=1}^{N}\text{MSFE}_{j}^{n}(h)}, \quad \text{MSFE}_{j}^{i}(h) = \sum_{l=0}^{h-1} \*\Theta_{ij,l}^{2}
$$

# Results

## United States

```{r include=FALSE}
data(optimism)

zero_irf          = matrix(0, nrow = 5, ncol = 5)
zero_irf[1, 1]    = 1
sign_irf          = array(0, dim = c(5, 5, 1))
sign_irf[2, 1, 1] = 1

specification = specify_bsvarSIGN$new(
  optimism*100,
  p        = 4,
  sign_irf = sign_irf,
  zero_irf = zero_irf
)

posterior = estimate(specification, S = 10000)
irf       = compute_impulse_responses(posterior, horizon = 40)
```

First, we apply the package to the data from the United States, and replicate the results in @arias2018inference.

### History

Historical values of the optimism shock is

```{r us-shock}
optimism_shock = posterior$posterior$shocks[1, , ] |> 
  apply(1, quantile, c(0.16, 0.50, 0.84)) |> 
  t()

bsvars::plot_ribbon(
  optimism_shock,
  probability = 0.68,
  main = "optimism shock",
  bty  = "n",
  xaxt = "n",
  )
abline(h = 0)
# abline(v = covid_index - 4 - 1, lty = 5)

label_index = seq(1, dim(optimism_shock)[1], 20)
axis(1, at = label_index, 
     labels = as.yearqtr(index(optimism))[label_index + 4 + 1])
```

Most of the historical values of the optimism shock are not significantly different from 0, perhaps putting only one zero and one sign restriction is not enough to identify the optimism shock.

### IRF

The impulse response functions are

```{r us-irf}
#| fig-cap: "US IRF (median and 68% highest density interval)"
# devtools::install_github("bsvars/bsvars")

# plot irf of one shock
plot_irf1 = function(irf) {
  N       = dim(irf)[1]
  ylab    = rep("Percent", N)
  ylab[4] = "Percentage Point"
  
  par(mfrow = c(2, 3))
  
  for (i in 1:N) {
    bsvars::plot_ribbon(
      irf[i, 1, , ],
      probability = 0.68,
      main = colnames(Y)[i],
      xlab = "Quarter",
      ylab = ylab[i],
      bty  = "n",
      )
    abline(h = 0)
  }
}

plot_irf1(irf)
```

which is very close to the ones in the original paper. Although consumption and hours worked have a boom in the median IRF, they are not significantly different from 0. Therefore, base on this identification strategy, the optimism shock does not drive business cycles in the US.

### FEVD

```{r}
compute_fevd = function(irf, horizon = 40) {
  N    = dim(irf)[1]
  S    = dim(irf)[4]
  fevd = array(NA,c(N,N,horizon,S))

  for (s in 1:S){
   
    for (i in 1:(horizon)){
      
      for (n in 1:N){
        for (nn in 1:N){
          fevd[n,nn,i,s]  = sum(irf[n,nn,1:i,s]^2)
        }
      }
      fevd[,,i,s]         = diag(1/apply(fevd[,,i,s],1,sum))%*%fevd[,,i,s]
    }
  }

  fevd        = 100*fevd
  class(fevd) = "fevd"
  fevd
}
```

Share of FEVD attributed to the optimism shock at horizon 40 (median 50%) and its 68% highest probability interval (16% and 84%) are

```{r}
fevd = compute_fevd(irf, 40)

fevd_40 = 
  fevd[, 1, 40, ] |> 
  apply(1, quantile, c(0.16, 0.50, 0.84)) |> 
  t() |>
  round(2)

rownames(fevd_40) = colnames(Y)
kable(fevd_40)
```

which is very close to the ones in the original paper. As the median values are relatively small (<20%), the optimism shock does not contribute significantly to the FEVD of consumption and hours worked in the US.

## Australia

All subsequent analysis are performed on the Australian data.

```{r include=FALSE}
zero_irf          = matrix(0, nrow = 5, ncol = 5)
zero_irf[1, 1]    = 1
sign_irf          = array(0, dim = c(5, 5, 1))
sign_irf[2, 1, 1] = 1

specification = specify_bsvarSIGN$new(
  Y*100,
  p        = 4,
  sign_irf = sign_irf,
  zero_irf = zero_irf
)

posterior = estimate(specification, S = 10000)
```

### History

Historical values of the optimism shock is

```{r au-shock}
optimism_shock = posterior$posterior$shocks[1, , ] |> 
  apply(1, quantile, c(0.16, 0.50, 0.84)) |> 
  t()

bsvars::plot_ribbon(
  optimism_shock,
  probability = 0.68,
  main = "optimism shock",
  bty  = "n",
  xaxt = "n",
  )
abline(h = 0)
abline(v = covid_index - 4 - 1, lty = 5)

label_index = seq(1, dim(optimism_shock)[1], 20)
axis(1, at = label_index, 
     labels = as.yearqtr(index(Y))[label_index + 4 + 1])
```

Where the vertical dashed line indicates the period 2020 Q1. Although we have not impose any restriction here, the optimism shock is still significantly negative in 2020 Q1 i.e. the 68% highest density interval does not include zero, this is likely due to the Covid-19 pandemic.

### IRF

The impulse response functions are

```{r au-irf}
#| fig-cap: "Australian IRF (median and 68% highest density interval)"

irf = compute_impulse_responses(posterior, horizon = 40)
plot_irf1(irf)
```

Comparing to the US, the optimism shock has a much smaller effect on consumption and hours worked in Australia, only stock prices has a significant effect in the short run, other variables all have wide and symmetric IRFs around zero. Therefore, base on the same identification strategy, we conclude the optimism shock does not drive business cycles in Australia as well. 

### FEVD

Share of FEVD attributed to the optimism shock at horizon 40 (median 50%) and its 68% highest probability interval (16% and 84%) are

```{r}
fevd = compute_fevd(irf, 40)

fevd_40 = 
  fevd[, 1, 40, ] |> 
  apply(1, quantile, c(0.16, 0.50, 0.84)) |> 
  t() |>
  round(2)

rownames(fevd_40) = colnames(Y)
kable(fevd_40)
```

As the median values are relatively small (<20%), the optimism shock does not contribute significantly to the FEVD of consumption and hours worked in Australia, which is a second evidence that the optimism shock does not drive business cycles in Australia.

## Extension 1: narrative restriction

Now we introduce an additional narrative restriction that

> the optimism shock is **negative** when Covid-19 hits Australia in 2020 Q1.

Since no one see the pandemic coming, and when it happens most people feel pessimistic due to many reasons. For example, being locked down in the house, losing jobs, and feeling uncertain about the future. An Australian national survey conducted in April 2020 [@fisher2020mental] finds 27.6% of people have clinically significant symptoms of depression, and 59.2% reported being more irritable. Therefore, it is reasonable to assume that the optimism shock is negative when Covid-19 hits Australia.

```{r include=FALSE}
zero_irf          = matrix(0, nrow = 5, ncol = 5)
zero_irf[1, 1]    = 1
sign_irf          = array(0, dim = c(5, 5, 1))
sign_irf[2, 1, 1] = 1

# need to subtract the number of lags
sign_narrative    = matrix(c(1, -1, 1, 1, covid_index - 4, 0), ncol = 6)

specification = specify_bsvarSIGN$new(
  Y*100,
  p              = 4,
  sign_irf       = sign_irf,
  sign_narrative = sign_narrative,
  zero_irf       = zero_irf
)

posterior = estimate(specification, S = 10000)
irf       = compute_impulse_responses(posterior, horizon = 40)
```

### History

Historical values of the optimism shock is

```{r au-ext-shock}
optimism_shock = posterior$posterior$shocks[1, , ] |> 
  apply(1, quantile, c(0.16, 0.50, 0.84)) |> 
  t()

bsvars::plot_ribbon(
  optimism_shock,
  probability = 0.68,
  main = "optimism shock",
  bty  = "n",
  xaxt = "n",
  )
abline(h = 0)
abline(v = covid_index - 4 - 1, lty = 5)

label_index = seq(1, dim(optimism_shock)[1], 20)
axis(1, at = label_index, 
     labels = as.yearqtr(index(Y))[label_index + 4 + 1])
```

Where the vertical dashed line indicates the period 2020 Q1. Now all draws of the optimism shock are negative in 2020 Q1 due to the explicit narrative restriction, if we expand the highest density interval to 100% the interval will still not include zero. Comparing with the historical values of the optimism shock without the narrative restriction, here the highest density intervals are narrower but their overall shape are similar.

One interesting observation is that we see another two standard deviation pessimism shock around 2008, we did not impose any explicit narrative restriction here, but the model is likely picking up the effect of the global financial crisis.

### IRF

The impulse response functions are

```{r au-ext-irf}
#| fig-cap: "Australian IRF with narrative restriction (median and 68% highest density interval)"

plot_irf1(irf)
```

Although the median IRFs are slightly different from the previous case, the difference is not significant. After imposing the additional narrative restriction, the optimism shock still has no significant impact of the IRFs. Therefore, the conclusion that optimism shock does not drive business cycles in Australia is robust to the narrative restriction on Covid-19.

### FEVD

Share of FEVD attributed to the optimism shock at horizon 40 (10 years) are

```{r}
fevd = compute_fevd(irf, 40)

fevd_40 = 
  fevd[, 1, 40, ] |> 
  apply(1, quantile, c(0.16, 0.50, 0.84)) |> 
  t() |>
  round(2)

rownames(fevd_40) = colnames(Y)
kable(fevd_40)
```

As the median values are relatively small (<20%), the optimism shock does not contribute significantly to the FEVD of consumption and hours worked in Australia. This is similar the the previous case without the narrative restriction on Covid-19, and is another evidence that the optimism shock does not drive business cycles in Australia.

## Extension 2: hierachical prior $\lambda$

Previous results have been implicitly relied on the choice of the hyperparameter $\lambda$ in the Minnesota prior:

$$
\underline{\*V} = \text{diag}\left[\lambda^2 * \text{rep}(1,p)\otimes\psi \quad 10^6 \right] 
$$

where we set $\lambda = 0.2$ and $\psi$ as the residual variances from independent AR(1) models. Now we impose a Gamma hierachical prior on $\lambda$ such that it peaks at around 0.2, specifically with scale and shape parameters:

$$
\lambda \sim \Gamma\left(\frac{9 + \sqrt{17}}{8}, \frac{8}{5(1 + \sqrt{17})}\right)
$$

@giannone2015prior proposed a Metropolis algorithm to sample a vector of hyperparameters $\gamma=(\mu,\delta,\lambda,\psi)$ where $\mu$ and $\delta$ are for dummy observation priors that will not be discussed here. A simplified version of their algorithm to sample $\lambda$ is as follows:

1. Initialize $\lambda$ by maximizing its log posterior, store the numerical Hessian matrix (scalar) $W$.
2. Sample $\lambda^*\sim\mathcal{N}(\lambda^{(s-1)},W)$.
3. Accept $\lambda^{(s)}=\lambda^*$ with probability $\alpha = \min\left\{ 1, \frac{p(\lambda^*|\*Y)}{p(\lambda^{(s-1)}|\*Y)} \right\}$.

Given the resulting sample $\left\{ \lambda^{(s)} \right\}$, we can then proceed to sample $\*B,\*\Sigma,\*Q$ by first taking a uniform draw from $\left\{ \lambda^{(s)} \right\}$.

The following `R` code implements the above algorithm.

```{r echo=TRUE}
niw_prior = function(Y,
                     p,
                     non_stationary,
                     lambda = 0.2) {
  T = nrow(Y)
  N = ncol(Y)
  K = 1 + N * p
  
  B           = matrix(0, K, N)
  B[1:N, 1:N] = diag(non_stationary)
  
  sigma2                  = sapply(1:N, \(i) summary(lm(Y[2:T, i] ~ Y[1:(T - 1), i]))$sigma^2)
  V                       = matrix(0, K, K)
  V[K, K]                 = 1e+6
  V[1:(K - 1), 1:(K - 1)] = diag(lambda^2 * kronecker((1:p)^-2, sigma2^-1))
  
  S  = diag(sigma2)
  nu = N + 2
  
  list(B = B, V = V, S = S, nu = nu)
}


log_det = function(A) {
  determinant(A, logarithm = TRUE)$modulus
}

dlambda = function(lambda, p, Y, X) {
  
  N     = ncol(Y)
  T     = nrow(Y)
  prior = niw_prior(Y, p, rep(1, N), lambda)
  
  b     = prior$B
  Omega = prior$V
  Psi   = prior$S
  d     = prior$nu
  
  inv_Omega = solve(Omega)
  Bhat      = solve(t(X) %*% X + inv_Omega) %*% (t(X) %*% Y + inv_Omega %*% b)
  ehat      = Y - X %*% Bhat
  
  lprior    = dgamma(
    lambda,
    shape = (9 + sqrt(17)) / 8,
    scale = 8 / (5 * (1 + sqrt(17))),
    log = TRUE
  )
  
  llike     = -N / 2 * log_det(Omega)
  llike     = llike - N / 2 * log_det(t(X) %*% X + inv_Omega)
  A         = Psi + t(ehat) %*% ehat + t(Bhat - b) %*% inv_Omega %*% (Bhat - b)
  llike     = llike - (T + d) / 2 * log_det(A)
  
  return(lprior + llike)
}


sample_lambda = function(S, p, Y, X) {
  init   = 0.2
  result = optim(
    init,
    \(lambda) {
      nlogp = -dlambda(lambda, p, Y, X)
      if (!is.finite(nlogp)) {
        nlogp = 1e+6
      }
      return(nlogp)
    },
    method  = "L-BFGS-B",
    lower = init / 100,
    upper = init * 100,
    hessian = TRUE,
    # control = list(trace = 1, maxit = 1e4)
  )
  
  lambda    = numeric(S)
  lambda[1] = result$par
  dlambda   = result$value
  
  c         = 1
  W         = c * 1 / result$hessian
  
  for (s in 2:S) {
    new_lambda  = rnorm(1, lambda[s - 1], sqrt(W))
    new_dlambda = dlambda(new_lambda, p, Y, X)
    
    if (runif(1) < exp(new_dlambda - dlambda)) {
      lambda[s] = new_lambda
      dlambda   = new_dlambda
    } else {
      lambda[s] = lambda[s - 1]
    }
  }
  
  return(lambda)
}
```

### Convergence diagnostics

We sample 12,000 draws of $\lambda$ and discard the first 2,000 draws as burn-in. From the trace plot, we can see that the chain has converged to a stationary distribution.

```{r}
p      = 4
data   = bsvars::specify_data_matrices$new(Y, p, NULL)
lambda = sample_lambda(12000, p, t(data$Y), t(data$X))[2000:12000]

plot(lambda, type = "l")
```

From the histogram, we can see that the posterior distribution of $\lambda$ is centered around to 0.13, which shows the default choice of 0.2 is reasonable. On the other hand, since 0.13 is smaller than 0.2, the hirechical prior suggests more shrinkage than the default choice.

```{r}
hist(lambda, breaks = 100)
```

### IRF

```{r include=FALSE}
zero_irf          = matrix(0, nrow = 5, ncol = 5)
zero_irf[1, 1]    = 1
sign_irf          = array(0, dim = c(5, 5, 1))
sign_irf[2, 1, 1] = 1

# need to subtract the number of lags
sign_narrative    = matrix(c(1, -1, 1, 1, covid_index - 4, 0), ncol = 6)

specification = specify_bsvarSIGN$new(
  Y*100,
  p              = 4,
  sign_irf       = sign_irf,
  sign_narrative = sign_narrative,
  zero_irf       = zero_irf
)

hyper      = kronecker(t(rep(1, length(lambda))), specification$prior$hyper)
hyper[3, ] = lambda
specification$prior$hyper = hyper

posterior = estimate(specification, S = 10000)
irf       = compute_impulse_responses(posterior, horizon = 40)
```

The impulse response functions are

```{r au-ext2-irf}
#| fig-cap: "Australian IRF with hierachical prior (median and 68% highest density interval)"

plot_irf1(irf)
```

For simplicity, we only report the IRF in this extension. As the IRF is very similar to the previous results, our interpretation remains the same. By imposing the hierarchical prior on $\lambda$, we let the data to choose the level of shrinkage and this shows the conclusion is robust to a more flexible prior.

# Conclusion

In this report, we implement the sign and zero restrictions algorithm from @arias2018inference, extend it with narrative restrictions from @antolin2018narrative and hierarchical priors from @giannone2015prior. We replicate the result in the original paper using the US data, then apply the algorithm to the Australian data and found that the conclusion of optimism shock does not drive business cycles holds in the Australian economy. Furthermore, the conclusion is robust to (1) a narrative restriction on pessimism shock during the Covid-19 pandemic and (2) imposing a hierarchical prior on the shrinkage parameter $\lambda$.

# References