Home

`

Panel Analysis of Nonstationarity in Idiosyncratic and Common Components

Steve Bronder

2014-10-21

The purpose of this package is to perform the Panel Analysis of Nonstationarity in Idiosyncratic and Common Components from Bai and Ng (2004,2010). When working with large dimensional panels, standard pooled and aggregated nonstationarity tests tend to over-reject null hypothesis due to:

Curse of dimensionality
Cross-Correlation in panel structure
Weak strength to Large N or large T

Instead of testing the data directly, PANIC performs a factor model to derive the common and idiosyncratic components of the panel. By using the BIC3 from Bai and Ng (2004) it is possible to determine the number of common components in panels that reduce cross correlation in the error term. In this vignette we will perform PANIC on three aggregate levels of National Income and Product Accounts in order to test these aggregates for aggregation bias

Vignette Info

This vignette will use the functions panic10() and panic04() availabe through PANICr package. This package can be downloaded from github by using the install_github() function available in the package devtools.


install.packages("devtools")
library(devtools)
install_github("stevo15025/PANICr")

These functions perform a factor model on each level of aggregation, derive the common and idiosyncratic components, and then perform several pooled test statistics. One of several benefits of PANIC is that by reducing cross-correlation we allow valid pooling of individual statistics and so panel tests can be run that have reasonable strength.

Performing the factor analysis using BIC3, the criteria for determining the number of factors in our approximate factor model, allows us to determine whether the nonstationarity is pervasive, variable specific, or both.

Data

The data we use is gathered from the Price Indexes for Personal Consumption Expenditures by Type of Product available from the BEA. The data is monthly from 1959 to 2013¹. At this point we run the data through X-13 ².

After extracting each sector we divide them up into three seperate levels of aggregation from highest level of aggregation to lowest. To turn this dataset into year on year inflation we perform \(log(p_{t}/p_{t-12})\). The data is available already cleaned and manipulated as NIPAagg1, NIPAagg2, and NIPAagg3, respectively. The dimensions of the aggregates are (N=12,T=639), (N=46,T=639), (N=160,T=639), respectively.

Model

Consider a factor analytic model:

\(X_{it} = D_{it} + \lambda_{i}' F_{t} + e_{it}\)

Where \(D_{it}\) is a polynomial trend function, \(F_{t}\) is an \(r\times{1}\) vector of common factors, and \(\lambda'_{i}\) is a vector of factor loadings. The panel \(X_{it}\) is the sum of a deterministic component \(D_{it}\) , a common component \(\lambda_{i}' F_{t}\), and an error \(e_{it}\) that is largely idiosyncratic. A factor model with \(N\) variables has \(N\) idiosyncratic components, but a smaller number of common factors.

\(D_{it}\) can be modeled by \(P\). In PANIC 2004, When the number of common factors is greater than one, \(P=1\) and the deterministic trend has an individual specific fixed effect and time trend. When the number of common factors is equal to one, \(P=0\) is an individual specific fixed effect. When the number of common factors is zero, \(P=0\) is neither.

PANIC 2010 examines the data with ADF models A, B, and C. A assumes no deterministic component, B assumes a constant to allow for a fixed effect, and C allows a constant and a trend. Note that this is different than P as P is a data generating process while Models A, B, and C impose these constraints inside of the ADF test.

The benefit of this factor model is that, if the number of factors has been correctly determined, the error term will be largely idosyncratic and the common components will explain the largest variance of the data. To determine the approximate number of factors we use the BIC3 from Bai and Ng (2002) such that:

\(BIC3 = V(k,\hat{F}^k)+k\hat{\sigma}^2 + \frac{(N+T-k)ln(NT)}{NT}\)

\((k,\hat{F}^k)\) is the average residual variance when k factors are assumed for each cross-section unit. \(\hat{\sigma}^2\) is the mean of the error term squared over N and T.

Once we have this model we perform ADF style pooled tests on the idiosyncratic and common components. panic04 and panic10 ask for nfac, the number of estimated factors, k1, the maximum lag allowed in the ADF test, and jj, the criteria to determine the number of factors in our approximate factor model. To determine the lag of the ADF test Bai and Ng (2002) suggest \(4(\sqrt{\frac{T}{100}})\).

I utilize Bai and Ng’s (2002) third information criterion to determine the number of common factors. nfac is weak to underestimation so it is suggested to overestimate the number of factors. jj is an Integer 1 through 8. Choices 1 through 7 are respectively, IC(1), IC(2), IC(3), AIC(1), BIC(1), AIC(3), and BIC(3), respectively. Choosing 8 makes the number of factors equal to the number of columns whose sum of eigenvalues is less than or equal to .5. panic10() also has the option to run models on demeaned or non-demeaned data (TRUE or FALSE) which will return models A and B in the first case and C in the second.

PANIC 2004

With this information it is now appropriate to start running our tests.

  library(PANICr)
  data(NIPAagg1)
  data(NIPAagg2)
  data(NIPAagg3)
agg1.04 <- panic04(NIPAagg1,9,7,8)

Aggregate One Results from PANIC 2004

Test	Values
Pooled Demeaned	151.287625586434	18.3723862242089
Pooled Idiosyncratic	131.157425663919	15.4668421381828
Pooled Cointegration test	197.117010015667	24.9872880834621

Common Test
-5.719
-4.345

Above is what all of the results will look like for panic04(). These tests are based on ADF tests and so our null hypothesis is non-stationarity and our alternative is stationarity. Pooled Demeaned is our pooled test statistic on the demeaned dataset. With a critical value of 2.87 we reject this test statistic and conclude stationarity.

Our pooled idiosyncratic test is the pooled test on the idiosyncratic component from our factor analysis. This test has a critical value of 1.64 (Bai and Ng(2004)) at the .05 percent level and so we reject our null and conclude stationarity. For the Pooled Cointegration test we assume a null that the estimated number of common factors is correct and an alternative of the number of common factors chosen is not equal to the true number of common factors.

If we have properly estimated the number of common factors then there should be no cointegration inside of our common components. Cointegration is a problem because it puts our ADF tests on the common components into question. For this test statistic we refer back to the table on page 1135 of Bai and Ng (2004). For convenince, I have posted these critical values below.

Critical Values for Cointegration test

m	.01	.05	.10
1	-20.151	-13.730	-11.022
2	-31.621	-23.535	-19.923
3	-41.064	-32.296	-28.399
4	-48.501	-40.442	-36.592
5	-58.383	-48.617	-44.111
6	-66.978	-57.040	-52.312

And so for our above results we look at m = 2, because we have two common components. We reject the null at the .05 percent level, but do not reject at the .01 percent level. Our test on the common components has a critical value of -2.86 at the .05 percent level. For our example above we reject the null and assume stationarity. Below we repeat this process for aggregates two and three.

agg2.04 <- panic04(NIPAagg2,9,7,8)

Aggregate Two Results from PANIC 2004

Test	Values
Pooled Demeaned	611.71868977349	38.8866151821883
Pooled Idiosyncratic	522.753968510507	32.2555763707432
Pooled Cointegration test	490.814016498057	29.8749129074792

Common Test
-6.282
-5.915
-4.684

agg3.04 <- panic04(NIPAagg3,5,7,3)

Aggregate Three Results from PANIC 2004

Test	Values
Pooled Demeaned	2115.1699605466	73.452641227304
Pooled Idiosyncratic	1757.4869117784	58.9466781059138
Pooled Cointegration test	2799.94347308076	101.223874316303

Common Test
-5.936
-6.874
-7.459
-7.442
-6.794

PANIC 2010

In 2010 Bai and Ng released two new tests. One test estimates the pooled autoregressive coefficient, and one uses a sample moment in order to account for structural breaks. The estimate of the pooled autoregressive coefficient is based on Moon and Perron and from here on called MP. This test has three different models dubbed A, B, and C.

Model C assumes an incidental trend model as well as a fixed effect, Model B imposes a fixed effect, and Model A assumes no incendental trend or fixed effect. These tests are are asymptotically normal and reject our null hypothesis of nonstatinarity at postive or negative 1.96.

The function panic10() also includes a pooled autoregressive test that is based on a bias corrected pooled autoregressive coefficient for the idiosyncratic errors estimated by PANIC. This test is asymptotically normal and based on a demeaned data generating process (DGP).

The test that accounts for structural breaks is known as the PMSB test as it is a panel version of the Sargan-Bhargava test (MSB). An interesting thing to note about this test is the the critival value is degenerative. This means that instead of rejecting the null of stationarity after we go past a critical value we reject the null if we go below a critical value.

The function panic10() contains a parameter demean. When this is set to true the function using a data generating process (DGP) that demeans the data and only runs model C of the MP tests, the pooled tests, PMSB, and the LM test from original PANIC (2004).

Setting this parameter to false runs a DGP that does not demean the data and returns model A and B of the MP test, PMSB, and the pooled ADF test of PANIC 2004. For each test you receive two test statistics. As common with tests of nonstationarity, we must reject both test statistics in order to conclude stationarity.

agg1.10.d <- panic10(NIPAagg1,12,7,7,TRUE)

Aggregate One Results from PANIC 2010 Demeaned Tests

Pool Test	P	MP Test	Model C
Pa	2.67946018164147	ta	-3.76959077124406
Pb	4.23816556345715	tb	-2.94085466263801

PMSB	rho1	04 Pool LM
5.076	0.9962	-1.199

As an example, for the demeaned tests above, we do not reject the null for the PMSB test, but we do reject for MP model C test and the pooled test. However, for PANIC 2004 we do not reject the null. This is most likely due to the fact the the LM does not take into account the structural shift in the data. Below are the results for aggregates two and three as well as the asymptotic critical percentiles for the PMSB. I was unable to find the critical values for Bai and Ng’s 2010 test, so we use the critical values from Stock (1990) for the MSB test.

Asymptotic critical percentiles for the PMSB

Percentile	Demeaned	Detrended
.025	.17405	.015250
.05	.19144	.16449
.10	.21426	.18050

agg2.10.d <- panic10(NIPAagg2,12,7,7,TRUE)

Aggregate Two Results from PANIC 2010 Demeaned Tests

Pool Test	P	MP Test	Model C
Pa	-25.8547458418636	ta	-2.26642977540598
Pb	-10.3350902614155	tb	-1.10444796408497

PMSB	rho1	04 Pool LM
-4.069	0.9985	27.98

agg3.10.d <- panic10(NIPAagg3,12,7,8,TRUE)

Aggregate Three Results from PANIC 2010 Demeaned Tests

Pool Test	P	MP Test	Model C
Pa	-36.7995444340426	ta	-3.4444255000788
Pb	-15.0178637121261	tb	-1.58906392647526

PMSB	rho1	04 Pool LM
-6.039	0.9985	52.06

agg1.10.nd <- panic10(NIPAagg1,12,7,8,FALSE)

Aggregate One Results from PANIC 2010 Non-Demeaned Tests

MP	Model A	Model B
ta	-18.6986081207265	0.496952725573856
tb	-5.83235690939432	0.242043809710246

PMSB	rho1	04 Pool ADF
-2.088	1	14.97

Above are the results of PANIC 2010’s non-demeaned DGP for aggregate one. We reject for the PMSB test and MP test Model A, and pooled ADF test for PANIC 2004. However, we do not reject for model B.

agg2.10.nd <- panic10(NIPAagg2,12,7,8,FALSE)

Aggregate Two Results from PANIC 2010 Non-Demeaned Tests

MP	Model A	Model B
ta	-41.6496402167829	-0.687013376283987
tb	-11.6726242443272	-0.243108562817226

PMSB	rho1	04 Pool ADF
-3.753	0.9996	30.84

agg3.10.nd <- panic10(NIPAagg3,12,7,8,FALSE)

Aggregate Three Results from PANIC 2010 Non-Demeaned Tests

MP	Model A	Model B
ta	-55.4162521481366	-2.29876504766966
tb	-15.7271993582764	-0.714690391494631

PMSB	rho1	04 Pool ADF
-5.146	0.9991	57.72

Interpreting Results

What does this mean? All of these tests are on the idiosyncratic component so can only tell us whether or not nonstationarity lies outside of the common component. It is important to run panic04() in order to test whether or not the common compoenents are nonstationary.

MCMC functions for PANIC 2004

One thing we may be curious about is the distribution of each of our test statistics. For example, since we are studying aggregation bias, if the probability of rejecting the null aggregate one is much higher than for aggregate two or three this may be a good indicator as to whether aggregation bias exists within the data. To find these distributions we perform an MCMC technique known as gibbs sampling.

Gibbs sampling is a Markov chain Monte Carlo algorithm for obtaining a sequence of observations which are approximated, for our case, from a inverse gamma distribution. We use this technique to derive the marginal distribution for each test statistic as a means of statistical inference.

To ensure stationarity of our markov chain and independence of resamples during our Gibbs process, we burn 50000 chains before starting the process of gathering samples (mcmc=100000). In addition, we thin the samples by only taking every tenth sample. In the below code, seed specifies where we start in R’s random number generator. lambda.start is the starting values for the factor loading matrix lambda. psi.start is the starting values for the uniqueness. l0 is the means of the independent normal prior on the factor loadings. L0 is the precisions (inverse variances) of the independent normal prior on the factor loadings. For the rest of the these paramater specifications please see the help file for MCMCpack’s MCMCfactanal() function.

mcmcagg1.04 <- MCMCpanic04(NIPAagg1, 9, 7, 8, burn = 50000, mcmc = 100000, thin = 10,
verbose = 0,seed = NA, lambda.start = NA, psi.start = NA,
l0 = 0, L0 = 0, a0 = 0.001, b0 = 0.001, std.var = TRUE)

mcmcagg2.04 <- MCMCpanic04(NIPAagg2, 9, 7, 8, burn = 50000, mcmc = 100000, thin = 10,
verbose = 0,seed = NA, lambda.start = NA, psi.start = NA,
l0 = 0, L0 = 0, a0 = 0.001, b0 = 0.001, std.var = TRUE)

With large datasets caution should be used on the number of variables allowed in the factor model. A large number can cause the function to run extremely slow and be memory intensive. We thin twice as much for our third aggregate in order to avoid errors. Currently (October 19th, 2014) my laptops 8GB’s of RAM is not sufficient to perform the tests on the third aggregate. We can see what the function would look like here.

mcmcagg3.04 <- MCMCpanic04(NIPAagg3, 9, 7, 8, burn = 80000, mcmc = 100000, thin = 25,
verbose = 0,seed = NA, lambda.start = NA, psi.start = NA,
l0 = 0, L0 = 0, a0 = 0.001, b0 = 0.001, std.var = TRUE)

After running aggregates one and two we receive the test statistics for each chain of the MCMC. turning these back into mcmc objects allows us to use the coda packages built in functions for analyzing markov chains. We use thesummary()` function from the coda package for MCMCs. The second list of this function gives us the quantiles for each chain.

adf.mcmc1 <- as.mcmc(mcmcagg1.04$adf.mcmc)
adf.mcmc2 <- as.mcmc(mcmcagg2.04$adf.mcmc)
summary(adf.mcmc1)[[2]]

##            2.5%     25%     50%     75%   97.5%
## adf50a  125.231 125.231 125.231 125.231 125.231
## adf50b   14.611  14.611  14.611  14.611  14.611
## adf30a   40.492  73.013  93.852 114.141 146.481
## adf30b    2.380   7.074  10.082  13.011  17.679
## Common1  -4.555  -3.297  -2.803  -2.388  -1.596
## Common2  -4.503  -3.297  -2.797  -2.385  -1.601

summary(adf.mcmc2)[[2]]

##           2.5%     25%     50%     75%   97.5%
## adf50a 601.745 601.745 601.745 601.745 601.745
## adf50b  38.143  38.143  38.143  38.143  38.143
## adf30a 424.355 488.003 515.543 541.600 581.584
## adf30b  24.921  29.665  31.718  33.660  36.641
##         -6.085  -4.534  -3.790  -3.246  -2.480
##         -5.668  -3.917  -3.373  -3.032  -2.436
##         -6.162  -4.491  -3.683  -3.229  -2.566

Investigating the quantiles for each test statistic gives us the critical value at each. for adf50a and b the test statistic holds as in bai and ng (2004) and the value is the same for each quantile. For ADF30a and b we see the quantiles for each distribution are very different. Similarly, for each of the common components aggregate one appears to have a much higher chance of rejecting the test statistic.

library(ggplot2)
library(reshape2)
melt.adf1.mcmc <- melt(mcmcagg1.04$adf.mcmc[,5:6])

## No id variables; using all as measure variables

adf.density1 <- ggplot(data = melt.adf1.mcmc, aes(x=value)) + geom_density(aes(fill=variable), alpha = 0.4)+geom_vline(xintercept=-2.86)


adf.density1

plot of chunk density041

Above and below are graphical representations of the probability distributions of our common tests for aggregate 1 and 2, respectively. This is a nice feature of MCMC functions as we can graphically display what the probability of rejecting our critical value (black line) for each component of each aggregate. It’s pretty clear that aggregate 1 is much less likely to reject our null hypotheis, while aggregate two is much more likely.

## No id variables; using all as measure variables

plot of chunk cadf42

Now we run the MCMC version of PANIC10’s demeaned test statistics. Due to computational constraints we cannot currently run aggregate 3.

mcmcagg1.10nd<- MCMCpanic10(NIPAagg1, 9, 7, 8, burn = 50000, mcmc = 100000, thin = 10,
               verbose = 0,seed = NA, lambda.start = NA, psi.start = NA,
               l0 = 0, L0 = 0, a0 = 0.001, b0 = 0.001, std.var = TRUE,demean=FALSE)

mcmcagg2.10nd<- MCMCpanic10(NIPAagg2, 9, 7, 8, burn = 50000, mcmc = 100000, thin = 10,
               verbose = 0,seed = NA, lambda.start = NA, psi.start = NA,
               l0 = 0, L0 = 0, a0 = 0.001, b0 = 0.001, std.var = TRUE,demean=FALSE)

mcmcagg3.10nd<- MCMCpanic10(NIPAagg3, 9, 7, 8, burn = 80000, mcmc = 100000, thin = 20,
               verbose = 0,seed = NA, lambda.start = NA, psi.start = NA,
               l0 = 0, L0 = 0, a0 = 0.001, b0 = 0.001, std.var = TRUE,demean=FALSE)

Similar to PANIC04’s MCMC function we can turn our results into mcmc objects and run the summary function from CODA to get the quantiles for each.

adf.mcmc1.10nd <- as.mcmc(mcmcagg1.10nd)
adf.mcmc2.10nd <- as.mcmc(mcmcagg2.10nd)
summary(adf.mcmc1.10nd)[[2]]

##                 2.5%       25%      50%       75%     97.5%
## model A ta  -25.6086  -15.4765  -10.421   -6.3613   -1.6982
## model A tb   -6.8383   -5.1564   -4.127   -3.0915   -1.2256
## model B ta -693.1939 -487.0977 -356.294 -235.6158 -100.5458
## model B tb  -23.1988  -18.2119  -15.855  -13.5050   -8.9987
## PMSB         -2.1865   -1.9880   -1.858   -1.6737   -0.9920
## rho           0.2986    0.4648    0.588    0.7226    0.8773

summary(adf.mcmc2.10nd)[[2]]

##                 2.5%       25%      50%       75%     97.5%
## model A ta  -42.4917  -37.0618  -33.266  -29.6033  -21.5014
## model A tb   -8.8367   -8.1918   -7.735   -7.2605   -6.0665
## model B ta -769.2484 -577.6176 -492.919 -419.5287 -296.5864
## model B tb  -37.6510  -31.3923  -28.251  -24.8387  -17.5479
## PMSB         -2.2558   -2.1515   -2.098   -2.0368   -1.8725
## rho           0.4969    0.5987    0.643    0.6815    0.7324

Below are the probability distributions for the PMSB test.

PMSB.test <- cbind(mcmcagg1.10nd[,5],mcmcagg2.10nd[,5])
colnames(PMSB.test) <- c("Agg1 PMSB","Agg2 PMSB")
melt.adf1.mcmc10 <- melt(PMSB.test)
adf.density1.10 <- ggplot(data =melt.adf1.mcmc10, aes(x=value)) + geom_density(aes(fill=Var2),alpha = 0.4)
adf.density1.10

plot of chunk density101

T = 660↩
X-13 is a software program available from the U.S. Census Bureau that seasonally adjusts multiple time series using X-13ARIMA-SEATS process↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly