## 1 Introduction

Testing for unit roots in systems of equations has been an active area of research for at least the last three decades. The principal aim of this research has been to increase the power of unit root tests by utilizing the cross-sectional dimension of multiple time series. In this way, power gains can be made by increasing the overall number of observations while using relatively short time series. This approach is often preferable to the use of long univariate time series, which are likely undergo structural changes.

One of the earliest unit root tests in systems of equations was the test by Levin, Lin, and Chu (2002). This test assumes a common autoregressive parameter for all time series in the equation system and consequently pools the data. The assumption of a common parameter, however, imposes a restriction that limits the use of the test for heterogeneous time series. Im, Pesaran, and Shin (2003) presented the IPS test, which relaxed this assumption and modeled the individual time series using separate linear trends. Their suggested test statistic was the average of the *t*-statistics from the individual equations. However, implicit in this method is the assumption that all the time series are of similar length, i.e. that the data is balanced. The test has also been revealed to be sensitive to cross-sectional dependency (see Li and Shukur, 2013, for example).

Another panel unit root test that allows for heterogeneous panels was presented by Maddala and Wu (1999) and Choi (2001). This test combines evidence from several independent tests using their *p*-values and has its basis in the method found in Fisher (1932). If *P _{i}* is the

*p*-value for the

*i*th unit root test, then

*χ*

^{2}distribution, with degrees of freedom equal to twice the number of the individual tests (and therefore, their

*p*-values). Maddala’s unit root test does not require balanced data, can be conducted on

*p*-values obtained from any unit root test, and is less sensitive to correlation across time series compared to the IPS unit root test (see Maddala and Wu, 1999).

The tests described above belong to a group of tests referred as the first generation unit root tests in the panel data literature. These tests depend on the assumption that there is no correlation between the individual time series in the equation system – an assumption that rarely holds in practice. Consequently, tests that account for correlation between time series in equation systems have been proposed. These are often referred to as the second generation unit root tests. The cross-sectionally augmented Im, Shin and Pesaran test, hereafter referred to as CIPS, (Pesaran 2007) is perhaps the most popular of the second generation unit root tests. Results from using CIPS on time series of short lengths will be investigated and compared to those from the proposed unit root test.

We suggest a wavelet variance ratio unit root test for a system of equations. Monte Carlo simulations show that the proposed test is powerful and robust to correlation between time series. We derive the limiting distribution of the wavelet variance ratio test statistic in the cases where the alternatives have no deterministic components, as well as when testing against trend stationarity (stationarity around a non-zero mean and time trend). The limiting distribution is presented under the condition that the lengths of the time series increase, but with a fixed number of time series. Results from the Monte Carlo simulations show that the wavelet-based test retains its nominal size for all of the data generating processes (DGPs) considered, and has better power compared to CIPS.

Finally, we demonstrate the usefulness of the test using an empirical application on evaluating the Purchasing Power Parity theory for the Group of 7 countries. Evidence from this evaluation points to different countries following different specifications, with some having stationary exchange rate series.

## 2 Methodology

### 2.1 Variance ratio unit root tests and the wavelet filters

There has been considerable research into testing the random walk and martingale difference hypotheses, mainly in the context of asset prices. Of particular interest is the model where the error term is an uncorrelated process, which is common in financial time series. Consider the Random Walk Model given by,

Variance ratio unit root tests use the fact that, for a unit root time series, the variance of the *k*th difference of the series is an increasing linear function of the difference, *k*. The test statistics of these tests are, therefore, based on estimators of the ratio of variances at different lags to that at lag 1.

Let *y _{t}* are given as (see Cochrane (1988)),

where *ρ*_{i} is the lag *i* autocorrelation coefficient of the first differences of the

The variance ratio (see Cochrane 1988) is given as follows,

where *f*_{Δy}(0) is the spectral density estimator of Δ*y _{t}* at the zero frequency. An estimate of

*f*

_{Δy}(0) can be based on the sample autocorrelations of Δ

*y*. When the time series has a unit root, the expected value of the variance ratio should be close to 1 for all lags

_{t}*k*. The variance ratio will be less than 1 when the first differences are correlated, indicating the rejection of the null hypothesis of a serially uncorrelated random walk.

Other variance ratio tests are those suggested by Tanaka (1990) and Kwiatkowski et al. (1992). The test statistic for the variance ratio test given in Kwiatkowski et al. (1992) is,

where

where the deterended series is given as

In the frequency domain, the use of variance ratios for unit root testing is motivated by the fact that the spectrum of a unit root process peaks at the near zero frequencies, and tails off exponentially. As a consequence, the largest proportion of the variance is found in the lowest frequency bands. Suitable test statistics can, therefore, be based on the relative distribution of the variance with regards to frequency. For this to be feasible, the spectral variance needs to be decomposed in order to obtain the proportions of the variance contributed by the different frequency intervals. The Discrete Wavelet Transform (DWT) is a variance preserving transform, which decomposes the spectral variance on a scale-by-scale basis using filtering operations. The transform outputs two vectors; a vector of the DWT wavelet coefficients, and a vector of its scaling coefficients. The wavelet coefficients describe the changes at each scale, i.e. the details resulting from differences within each scale. The scaling coefficients, on the other hand, describe averages at each scale, i.e. the smooth resulting from averaging at each scale. The scale of the transform, which is inversely related to frequency, refers to the number of the recursive decompositions. Each recursive iteration from the second onwards decomposes the scaling coefficients from the preceding iteration.

The DWT has its filters operate on non-overlapping values, which means that the input time series have to be of dyadic lengths

The Haar MODWT wavelet filter

*j*. The scaling filter is given as,

The Haar wavelet filter, therefore, approximates a band-pass filter with the nominal pass-band [2^{−(j+1)}, 2^{−j}] and the Haar scaling filter approximates an ideal low-pass filter with the nominal pass-band [0, 2^{−(j+1)}].

The *j*th level wavelet and scaling coefficients are defined as follows respectively:

A useful property of the Haar MODWT transform is,

where *J*_{0} is an arbitrary scale less than or equal to the maximum resolution of the time series. This property implies that the time series itself (not only its variance) can be additively decomposed into its wavelet and scaling coefficients.

For univariate time series, the wavelet variance ratio unit root test introduced by Fan and Gençay (2010) use a normalized version of the test statistic given below,

The numerator is the contribution to the variance from the first level scaling coefficients, and the denominator is the total variance partitioned into the parts contributed by the scaling and wavelet coefficients of the first scale, respectively. The variances of the scaling and wavelet coefficients are given as,

respectively.

Under the unit root null hypothesis, it can be seen that

The limiting distribution of the test statistic, which is a normalized version of

Under the alternative hypothesis both

Li and Shukur (2013) proposed using

### 2.2 The wavelet variance ratio unit root test for a system of equations

Consider the system of equations without deterministic terms for simplicity,

where *y _{it}* is the time series of interest and

*u*is a zero mean weakly stationary error term, i.e.

_{it}*ψ*(1) ≠ 0),

*i*indexes the individual equation, and

*t*indexes time. Also

*i*≠

*k*, and

*ε*

_{it}is an iid, zero mean process with variance

*ψ*(

*L*) is the lag polynomial that that relates the response of

*u*to

_{it}*ε*

_{it}.

The unit root hypothesis for the system,

and the alternative hypothesis is,

Let the matrix of time series be denoted by,

so that the Haar MODWT scaling and wavelet coefficients for the first scale decomposition are given by,

where **v _{i}** is the vector of the scaling coefficients of the series

**y**(

_{i}*i*= 1, 2, … ,

*N*), i.e.

**w**is the vector of the wavelet coefficients of series

_{i}**y**(

_{i}*i*= 1, 2, … ,

*N*), i.e.

A wavelet variance ratio unit root test can be based on the following,

where,

is the total variance of the system and,

is the variance contributed by the first scale wavelet coefficients.

Under the null hypothesis (where all the series in the equation system are *I*(1)), **V ^{T}V** is a diagonal matrix with diagonal elements being of order

*O*(

_{p}*T*

^{2}). The diagonal elements of

**V**will dominate those of

^{T}V**W**which are of order of convergence

^{T}W*O*(

_{p}*T*). The test statistic will, therefore, take on larger values under the null hypothesis compared the values under the alternative. For white noise series, for example,

since **V ^{T}V** =

**W**.

^{T}WVR is not bounded under the null hypothesis. A suitably normalized test statistic given as follows,

where

The limiting distribution of the test statistic under the null hypothesis is shown in the following theorem whose proof is given in the Appendix.

*The limiting distribution of *

where *N* is the number of equations.

Since the test statistic is the sum of variance ratios, a Central Limit Theorem could be invoked by normalizing the sum and letting *N* → ∞, but this is not pursued in this paper as we are mainly interested in the limit only where *T* → ∞.

Many unit root tests suffer from loss of power when tested against alternatives that are trend stationary. As a consequence of this, efficient detrending methods (see Schmidt and Phillips, 1966) are required to retain power. We use the detrending techniques suggested by Fan and Gençay (2010), and, as in their work, we restrict our scope to the the cases where the models specified under the alternative hypotheses have non-zero means and linear trends only.

The model including deterministic components is given as,

For equation *i*, the null hypothesis, *H*_{0} : *ϕ*_{i} = 1 is the unit root hypothesis while under *H _{A}*,

*ϕ*

_{i}< |1| is the hypothesis of stationarity. Following Fan and Gençay (2010), when

*α*= 0, we consider the demeaned series

*α*≠ 0.

*i*,

*y*for individual

_{it}*i*.

Let the test statistics be denoted by *α* = 0 and *α* ≠ 0 respectively (see Eqn. (1)). The limiting distributions of these statistics are given by Theorem 2 below. The derivations of these limiting distributions, which are also given in the Appendix, are similar to that given for the distribution of the test statistic given in Theorem 1, except that detrended Brownian motions are used.

*The limiting distributions of *

_{0}are given as,

*respectively.*

Where ⇒ denotes convergence in the associated probability measure, *V*(*r*) = *W*(*r*) − *rW*(1).

### 2.3 Comparison unit root test

The small sample properties of VR_{M} are compared to CIPS. Pesaran (2007) constructs the CIPS test based the following model:

where the initial values *y _{i0}* are fixed. A single common factor with individual specific factor loadings is specified for the error term,

*f _{t}*, is assumed to be stationary and serially uncorrelated, and without loss of generality, its variance is fixed at 1, i.e.

*λ*

_{i}, themselves random variables.

*ϵ*

_{it},

*λ*

_{i}and

*f*are assumed to be mutually independent.

_{t}Pesaran (2007) proposes a test that augments the standard Dickey-Fuller test with the cross-sectional averages, resulting in the following Cross-sectionally Augmented Dickey-Fuller (CADF) estimating equation,

where *ȳ*_{t} are the cross-sectional averages, and lags of Δ*y _{it}* and

Letting CADF_{i} represent the CADF statistic for equation *i*, CIPS is the average of the CADFs over all the equations,

The small sample properties of CIPS are studied using Monte Carlo simulation in Pesaran (2007). The test is shown to be robust to size distortions even, in the presence of strong cross-sectional dependence and serial correlation, and has good power properties for sample sizes of between 50 and 100 for the DGPs considered therein.

In the following section, we examine the performance of VR_{M}, and make comparisons with that of CIPS for time series of lengths 20–100, using systems of 5 and 10 equations. The size and power of the tests are compared in cases where there is neither cross-sectional dependency nor serial correlation (hereafter called DGP 1), in the presence of weak cross-sectional correlation but no serial correlation (hereafter called DGP 2), in the case where there is strong cross-sectional correlation but no serial correlation (hereafter called DGP 3), and in the case with both strong cross-sectional correlation and strong serial correlation (hereafter called DGP 4). The choice of DGPs follows that of Pesaran (2007).

## 3 Monte Carlo simulations

Monte Carlo simulations were used to study the size and power properties of the two unit root tests in small sample sizes. The design of the Monte Carlo experiments is discussed next.

### 3.1 Design of the Monte Carlo experiment

Following Pesaran (2007), time series are simulated using the following DGPs

Cross-sectional correlation is introduced using a single common factor denoted by *f _{t}*, and represents the unobserved common factor effect.

Table 1 shows the experimental factors and the ranges over which they are varied. The nominal test size is held at 5% as per convention.

Factors that vary for the different DGPs.

Factor | Symbol | Design |
---|---|---|

Nominal size | π_{0} | 0.05 |

Number of iterations | I | 10,000 (Size) |

50,000 (Critical values) | ||

10,000 (Power) | ||

Number of equations | N | 5, 10 |

Number of observations | T | 20, 30, 50, 100 |

Common factor loadings | γ_{i} | 0 (No correlation) |

(Cross-sectional correlation) | ∼U[0, 0.2] (Weak) | |

∼U[−1, 3] (Strong) | ||

AR parameter for serial correlation | ρ_{i} | ∼U[0.2, 0.4] |

AR parameter for alternatives | ϕ_{i} | ∼U[0.85, 0.95] |

The common factor loadings are sampled from the uniform distribution with parameters U[0, 0.2] and U[−1, 3] for weak and strong cross-sectional dependence, respectively. This corresponds to cross-sectional correlations between the equations of 1% and 50% on average, respectively.

## 4 Results and discussions

### 4.1 Empirical test sizes and power

Monte Carlo simulations were conducted for two purposes; to study the small sample performance of VR_{M} by comparing it CIPS, and to study the robustness of VR_{M} to cross-sectional and serial dependence. We examine these aspects in the cases where testing is against an alternative that has zero mean and no time trend, and in the case where testing is against an alternative that is stationary around a non-zero mean only. The test statistic used in both cases is *α* = 0 but differ in their specification of *μ*_{i} (see Eqn. (1))

### 4.1.1 Case I. No deterministic terms

The 1%, 5% and 10% critical values for

Table 3 shows the test sizes for the the four DGPs given earlier. There is no evidence of size distortions any of the DGPs for both tests.

Case I (no deterministic components): critical values.

1% | 5% | 10% | |||||
---|---|---|---|---|---|---|---|

T | CIPS | CIPS | CIPS | ||||

N = 5 | 20 | 105.55 | −2.29 | 220.86 | −1.92 | 321.06 | −1.72 |

30 | 66.35 | −2.28 | 135.69 | −1.91 | 196.64 | −1.72 | |

50 | 38.35 | −2.24 | 77.86 | −1.91 | 111.35 | −1.71 | |

100 | 19.09 | −2.23 | 38.25 | −1.89 | 54.62 | −1.72 | |

N = 10 | 20 | 693.17 | −1.97 | 1185.68 | −1.69 | 1529.02 | −1.55 |

30 | 398.98 | −1.95 | 637.69 | −1.70 | 799.70 | −1.56 | |

50 | 214.04 | −1.95 | 333.68 | −1.70 | 422.6 | −1.56 | |

100 | 100.18 | −1.93 | 154.13 | −1.71 | 192.14 | −1.57 |

^{}

Each individual series generated using the DGP *f _{t}* and

*ε*

_{it}∼ iid N(0, 1).

^{}

For CIPS, the critical values are calculated from the regression of Δ*y _{it}* on

^{}

For the

The empirical power of the two tests are also displayed in Table 3. For the case where there are no deterministic components, it is clear that the

Case I (no deterministic components): test sizes.

DGP 1 | DGP 2 | DGP 3 | DGP 4 | |||||
---|---|---|---|---|---|---|---|---|

T | CIPS | CIPS | CIPS | CIPS | ||||

Size | N = 5 | |||||||

20 | 0.055 | 0.054 | 0.050 | 0.050 | 0.050 | 0.058 | 0.049 | 0.054 |

30 | 0.049 | 0.051 | 0.051 | 0.051 | 0.051 | 0.050 | 0.048 | 0.051 |

50 | 0.050 | 0.051 | 0.050 | 0.050 | 0.052 | 0.054 | 0.046 | 0.051 |

100 | 0.051 | 0.049 | 0.053 | 0.053 | 0.508 | 0.055 | 0.046 | 0.049 |

Size | N = 10 | |||||||

20 | 0.050 | 0.048 | 0.046 | 0.055 | 0.048 | 0.053 | 0.047 | 0.040 |

30 | 0.049 | 0.050 | 0.051 | 0.052 | 0.051 | 0.057 | 0.046 | 0.046 |

50 | 0.050 | 0.051 | 0.049 | 0.049 | 0.053 | 0.053 | 0.049 | 0.035 |

100 | 0.047 | 0.055 | 0.046 | 0.051 | 0.051 | 0.063 | 0.044 | 0.041 |

Power | N = 5 | |||||||

20 | 1.000 | 0.134 | 1.000 | 0.132 | 1.000 | 0.142 | 1.000 | 0.091 |

30 | 1.000 | 0.232 | 1.000 | 0.211 | 1.000 | 0.234 | 1.000 | 0.134 |

50 | 1.000 | 0.508 | 1.000 | 0.496 | 1.000 | 0.503 | 1.000 | 0.313 |

100 | 1.000 | 0.960 | 1.000 | 0.956 | 1.000 | 0.952 | 1.000 | 0.837 |

Power | N = 10 | |||||||

20 | 1.000 | 0.194 | 1.000 | 0.188 | 1.000 | 0.191 | 1.000 | 0.101 |

30 | 1.000 | 0.357 | 1.000 | 0.362 | 1.000 | 0.366 | 1.000 | 0.194 |

50 | 1.000 | 0.796 | 1.000 | 0.782 | 1.000 | 0.788 | 1.000 | 0.519 |

100 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.988 |

### 4.1.2 Case II. Non-zero mean and no time trend

The critical values for

Case II (intercept only): critical values.

1% | 5% | 10% | |||||
---|---|---|---|---|---|---|---|

T | CIPS | CIPS | CIPS | ||||

N = 5 | 20 | 1.07 | −2.99 | 1.22 | −2.59 | 1.32 | −2.40 |

30 | 1.00 | −2.91 | 1.17 | −2.56 | 1.27 | −2.39 | |

50 | 0.96 | −2.87 | 1.12 | −2.55 | 1.22 | −2.38 | |

100 | 0.92 | −2.86 | 1.09 | −2.54 | 1.19 | −2.38 | |

N = 10 | 20 | 2.58 | −2.61 | 2.84 | −2.35 | 2.98 | −2.22 |

30 | 2.41 | −2.58 | 2.67 | −2.33 | 2.83 | −2.21 | |

50 | 2.28 | −2.55 | 2.56 | −2.33 | 2.71 | −2.22 | |

100 | 2.20 | −2.53 | 2.47 | −2.32 | 2.64 | −2.22 |

^{}

Each individual series generated using the DGP *f _{t}* and

*ε*

_{it}∼ iid N(0, 1).

^{}

For *CIPS*, The critical values are calculated from the regression of Δ*y _{it}* on a constant,

^{}

For the

Table 5 shows the test sizes for the 4 DGPs. Again there is no evidence of size distortions for all the DGPs using both tests.

Table 5 also displays the power of the two tests. Both tests show low power for the smallest sample sizes (*T* = 20, 30) but the power increases with sample size. Both tests show decreasing power when strong cross-sectional and serial correlation are present. For the 5 equations system, *T* = 50 and *T* = 100. For the smaller sample sizes, both tests show similar (but low) power. As expected, the 10 equations system shows more power than the 5 equations system for both tests.

Case II (intercept only): test sizes and power.

DGP 1 | DGP 2 | DGP 3 | DGP 4 | |||||
---|---|---|---|---|---|---|---|---|

T | CIPS | CIPS | CIPS | CIPS | ||||

Size | N = 5 | |||||||

20 | 0.047 | 0.048 | 0.050 | 0.051 | 0.045 | 0.050 | 0.037 | 0.050 |

30 | 0.048 | 0.049 | 0.055 | 0.047 | 0.049 | 0.056 | 0.042 | 0.044 |

50 | 0.050 | 0.047 | 0.049 | 0.051 | 0.047 | 0.049 | 0.045 | 0.044 |

100 | 0.051 | 0.052 | 0.050 | 0.049 | 0.053 | 0.058 | 0.041 | 0.051 |

Size | N = 10 | |||||||

20 | 0.047 | 0.054 | 0.050 | 0.050 | 0.046 | 0.058 | 0.049 | 0.055 |

30 | 0.050 | 0.051 | 0.045 | 0.051 | 0.049 | 0.060 | 0.044 | 0.044 |

50 | 0.047 | 0.051 | 0.049 | 0.050 | 0.044 | 0.054 | 0.050 | 0.044 |

100 | 0.051 | 0.049 | 0.050 | 0.053 | 0.052 | 0.055 | 0.048 | 0.038 |

Power | N = 5 | |||||||

20 | 0.089 | 0.079 | 0.089 | 0.086 | 0.085 | 0.089 | 0.078 | 0.077 |

30 | 0.157 | 0.111 | 0.151 | 0.113 | 0.157 | 0.119 | 0.147 | 0.087 |

50 | 0.368 | 0.227 | 0.385 | 0.224 | 0.363 | 0.235 | 0.350 | 0.156 |

100 | 0.905 | 0.729 | 0.918 | 0.706 | 0.886 | 0.722 | 0.885 | 0.533 |

Power | N = 10 | |||||||

20 | 0.098 | 0.093 | 0.097 | 0.092 | 0.092 | 0.100 | 0.098 | 0.073 |

30 | 0.178 | 0.159 | 0.184 | 0.165 | 0.171 | 0.165 | 0.170 | 0.099 |

50 | 0.511 | 0.373 | 0.535 | 0.371 | 0.479 | 0.391 | 0.491 | 0.225 |

100 | 0.986 | 0.964 | 0.993 | 0.959 | 0.976 | 0.954 | 0.977 | 0.842 |

The power advantage of the wavelet-based unit root test over CIPS could be explained by the differences in effective sample sizes. While the wavelet-based test loses power when the series are demeaned or detrended, CIPS requires the estimation of several parameters for each individual time series, which means that the effective sample size is reduced (hence the loss of power).

## 5 Empirical application

Purchasing Power Parity (PPP) has been heavily researched in international economics because of its central role in building macroeconomic models. There are two different versions of PPP; the absolute PPP, which refers to the situation where the nominal exchange rate between two currencies is equal to the ratio of the price levels of the two corresponding countries, and the relative PPP which takes into account factors such as trade barriers (tariff and non-tariff barriers), transportation costs, and product differentiation across countries. The empirical literature has focused on the relative version the PPP which is the weaker version of the macroeconomic theory. In the relative version, the rate of depreciation of a currency equals the difference in price inflation of that country’s currency and the price inflation in the comparative country, making the real exchange rate constant.

The conventional procedure when evaluating PPP is to test the null hypothesis that the real exchange rate series has a unit root against the alternative hypothesis of being stationary. Rejection of the null hypothesis indicates support for the PPP theory. Initial studies using augmented Dickey-Fuller (ADF) unit root tests suggested by Dickey and Fuller (1979) showed little evidence supporting PPP in the long-run. An example of such a study is Taylor (1988) where the conclusions were very unfavorable to PPP as a long-run equilibrium condition. Other examples of such studies include Corbae and Ouliaris (1988), Layton and Stark (1990), Corbae and Ouliaris (1991), and Bahmani-Oskooee (1993, 1995). However, Frankel and Rose (1996) noted that a non-rejection of the null hypothesis may be due to low statistical power of the unit root tests, which is mainly caused by the lack of data. Glen (1992), Lothian and Taylor (1996), and Taylor (2002) among others suggest that longer time series could be used to provide indirect evidence to support PPP. However, these long-span studies also faced the criticism (see Hegwood and Papell, 1998, for example) that structural breaks or shifts in the equilibrium exchange rates are possibly generated during the long time span, thereby biasing the results. An alternative approach, which can be used to increase the statistical power, is to utilize the cross-sectional dimension of multiple time series. Examples of studies using panel unit root tests are Cheung, Chinn, and Fujii (2006) and Murray and Papell (2002, 2005).

As an empirical application we use the PPP theory and compare the evidence found from using CIPS and

The CIPS test is the conventional Pesaran (2007) test. The

Empirical example.

Full sample | Post-Bretton Woods | ||
---|---|---|---|

CIPS | CIPS | ||

1.562* | −2.006 | 1.661* | −2.078 |

^{}

A star indicates significance at the conventional 5% level of significance.

## 6 Summary and conclusions

A unit root test for a system of equations is introduced in this paper. The proposed test extends the wavelet variance ratio unit root test of Fan and Gençay (2010) to multiple equation time series.

Monte Carlo simulations show that the proposed test has higher power compared to CIPS (Pesaran 2007) for time series of short length (between 20 and 100 observations), and systems of 5 and 10 equations. The test is also shown to be robust to cross-sectional dependency and serial correlation for the DGPs considered in this paper.

We demonstrate its usefulness through an empirical application on evaluating the PPP theory for the G7 countries. Evidence from this evaluation points to different countries following different specifications, with some having stationary exchange rate series.

The proposed unit root test is simple to apply and interpret, and could prove to be useful to the practitioner who is faced with a system of 10 or fewer equations and time series of lengths up to 100. For larger systems of equations or systems with longer time series, any of the existing unit root tests should provide adequate power.

The first two authors would like to acknowledge the contribution of the third author, Ghazi Shukur, who passed away during the review period of the manuscript. We would also like to thank the anonymous reviewers for their suggestions.

Consider the first-order autoregressive model:

where *y _{it}* is the time series of interest,

*i*= 1, …,

*N*are the individual time series,

*t*= 1, …,

*T*indexes time,

*u*is a weakly stationary zero mean process with finite long-run variance.

_{it}Here we consider the cases with no deterministic terms as well as where the alternative is trend stationary around a non-zero mean and time trend.

Consider *N* time series that have no cross-sectional correlation but are possibly autocorrelated,

The Haar MODWT scaling and wavelet coefficients for the first scale decomposition are given by,

where **v _{i}** is the vector of the scaling coefficients of the series

**y**(

_{i}*i*= 1, 2, … ,

*N*), i.e.

**w**is the vector of the wavelet coefficients of series

_{i}**y**(

_{i}*i*= 1, 2, … ,

*N*), i.e.

The total variance in the system of equations can be expressed in terms of the Haar MODWT coefficients as follows:

The contribution to the total variance due to the wavelet and scaling coefficients is given by,

A unit root test statistic can, therefore, be based on the following ratio,

Under the unit root null hypothesis, the diagonal elements of *O _{p}*(

*T*) and the ratio will be small.

Under the null hypothesis,

as *T* → ∞ since there is no cross-correlation i.e.

Then,

For the MODWT transform,

Also, for this wavelet transform,

From the asymptotic theory for unit root processes (see Hamilton (1994) pp. 486) and Continuous Mapping Theorem (CMT) (see Billingsley 1968),

where *u _{it}*.

Also, for the Haar MODWT wavelet filter,

where *i* (see Percival 1995)

Using the CMT, the limiting distribution of the variance ratio for each individual series is,

where the long-run variance of *u _{it}* (for time series

*i*) is given by,

*γ*_{i, j} is the lag *j* autocovariance for time series *i*, and ⇒ is used to denote convergence of the associated probability measure.

The two nuisance parameters in the limiting distribution,

- 1.
is the wavelet variance at unit scale of the Haar MODWT. Its consistent estimator (see Percival (1995)) is given by,$\text{E}\left({W}_{ti,1}^{2}\right)$ The wavelet variance estimator for the Haar MODWT avoids boundary effects, which is the loss of coefficients at the ends of time series as a result of circular filtering operations.${\hat{\upsilon}}_{yi,1}=\frac{1}{T-1}\sum _{t=0}^{T-1}{W}_{ti,1}^{2}$ - 2.For the long-run variance,
, estimation can be made in one of two ways (see Zivot and Wang, 2006, for details on the long-run variance estimation):${\omega}_{i}^{2}$ - a.Parametric approachFor time series
*i*, since*ω*_{i}is a linear process, it follows that,giving$\sum _{-\infty}^{\infty}{\gamma}_{j}={\sigma}^{2}{\left(\sum _{j=0}^{\infty}{\psi}_{j}\right)}^{2}={\sigma}^{2}\psi (1{)}^{2}$ .When${\omega}_{i}^{2}={\sigma}_{i}^{2}\psi (1{)}^{2}$ *u*is ARMA(_{it}*p*,*q*), then,which gives$\psi \left(1\right)=\frac{1+{\theta}_{1}+\dots +{\theta}_{q}}{1+{\varphi}_{1}+\dots +{\varphi}_{p}}=\frac{\theta \left(1\right)}{\varphi \left(1\right)},$ where${\omega}_{i}^{2}=\frac{{\sigma}_{i}^{2}\theta (1{)}^{2}}{\varphi (1{)}^{2}}$ is the variance of the error of the ARMA model for time series${\sigma}_{i}^{2}$ *i*.Making substitutions using the estimates of the parameters of the ARMA(*p*,*q*) process gives a consistent estimate of .A second parametric approach is to approximate the ARMA(${\omega}_{i}^{2}$ *p*,*q*) process with a higher order AR(*p*^{*}) process,and then estimate the long-run variance as follows${u}_{it}={\varphi}_{i,1}{u}_{i,t-1}+\dots +{\varphi}_{i,{p}^{\ast}}{u}_{i,t-{p}^{\ast}}+{\epsilon}_{it},$ ${\omega}_{i}^{2}=\frac{{\sigma}_{i}^{2}}{{\varphi}^{\ast}(1{)}^{2}}$ - b.Semi-parametric method using a kernel function:One possible semi-parametric estimator of the long-run variance is the Newey and West (1987) estimator, which is the wighted covariance function,where
${\omega}_{i}^{2}={\hat{\gamma}}_{i,0}+2\sum _{\ell =1}^{L}{w}_{i,\ell}{\hat{\gamma}}_{i,\ell}$ *w*are the weights for time series_{i,ℓ}*i*, are the autocovariances for time series${\hat{\gamma}}_{i,\ell}$ *i*, and*L*is the truncation lag or bandwidth parameter, such that*L*=*O*(*T*^{1/3}) (see Andrews 1991).Newey and West use the Bartlett weights,with${w}_{\ell}=1-\frac{\ell}{L+1}$ .$L=\lfloor 4(T/100{)}^{2/9}\rfloor $

- a.Parametric approachFor time series

The nuisance parameters are eliminated from the limiting distribution by normalizing the variance ratio with the ratio of the consistent estimates of the nuisance parameters,

giving the result,

The test statistic is, therefore,

where

Let

where

so that

where *i*.

For the case where the time series are efficiently detrended, the following result holds (see Kwiatkowski et al. (1992), for example)

where *V*(*r*) is a standard Brownian bridge given as *V*(*r*) = *W*(*r*) − *rW*(1), and *W*(*r*) is Brownian motion.

The rest of the proof follows that for Theorem 1. Starting with Eqn. (2), replacing *W _{i}*(

*r*) with

*V*(

_{i}*r*) with

## References

Andrews, D. W. 1991. “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation.” Econometrica: Journal of the Econometric Society 59: 817–858.

Bahmani-Oskooee, M. 1993. “Purchasing Power Parity Based on Effective Exchange Rate and Cointegration: 25 LDCs’ Experience With its Absolute Formulation.” World Development 21 (6): 1023–1031.

Bahmani-Oskooee, M. 1995. “Real and Nominal Effective Exchange Rates for 22 LDCs: 1971: 1–1990: 4.” Applied Economics 27 (7): 591–604.

Billingsley, P. 1968. Convergence of Probability Measures. New york: John Wiley & Sons.

Breitung, J. 2002. “Nonparametric Tests for Unit Roots and Cointegration.” Journal of Econometrics 108 (2): 343–363.

Cheung, Y.-W., M. D. Chinn, and E. Fujii. 2006. “The Chinese Economies in Global Context: The Integration Process and its Determinants.” Journal of the Japanese and International Economies 20 (1): 128–153.

Choi, I. 2001. “Unit Root Tests for Panel Data.” Journal of International Money and Finance 20 (2): 249–272.

Cochrane, J. H. 1988. “How Big is the Random Walk in GNP?” Journal of Political Economy 96 (5): 893–920.

Corbae, D., and S. Ouliaris 1988. “Cointegration and Tests of Purchasing Power Parity.” The Review of Economics and Statistics 70: 508–511.

Corbae, D., and S. Ouliaris. 1991. “A Test of Long-run Purchasing Power Parity Allowing for Structural Breaks.” Economic Record 67 (1): 26–33.

Dickey, D. A., and W. A. Fuller. 1979. “Distribution of the Estimators for Autoregressive Time Series with a Unit Root.” Journal of the American statistical association 74 (366a): 427–431.

Fan, Y., and R. Gençay. 2010. “Unit Root Tests with Wavelets.” Econometric Theory 26 (05): 1305–1331.

Fisher, R. 1932. Statistical Methods for Research Workers (Edinburgh: Oliver and Boyd, 1925). Edinburgh: Oliver and Boyd.

Frankel, J. A, and A. K Rose. 1996. “A Panel Project on Purchasing Power Parity: Mean Reversion Within and Between Countries.” Journal of International Economics 40: (1): 209–224.

Gençay, R., F. Selçuk, and B. J. Whitcher. 2001. An Introduction to Wavelets and Other Filtering Methods in Finance and Economics. San Diego: Academic Press.

Glen, J. D. 1992. “Real Exchange Rates in the Short, Medium, and Long Run.” Journal of International Economics 33 (1–2): 147–166.

Granger, C. 1966. “The Typical Spectral Shape of an Economic Variable.” Econometrica 54: 257–287.

Hamilton, J. D. 1994. Time Series Analysis, Vol. 2. Princeton: Princeton University Press.

Hegwood, N. D., and D. H. Papell. 1998. “Quasi Purchasing Power Parity.” International Journal of Finance & Economics 3 (4): 279–289.

Im, K. S., M. H. Pesaran, and Y. Shin. 2003. “Testing for Unit Roots in Heterogeneous Panels.” Journal of econometrics 115 (1): 53–74.

Kwiatkowski, D., P. C. Phillips, P. Schmidt, and Y. Shin. 1992. “Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root: How Sure are we that Economic Time Series have a Unit Root?” Journal of Econometrics 54 (1–3): 159–178.

Layton, A. P., and J. P. Stark. 1990. “Cointegration as an Empirical Test of Purchasing Power Parity.” Journal of Macroeconomics 12 (1): 125–136.

Levin, A., C.-F. Lin, and C.-S. J. Chu. 2002. “Unit Root Tests in Panel Data: Asymptotic and Finite-sample Properties.” Journal of econometrics 108 (1): 1–24.

Li, Y., and G. Shukur. 2013. “Testing for Unit Roots in Panel Data Using a Wavelet Ratio Method.” Computational Economics 41 (1): 59–69.

Lothian, J. R., and M. P. Taylor. 1996. “Real Exchange Rate Behavior: The Recent Float From the Perspective of the Past two Centuries.” Journal of Political Economy 104 (3): 488–509.

Maddala, G. S., and S. Wu. 1999. “A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test.” Oxford Bulletin of Economics and statistics 61 (S1): 631–652.

Murray, C. J., and D. H. Papell. 2002. “The Purchasing Power Parity Persistence Paradigm.” Journal of International Economics 56 (1): 1–19.

Murray, C. J., and D. H. Papell. 2005. “Do Panels Help Solve the Purchasing Power Parity Puzzle?” Journal of Business & Economic Statistics 23 (4): 410–415.

Newey, W. K., and K. D. West. 1987. “A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation-consistent Covariance Matrix.” Econometrica 55: 703–708.

Percival, D. B. 1995. “On Estimation of the Wavelet Variance.” Biometrika 82 (3): 619–631.

Percival, D. B., and A. T. Walden. 2000. Wavelet Methods for Time Series Analysis (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge: Cambridge University Press.

Pesaran, M. H. 2007. “A Simple Panel Unit Root Test in the Presence of Cross-section Dependence.” Journal of Applied Econometrics 22 (2): 265–312.

Schmidt, P., and P. C. Phillips. 1966. “LM Tests for a Unit Root in the Presence of Deterministic Trends“. Oxford Bulletin of Economics and Statistics 34: 150–161.

Stock, J. H. 1994. “Unit Roots, Structural Breaks and Trends.” In Handbook of Econometrics, edited by R. Engle and D. McFadden, Vol. 4, chapter 46, 2752–2753. North Holland: Elsevier located in Amsterdam.

Tanaka, K. 1990. “Testing for a Moving Average Unit Root.” Econometric Theory 6 (04): 433–444.

Taylor, M. P. 1988. “An Empirical Examination of Long-run Purchasing Power Parity Using Cointegration Techniques.” Applied Economics 20 (10): 1369–1381.

Taylor, A. M. 2002. “A Century of Purchasing Power Parity.” Review of Economics and Statistics 84 (1): 139–150.

Zivot, E., and J. Wang. 2006. Modelling Financial Time Series with S-Plus. New York: Springer-Verlag.