A wavelet-based variance ratio unit root test for a system of equations

In this paper, we suggest a unit root test for a system of equations using a spectral variance decomposition method based on the Maximal Overlap Discrete Wavelet Transform. We obtain the limiting distribution of the test statistic and study its small sample properties using Monte Carlo simulations. We find that, for multiple time series of small lengths, the wavelet-based method is robust to size distortions in the presence of crosssectional dependence. The wavelet-based test is also more powerful than the Cross-sectionally Augmented Im et al. unit root test (Pesaran, M. H. 2007. “A Simple Panel Unit Root Test in the Presence of Cross-section Dependence.” Journal of Applied Econometrics 22 (2): 265–312.) for time series with between 20 and 100 observations, using systems of 5 and 10 equations. We demonstrate the usefulness of the test through an application on evaluating the Purchasing Power Parity theory for the Group of 7 countries and find support for the theory, whereas the test by Pesaran (Pesaran, M. H. 2007. “A Simple Panel Unit Root Test in the Presence of Cross-section Dependence.” Journal of Applied Econometrics 22 (2): 265–312.) finds no such support.


Introduction
Testing for unit roots in systems of equations has been an active area of research for at least the last three decades. The principal aim of this research has been to increase the power of unit root tests by utilizing the cross-sectional dimension of multiple time series. In this way, power gains can be made by increasing the overall number of observations while using relatively short time series. This approach is often preferable to the use of long univariate time series, which are likely undergo structural changes.
One of the earliest unit root tests in systems of equations was the test by Levin, Lin, and Chu (2002). This test assumes a common autoregressive parameter for all time series in the equation system and consequently pools the data. The assumption of a common parameter, however, imposes a restriction that limits the use of the test for heterogeneous time series. Im, Pesaran, and Shin (2003) presented the IPS test, which relaxed this assumption and modeled the individual time series using separate linear trends. Their suggested test statistic was the average of the t-statistics from the individual equations. However, implicit in this method is the assumption that all the time series are of similar length, i.e. that the data is balanced. The test has also been revealed to be sensitive to cross-sectional dependency (see Li and Shukur, 2013, for example).
Another panel unit root test that allows for heterogeneous panels was presented by Maddala and Wu (1999) and Choi (2001). This test combines evidence from several independent tests using their p-values and has its basis in the method found in Fisher (1932). If P i is the p-value for the ith unit root test, then −2 ∑ log has an exact χ 2 distribution, with degrees of freedom equal to twice the number of the individual tests (and therefore, their p-values). Maddala's unit root test does not require balanced data, can be conducted on p-values obtained from any unit root test, and is less sensitive to correlation across time series compared to the IPS unit root test (see Maddala and Wu, 1999).
The tests described above belong to a group of tests referred as the first generation unit root tests in the panel data literature. These tests depend on the assumption that there is no correlation between the individual time series in the equation system -an assumption that rarely holds in practice. Consequently, tests that account for correlation between time series in equation systems have been proposed. These are often referred to as the second generation unit root tests. The cross-sectionally augmented Im, Shin and Pesaran test, hereafter referred to as CIPS, (Pesaran 2007) is perhaps the most popular of the second generation unit root tests. Results from using CIPS on time series of short lengths will be investigated and compared to those from the proposed unit root test.
We suggest a wavelet variance ratio unit root test for a system of equations. Monte Carlo simulations show that the proposed test is powerful and robust to correlation between time series. We derive the limiting distribution of the wavelet variance ratio test statistic in the cases where the alternatives have no deterministic components, as well as when testing against trend stationarity (stationarity around a non-zero mean and time trend). The limiting distribution is presented under the condition that the lengths of the time series increase, but with a fixed number of time series. Results from the Monte Carlo simulations show that the wavelet-based test retains its nominal size for all of the data generating processes (DGPs) considered, and has better power compared to CIPS.
Finally, we demonstrate the usefulness of the test using an empirical application on evaluating the Purchasing Power Parity theory for the Group of 7 countries. Evidence from this evaluation points to different countries following different specifications, with some having stationary exchange rate series.

Variance ratio unit root tests and the wavelet filters
There has been considerable research into testing the random walk and martingale difference hypotheses, mainly in the context of asset prices. Of particular interest is the model where the error term is an uncorrelated process, which is common in financial time series. Consider the Random Walk Model given by, where, = ( ) and = ( ) are stationary processes.
Variance ratio unit root tests use the fact that, for a unit root time series, the variance of the kth difference of the series is an increasing linear function of the difference, k. The test statistics of these tests are, therefore, based on estimators of the ratio of variances at different lags to that at lag 1.
Let 2 = Var ( − − ) / and Δ = ( − −1 ). The relation between 2 and the autocorrelation coefficients of Δy t are given as (see Cochrane (1988)), where ρ i is the lag i autocorrelation coefficient of the first differences of the { } =0 series. This type of variance ratio unit root test is essentially a specification test using the null hypothesis, The variance ratio (see Cochrane 1988) is given as follows, wherê2 (1) is an unbiased estimate of the variance at lag 1 and f Δy (0) is the spectral density estimator of Δy t at the zero frequency. An estimate of f Δy (0) can be based on the sample autocorrelations of Δy t . When the time series has a unit root, the expected value of the variance ratio should be close to 1 for all lags k. The variance ratio will be less than 1 when the first differences are correlated, indicating the rejection of the null hypothesis of a serially uncorrelated random walk.
Other variance ratio tests are those suggested by Tanaka (1990) and Kwiatkowski et al. (1992). The test statistic for the variance ratio test given in Kwiatkowski et al. (1992) is, is the partial sum of the { } =0 process. ∑ =1 2 / estimates the long-run variance which, in the case of serial dependence, can be estimated using semi-parametric kernel based methods e.g. the Newey-West estimator (Newey and West, 1987). When testing against trend stationary alternatives, the test statistic is for the detrended series,̂= where the deterended series is given as̃= ( −), and̂is an estimate of the deterministic component -for example, the sample average in the case of the null being of stationarity about a non-zero mean. The test statistic of Kwiakowski et al. given above tests for stationarity i.e. it has its null hypothesis as stationary. Breitung (2002) reversed the roles of the null and alternative hypotheses and proposed usinĝas a unit root test where the null hypothesis is non-stationarity. Used in this way, its limiting distribution under the null hypothesis (see Breitung (2002), Proposition 3) does not depend on the long-run variance as the long-run variance cancels out in the variance ratio. This removes the need for kernel function selection and tuning parameter optimization necessary for estimating the long-run variances.
In the frequency domain, the use of variance ratios for unit root testing is motivated by the fact that the spectrum of a unit root process peaks at the near zero frequencies, and tails off exponentially. As a consequence, the largest proportion of the variance is found in the lowest frequency bands. Suitable test statistics can, therefore, be based on the relative distribution of the variance with regards to frequency. For this to be feasible, the spectral variance needs to be decomposed in order to obtain the proportions of the variance contributed by the different frequency intervals. The Discrete Wavelet Transform (DWT) is a variance preserving transform, which decomposes the spectral variance on a scale-by-scale basis using filtering operations. The transform outputs two vectors; a vector of the DWT wavelet coefficients, and a vector of its scaling coefficients. The wavelet coefficients describe the changes at each scale, i.e. the details resulting from differences within each scale. The scaling coefficients, on the other hand, describe averages at each scale, i.e. the smooth resulting from averaging at each scale. The scale of the transform, which is inversely related to frequency, refers to the number of the recursive decompositions. Each recursive iteration from the second onwards decomposes the scaling coefficients from the preceding iteration.
The DWT has its filters operate on non-overlapping values, which means that the input time series have to be of dyadic lengths (2 , = 2, 3, …). In contrast, the Maximal Overlap DWT (MODWT) has its filters operate on overlapping values, which makes it possible to handle samples of any size. The MODWT, therefore, extracts more information on the local variation of the time series. Unlike wavelet functions with longer and smoother filters, the Haar MODWT does not suffer from boundary effects, i.e. the loss of coefficients which are subject to circular filtering operations at the end of the time series. The transform also provides a better estimator of the wavelet variance (see Percival 1995) compared to the DWT. For these reasons, we use the Haar MODWT in this paper. For more details on wavelet filters and their properties, we refer to texts by Percival and Walden (2000) and Gençay, Selçuk, and Whitcher (2001).
The jth level wavelet and scaling coefficients are defined as follows respectively: A useful property of the Haar MODWT transform is, where J 0 is an arbitrary scale less than or equal to the maximum resolution of the time series. This property implies that the time series itself (not only its variance) can be additively decomposed into its wavelet and scaling coefficients. For univariate time series, the wavelet variance ratio unit root test introduced by Fan and Gençay (2010) use a normalized version of the test statistic given below, The numerator is the contribution to the variance from the first level scaling coefficients, and the denominator is the total variance partitioned into the parts contributed by the scaling and wavelet coefficients of the first scale, respectively. The variances of the scaling and wavelet coefficients are given as, Under the unit root null hypothesis, it can be seen that ∑ −1 =0̃2,1 = ( ) and it is shown (see Fan and Gençay, 2010) that ∑ −1 =0̃2,1 = ( 2 ). The variance ratio therefore takes the form, The limiting distribution of the test statistic, which is a normalized version of̃, 1 , is non-standard and the critical values are obtained using Monte Carlo simulations. Under the alternative hypothesis both ∑ −1 =0̃2,1 and ∑ −1 =0̃2,1 are stationary and the ratiõ, 1 will be less than 1. Li and Shukur (2013) proposed usinĝ, 1 in a panel data setting. Their test statistic, , is based on averaging the variance ratios of the individual panel units, i.e. = −1 ∑ =1̂, 1 . The test was conducted on cross-sectionally correlated time series as well as series that were decorrelated by wavestrapping. Monte Carlo simulation results showed that the wavelet-based panel unit root test is more powerful than the IPS test in the presence of correlation among the panel units. The test is also more robust to size distortions resulting from cross-sectional dependency compared to the IPS test, but still over-sized.

The wavelet variance ratio unit root test for a system of equations
Consider the system of equations without deterministic terms for simplicity, where y it is the time series of interest and u it is a zero mean weakly stationary error term, i.e. = ∑ ∞ =0 (with finite long-run non-zero variance 2 (1) 2 < ∞, and ψ(1) ≠ 0), i indexes the individual equation, and t indexes time. Also Cov( , ) = 0 for i ≠ k, and ε it is an iid, zero mean process with variance 2 . ψ(L) is the lag polynomial that that relates the response of u it to ε it . The unit root hypothesis for the system, 0 ∶ | | = 1 for all and the alternative hypothesis is, Let the matrix of time series be denoted by, so that the Haar MODWT scaling and wavelet coefficients for the first scale decomposition are given by, where v i is the vector of the scaling coefficients of the series y i (i = 1, 2, … , N), i.e. ,1 , … ,1 and w i is the vector of the wavelet coefficients of series y i (i = 1, 2, … , N), i.e. ,1 , … ,1 A wavelet variance ratio unit root test can be based on the following, where, is the total variance of the system and, is the variance contributed by the first scale wavelet coefficients. Under the null hypothesis (where all the series in the equation system are I(1)), V T V is a diagonal matrix with diagonal elements being of order O p (T 2 ). The diagonal elements of V T V will dominate those of W T W which are of order of convergence O p (T). The test statistic will, therefore, take on larger values under the null hypothesis compared the values under the alternative. For white noise series, for example, VR is not bounded under the null hypothesis. A suitably normalized test statistic given as follows, whereΓ is a diagonal matrix with the diagonals consisting of the weightŝ, 1 /̂2, and̂, 1 and̂2 are consistent estimates of the wavelet and long-run variances respectively. The two variances, which enter the limiting distribution as nuisance parameters, are consistently estimated as is shown in the Appendix. The limiting distribution of the test statistic under the null hypothesis is shown in the following theorem whose proof is given in the Appendix.

Theorem 1
The limiting distribution of under H 0 is given as, where ( ), = 1, 2, … , are independent standard Brownian motions, and N is the number of equations.
Since the test statistic is the sum of variance ratios, a Central Limit Theorem could be invoked by normalizing the sum and letting N → ∞, but this is not pursued in this paper as we are mainly interested in the limit only where T → ∞.
Many unit root tests suffer from loss of power when tested against alternatives that are trend stationary. As a consequence of this, efficient detrending methods (see Schmidt and Phillips, 1966) are required to retain power. We use the detrending techniques suggested by Fan and Gençay (2010), and, as in their work, we restrict our scope to the the cases where the models specified under the alternative hypotheses have non-zero means and linear trends only.
The model including deterministic components is given as, For equation i, the null hypothesis, H 0 : ϕ i = 1 is the unit root hypothesis while under H A , ϕ i < |1| is the hypothesis of stationarity. Following Fan and Gençay (2010), when α = 0, we consider the demeaned series Let the test statistics be denoted by VR and for the cases where α = 0 and α ≠ 0 respectively (see Eqn. (1)). The limiting distributions of these statistics are given by Theorem 2 below. The derivations of these limiting distributions, which are also given in the Appendix, are similar to that given for the distribution of the test statistic given in Theorem 1, except that detrended Brownian motions are used.

Theorem 2
The limiting distributions of and under H 0 are given as, Where ⇒ denotes convergence in the associated probability measure,

Comparison unit root test
The small sample properties of VR M are compared to CIPS. Pesaran (2007) constructs the CIPS test based the following model: where the initial values y i0 are fixed. A single common factor with individual specific factor loadings is specified for the error term, = + , = 1, … , , = 1, … , , are zero mean errors with heterogeneous variances 2 . The common factor, f t , is assumed to be stationary and serially uncorrelated, and without loss of generality, its variance is fixed at 1, i.e. 2 = 1. Cross-sectional correlations are introduced by the factor loadings λ i , themselves random variables. ϵ it , λ i and f t are assumed to be mutually independent. Pesaran (2007) proposes a test that augments the standard Dickey-Fuller test with the cross-sectional averages, resulting in the following Cross-sectionally Augmented Dickey-Fuller (CADF) estimating equation, where ȳ t are the cross-sectional averages, and lags of Δy it and Δ may be included to whiten the residuals. The cross-sectional averages are used as proxies for the unobservable common factor. Letting CADF i represent the CADF statistic for equation i, CIPS is the average of the CADFs over all the equations, The small sample properties of CIPS are studied using Monte Carlo simulation in Pesaran (2007). The test is shown to be robust to size distortions even, in the presence of strong cross-sectional dependence and serial correlation, and has good power properties for sample sizes of between 50 and 100 for the DGPs considered therein.
In the following section, we examine the performance of VR M , and make comparisons with that of CIPS for time series of lengths 20-100, using systems of 5 and 10 equations. The size and power of the tests are compared in cases where there is neither cross-sectional dependency nor serial correlation (hereafter called DGP 1), in the presence of weak cross-sectional correlation but no serial correlation (hereafter called DGP 2), in the case where there is strong cross-sectional correlation but no serial correlation (hereafter called DGP 3), and in the case with both strong cross-sectional correlation and strong serial correlation (hereafter called DGP 4). The choice of DGPs follows that of Pesaran (2007).

Monte Carlo simulations
Monte Carlo simulations were used to study the size and power properties of the two unit root tests in small sample sizes. The design of the Monte Carlo experiments is discussed next.

Design of the Monte Carlo experiment
Following Pesaran (2007), time series are simulated using the following DGPs Cross-sectional correlation is introduced using a single common factor denoted by f t , and represents the unobserved common factor effect. Table 1 shows the experimental factors and the ranges over which they are varied. The nominal test size is held at 5% as per convention. The common factor loadings are sampled from the uniform distribution with parameters U[0, 0.2] and U[−1, 3] for weak and strong cross-sectional dependence, respectively. This corresponds to cross-sectional correlations between the equations of 1% and 50% on average, respectively.

Results and discussions 4.1 Empirical test sizes and power
Monte Carlo simulations were conducted for two purposes; to study the small sample performance of VR M by comparing it CIPS, and to study the robustness of VR M to cross-sectional and serial dependence. We examine these aspects in the cases where testing is against an alternative that has zero mean and no time trend, and in the case where testing is against an alternative that is stationary around a non-zero mean only. The test statistic used in both cases is since both tests correspond to α = 0 but differ in their specification of μ i (see Eqn. (1))

Case I. No deterministic terms
The 1%, 5% and 10% critical values for and CIPS test statistics are shown in Table 2. These critical values correspond to the case where no deterministic terms are assumed. Table 3 shows the test sizes for the the four DGPs given earlier. There is no evidence of size distortions any of the DGPs for both tests. Each individual series generated using the DGP = , −1 + + for = 1, … , ; = 1, … , with f t and ε it ∼ iid N(0, 1).
For CIPS, the critical values are calculated from the regression of Δy it on , −1 , −1 and Δ . The cross-sectional mean is = ∑ =1 . For the , the critical values are given by the quantiles of the empirical distribution of the test statistic with no adjustment made for the deterministic components.
The empirical power of the two tests are also displayed in Table 3. For the case where there are no deterministic components, it is clear that the is more powerful than the CIPS test for all the DGPs and the sample sizes considered, as well as for both equation systems. The power of CIPS increases with sample size for all the DGPs. The increase in power with increasing sample size is slowest for the DGP that has the combination of the strongest cross-sectional dependence and serial correlation (DGP number 4). For the 5 equations system, the highest power achieved by the CIPS test is only 83.7%.

Case II. Non-zero mean and no time trend
The critical values for VR and CIPS are given in Table 4. These values correspond to the case where a nonzero mean but no time trend is assumed. For the CIPS test, the estimating equation fits an intercept but no time trend, and for the data are demeaned prior to performing the unit root test (see the discussion on tests against trend stationary alternatives in page 10). Each individual series generated using the DGP = , −1 + + for = 1, … , ; = 1, … , with f t and ε it ∼ iid N(0, 1). For CIPS, The critical values are calculated from the regression of Δy it on a constant, , −1 , −1 and Δ . The cross-sectional mean is = ∑ =1 . For the , the critical values are given by the quantiles of the empirical distribution of the test statistic for the demeaned series. Table 5 shows the test sizes for the 4 DGPs. Again there is no evidence of size distortions for all the DGPs using both tests. Table 5 also displays the power of the two tests. Both tests show low power for the smallest sample sizes (T = 20, 30) but the power increases with sample size. Both tests show decreasing power when strong cross-sectional and serial correlation are present. For the 5 equations system, has higher power than the CIPS for sample sizes of larger than 20. For the 10 equations system, is more powerful than CIPS for sample sizes of T = 50 and T = 100. For the smaller sample sizes, both tests show similar (but low) power. As expected, the 10 equations system shows more power than the 5 equations system for both tests.
has noticeably higher power than CIPS in the presence of both serial and cross-sectional correlation. The power advantage of the wavelet-based unit root test over CIPS could be explained by the differences in effective sample sizes. While the wavelet-based test loses power when the series are demeaned or detrended, CIPS requires the estimation of several parameters for each individual time series, which means that the effective sample size is reduced (hence the loss of power).

Empirical application
Purchasing Power Parity (PPP) has been heavily researched in international economics because of its central role in building macroeconomic models. There are two different versions of PPP; the absolute PPP, which refers to the situation where the nominal exchange rate between two currencies is equal to the ratio of the price levels of the two corresponding countries, and the relative PPP which takes into account factors such as trade barriers (tariff and non-tariff barriers), transportation costs, and product differentiation across countries. The empirical literature has focused on the relative version the PPP which is the weaker version of the macroeconomic theory. In the relative version, the rate of depreciation of a currency equals the difference in price inflation of that country's currency and the price inflation in the comparative country, making the real exchange rate constant.
The conventional procedure when evaluating PPP is to test the null hypothesis that the real exchange rate series has a unit root against the alternative hypothesis of being stationary. Rejection of the null hypothesis indicates support for the PPP theory. Initial studies using augmented Dickey-Fuller (ADF) unit root tests suggested by Dickey and Fuller (1979) showed little evidence supporting PPP in the long-run. An example of such a study is Taylor (1988) where the conclusions were very unfavorable to PPP as a long-run equilibrium condition.
Other examples of such studies include Corbae and Ouliaris (1988), Layton and Stark (1990), Corbae and Ouliaris (1991), and Bahmani-Oskooee (1993. However, Frankel and Rose (1996) noted that a non-rejection of the null hypothesis may be due to low statistical power of the unit root tests, which is mainly caused by the lack of data. Glen (1992), Lothian and Taylor (1996), and Taylor (2002) among others suggest that longer time series could be used to provide indirect evidence to support PPP. However, these long-span studies also faced the criticism (see Hegwood and Papell, 1998, for example) that structural breaks or shifts in the equilibrium exchange rates are possibly generated during the long time span, thereby biasing the results. An alternative approach, which can be used to increase the statistical power, is to utilize the cross-sectional dimension of multiple time series. Examples of studies using panel unit root tests are Cheung, Chinn, and Fujii (2006) and Papell (2002, 2005).
As an empirical application we use the PPP theory and compare the evidence found from using CIPS and . The data are on the real exchange rates of the Group of 7 (G7) countries (source: Bruegel website: http://bruegel.org/) and covered the time span between 1960 and 2015. To avoid the potential bias from a structural break, we consider data from the post-Bretton Woods period (the period after currencies were unpegged from the dollar, spanning from 1972 to 2015) in a separate analysis. The results are shown in Table 6. The data are demeaned so as to study the relative version of PPP. The results confirm those from the simulation study and conclude that , rejects the null hypothesis for both sample periods. No such support for PPP is found using CIPS for either of the sample periods at the 5% significance level.
The CIPS test is the conventional Pesaran (2007) test. The test is based on the MODWT using the Haar filter. This method, as noted in Section 2 when describing the wavelet filtering, is chosen since it extracts more information than the DWT. The method can also handle samples of any sizes. By using the Haar filter we also avoid loss of information due to boundary coefficients. The test is, as discussed in Fan and Gençay (2010), is based on the observation by Granger (1966) that the spectral density of trending time series, such as the real exchange rates, are characterized by a significant power in low frequencies followed by exponential decline at higher frequencies. The more powerful wavelet-based test for unit roots suggested by Fan and Gençay (2010) used this notion and the ability of wavelets to decompose the variance of a time series at different frequencies.
Capitalizing on the idea of Granger (1966) and the decomposing ability of wavelets Fan and Gençay (2010) constructed the variance ratio test which we generalize to systems of equations. The results from our simulation study and the Fan and Gençay (2010) study indicates that this type of wavelet variance ratio test is more powerful than the traditional parametric options such as the ADF test for univariate time series and the CIPS test for systems of equations. This is the main reason why the test is able to reject the null hypothesis of a non-stationary system of equations

Summary and conclusions
A unit root test for a system of equations is introduced in this paper. The proposed test extends the wavelet variance ratio unit root test of Fan and Gençay (2010) to multiple equation time series.
Monte Carlo simulations show that the proposed test has higher power compared to CIPS (Pesaran 2007) for time series of short length (between 20 and 100 observations), and systems of 5 and 10 equations. The test is also shown to be robust to cross-sectional dependency and serial correlation for the DGPs considered in this paper.
We demonstrate its usefulness through an empirical application on evaluating the PPP theory for the G7 countries. Evidence from this evaluation points to different countries following different specifications, with some having stationary exchange rate series.
The proposed unit root test is simple to apply and interpret, and could prove to be useful to the practitioner who is faced with a system of 10 or fewer equations and time series of lengths up to 100. For larger systems of equations or systems with longer time series, any of the existing unit root tests should provide adequate power.

Acknowledgement
The first two authors would like to acknowledge the contribution of the third author, Ghazi Shukur, who passed away during the review period of the manuscript. We would also like to thank the anonymous reviewers for their suggestions.

A Appendix
Consider the first-order autoregressive model: where y it is the time series of interest, i = 1, …, N are the individual time series, t = 1, …, T indexes time, z are the deterministic components, and u it is a weakly stationary zero mean process with finite long-run variance.
Here we consider the cases with no deterministic terms as well as where the alternative is trend stationary around a non-zero mean and time trend.

Proof of Theorem Theorem 1
Consider N time series that have no cross-sectional correlation but are possibly autocorrelated, The Haar MODWT scaling and wavelet coefficients for the first scale decomposition are given by, where v i is the vector of the scaling coefficients of the series y i (i = 1, 2, … , N), i.e. ,1 , … ,1 and w i is the vector of the wavelet coefficients of series y i (i = 1, 2, … , N), i.e. ,1 , … ,1 . The total variance in the system of equations can be expressed in terms of the Haar MODWT coefficients as follows: The contribution to the total variance due to the wavelet and scaling coefficients is given by, tr ( −1 (V V)) and tr ( −1 (W W)) , respectively.
A unit root test statistic can, therefore, be based on the following ratio, Under the unit root null hypothesis, the diagonal elements of V V = ( 2 ) while those of W W = ( ). The variance ratio will take on large values. Whereas for stationary processes, both terms are O p (T) and the ratio will be small.
Under the null hypothesis, as T → ∞ since there is no cross-correlation i.e. v v and w w = (1), and Then, For the MODWT transform, , Also, for this wavelet transform, From the asymptotic theory for unit root processes (see Hamilton (1994) pp. 486) and Continuous Mapping Theorem (CMT) (see Billingsley 1968), where 2 is the long-run variance of u it . Also, for the Haar MODWT wavelet filter, where E( 2 ,1 ) is the first scale wavelet variance for series i (see Percival 1995) Using the CMT, the limiting distribution of the variance ratio for each individual series is, where the long-run variance of u it (for time series i) is given by, 2 is the partial sum process of 2 , , γ i, j is the lag j autocovariance for time series i, and ⇒ is used to denote convergence of the associated probability measure.
The two nuisance parameters in the limiting distribution, 2 and E( 2 ,1 ), can be consistently estimated as follows: 1. E( 2 ,1 ) is the wavelet variance at unit scale of the Haar MODWT. Its consistent estimator (see Percival (1995)) is given by,̂, The wavelet variance estimator for the Haar MODWT avoids boundary effects, which is the loss of coefficients at the ends of time series as a result of circular filtering operations.

2.
For the long-run variance, 2 , estimation can be made in one of two ways (see Zivot and Wang, 2006, for details on the long-run variance estimation): a. Parametric approach For time series i, since ω i is a linear process, it follows that, giving 2 = 2 (1) 2 . When u it is ARMA(p, q), then, (1) = 1 + 1 + … + 1 + 1 + … + = (1) (1) , which gives 2 = 2 (1) 2 (1) 2 where 2 is the variance of the error of the ARMA model for time series i. Making substitutions using the estimates of the parameters of the ARMA(p, q) process gives a consistent estimate of 2 . A second parametric approach is to approximate the ARMA(p, q) process with a higher order AR(p * ) process, = ,1 , −1 + … + , * , − * + , and then estimate the long-run variance as follows 2 = 2 * (1) 2 b. Semi-parametric method using a kernel function: One possible semi-parametric estimator of the long-run variance is the Newey and West (1987) estimator, which is the wighted covariance function, 2 =̂, 0 + 2 ∑ ℓ=1 ,ℓ̂,ℓ where w i,ℓ are the weights for time series i,̂, ℓ are the autocovariances for time series i, and L is the truncation lag or bandwidth parameter, such that L = O(T 1/3 ) (see Andrews 1991). Newey and West use the Bartlett weights, ℓ = 1 − ℓ + 1 with = ⌊4( /100) 2/9 ⌋.
The nuisance parameters are eliminated from the limiting distribution by normalizing the variance ratio with the ratio of the consistent estimates of the nuisance parameters, The test statistic is, therefore, whereΓ is a diagonal matrix with the main diagonal consisting of the weightŝ, 1 /̂2.

Proof of Theorem Theorem 2
Let̃represent the time series adjusted for the deterministic components i.e.

= ( −̂)
wherêis the estimate ot the deterministic component. Then, from asymptotic theory for demeaned unit root processes (see Stock (1994), for example) where 2 is the long-run variance for equation i. For the case where the time series are efficiently detrended, the following result holds (see Kwiatkowski et al. (1992), for example)  (1), and W(r) is Brownian motion.
The rest of the proof follows that for Theorem 1. Starting with Eqn.
(2), replacing W i (r) with ( ) and V i (r) with ( ) in the proof of Theorem 1 for the cases where the series have been demeaned and detrended respectively, leads to the limiting distributions of the two test statistics ( and ) as given in Theorem 2.