# A simple solution of the spurious regression problem

Cindy Shin-Huei Wang and Christian M. Hafner

# Abstract

This paper develops a new estimator for cointegrating and spurious regressions by applying a two-stage generalized Cochrane-Orcutt transformation based on an autoregressive approximation framework, even though the exact form of the error term is unknown in practice. We prove that our estimator is consistent for a wide class of regressions. We further show that a convergent usual t-statistic based on our new estimator can be constructed for the spurious regression cases analyzed by (Granger, C. W. J., and P. Newbold. 1974. “Spurious Regressions in Econometrics.” Journal of Econometrics 74: 111–120) and (Granger, C. W. J., N. Hyung, and H. Jeon. 2001. “Spurious Regressions with Stationary Series.” Applied Economics 33: 899–904). The implementation of our estimator is easy since it does not necessitate estimation of the long-run variance. Simulation results indicate the good statistical properties of the new estimator in small and medium samples, and also consider a more general framework including multiple regressors and endogeneity.

JEL Classification: C22; C53

## 1 Introduction

The long-run relation between economic time series often plays an important role in macroeconomics and finance. Additionally, many macroeconomic and financial models imply that certain variables are cointegrated as defined by Engle and Granger (1987). Empirical tests, however, often fail to reject the null hypothesis of no cointegration, even for moderate sample sizes. One possible explanation of these test results is that the error term has a unit-root. For instance, the error term may contain a unit-root because of a nonstationary measurement error in one variable or nonstationary omitted variables (see Choi, Hu & Ogaki, 2008). In these cases, when the error term is a non-stationary I(1) process but structural parameters can be recovered, the regression is called a structural spurious regression. Another related issue that is pervasive in the time series literature is the danger of obtaining spurious correlation findings. A spurious correlation occurs when a pair of independent series, each of them nonstationary or strongly autoregressive, are found apparently to be related according to standard inference in an OLS regression.

Since the seminal contribution by Yule (1926) on nonsense correlations between time series, it has been shown that spurious regression results may occur not only for pairs of independent unit root processes (see Granger and Newbold 1974, for a simulation study and Phillips 1986, for a theoretical explanation) but also for other persistent processes, such as I(2) (Haldrup 1994), or even positively autocorrelated (stationary) autoregressive series (Granger, Hyung & Jeon, 2001). Some important applications of spurious regressions in economics and finance, although this list is by no means exhaustive, include Plosser, Schwert, and White (1982), Plosser and Schwert (1978), Ferson, Sarkissian, and Simin (2003), Hendry (1980), and Valkanov (2003).

The purpose of this paper is to propose a new estimator for cointegrating and spurious regressions that can successfully address both problems described above. More specifically, we propose a two-stage generalized Cochrane-Orcutt transformation estimator based on an autoregressive AR(k) approximation framework (henceforth CO-AR estimator) that can cover a wide class of regression models. The order k of the AR approximation is assumed to grow with the sample size but at a slower rate. In practice, it can be chosen according to classical selection criteria such as AIC or BIC. This generalization of existing Cochrane-Orcutt transformations in this framework, in particular that of Choi, Hu, and Ogaki (2008), is one of the main contributions of our paper. The CO-AR estimator turns out to be a robust procedure with respect to error specifications of unknown form even when the regressors and regressand are highly persistent (or possibly unit-root) processes.

Several papers have studied the problem of spurious regressions. Two recent contributions analyze an approach to correcting spurious regressions involving nonstationary regressors and error terms. Choi, Hu, and Ogaki (2008) propose the CHO-FGLS estimator that is consistent not just when the regression error is stationary but also when it is unit-root nonstationary.

Our estimator generalizes CHO-FGLS by approximating the unknown error term using an AR(k) model rather than AR(1). This implies that the CO-AR estimator can be successfully applied to a much wider class of regression models. In particular, unlike CHO-FGLS, the CO-AR estimator provides consistent and accurate estimation results when the error term follows a stationary (but potentially highly persistent) ARMA(p,q) process as well as a nonstationary ARIMA(p,1,q) process. The fact that the CO-AR estimator can cover general regression models is especially appealing in empirical work where the determination of the exact form of the error terms is often unknown, and the errors rarely follow a simple AR(1) process.

Our estimator differs from the CHO-FGLS estimator is that we use first differencing in the first step. This implies that the slope coefficients of the preliminary step are recovered using the GLS corrected estimator rather than the standard OLS regression. By doing so, the CO-AR estimator remains consistent, but is more robust than the CHO-FGLS when the innovations are highly persistent.

Deng (2014) considers the spurious regression issue when the regressors follow an ARMA(1,1) process where both the AR root and the MA root are close to unity, and derives a t test with a well defined distribution under the null by employing the fixed-bandwidth long run variance framework. Since the limiting distribution of Deng’s t statistic critically depends on the long-run variance estimator, its implementation requires the selection of a suitable bandwidth parameter and kernel function. On the other hand, the CO-AR version of the t-statistic asymptotically follows the standard normal distribution and does not rely on a long-run variance estimator. Therefore the CO-AR t test can be easily implemented.

We now briefly describe the implementation of the two-stage CO-AR estimator. First, we start by taking the first difference of both the dependent and explanatory variables of our regression model, and use standard OLS to estimate the slope coefficients. Then, we compute residuals as the difference between the level of the dependent variable and the level of the explanatory variables multiplied by the estimated slope coefficients. Second, we fit an AR(k) model to these residuals, where the order k of the approximation is determined by minimizing the value of an information criterion. Berk (1974), Shibata (1980), and Bauer and Wagner (2008) show that an AR model can well approximate an I(0) and an I(1) process with increasing k. By doing so, we can filter the error term by an AR(k) model without the prior knowledge of the forms of the error term, and even if some I(0) or I(1) omitted variables are included in the error term. Third, we conduct a generalized Cochrane-Orcutt transformation (of order k) of both the dependent and independent variables. Finally, we run a standard OLS regression on the transformed variables.

The main findings of the paper can be summarized as follows. First, we derive the consistency and the limit distribution of the CO-AR estimator when a) the dependent, the explanatory variables and the innovations are stationary, but potentially highly persistent, b) the dependent, the explanatory variables and the innovations are non-stationary I(1) series and c) the dependent and the explanatory variables are unit-root non-stationary series and the innovations are stationary. As a corollary, we show that the usual t-statistic of the slope coefficients constructed by our CO-AR estimation (CO-AR t-test) is convergent and follows the standardized Gaussian N(0, 1) distribution, even when the regressors and regressand are highly persistent or possibly non-stationary processes. In sum, our proposed CO-AR estimator represents a new solution to the spurious regression problem, as examined in the influential articles of Granger and Newbold (1974) and Granger, Hyung, and Jeon (2001).

Second, we investigate the finite sample performance of our CO-AR estimator in a simulation exercise. The Monte Carlo experiments confirm our theoretical findings. We find that in spurious regressions the conventional significance tests based on standard least squares estimation and inference are seriously biased towards rejection of the null hypothesis of no relationship (and hence acceptance of a spurious relationship), even when the series are generated as statistically independent. The t-test constructed by our CO-AR estimator, however, remains approximately distributed as a standard Normal distribution without using the long-run variance framework, even in samples as small as 50 observations. Hence, the simulation results indicate the size control and the power pattern of our methodology are excellent even in small samples. On the other hand, our CO-AR t-test can be viewed as a new test to avoid the over-rejection of the null hypothesis and solve the spurious regression problem. We also find that in cointegrating relationships the CO-AR is as good as the other three conventional estimators, dynamic OLS (Saikkonen, 1991; Stock & Watson, 1993), GLS Corrected, and CHO-FGLS (Choi, Hu & Ogaki, 2008) when the innovation follows either a white noise or an AR(1) process, and significantly outperforms the FGLS when the innovation follows a stationary ARMA process or nonstationary ARIMA dynamics.

The rest of the paper is organized as follows. Section 2 starts by establishing the asymptotic properties of the new CO-AR estimator for cointegrating and spurious regressions, and constructing a convergent usual t-statistic in spurious regressions. Section 3 investigates the finite sample performance of the CO-AR estimator through Monte Carlo experiments. Section 4 contains some concluding comments. An Appendix provides proofs of the results given in the paper.

## 2 The statistics and main results

The objective of this section is to establish the asymptotic properties of the CO-AR estimator in both spurious and cointegrating regressions, and to construct a convergent usual t-statistic in spurious regressions as analyzed by Granger and Newbold (1974) and Granger, Hyung, and Jeon (2001).

### 2.1 Stationary regressions

Consider the regression Model I,

yt=βxt+ut

where the regressor xt and the error term ut are independent and follow ARMA processes as defined by the following Assumption 1. For the sake of notational simplicity we do not include a constant in the regression and assume that xt is univariate. The independence assumption is strong as it implies, among other things, absence of endogeneity. We will discuss ways to generalize our framework to multiple regressors and endogeneity in Section 2.4.

### Assumption 1

The processes xt and ut follow independent ARMA(p, q) processes of the form (zt is either xt or ut)

ϕ(L)zt=θ(L)et

where (i)s the autoregressive (AR) and moving average (MA) polynomials ϕ() and θ() in the lag operator L are assumed to have all roots outside the unit circle; (ii) ϕ() and θ() have no common roots; (iii) et is an i.i.d. process with E[et] = 0,E[et2]=σe2, and E[et4]<.

Note that the lag polynomials of the models for xt and ut can be different and, in particular, can be of different order. Assumption 1 guarantees that the conditions of Theorem 2 of Berk (1974) hold and allows to represent the ARMA process ut as

ut=j=1bjutj+et

where j=1|j|s|bj|<, s > 1, and to approximate ut by a fitted AR(k) process as

ut=j=1kb^jutj+e^tk+op(1)

with k → ∞. The fact that we use an AR(k) approximation of increasing order is the key feature of our procedure and allows us to approximate a much wider class of serially correlated error processes than for example the AR(1) approximation used by Choi, Hu, and Ogaki (2008). As a consequence, we expect the new procedure to provide more efficient parameter estimates as well as parameter tests that are more robust with respect to wider patterns of error autocorrelation. Section 3 will investigate this idea via a detailed simulation study.

The following assumption gives conditions on the choice of k as the sample size increases.

### Assumption 2

The order k = kT is chosen such that

1. k → ∞ as T → ∞,

2. k = o(T1/3), and

3. Tj=k+1|bj|0, as T → ∞.

Assumption 2(a) requires that the order of the fitted AR(k) model increases as the sample size increases. Assumptions 2(b) and 2(c) give upper and lower bounds on the rate at which k increases, based on the conditions of Berk (1974). The condition 2(c) is a lower bound since k has to increase sufficiently fast such that j=k+1|bj|=o(T1/2). Shibata (1980) and Ing and Wei (2003) show that a model which minimizes the mean squared prediction error is of order k* = O(logT), 1 ≤ k*kT.

We are now in the position to introduce our two-stage Cochrane-Orcutt GLS estimator as follows:

Stage 1. We start by taking the first difference of the dependent and explanatory variables, and compute the standard OLS estimate of the coefficients β, denoted as β^GLSC, of the following model,

Δyt=βΔxt+Δut

where Δ = 1 − L denotes the difference operator. In the case of I(1) processes yt and xt, to be discussed in the following sections, this procedure can be viewed as a GLS corrected estimation as described by Choi, Hu, and Ogaki (2008). Note that when xt and ut are uncorrelated, the estimator β^GLSC is consistent.

Stage 2. Construct the series of fitted residuals u^t=ytβ^GLSCxt. Then, approximate the errors ût by a finite order AR(k) model, i.e. u^t=j=1kb^jutj+e^tk. Subsequently, conduct the following Cochrane-Orcutt transformation of the variables xt and yt:

y~t=ytj=1kb^jytj,x~t=xtj=1kb^jxtj.

Consider OLS estimation of the regression

y~t=βx~t+u~t

where u~t=e^tk+op(1) is an asymptotically uncorrelated error term. The OLS estimator is computed as

β^CO-AR=(t=k+1Tx~t2)1t=k+1Tx~ty~t.

Under our condition that xt and ut are independent ARMA processes, the following theorem gives the asymptotic properties of the CO-AR estimator of β.

### Theorem 1

If the data generating process satisfies Model 1, xt and ut are independent ARMA processes satisfying Assumption 1, then, under Assumption 2,

β^CO-AR=β+Op(T1/2).

Theorem 1 includes the particular case β = 0, in which case xt and yt are independent. We obtain the following corollary.

### Corollary 1

If xt and yt are independent ARMA processes satisfying Assumption 1, then under Assumption 2, β^CO-AR=Op(T-1/2)

Theorem 1 shows that the CO-AR estimator of β is consistent and converges at rate T1/2 when the data generating process satisfies Model I. The corollary indicates that when an ARMA process is regressed on another, independent ARMA process, the CO-AR process remains consistent at rate T1/2. This asymptotic result improves the previous studies of Deng (2014) where β^=Op(1). Simply speaking, the spirit of our methodology follows the suggestion of Granger, Hyung, and Jeon (2001) saying that the proper reaction to having a possible spurious relationship is to add lagged dependent and independent variables until the errors appear to be white noise.

We now construct the usual t-statistic to test the hypothesis H0: β = 0 by using the CO-AR estimate of β and the corresponding standard error Sβ^=(σ^u~2/t=k+1Tx~t2)1/2, where σ^u~2=(Tk)1t=k+1Tu~t2 and u~t=y~tβ^CO-ARx~t. The asymptotic properties of the t-statistic are given in the following theorem.

### Theorem 2

If xt and yt are independent ARMA processes satisfying Assumption 1, then, under Assumption 2,

tβ=β^CO-ARSβ^LN(0,1).

The important implication of this theorem is that we can build a convergent t-statistic tβ that, under the null hypothesis, is asymptotically normally distributed without having to estimate the long-run variance. The associate finite sample performance of the statistic will be analyzed in Section 3.

### 2.2 Spurious regressions with I(1) processes

Many macroeconomics models imply that certain variables are cointegrated as defined by Engle and Granger (1987). However, cointegration tests often fail to reject the null hypothesis of no cointegration for these variables. Choi, Hu, and Ogaki (2008) pointed out that one possible explanation for these empirical results is that the error is unit-root non-stationary due to a non-stationary measurement error in one variable or non-stationary omitted variables. Recall that a regression with nonstationary stochastic errors is defined “spurious” in the time series literature. In this section, we therefore consider the general case where both xt and ut are integrated of order one.

Consider the following Model (II):

yt=βxt+ut

where xt and ut are independent I(1) processes. Again, we note that the independence assumption is strong, but refer to Section 2.4 for possible extensions. We make the following assumption about the dynamics of xt and ut.

### Assumption 3

The processes xt and ut follow independent ARIMA(p, 1, q) processes of the form (zt is either xt or ut)

ϕ(L)(1L)zt=θ(L)et,

where (i) the autoregressive (AR)-and moving average (MA)-polynomials ϕ(.) and θ(.) are assumed to have all roots outside the unit circle; (ii)ϕ(.) and θ(.) have no common roots; (iii) et is an i.i.d. process with E(et) = 0, E(et2)=σe2 and E(et4)<.

Again, the lag polynomials can be different for xt and ut. By applying the CO-AR estimating procedure, one obtains a T-consistent first stage estimator β^GLSC and residuals u^t=ytβ^GLSCxt. Then, the CO-AR estimator of β is the OLS estimator of the regression y~t=βx~t+u~t. We summarize the asymptotic properties of the CO-AR estimate of β in Model (II) in the next theorem.

### Theorem 3

If the data generating processes satisfy Model (II) and Assumption 3, then, under Assumption 2, β^CO-ARβ=Op(T1/2).

Theorem 3 not only implies that the CO-AR estimator is consistent (while the OLS estimator is inconsistent), but also is more general when compared to the corrected GLS and CHO-FGLS estimators (see Choi, Hu & Ogaki, 2008). On the one hand, the corrected GLS is the ideal solution for the spurious regression problem when the error term follows a random walk process. However if the error term does not follow a pure random walk process, differencing the data can result in a misspecified model. On the other hand, the CHO-FGLS estimator has promising performance for regressions with a stationary AR(1) error term. However, in empirical work the exact autocorrelation structure of the error terms is often unknown, and the errors rarely follow a simple AR(1) process. Therefore in practice both the GLS and the CHO-FGLS estimators should be used with care. Instead, the performance of the CO-AR estimator is very good not only for stationary AR(1) or random walk error terms but also for more general innovations, such as stationary ARMA(p, q) processes as well as nonstationary ARIMA(p, 1, q) processes.

Similar to the stationary case, Theorem 3 also includes the particular case β = 0, in which case xt and yt are independent. We obtain the following Corollary.

### Corollary 2

If xt and yt are independent I(1) processes satisfying Assumption 3, then, under Assumption 2, βCO-AR = Op(T−1/2).

Thus, βCO-AR is consistent in the important case of spurious regressions with integrated processes. Based on this result, we next show that a convergent t-statistic for β can be constructed for the case of spurious regressions.

Since the Monte Carlo study by Granger and Newbold (1974) and the asymptotic theory developed by Phillips (1986) it is well known that the usual t-statistic for a regression between independent I(1) processes does not have a limiting distribution but diverges at the rate of T1/2 as the sample size T increases. In this section, in a more general framework, we establish a convergent standard t-statistics by using the CO-AR estimation for a regression between two independent I(1) processes. In effect, we allow yt and xt to be rather general integrated processes whose first differences are weakly dependent and possibly heterogeneously distributed innovations. This includes a wide variety of data-generating mechanisms, such as the ARIMA (p, 1, q) model. The following theorem shows that by using the CO-AR estimator, a convergent t-statistic can be constructed and is asymptotically normal distributed for a regression between two independent I(1) processes.

### Theorem 4

Assume that xt and yt are independent I(1) processes satisfying Assumption 3, then, under Assumption 2, tβ=β^CO-ARSβ^LN(0,1).

Note also that using the CO-AR methodology to construct a convergent standard t-statistic avoids the difficult issues of choosing a suitable bandwidth parameter and a kernel function for long run variance estimation.

### 2.3 Cointegrating regressions

In this section we consider the asymptotic distribution of the CO-AR estimator under the assumption of cointegration, i.e. the error term ut in Model (II) is an I(0) process.

### Theorem 5

If the data generating processes xt and yt follow Model (II) but ut is an I(0) process satisfying Assumption 1, xt and ut are independent processes, then, under Assumption 2, β^CO-ARβ=Op(T1) and

T(β^β)d(01V(r)2dr)101V(r)dW(r)

where V(r) is a Brownian motion with variance j=E[vtvtj], vt = Δxt, and W(r) is a Brownian motion with variance j=E[ututj].

Theorem 5 shows that the CO-AR estimation method is also useful for cointegration analysis. For instance, in a cointegration model, the OLS estimator is consistent at the convergence rate T, but if the error term follows a stationary ARMA (p, q) process, the OLS estimator may cause inaccurate estimation results, similar to the stationary case investigated by Granger, Hyung, and Jeon (2001). In large samples, CO-AR and OLS behave similarly under cointegration, and the corresponding asymptotic distributions are identical and given by Theorem 5. However, the lag selection procedure of the CO-AR estimator offers additional flexibility in small samples.

Again, our results use the strong assumption of independence between ut and xt. We now turn to possible generalizations in the next section.

### 2.4 Multiple regressors and edogeneity

We consider two extensions of the proposed procedure, the case of multiple explanatory variables, and that of possible endogeneity.

Consider the following regression

yt=βxt+ut

where xt is now a (p × 1) vector of explanatory variables, and β = (β1, …, βp) a vector of coefficients.

Suppose now that the error term ut is not necessarily uncorrelated with each of the explanatory variables, and that we have a set of instruments wt = (w1t, …, wpt)′ which are uncorrelated with ut but correlated with the respective explanatory variable. If a particular xit is uncorrelated with ut we simply set wit = xit. Define the (T × p) matrices X = (x1, …, xT)′ and W = (w1, …, wT)′, the (T − 1 × p) matrices ΔX = (Δx2, …, ΔxT)′ and ΔW = (Δw2, …, ΔwT)′, the (T × 1) vector Y = (y1, …, yT)′ and the (T − 1 × 1) vector ΔY = (Δy2, …, ΔyT)′.

In the first stage of the CO-AR procedure above, the regression Δyt = β′Δxt + Δut is now estimated by the instrumental variable estimator β^IV=(ΔWΔX)1ΔWΔY, which remains consistent unlike the usual OLS or GLS estimators.

In the second stage, as above we obtain fitted residuals after the first stage, then fit an AR(k) model and apply a Cochrane-Orcutt transformation to the series xt and yt to obtain the transformed x~t and y~t. The regression

y~t=βx~t+u~t

is then estimated by the IV estimator

β^IV-COAR=(W~X~)1W~Y~

where X~=(x~1,,x~T), W~=(w~1,,w~T) and Y~=(y~1,,y~T).

In the spurious regression case, the asymptotic covariance matrix of β^IV-COAR can be consistently estimated using

S=σ^u~2(W~X~)1(W~W~)(X~W~)1

where σ^u~2=T1t(ytβ^IV-COARxt)2, and we can construct usual Wald-type statistics to test the hypothesis H0: β = β0 as

(β^IV-COARβ0)S1(β^IV-COARβ0),

which, under H0, has an asymptotic χ2 distribution with p degrees of freedom. As before, we can also construct individual tests of hypotheses H0: βj = βj0, j = 1 …, p, using the t-statistic

tβ=β^j,IV-COARβj0Sjj

where Sjj is the j-th element on the diagonal of S. Under H0 this statistic has an asymptotic standard normal distribution. We will provide additional simulation evidence in the next section to investigate the performance of our estimator in this more general setting.

## 3 Simulation results

In this section we use simulations to show, for several typical cases, that the test-rejection probabilities of the CO-AR estimator are close to the nominal levels in small and medium samples. Moreover, we analyze the finite-sample properties of our CO-AR estimator in cointegrating relationships with serially correlated errors and spurious regressions, and compare its performance to the other estimators (OLS, GLSC and CHO-FGLS) discussed in Choi, Hu, and Ogaki (2008).

### 3.1 Stationary and spurious regressions

We study the finite sample performance of the CO-AR t-statistic compared to the t-statistic of the standard least squares estimation and inference (OLS). By adopting the same experimental design as Granger, Hyung, and Jeon (2001), suppose that Xt and Yt are generated by the following independent processes:

• DGP 1: xt = et

• DGP 2: xt = 0.95xt−1 + et

• DGP 3: xt = 0.99xt−1 + et

• DGP 4: (1 − 0.9L)(1 − L)xt = (1 + 0.5L)et

• DGP 5: (1 − L)2xt = et

where xt stands for either Xt or Yt, and et is drawn from independent N(0, 1) populations. In words, Xt or Yt are generated respectively by a white noise process (DGP 1), a strongly positively autocorrelated autoregressive series (DGP 2 and 3), an ARIMA(1,1,1) process (DGP 4) and a nonstationary I(2) process (DGP 5). The number of iterations in each simulation is 20,000. To avoid the problem of fixing X0 and Y0, in each replication, the first 100 observations are discarded, and X−100 = Y−100 = 0. We consider sample sizes of 50, 100 and 500 observations. The lag length of the AR(k) approximation of the error term is determined by applying the Bayesian information criterion (BIC) to each replication with k ranging from 0 to some maximal order kmax, which is set respectively at 3 (when the sample consists of 50 observations), 4 (when the sample is 100), and 8 (when the sample is 500).

Table 1 shows the percentage of rejection of the null hypothesis of no linear relationship between Yt and Xt at the 5% critical value, i.e. absolute value of the t-statistic greater than 1.96, using both OLS (left) and the CO-AR estimator (right). The series Xt and Yt are generated independently employing DGP 1 to 5. First, the t-test based on the OLS estimate is well-behaved as long as either Xt or Yt are white noise processes. Second, as soon as both Xt and Yt display serial correlation over time, the OLS t-test is spuriously biased towards rejection. This becomes very serious when both series have strong temporal dependence. Moreover, the percentage of spurious relationships tends to increase with the sample size. Third, the empirical size of the t-test based on the CO-AR estimator is very close to the theoretical size of 5%. Put differently, our estimator can control almost perfectly for the spurious regression problem, especially for moderate and large sample sizes.

### Table 1:

Spurious regression with Normal distributions. The table displays the percentage of rejection, i.e. absolute value of t-value greater than 1.96. The order of the AR approximation is selected using BIC. The maximum number of lags to be used in the selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500).

Percentage of |t| > 1.96
Y1Y2Y3Y4Y5
T = 50OLSX15.25.45.75.55.8
X25.756.560.561.661.8
X35.860.165.065.566.3
X45.560.265.567.267.5
X55.562.567.868.168.4
CO-ARX15.36.36.86.97.0
X25.86.37.06.66.6
X35.76.36.06.76.6
X45.86.16.06.36.4
X55.96.26.16.26.0
T = 100OLSX15.25.25.55.55.6
X25.462.566.861.662.6
X35.367.372.965.567.8
X45.467.375.367.268.2
X55.468.576.168.169.4
CO-ARX15.15.55.66.16.2
X24.95.95.16.06.1
X35.25.75.06.06.2
X45.65.65.05.65.5
X55.75.75.15.45.1
T = 500OLSX14.85.15.05.05.0
X25.265.572.473.874.1
X34.972.284.485.886.4
X44.873.785.589.689.3
X55.074.285.789.990.1
CO-ARX14.94.93.94.14.0
X25.15.05.55.65.2
X34.75.25.45.15.3
X44.94.84.95.25.4
X55.04.95.15.35.2

We have compared the size performance of the CO-AR t-test with that of setting the AR order k = 1 in all regressions. In the vast majority of cases, except for those where the true error process was AR(1), the size of the CO-AR test was closer to the nominal size. The CHO-FGLS test happens to be severely biased in sufficiently large samples in situations where the serial correlation structure of the error term is more complicated than AR(1). Detailed results are available upon request.

To assess the robustness of the previous conclusions, we also use different distributions for the error terms, in particular a Student-t-with five degrees of freedom and a Laplace distribution. All gave very similar results compared to Table 1.

We further consider the power performance of the CO-AR t-test by considering two cross-correlated time series {Yt} and {Xt}. We assume that xt and yt are generated with ρxy(j) = 0.2 for j = 0 and ρxy(j) = 0 for j ≠ 0 where ρxy(j) denotes the cross correlation function of xt and yt at lag j. All power estimates are above 60% at a nominal size of 5%. Detailed results are not reported but available upon request.

### 3.2 Finite sample performance of the four estimators

Consider the following regression model:

(1)yt=βxt+γvt+et

where vt = (△xt−k, …, △xt)′, and γ = (γ0, …, γk)′, and xt is integrated of order 1, i.e. I(1). Note that the inference procedure about the coefficients β substantially differs according to the different assumptions on the error term et. When the error term et is stationary, the equation (1) is a cointegrating regression with potentially serially correlated error. When the error term et is a unit-root process, the equation (1) is a spurious regression.

Choi, Hu, and Ogaki (2008) discuss three methods to estimate the structural parameters β:

• (1) Dynamic Ordinary Least Squares (DOLS): Regress yt on xt and vt to get β^DOLS.

• (2) Corrected Generalized Least Squares (GLSC): Regress △yt on △xt and △vt to get β^GLSC.

• (3) Feasible Generalized Least Squares (FGLS): Let the residual from the DOLS regression be denoted by e^t.

Then run the following AR(1) regression e^t=ρe^t1+ut, and compute the OLS coefficient ρ^. Apply the Cochrane-Orcutt transformation to the data:

y^t=ytρ^yt1,x^t=xtρ^xt1,v^t=vtρ^vt1

Finally, regress y^t on x^t and v^t to get β^CHO-FGLS.

This section compares the finite sample properties of β^CO-AR with those of β^DOLS, β^GLSC and β^CHO-FGLS. For the ease of comparability, we adopt the same experimental design as Choi, Hu, and Ogaki (2008). More specifically, in the simulation we generate vt and ut from two independent standard normal distributions. The structural parameter is set to β = 2 and γvt = 0.5vt. The error process et is specified as follows:

• DGP a: et = ut

• DGP b: et = 0.95et−1 + ut

• DGP c: et = et−1 + ut

• DGP d: et = 0.5et−1 + 0.5et−2 + ut

• DGP e: Δet = 0.95Δet−1 + ut

where utN(0, 1). In words, the error term is a white noise (DGP a), an AR(1) process with autoregressive coefficient set to 0.95 (DGP b), a random walk (DGP c), an AR(2) that has a unit root (DGP d), and an ARIMA(1,1,0) (DGP e). Note that the first two cases correspond to a cointegrating relationship. The number of iterations in each simulation is 5000, and in each replication 100 + n observations are generated (n = 50, 100, and 500), of which the first 100 observations are discarded. The lag length of the AR(k) approximation of the error term ut is determined by applying the Bayesian information criterion to each replication. The maximum number of lags to be used in selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). Table 2 shows the bias and the root mean square error (RMSE) of all four estimators. As we would expect, the DOLS estimator is the best one when the error process is DGP a, while the GLS Corrected is the best estimator when the error process is DGP c. As shown by Choi, Hu, and Ogaki (2008), the CHO-FGLS estimator is almost as good as the DOLS estimator in cointegrating relationships and significantly outperforms the DOLS estimator in spurious regressions. Hence, the FGLS estimation is a robust procedure with respect to error specifications.

### Table 2:

The bias and RMSE of the four estimators. The order of the AR approximation is selected using BIC. The maximum number of lags to be used in the selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). The number of replications is 5000.

DOLSGLSCCHO-FGLSCO-AR
TBiasrmseBiasrmseBiasrmseBiasrmse
(a)500.00050.047−0.0010.2100.00070.048−0.00020.083
10000.023−0.0030.1440.00010.023−0.00080.045
50000.004−0.00030.06300.00400.012
(b)50−0.0030.512−0.0040.180−0.0040.232−0.0030.181
1000.0010.316−0.0040.128−0.0040.142−0.0040.124
500−0.0000.083−0.0000.061−0.0000.048−0.0000.049
(c)500.0122.175−0.0040.180−0.0090.639−0.0040.185
1000.0071.674−0.0050.127−0.0060.366−0.00510.130
500−0.0231.129−0.0000.061−0.0020.113−0.0000.061
(d)500.0081.446−0.0030.170−0.0080.500−0.0060.207
1000.0051.114−0.0040.116−0.0030.323−0.0040.108
500−0.0160.752−0.0000.051−0.0030.138−0.0000.047
(e)500.30538.92−0.0060.743−0.06410.47−0.0020.198
1000.12430.72−0.01120.553−0.0155.608−0.0010.135
500−0.45921.91−0.00590.267−0.0231.088−0.0000.062

The CO-AR estimator is robust with respect to the order of integration of the error terms. If the error follows a white noise process, the RMSE of CO-AR is similar to the CHO-FGLS and DOLS estimators. If the error follows a highly persistent AR(1) process (DGP b) or a unit-root process (DGP c), the RMSE of the CO-AR tends to be slightly smaller than the CHO-FGLS and GLS corrected estimators, and much smaller than the DOLS procedure. Not surprisingly, when the error process follows complex unit-root dynamics (cases DGP d and DGP e), the CO-AR estimator performs very well compared to all the three other estimators proposed by Choi, Hu, and Ogaki (2008), especially for large sample sizes.

The adequacy of an approximate model for the DGP of the error term depends on the choice of the order of the AR approximation. In particular, different lag selection methods for the order of the AR approximation may potentially lead to drastically different conclusions regarding the finite sample performance of the CO-AR estimator. Therefore, we also investigate the sensitivity of our Monte Carlo results to the choice of the lag selection criterion by applying the Akaike information criterion (AIC), the Hannan-Quinn information criterion (HQIC), and by using the “true” order of the approximation (i.e. zero lags for the error process specified in DGP a, one lag for DGP b and DGP c, and two lags for DGP d and DGP e). To economize on space we do not report the results in detail, but rather discuss some important aspects. Overall we find that the results are broadly in line with the findings displayed in Table 2. The performance of the CO-AR estimator is remarkably robust with respect to the different information criteria. However, when the error term follows a white noise process, the RMSE of the CO-AR estimator using the “true” lag length is halved compared to the BIC lag selection. Moreover, the performance of the BIC is broadly similar to the HQIC, and both criteria slightly dominate the AIC lag selection in terms of the RMSE metric.

### 3.3 Multiple regressors and endogeneity

We finally consider the case of two regressors, one of which is correlated with the error term, to see how our procedure performs under endogeneity. We generate 5000 replications of the following processes:

yt=0.6x1t+0.6x2t+utx1t=0.95x1,t1+0.5ε1,t1+ε1tx2t=0.7x2,t1+0.3ε2,t1+ε2tut=0.8ut1+0.2ε3,t1+ε3t.

The iid error terms εit are drawn from a standard normal distribution, but where the correlation between ε1t and ε3t is 0.5. We assume the presence of an instrumental variable w, also iid and drawn from a standard normal distribution, which has a correlation of 0.8 with ε1t. To summarize, the vector (ε1t, ε2t, ε3t, wt)′ is drawn from a Gaussian distribution with mean zero and covariance matrix Σ given by

Σ=(100.50.801000.50100.8001)

In Table 3 we report the results of the t-statistic for testing H0: β1 = 0.6 and H0: β2 = 0.6. While the test is slightly over-sized in small samples, it performs well in moderate and large samples, and the empirical size is not significantly different from the nominal size.

### Table 3:

Size of the t-tests and RMSE of the estimation error in the case of two regressors and partial endogeneity. The lag order k was chosen according to the AIC criterion. The number of replications is 5000.

T501002005001000
β1size7.56%6.74%5.78%5.54 %5.22 %
RMSE9.27%2.02 %0.9%0.33 %0.16%
β2size10.68%8.74%6.36%5.80%5.02%
RMSE4.70%1.20%0.54%0.21%0.10%

To sum up, this paper develops a new robust estimator for structural parameters in dynamic regressions. The simulation study indicates that the CO-AR estimator is particularly useful in most applied situations in macroeconomics and finance where the researcher does not know the exact form of the error term.

## 4 Concluding remarks

This paper proposes a new estimation method, namely CO-AR estimation, for cointegrating and spurious regressions by applying a two-stage generalized Cochrane-Orcutt transformation based on the autoregressive approximation framework developed by Berk (1974). We prove that our CO-AR estimator is consistent. We further show that a convergent usual t-statistic based on the CO-AR estimator (the CO-AR t test), asymptotically distributed as N(0, 1), can be constructed for the spurious regression cases analyzed by Granger and Newbold (1974) and Granger, Hyung, and Jeon (2001). More importantly, the CO-AR t test does not rely on the long-run variance estimator, and therefore it can be easily implemented. Moreover, the CO-AR estimation method turns out to be a robust procedure with respect to error specifications even when the regressors and regressand are highly persistent (or possibly unit-root) processes. The fact that the CO-AR estimator can cover general regression models is especially appealing in empirical work where the determination of the exact form of the error terms is often unknown, and the errors rarely follow a simple AR(1) process. Finally, the simulation results indicate that the finite sample performance of the CO-AR methodology is promising even if the sample size is as small as 50 observations.

Of course, several important issues are not considered in this paper and deserve further study. For instance, in future work it will be important to investigate the choice of k, the order of the AR model for approximating I(1) processes, allowing alternative selection criteria, such as the modified AIC and BIC developed in Bauer and Wagner (2008), or a different maximum number of lags to be used in the selection procedure. Moreover, we believe that the AR approximation framework could be applied to the issue of the spurious regressions involving other persistent processes, such as I(2) series.

# Acknowledgement

We appreciate useful comments of Cheng Hsiao, Yong-Miao Hong, Wolfgang Härdle, Hashem Pesaran, Mark Watson and all participants at the Sir Clive Granger Memorial Conference, Econometrics Workshop at Humboldt University Berlin, and European Econometrics Meeting, Oslo.

## A Appendix

### A.1 Proof of Theorem 1

First note that β^GLSC is consistent under our assumptions, see e.g. Robinson (1998). Then u^t=ytβ^GLSCxt=ut(β^GLSCβ)xt=ut+op(1). By Theorem 2 of Berk (1974), we note that an ARMA (p, q) model represented by an infinite autoregression can be approximated by an AR(k) model, where k = kT and where the autoregressive coefficients satisfy mild summability constraints. Thus, u^t can be approximated by an AR(k) model, i.e. u^t=j=1kb^ju^tj+e^tk and b^(k)b(k)=Op(T1/2) as k = o(T1/3), where b^(k)=(b^1,,b^k) is the OLS estimate of b(k), e^tk=et+op(1).

Recall the notation y~t=ytj=1kb^jytj and x~t=xtj=1kb^jxtj. Define Xtk = (xt, xt−1, …, xt−k), Utk = (ut, ut−1, …, ut−k) and b^=(1,b^1,,b^k) such that x~t=b^Xtk and u~t=b^Utk. Then we obtain

β^CO-ARβ=[t=k+1Tx~t2]1[t=k+1Tx~tu~t]=[b^Γ^kb^]1[b^Ψ^kb^]

where Γ^k=1Tt=k+1TXtkXtk and Ψ^k=1Tt=k+1TXtkUtk. Since Γ^0>0 almost surely, it follows by Proposition 5.1.1 of Brockwell and Davis (1991) that, for k ≥ 1, λmin(Γ^k)>0 almost surely. Furthermore, by Proposition 7.3.4 of Brockwell and Davis (1991), Γ^kΓk=Op(T1/2), where Γk=E[XtkXtk]. Due to the independence of Xtk and Utk it also follows by standard arguments that Ψ^k=Op(T1/2). This implies that β^CO-ARβ=Op(T1/2) as stated.  □

### A.2 Proof of Theorem 2

Under H0:β = 0, we have β^CO-AR=(T1t=k+1Tx~t2)1T1t=k+1Tx~tu~t. First, using a result of Theorem 1, T1t=k+1Tx~t2pbΓkb. Since u~t=e^tk+op(1)=et+op(1), the terms x~tu~t have finite variance and are asymptotically uncorrelated, such that a central limit theorem applies to the term T1/2t=k+1Tx~tu~tLN(0,bΓbσu~2). Hence, under H0, Tβ^CO-ARLN(0,σβ2), where σβ2=(bΓb)1σu~2. Next, note that σ^u~2 is a consistent estimator of σu~2 due to the assumption of finite fourth moments of et. This implies that TSβ^2 is a consistent estimator of σβ2, which then implies an asymptotically standard normally distributed t-statistic.  □

### A.3 Proof of Theorem 3

First, note that u^t=ut(β^GLSCβ)xt, and that the second term does not converge to zero, unlike in the case where xt is stationary. However, the AR(k) approximation of ût can be equivalently represented as

u^tu^t1=ψ1^u^t1+j=2kψ^jΔu^tj+1+e^tk

where ψ^1=i=1kb^i1, ψ^j=i=jkb^i, j = 2, …, k. The leading term of ût is integrated, while Δu^t=Δut(β^GLSCβ)Δxt is stationary. Under our assumptions, ψ^1=Op(T1) and ψ^j=ψj+Op(T1/2), j = 2, …, k.

In a similar way, the filtered series x~t and y~t can be represented as

x~t=xt(1+ψ1^)xt1j=2kψ^jΔxtjy~t=yt(1+ψ1^)yt1j=2kψ^jΔytj

Define the processes

(2)xt=Δxtj=2kψ^jΔxtj
(3)xt=Δxtj=2kψ^jΔxtj

which are stationary by construction. Since ψ^1=Op(T1), we have x~t=xtψ1^xt1=xt+Op(T1/2), and thus, {x~t} converges in probability to a stationary process. The same holds for y~t=yt+Op(T1/2). This implies that the CO-AR estimator, i.e. the OLS estimator of β in the regression y~t=βx~t+ηt is T-consistent as stated.  □

### A.4 Proof of Theorem 4

We can write the regression y~t=βx~t+ηt as yt=βxt+ηt+Op(T1/2), where xt and yt are stationary processes given by (2) and (3). Then, the same arguments of the proof of Theorem 2 apply to the CO-AR estimator of β, and we obtain the required result.  □

### A.5 Proof of Theorem 5

The first stage regression yields an estimator β^ with T(β^β)dN(0,Z). At the second stage, an AR(k) model is fitted to the residuals ût. By assumption, the stationary sequence {ut} has the representation ut=j=1bjutj+et with exponentially decaying coefficients, i.e. there are constants C, κ > 0 such that bjC exp(−κj) for all j ≥ 1. This implies that j=1|j|s|bj|<, s > 1, so that we can use the results of Lemma 2.1 of Phillips and Ouliaris (1990).

Lemma 3 of Berk (1974) implies that b^(k)pb(k), where b^(k)=(b^1,,b^k) is the coefficient vector of the approximating model. The CO-AR estimator can be written as

T(β^β)=(1T2t=1Tx~t2)11Tt=1Tx~tu~t.

First, by Lemma 2.1 of Phillips and Ouliaris (1990), we have

1Tx~[Tr]dB(1)V(r)

where r ∈ (0, 1), B(1)=1j=1bj, V(r) is a Brownian motion with variance j=E[vtvtj] and vt = Δxt. Hence,

1T2t=1Tx~t2d01B(1)2V(r)2dr.

Second, we have

1Tt=1[Tr]u~tdB(1)W(r)

where W(r) is a Brownian motion with variance j=E[ututj]. Hence,

1Tt=1Tx~tu~tdB(1)201V(r)dW(r).

Therefore, we obtain β^β=Op(T1) as stated, and the asymptotic distribution of the CO-AR estimator is given by

T(β^β)d(01V(r)2dr)101V(r)dW(r)

which is identical to the asymptotic distribution of the OLS estimator.  □

### References

Bauer, D., and M. Wagner. 2008. “Autoregressive Approximations of Multiple Frequency I(1) Processes.” Working paper, Economics Series, Institute for Advanced Studies, Vienna.Search in Google Scholar

Berk, K. N. 1974. “Consistent Autoregressive Spectral Estimates.” The Annals of Statistics 2: 489–502.Search in Google Scholar

Brockwell, P. J., and R. A. Davis. 1991. Time Series: Theory and Methods. 2nd ed. New York, NY: Springer-Verlag.Search in Google Scholar

Choi, C.-Y., L. Hu, and M. Ogaki. 2008. “Robust Estimation for Structural Spurious Regressions and a Hausman-Type Cointegration Test.” Journal of Econometrics 142: 327–351.Search in Google Scholar

Deng, A. 2014. “Understanding Spurious Regression in Financial Economics.” Journal of Financial Econometrics 12: 122–150.Search in Google Scholar

Engle, R. F., and C. W. J. Granger. 1987. “Cointegration and Error Correction: Representation, Estimation, Testing.” Econometrica 55 (2): 251–276.Search in Google Scholar

Ferson, W. E., S. Sarkissian, and T. Simin. 2003. “Spurious Regressions in Financial Economics?” Journal of Finance 58 (4): 1393–1414.Search in Google Scholar

Granger, C. W. J. 2001. “Spurious Regressions.” In A Companion to Theoretical Econometrics, edited by B. Baltagi. Oxford: Blackwell Publishers.Search in Google Scholar

Granger, C. W. J., and P. Newbold. 1974. “Spurious Regressions in Econometrics.” Journal of Econometrics 74: 111–120.Search in Google Scholar

Granger, C. W. J., N. Hyung, and H. Jeon. 2001. “Spurious Regressions with Stationary Series.” Applied Economics 33: 899–904.Search in Google Scholar

Haldrup, N. 1994. “The Asymptotics of Single-Equation Cointegration Regressions with I(1) and I(2) Variables.” Journal of Econometrics 63: 153–81.Search in Google Scholar

Hendry, D. F. 1980. “Econometrics: Alchemy or Science?” Economica 47: 387–406.Search in Google Scholar

Ing, C.-K., and C.-Z. Wei. 2003. “On Same-Realization Prediction in an Infinite-Order Autoregressive Process.” Journal of Multivariate Analysis 85: 130–155.Search in Google Scholar

Phillips, P. C. B. 1986. “Understanding Spurious Regressions in Econometrics.” Journal of Econometrics 33: 311–340.Search in Google Scholar

Phillips, P. C. B., and S. Ouliaris. 1990. “Asymptotic Properties of Residual Based Tests for Cointegration.” Econometrica 58: 165–193.Search in Google Scholar

Plosser, C. I., and G. W. Schwert. 1978. “Money Income and Sunspots: Measuring Economic Relationships and the Effects of Differencing.” Journal of Monetary Economics 4: 637–660.Search in Google Scholar

Plosser, C. I., G. W. Schwert, and H. White. 1982. “DIfferencing As A Test of Specification.” International Economic Review 23: 535–552.Search in Google Scholar

Saikkonen, P. 1991. “Asymptotically Efficient Estimation of Cointegration Regressions.” Econometric Theory 7: 1–21.Search in Google Scholar

Shibata, R. 1980. “Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of Linear Processes.” Annals of Statistics 8: 147–164.Search in Google Scholar

Stock, J. H., and M. W. Watson. 1993. “A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems.” Econometrica 61: 783–820.Search in Google Scholar

Valkanov, R. 2003. “Long-Horizon Regressions: Theoretical Results and Applications.” Journal of Financial Economics 68: 201–232.Search in Google Scholar

Yule, G. U. 1926. “Why Do We Sometimes Get Nonsense Correlations Between Time Series? A Study in Sampling and the Nature of Time Series.” Journal of the Royal Statistical Society 89: 1–64.Search in Google Scholar