Selecting between causal and noncausal models with quantile autoregressions


 We propose a model selection criterion to detect purely causal from purely noncausal models in the framework of quantile autoregressions (QAR). We also present asymptotics for the i.i.d. case with regularly varying distributed innovations in QAR. This new modelling perspective is appealing for investigating the presence of bubbles in economic and financial time series, and is an alternative to approximate maximum likelihood methods. We illustrate our analysis using hyperinflation episodes of Latin American countries.


Motivation
Mixed causal and noncausal time series models have been recently used in order (i) to obtain a stationary solution to explosive autoregressive processes, (ii) to improve forecast accuracy, (iii) to model expectation mechanisms implied by economic theory, (iv) to interpret non-fundamental shocks resulting from the asymmetric information between economic agents and econometricians, (v) to generate non-linear features from simple linear models with non-Gaussian disturbances, (vi) to test for time reversibility. When the distribution of innovations is known, a non-Gaussian likelihood approach can be used to discriminate between lag and lead polynomials of the dependent variable. For instance, the R package MARX developed by (Hecq, Lieb, and Telg 2017a) estimates univariate mixed models under the assumption of a Student's t-distribution with v degrees of freedom (see also Saikkonen 2011, 2013) as well as the Cauchy distribution as a special case of the Student's t when v = 1. Gouriéroux and Zakoian (2016) privilege the latter distribution to derive analytical results. Gouriéroux and Zakoian (2015); Fries and Zakoian (2017) provide an additional flexibility to involve some skewness by using the family of alpha-stable distributions. However, all those aforementioned results require the estimation of a parametric distributional form. In this article we take another route.
The objective of this paper is to select between causal and noncausal models without using parametric distributional assumptions. To achieve that, we adopt a quantile regression (QR) framework and apply quantile autoregressions (QCAR hereafter) (Koenker and Xiao 2006) on candidate models. Although we obviously also require non-Gaussian innovations in time series, we do not make any parametric distributional assumption about the innovations. By using quantile regressions, we consider a statistic called the sum of rescaled absolute residuals (SRAR hereafter) to measure model performances and reveal properties of time series. Remarkably we find that SRAR cannot always favour a model uniformly along quantiles. This issue is common for time series of asymmetric distributed innovations, which causes confusion in model selection and calls for a robust statistic to meet the goal. Considering that, we propose to aggregate the SRAR information over quantiles. It is worth mentioning that when coefficients are constant in the underlying model with a symmetrically i.i.d. error term, the aggregate SRAR criterion is equivalently to select between forward and backward conditional mean models (termed by Gourieroux and Zakoian (2017)). However, the aggregate SRAR is a measure based on the whole dynamics of the underlying process, which is not dominated by the conditional mean information any more. This characteristic of the aggregate SRAR criterion indeed makes it robust in model selection even for some general situations such as with asymmetric distributed innovations. Another remark on this paper is that our method is restricted to the model framework of purely causal or noncausal autoregressions without other explanatory variables, thereby this method can be used to questions like asset pricing of exchange rate where current exchange rate is associated with future exchange rates. However, it cannot be used to questions like the Taylor (1993) rule which associates the dynamics of the nominal interest rate with dynamics of some endogenous variables (e.g. inflation).
The rest of this paper is constructed as follows. Section 2 introduces mixed causal and noncausal models and our research background. In Section 3, we propose quantile autoregressions in the time reverse version called quantile noncausal autoregressions (QNCAR) along with a generalized asymptotic theorem in a stable law for both QCAR and QNCAR. Section 4 brings out a common issue in the model selection through SRAR comparison. The use of the aggregate SRAR over all quantiles as a new model selection criterion is then proposed with the shape of SRAR curves being analysed. Furthermore, we illustrate our analysis using hyperinflation episodes of four Latin American countries in Section 6. Section 7 concludes this paper.

Causal and noncausal time series models
Brockwell and Davis introduce in their texbooks (Brockwell and Davis 2016;Brockwell, Davis, and Fienberg 1991) a univariate noncausal specification as a way to rewrite an autoregressive process with explosive roots into a process in reverse time with roots outside the unit circle. This noncausal process possesses a stable forward looking solution whereas the explosive autoregressive process in direct time does not. This approach can be generalized to allow for both lead and lag polynomials. This is the so called mixed causal-noncausal univariate autoregressive process for y t that we denote MAR(r, s) where π(L) = 1−π 1 L−…−π r L r ,ϕ(L −1 ) = 1−ϕ 1 L −1 −…−ϕ s L −s . L is the usual backshift operator that creates lags when raised to positive powers and leads when raised to negative powers, i.e. L j y t = y t−j and L −j y t = y t + j . The roots of both polynomials are assumed to lie strictly outside the unit circle, that is π(z) = 0 and ϕ(z) = 0 for |z| > 1 and therefore has an infinite two sided moving average representation. We also have that E(|ε t | δ ) < ∞ for δ > 0 1 and the Laurent expansion parameters are such that ∑ ∞ i −∞ |a i | δ < ∞. The representation (2) is sometimes clearer than (1) to motivate the terminology "causal/noncausal". Indeed those terms refer to as the fact that y t depends on a causal (resp. noncausal) component With this in mind, it is obvious that an autoregressive process with explosive roots will be defined as noncausal.
Note that in (1), the process y t is a purely causal MAR(r, 0), also known as the conventional causal AR(r) process, when ϕ 1 = … = ϕ s = 0, while the process is a purely noncausal MAR(0, s) when π 1 = … = π r = 0. A crucial point of this literature is that innovation terms ε t must be i.i.d. non-Gaussian to ensure the identifiability of a causal from a noncausal specification (Breidt et al. 1991). The departure from Gaussianity is not as such an ineptitude as a large part of macroeconomic and financial time series display nonlinear and non-normal features.
We have already talked in Section 1 about the reasons for looking at models with a lead component. Our main motivation in this paper lies in the fact that MAR(r, s) models with non-Gaussian disturbances are able to replicate non-linear features (e.g. bubbles, asymmetric cycles) that previously were usually obtained by highly nonlinear models. As an example, we simulate in Figure 1 for 200 observations. 2 One can observe asymmetric cycles and multiple bubbles. 3 Once a distribution or a group of distributions is chosen, the parameters in π(L)ϕ(L −1 ) can be estimated. Assuming for instance a non-standardized t-distribution for the innovation process, the parameters of mixed causal-noncausal autoregressive models of the form (1) can be consistently estimated by the approximate maximum likelihood (AML) method (Hecq, Lieb, and Telg 2016). Let (ε 1 ,…,ε T ) be a sequence of i.i.d. zero mean t-distributed random variables, then its joint probability density function can be characterized as where Γ(⋅) denotes the gamma function. The corresponding (approximate) log-likelihood function conditional on the observed data y = (y 1 , … , y T ) can be formulated as where p = r + s and ε t = π(L)ϕ(L −1 )y t − α is replaced by a nonlinear function of the parameters when expanding the product of polynomials. The distributional parameters are collected in λ = [σ, ν]′, with σ representing the scale parameter and ν the degrees of freedom. α denotes an intercept that can be introduced in model (1). Thus, the AML estimator corresponds to the solution θ ML arg max θ∈Θ l y (θ y), with θ=[ϕ′, φ′, λ′]′ and Θ is a permissible parameter space containing the true value of θ, say θ 0 , as an interior point. Since an analytical solution of the score function is not directly available, gradient based numerical procedures can be used to find θ ML . If ν > 2, and hence E(|ε t | 2 ) < ∞, the AML estimator is T √ -consistent and asymptotically normal. Lanne and Saikonen (2011) also show that a consistent estimator of the limiting covariance matrix is obtained from the standardized Hessian of the log-likelihood. For the estimation of the parameters and the standard innovations as well as for the selection of mixed causal-noncausal models we can also follow the procedure proposed by Hecq, Lieb, and Telg (2016). However, the AML estimation is based on a parametric form of the innovation term in (1), which makes this method not flexible enough to adapt uncommon distributions as complex in reality. To be more practical and get rid of strong distribution assumptions on innovations, in next section we adopt quantile regression methods with some properties discussed there. This paper only focuses on purely causal and noncausal models. Koenker and Xiao (2006) have introduced a quantile autoregressive model of order p denoted as QAR(p) which is formulated as the following form:

QCAR & QNCAR
where u t is a sequence of i.i.d. standard uniform random variables. In order to emphasize the causal characteristic of this kind of autoregressive models, we refer to (6) as QCAR(p) hereafter. Provided that the righthand side of (6) is monotone increasing in u t , the τ−th conditional quantile function of y t can be written as If an observed time series {y t } T t 1 can be written into a QCAR( p) process, its parameters as in (7) can be obtained from the following minimization problem.
where ρ τ (u): = u(τ−I(u < 0)) is called the check function, x t ′: = [1, y t−1 , … , y t−p ], and θ′: = [θ 0 ,θ 1 ,…,θ t−p ]. We define the sum of rescaled absolute residuals (SRAR) for each pair of (τ, θ) as Substitute (9) into (8) and write the minimization problem (8) as The estimation consistency and asymptotic normality in the minimization problem (8) have been provided by Koenker and Xiao (2006). A modified simplex algorithm proposed by Barrodale and Roberts (1973) can be used to solve the minimization, and in practise parameters for each τ−th quantile can be obtained, for instance, through the rq() function from the quantreg package in R or in EViews.

QNCAR
A QNCAR( p) specification is introduced here as the noncausal counterpart of the QCAR( p) model by reversing time, explicitly as follows: Analogously to the QCAR(p), the estimation of the QNCAR(p) goes through solving where for the simplicity of notations, we use θ(τ) to denote the estimate in quantile noncausal autoregression. Drawing on the asymptotic derived by Koenker and Xiao (2006), we present the following theorem for QNCAR(p) based on three assumptions which are made to ensure covariance stationarity of the time series (by (A1) and (A2)) and the existence of quantile estimates (by (A3)).
Remark. There is an issue in the estimation consistency of QCAR(p) as reported by Fan and Fan (2010). This is due to the violation on the monotonicity requirement of the right side of (6) in u t but not exclusively the monotonicity of θ i (u t ) in u t . So to recover an AR(p) process of coefficients θ i (u t ) (i = 0,…,p) monotonic in u t , quantile autoregression is not a 100% match tool unless the monotonicity requirement is met beforehand. This issue is also illustrated in Section 4.1.
Theorem 1. A QNCAR(p) model can be written in the following vectorized companion form: , satisfying the following assumptions: (A3): F y t |xt+1 (⋅) ≔ P[y t < ⋅ y t+1 , y t+2 , …, y t+p ] has derivative f y t |xt+1 (⋅) which is uniformly integrable on U and nonzero with probability one. Then, where Σ: The above result can be further simplified into Corollary 2 by adding the following assumption: Corollary 2. Under assumptions (A1), (A2), (A3) and (A4), where As can be seen, QCAR( p) and QNCAR( p) generalize the classical purely causal and purely noncausal models respectively by allowing random coefficients on lag or lead regressors over time. Corollary 2 provides additional results when the same coefficients except the intercept are used to generate each quantile. However, A. Hecq and L. Sun: Quantile noncausal autoregressions the moment requirement in (A1) is very strict for heavy tailed time series. In order to study noncausality by QAR in heavy tailed distributions, we have to show its applicability when weakening the assumption (A1). This goal is achieved by Theorem 3 which presents the asymptotic behaviour of the QAR estimator for a classical purely noncausal model. Similarly, the asymptotic in a classical purely causal model follows right after reversing time.

Theorem 3. (Asymptotics in regularly varying distributed innovations).
Under Assumption (A4), a purely noncausal AR(p) of the following form where ϕ(L −1 ) = 1−ϕ 1 L −1 −…−ϕ p L −p , also satisfies the following assumptions: where L(x) is slowly varying at ∞and 0 < α < 2. There is a sequence {a T } satisfying The roots of the polynomial ϕ(z)are greater than one, such that y t can be written into where where ϕ τ ≔ F −1 (τ) aT , ϕ 1 , …, ϕ p , Ω: = [ω ik ] being a p×p matrix with entry ω ik ≔ ∑ ∞ j 0 c j c j+|k−i| at the ith row and the kth column, {S α (s)}being a process of stable distributions with index α which are independent of Brownian motions {W(s)}. In this theorem the intercept regressor in QNCAR(p) is changed to a T such that x′t: = [a T , y t , y t+1 , … , y t+p−1 ].

Proof. See the appendix
Heuristically, next we restrict our focus on the classical models and explore consequences of causality misspecification in quantile regressions. Figure 2 displays a simulated sample following this data generating process (DGP). Figure 3 is the SRAR(τ) of each candidate model along quantiles, indicating their goodness of fit. The two SRAR curves almost overlap at every quantile, which implies no discrimination between QCAR and QNCAR in Gaussian innovations, in line with results in the OLS case. The Gaussian distribution is indeed time reversible, weak and strict stationary. Its first two moments characterize the whole distribution and consequently every quantile. Note that we obtain similar results for a stationary noncausal AR(p) process with i.i.d. Gaussian {ε t }. The results are not reported to save space.

Causal and noncausal models with Student's t distributed innovations
Things become different if we depart from Gaussianity. Suppose now a causal AR(1) process y t = α + βy t−1 + ε t with again [α, β] = [1, 0.5] but where {ε t } are i.i.d. Student's t−distributed with 2 degrees of freedom (hereafter using shorthand notation: t(2)). Figure 4 depicts a simulation in this AR(1) with T = 200. Applying QCAR and QNCAR respectively on this series results in the SRAR curves displayed in Figure 5. The distance between the two curves is obvious compared to the Gaussian case, favouring the causal specification at almost all quantiles. Figure 6 is the SRAR plot of a purely noncausal process with i.i.d. Cauchy innovations. The noncausal specification is preferred in the SRAR comparison.  It seems now that applying the SRAR comparison at one quantile, such as the median, is sufficient for model identification, but it is not true in general. In Section 4, we will spot an identification issue in the SRAR plots, the true model even having higher SRAR values at certain quantiles than the misspecified model. So far we have applied QCAR and its counterpart QNCAR on the classical purely causal or noncausal models with symmetrically i.i.d. innovations. Within this restricted scope, the conditional mean models of those data generating processes only differ from their conditional quantile models in the intercept term. And we see that model selection by the SRAR comparison gives uniform decisions along quantiles. However, such a model selection is not always that clear in practise. For example, in the empirical study later, we will encounter an identification issue which can be seen by checking the SRAR plots. In the next section, we will present this   identification issue with some possible reasons, and propose a robust model selection criterion called the aggregate SRAR to cope with this issue.

SRAR as a model selection criterion
It is natural to think about SRAR as a model selection criterion since a lower SRAR means a better goodness of fit in quantile regressions. However, SRAR is a function of quantile, which raises a question on which quantile to be considered for model selection. It is empirically common to see an identification problem by checking the SRAR plots, which gives different model selections at certain quantiles and makes a selection unreliable if only one quantile is considered. In this section, we discuss this issue and propose a more robust model selection criterion based on aggregating SRARs.

Identification issue spotted from the SRAR plots
First let us see some possible model settings causing the identification problem in SRAR plots. The first case is linked to the existence of multi-regimes in coefficients.
Suppose a regime-shift model is specified as follows: where {ε t } is an i.i.d. innovation process with cumulative probability function F(⋅), and β t is defined as follows: with τ * ∈ (0, 1) and β 1 < β 2 . In essence, the regime shift of β t depends on the quantile occurrence of ε t which is indexed by τ t : = F(ε t ) with {τ t } being i.i.d. in the standard uniform distribution. If { y t } can be negative, then there is a problem in using QNCAR to recover the coefficients in the underlying model (19) because the τ-th regime is not necessary to produce the τ-th conditional quantile of y t . So the comonotonicity condition of the linear quantile regressions (19) is not satisfied. However, by restricting to the non-negative region of the covariate y t+1 (also see Fan and Fan 2010) we force the regression model to satisfy the comonotonicity requirement without losing its association with {τ t }. The obtained estimator is also  A. Hecq and L. Sun: Quantile noncausal autoregressions consistent to the true coefficients in (19). 5 QCAR (or QNCAR) with such a restriction, hereafter denoted as RQCAR (or RQNCAR) shorthand for restricted quantile causal autoregression (or restricted quantile noncausal autoregression), is formulated as follows: where I is the set restricting the quantile regression on a particular information set. The restriction is usually imposed in order for quantile regressions to meet the comonotonicity condition. To study the regime-shift model (19), we restrict the QNCAR on non-negative covariates, i.e. I t : x t ≥ 0 { } . Figure 8 shows four SRAR curves estimated from QCAR, QNCAR, RQCAR and RQNCAR. We consider a time series {y t } 600 t 1 simulated from the model (19) with τ * = 0.7, β 1 = 0.2, β 2 = 0.8 and i.i.d. innovation process in t (3), Figure 8 illustrates such an identification issue in which the SRAR curve from a true model is not always lower than one from misspecification. Applying restriction helps to enlarge the SRAR difference between a true model and a misspecified time direction.
The second case we investigate is the presence of skewed distributed innovations. Let us consider a time series {y t } following a purely noncausal AR(1): y t = 0.8y t+1 + ε t with {ε t } i.i.d. in a demeaned skewed t distribution with skewing parameter γ = 2 and v = 3 degrees of freedom (hereafter t(v, γ) is the shorthand notation for a skewed t-distribution). The probability density function of t(v, γ) (see Francq and Zakoïan 2007) is defined as where f t (⋅) is the probability density function of the symmetric t(v) distribution. Figure 9 shows four SRAR curves obtained from the estimation of the QCAR, the QNCAR, the RQCAR and the RQNCAR respectively. The 5 An alternative is to use simultaneous linear quantile regressions (see Tokdar and Kadane 2012; Liu 2019) which assumes a base quantile function for all coefficients in a (reparametrized) regression model. The dependence structure of these coefficients and priors on parameters in the base quantile function are further assumed for simulation, model fitting and posterior distribution summary. We do not include this method here as this paper aims to avoid assumptions on parametrizing underlying conditional distributions in the model setting. Additionally, regarding the data in our empirical study, very few negative points are observed. So the concern on losing data by such a restriction can be addressed.
curves from the QNCAR and the RQNCAR almost overlap each other, which confirms our understanding that the monotonicity requirement is met in the true model. The estimations and the corresponding SRAR curves should be the same unless many observations are omitted by the restriction. On the other hand, the SRAR curve gets enlarged from the QCAR to the RQCAR, which is very reasonable as the feasible set in the QCAR is larger and the misspecification is not ensured to satisfy the monotonicity requirement. Again we see this identification problem from the SRAR plot. Remarkably, the SRAR curve from a true model can be higher at certain quantiles than the one from a misspecified model. Consequently the SRAR comparison relying only on particular quantiles, such as the least absolute deviation (LAD) method for the median only is not robust in general. Therefore, we propose a new model selection criterion in next subsection by including the information over all quantiles.

The aggregate SRAR criterion
Based on the same number of explanatory variables in QCAR and QNCAR with a fixed sample size in quantile regressions, the best model is supposed to exhibit the highest goodness of fit among candidate models.
Similarly to the R-squared criterion in the OLS, when turning to quantile regressions, we are led to use the SRAR criterion for model selection. The aggregate SRAR is regarded as an overall performance of a candidate model over all quantiles such as: There are many ways to calculate this integral. One way is to approximate the integral by the trapezoidal rule.
Another way is to sum up SRARs over a fine enough quantile grid with equal weights. In other words, this aggregation is regarded as an average of performances (SRAR(τ), τ∈(0, 1)) of a candidate model. In practise, there is almost no difference in model selection between the two aggregation methods.
As equal weights are used on all quantiles in the aggregate SRAR above, people may argue to use a different weighting scheme. The weighting scheme indeed can be different as when weight being one for one quantile and zero for others is to select a conditional quantile model. We agree that the weighting scheme can be customized in justice of users' purpose. The equal-weight scheme proposed here is inspired to calculate the area under the SRAR curve over quantiles when we check the SRAR plot. Intuitively, such areas are directly linked to model performance when we concerns the whole dynamics of the underlying process. And we compare models by viewing the gap between their SRAR curves, which is the difference between the areas under their SRAR curves. This leads us to the aggregate SRAR measure.
Performances of the SRAR model selection criteria in Monte Carlo simulations are reported in Table 1. It shows the average frequencies with which we find the correct model based on the SRAR criterion per quantile and the aggregate SRAR criterion. The sample size T is 200 and each reported number is based on 2000 Monte Carlo simulations. Columns of Table 1 refer to as a particular distribution previously illustrated in this paper. As observed, the aggregate SRAR criterion performs very well even in situations with the identification issue. The Gaussian distribution being weakly and strictly stationary we cannot obviously discriminate between causal and noncausal specifications leading to an average frequency of around 50% to detect the correct model.

Shape of SRAR curves
By observing SRAR plots, we see that SRAR curves vary when the underling distribution varies. It is interesting to investigate the reasons. In this subsection, we will provide some insights on the slope and concavity of SRAR y t (τ, θ(τ)) curves under assumptions (A1), (A2), (A3) and (A4). Since ρ τ (y t −x t ' θ) is a continuous function in θ ∈ R (p+1) , by the continuous mapping theorem and τ, θ(τ)), we know that We also know that Therefore instead of directly deriving the shape of a SRAR y t (τ, θ(τ)) curve, we look at the properties of its intrinsic curve SRAR εt (τ, F −1 (τ)). We derive the first and second order derivatives of SRAR εt (τ, F −1 (τ)) with respect to τ in order to determine the shape of SRAR y t (τ, θ(τ)).

The slope property
One major difference between SRAR curves in a plot is their slopes. We can compute the first-order derivative of SRAR with respect to τ if the derivative exists. Under the following assumption: (A7) : The inverse distribution function F −1 (⋅) of innovation ε t is continuous and differentiable on (0, 1) to the second order; we can then take the first-order derivative of SRAR εt (τ, F −1 (τ))with respect to τ.
To manifest this result with a fixed T, we take expectation and obtain which can be regarded as the underlying guideline for the slope of a SRAR curve. Before interpreting this result, let us derive the second-order derivative of SRAR εt (τ, F −1 (τ)) with respect to τ and make an interpretation together.
Divide the above second order central difference by Δτ 2 , and take the limit Δτ↓0. It gives us the last line of which is obtained similarly to (25). To interpret this result, we take expectation and get the following: where the inequality holds with probability one since f(ε) > 0with probability one in the assumption (A1). Now we have the expectation of which can be regarded as the underlying guideline for the concavity of a SRAR curve. Together with the slope information, it implies that SRAR curves are always in arch shapes, going upward and then downward, with a peak point at E[ε t ] F −1 (τ). We can also know the skewness of ε t from the location of the peak point: ε t is left-skewed when the SRAR curve reaches its peak in the region τ < 0.5, or right-skewed when the peak in τ > 0.5. If ε t is symmetrically distributed, its SRAR curve is symmetric, and vice versa.
Plotting SRAR is a way to present the goodness of fit in quantile regressions for each candidate model. Quantile regressions are the path to get residuals for SRAR calculation. As we know and provide unbiased consistent estimation for true models. To study the estimation in misspecification we adopt the concept of binding function (Dhaene, Gourieroux, and Scaillet 1998). Binding function is defined as a mapping from coefficients in the true model to pseudo-true coefficients in a misspecified model.
The estimator of a pseudo-true coefficient in quantile regression for a misspecified QCAR(p) or QNCAR(p) converges to a limiting value which is characterized into the binding function. It is difficult to derive the binding functions explicitly in a general case so that they are studied by means of simulations (see Gouriéroux and Jasiak 2017). Suppose a noncausal AR(1): y t =π 1 y t+1 + ε t , with {ε t } i.i.d. t(ν) for v = 1, 3, 5 and 10. It is observed that the binding function in the misspecified QCAR(1) varies with two factors: (i) the distribution of ε t and (ii) the distance function in regression which is the check function ρ τ (⋅) in quantile regression. Figure 10, Figure 11 and Figure 12 illustrate the effect of those factors. Each point is an average value of estimates based on 1000   simulations and 600 observations. Since t(ν) is symmetric, the estimation results are in the same pattern for negative true coefficient region and (1−τ) th-quantile regression as in these three figures. Sometimes the binding function is not injective, which is evidenced in Figure 10 and Figure 11 for small absolute true coefficients. The non-injectivity of the binding function for Cauchy distributed innovations is also illustrated in Gouriéroux and Jasiak (2017) result that disables encompassing tests. On the other hand, we see that on Figure 12 the injectivity of binding functions seems recovered at τ = 10%. In the case of Cauchy distributed innovations, there are no binding functions from extreme quantile regressions like 0.1 th-or 0.9 th-quantile regression because the estimate is not convergent. Although a value for π 1 ∈(0,1) is plotted in Figure 12, it is just the average of binding function estimates for π 1 for illustration.

The model specification
The motivation of our empirical analysis comes from the rational expectation (RE) hyperinflation model originally proposed by Cagan (1956) and investigated by several authors (see e.g. Adam and Szafarz 1992;Broze and Szafarz 1985). We follow Broze and Szafarz (1985) notations with In (33), m t d and p t respectively denote the logarithms of money demand and price, x t is the disturbance term summarizing the impact of exogenous factors. E(p t+1 |I t ) is the rational expectation, when it is equal to conditional expectation, of p t+1 at time t based on the information set I t . Assuming that the money supply m t s = z t is exogenous, the equilibrium m t d = m t s provides the following equation for prices ϕ E p t+1 I t + u t . Broze and Szafarz (1985) show that a forward-looking recursive solution of this model exists when x t is stationary and ϕ < 1. The deviation from that solution is called the bubble B t with p t ∑ ∞ i 0 ϕ i E(u t+i |I t )] + B t . Finding conditions under which this process has rational expectation equilibria (forward and or backward looking) is out of the scope of our paper. We only use this framework to illustrate the interest of economists for models with leads components. Under a perfect foresight scheme E(p t+1 |I t ) = p t+1 we obtain the purely noncausal model withε t u t . In the more general setting, for instance when E(p t+1 |I t ) = p t+1 + v t with v t a martingale difference, the new disturbance term isε t v t + u t . Empirically, a specification with one lead only might be too restrictive to capture the underlying dynamics of the observed variables. We consequently depart from the theoretical model proposed above and we consider empirical specifications with more leads or lags. Luoto (2013, 2017) and Telg (2017a, 2017b) in the context of the new hybrid Keynesian Phillips curve assume for instance thatε t is a MAR(r−1, s−1) process such as where ε t is iid and c an intercept term. Inserting (35) in (34) we observe that ifε t is a purely noncausal model (i.e. a MAR(0, s−1) with ρ(L) = 1) we obtain a noncausal MAR(0, s) motion for prices 1 − ϕL −1 p t π L −1 −1 (c + ε t ), We would obtain a mixed causal and noncausal model if ρ(L) ≠ 1. Our guess that the same specification might in some circumstances empirically (although not mathematically as the lag polynomial does not annihilate the lead polynomial) gives rise to a purely causal model in small samples when the autoregressive part dominates the lead component. The above illustration presents a context of pure causal and noncausal models so that we can apply our approach to give an empirical analysis. It would be interesting to extend our modelling to investigate theoretical models with both forward and backward behaviours such as backward-and forward-looking Taylor rule for instance. To do that however we have to introduce additional regressors and extend the approach of Hecq, Issler, and Telg (2020) to quantile regressions, which can be further investigated by future research and is out of the scope of this paper.

The data and unit root testing
We consider seasonally unadjusted quarterly Consumer Price Index (CPI) series for four Latin American countries: Brazil, Mexico, Costa Rica and Chile. Monthly raw price series are downloaded at the OECD database for the largest span available (in September 2018). Despite the fact that quarterly data are directly available at OECD, we do not consider those series as they are computed from the unweighted average over three months of the corresponding quarters. Hence, these data are constructed using a linear filter, leading to undesirable properties for the detection of mixed causal and noncausal models (see Hecq, Telg, and Lieb 2017a, b on this specific issue). As a consequence, we use quarterly data computed by point-in-time sampling from monthly variables. The first observation is 1969Q1 for Mexico, 1970Q1 for Chile, 1976Q1 for Costa Rica and 1979Q4 for Brazil. Our last observation is 2018Q2 for every series. We do not use monthly data in this paper as monthly inflation series required a very large number of lags to capture their dynamic feature. Moreover, the detection of seasonal unit roots in the level of monthly price series was quite difficult.
Applying seasonal unit root tests (here HEGY tests, see Hylleberg et al. 1990) with a constant, a linear trend and deterministic seasonal dummies, we reject (see Table 2 in which a * denotes a rejection of the null unit root hypothesis at a specific frequency corresponding to 5% significance level) the null of seasonal unit roots in each series whereas we do not reject the null of a unit root at the zero frequency. The implementation of the unit root tests here is concerned with conditional mean models of the raw data to ensure that we process the data and use its weakly stationary time series in quantile regressions for analysis. The unit root testing can also been done per quantile (Koenker and Xiao 2004) to relate short-term explosiveness of time series to unit-root quantile models, which is an interesting perspective to treat explosive time series and alternative to causal and noncausal modelling. We do not go deeper in the unit-root direction for this paper but with its outlook on future research.
The number of lags of the dependent variable used to whiten for the presence of autocorrelation is chosen by AIC. From these results we compute quarterly inflation rates for the four countries in annualized rate, i.e. Δ ln P t i × 400.Next we carry out a regression of Δ ln P t i × 400 on seasonal dummies to capture the potential presence of deterministic seasonality. The null of no deterministic seasonality is not rejected for the four series. Figure 13 displays quarterly inflation rates and it illustrates the huge inflation episodes that the countries had faced. Among the four inflation rates, Brazil and Mexico show the typical pattern closer to the intuitive notion of what a speculative bubble is, namely a rapid increase of the series until a maximum value is reached before the bubble bursts. Table 3 reports for each quarterly inflation rates the autoregressive model obtained using the Hannan-Quinn information criterion. Given our results on the binding function (see also Gouriéroux and Jasiak 2017) it is safer to determine the pseudo-true autoregressive lag length using such an OLS approach than using quantile regressions or using maximum likelihood method. Indeed there is the risk that a regression in direct time from a noncausal DGP provides an underestimation of the lag order for some distributions (e.g. the Cauchy) and some values of the parameters.

Empirical findings and identification of noncausal models
Estimating autoregressive univariate models gives the lag length range from p = 1 for Brazil to p = 7 for the Chilean inflation rate. The p − values of the Breush-Pagan LM test (see column labelled LM[1−2]) for the null of no-autocorrelation after having included those lags show that we do not reject the null in every four cases. On the other hand, we reject the null of normality (Jarque-Bera test) in the disturbances of each series. We should consequently be able to identify causal from noncausal models. From columns skew. and kurt. it emerges that the residuals are skewed to the left for Brazil and Mexico and skewed to the right for Chile and Costa Rica. Heavy tails are present in each series. At a 5% significance level we reject the null of no ARCH (see column ARCH[1−2]) for Brazil and Mexico. Gouriéroux and Zakoian (2017) have derived the closed form conditional moments of a misspecified causal model obtained from a purely noncausal process with alpha stable disturbances. They show that the conditional mean (in direct time) is a random walk with a time varying conditional variance in the Cauchy case. This result would maybe favour the presence of a purely noncausal specification for Brazil and Mexico as the null of no ARCH is rejected. But this assertion must be carefully evaluated and tested, for instance using our comparison of quantile autoregressions in direct and reverse time. The results by the Q(N)CAR are reported in Table 4, and the RQ(N)CAR produces the same results. Each cell of Table 4 provides the selection frequency of MAR(0, p) or MAR(p, 0) identified by the SRAR at quantiles 0.1, 0.3, 0.5, 0.7, 0.9 as well as the aggregated SRAR. Figure 14 displays the SRAR curves from 0.05th-quantile to 0.95thquantile by the Q(N)CAR for the four economies respectively, similarly to the ones by the RQ(N)CAR with restriction on non-negative regressors. As observed, the identification problem is raised in the SRAR plots. Especially in the SRAR plot for Brazil, it is hard to trust a model from evidence at single quantiles. However, the aggregate SRAR criterion comes to help in this situation from an overall perspective. We conclude that Brazil, Mexico and Costa Rica are better characterized as being purely noncausal while Chile being purely causal according to the aggregate SRAR criterion.

Conclusions
This paper introduces a new way to select between causal and noncausal models by comparing residuals from quantile autoregressions developed by Koenker and Xiao (2006) and from the time-reverse specifications. To adapt to heavy tailed distributions, we generalize the quantile autoregression theory for regularly varying A. Hecq and L. Sun: Quantile noncausal autoregressions distributions. This also confirms the validity of quantile autoregressions in analysing heavy tailed time series, such as explosive or bubble-type dynamics. It is natural to consider SRAR as a model selection criterion in the quantile regression framework. However due to the identification problem spotted in the SRAR plots as presented in this paper, we propose to use the aggregate SRAR criterion for model selection. The robustness in its performance has been seen from all the results in this paper. It is worth mentioning that when coefficients are constant in the underlying model with a symmetrically i.i.d. error term, the aggregate SRAR criterion is equivalently to select between forward and backward conditional mean models (termed by Gourieroux and Zakoian (2017)). However, the aggregate SRAR is a measure based on the whole dynamics of the underlying process, which is not dominated by the conditional mean information any more. This characteristic of the aggregate SRAR criterion indeed makes it robust in model selection even for some general situations such as with asymmetric distributed innovations. In the empirical study on the inflation rates of four Latin American countries, we found that the purely noncausal specification is favoured in three cases.
Finally some possible extensions of our approach can be to the identification of mixed models in addition to purely causal and noncausal specifications, to enhancing QCAR and QNCAR with some explanatory variables in order to investigate the Taylor (1993) rule, and to investigating the unit-root testing per quantile for QCAR as well as QNCAR. Also, a formal testing on SRAR differences would require the application of a bootstrap approach which is beyond the scope of this paper but in our outlook for the future research.
The limiting distribution of Z (1) T (ν) can also be deduced in using (40) as follows.