Skip to content
BY-NC 4.0 license Open Access Published by De Gruyter December 17, 2019

Penalized Averaging of Parametric and Non-Parametric Quantile Forecasts

Jan G. De Gooijer and Dawit Zerom


We propose a hybrid penalized averaging for combining parametric and non-parametric quantile forecasts when faced with a large number of predictors. This approach goes beyond the usual practice of combining conditional mean forecasts from parametric time series models with only a few predictors. The hybrid methodology adopts the adaptive LASSO regularization to simultaneously reduce predictor dimension and obtain quantile forecasts. Several recent empirical studies have considered a large set of macroeconomic predictors and technical indicators with the goal of forecasting the S&P 500 equity risk premium. To illustrate the merit of the proposed approach, we extend the mean-based equity premium forecasting into the conditional quantile context. The application offers three main findings. First, combining parametric and non-parametric approaches adds quantile forecast accuracy over and above the constituent methods. Second, a handful of macroeconomic predictors are found to have systematic forecasting power. Third, different predictors are identified as important when considering lower, central and upper quantiles of the equity premium distribution.

1 Introduction

The increasing availability of a large database of time series variables across diverse disciplines of business and economics has motivated a new strand of forecasting research catered to data rich environments. For example, Ma, Fildes, and Huang (2016) use variable selection and model estimation by a multistage LASSO regression, followed by a scheme to generate out-of-sample forecasts in the field of inventory management. Jiang et al. (2018) obtain forecasts of a large set of Australian macroeconomic predictors using a wide variety of dimension reduction methods. Another forecasting study in high-dimensions is by Exterkate et al. (2011) who consider least angle regression, a shrinkage and variable selection method proposed by Efron et al. (2004), and multi-response sparse regression (Similä and Tikka 2006). The above examples are far from exhaustive, and the literature is growing rapidly.

Mimicking a similar trend in non-forecasting studies, most high-dimensional forecasting research focuses mainly on conditional mean forecasts (Garcia, Medeiros, and Vasconcelos 2017; Konzen and Ziegelmann 2016). Further, the modeling framework of existing high-dimensional studies is often based on the assumption that high-dimensional data come from a linear data generating process (DGP); see, e. g. Bonaccolto, Caporin, and Paterlini (2018) and Bayer (2018). In an attempt to provide modeling flexibility, few studies have introduced extensions. For instance, Bai and Ng (2008) and Giovannelli (2012) introduce nonlinear principal component methods within the conditional mean context. Recently, deviating from parametric modeling approaches, Chen et al. (2018) introduce non-parametric based high-dimensional averaging.

With the goal of offering yet another modeling option in the presence of high-dimensional predictors, we consider the case of conditional quantile forecasts. In particular, we further extend the non-parametric averaging method of De Gooijer and Zerom (2019). Similar to that of Chen et al. (2018), the approach of De Gooijer and Zerom (2019) rests on the idea of combining or averaging a large number of possibly misspecified non-parametric quantile forecasts to form an approximation of a target conditional quantile. In addition to incorporating possible nonlinearities in the effect of exogenous predictors, the non-parametric approach has robustness to distributional assumptions (such as normality) of the innovation process in fully specified parametric models.

Given our focus on quantile forecasting, it is also of practical interest to explore if the non-parametric averaging approach can be complemented by parametric models. To motivate such a hybrid approach, consider the case of conditional quantile estimation of financial data such as stock returns. In financial applications, conditional quantiles play an essential role in risk assessment. For example, evaluation of Value-at-Risk is a conditional quantile estimation problem. Although the literature on estimating conditional quantiles is large, the overwhelming majority of the approaches rely on the assumption that returns follow a fully specified conditional distribution (such as normality or t). As a result, the estimation of conditional quantiles is equivalent to estimating conditional variance (volatility) of returns. The massive literature on GARCH models (volatility modeling) reflect the popularity of such an approach. In addition to their computational convenience, GARCH type models have proven highly successful in modeling financial data. For instance, one key practical appeal of GARCH family of models is that they can parsimoniously capture the persistent influence of long past shocks. In contrast, the averaging method of De Gooijer and Zerom (2019) only allows dependence up to certain autoregressive lags. In other words, the non-parametric method considers a truncated quantile auto-regression (possibly nonlinear) approximation of persistence. If the GARCH type behavior (i. e. infinite ARCH) is indeed an important feature, the averaging method will not completely capture it even with a large autoregressive term.

An important implication of the foregoing discussion is that non-parametric approaches do not necessarily include all parametric models as special cases. Thus, the role of non-parametric and parametric approaches can be complementary. Motivated by this observation, we propose augmenting the non-parametric approach by parametric models. To illustrate the merit of the proposed hybrid quantile averaging approach, we apply it to quantile forecasting of the S&P 500 equity risk premium using a large data set of predictors involving both macroeconomic variables and technical indicators. Our findings show that the hybrid averaging method provides more accurate quantile forecasts than several benchmark methods including the non-parametric averaging method of De Gooijer and Zerom (2019). These findings hold for the full out-of-sample forecasting period as well as for periods of expansion and contraction of the U.S. economy.

The remainder of the paper is organized as follows. Section 2 provides details of the proposed hybrid averaging method including various methodological and numerical issues of the penalization methods. Section 3 describes the empirical data and forecasting procedure. Then, in Section 4, we conduct an extensive out-of-sample forecasting experiment and compare the proposed hybrid quantile averaging method with six, non-hybrid, alternative models/methods. Finally, Section 5 concludes.

2 Methodology

We consider the set of observations {Yt}t=1N obtained from a strictly stationary time series process {Yt,tZ} that depends on qy ≥ 1 past values of Yt, and on a qz-dimensional vector Zt=(Z1,t,,Zqz,t) that consists of exogenous, possibly lagged, stationary time series. Let Xt=(Yt1,Zt)Rq where Yt1=(Yt1,,Ytqy), and q=qy+qz. Given the observed data set {(Xt,Yt)}t=1N, our goal is to obtain the h-step (h ≥ 1) out-of-sample τth (0 < τ < 1) conditional quantile of YN + h given Xt=XN which we denote by QYN+h(τ|XN). To this end, we define the associated process {(Xt,Yt)}Rq×R where the components of the predictor vector Xt=(X1,t,,Xq,t) are given by




With this re-formulation, we obtain QYN+h(τ|XN) directly via


where FYt(|x) is the conditional distribution of Yt given Xt=x.

However, even for moderate dimension q, non-parametrically estimating QYt(τ|XNqy+1) is very challenging due to the curse of dimensionality. Therefore, extending Chen et al. (2018) into the conditional quantile context, De Gooijer and Zerom (2019) suggested approximating QYt(τ|x) by a linear combination (averaging) of marginal conditional quantiles θj(τ|xj)=inf{w:FYt(w|Xj,t=xj)τ}, i. e.


where θˆ0,j(τ|xj) is some nonparametric estimate of θj(τ|xj) and γ0, j(τ) are the weights depending on the quantile level τ. The key idea of this approach is that the non-parametrically estimated marginal conditional quantile functions carry information about the target QYt(τ|x). At the same time, all marginal functions are unlikely to be equally informative. The averaging will assign higher weights to better (more informative) candidate functions. The conditional quantile approximation (2) is non-parametric because we do not impose a parametric structure about the candidate marginal quantile functions θj(τ|xj).

2.1 A Hybrid Approach

We assume that m candidate parametric models are available with the corresponding estimated marginal quantile functions {θˆi,1(τ|x1,t),,θˆi,q(τ|xq,t)} (i = 1, …, m). For each parametric model i, we define a hybrid extension of Q˜Yt(τ|x) by


where {γ0,j(τ)}j=1q and {γi,j(τ)}j=1q are sets of weights depending on the quantile level τ, which we summarize by the set {γi,u(H)(τ)}u=1q+q (i = 1, …, m). In the equity premium quantile forecasting application (see Section 3), q=qz. Let the ith design matrix Xt consists of the set of q + q* potential predictors:


its uth component will be denoted by θˆi,u(H)(τ|Xu,t)(u=1,,q+q). So for the ith model the h-step ahead hybrid quantile forecast of YN + h is given by


where M˜ˆi,τ(H)(h) is an estimate of the set of selected nonzero quantile predictors M˜i,τ(H)(h)={u:θi,u(τ)0} associated with Q˜Yt,i(H)(τ|XNqy+1), and γˆi,u(H)(τ) are the corresponding hybrid quantile regression estimates, or weights. The key idea is to use a set of marginal conditional quantiles as “pilot” forecasts of Yt + h and later discard irrelevant predictors via appropriate penalization.

2.2 Obtaining M˜ˆi,τ(H)(h)

Let θˆi(H)(τ|Xt)=(θˆi,1(H)(τ|X1,t),,θˆi,q+q(H)(τ|Xq+q,t)) denote a (q + q*) × 1 vector of estimates of the combined marginal quantiles at quantile level τ using the ith parametric model. These estimates and the approximation Q˜Yt,i(τ|x) jointly serve as a basis to obtain sparse penalized quantile regression estimators of the (q + q*)-dimensional vector of hybrid prediction weights γi(H)(τ)=(γi,1(H)(τ),,γi,q+q(H)(τ)) (i = 1, …, m). To this end, we consider the following weighted L1-penalized quantile estimator


where ρτ(z) = {τ – I(z < 0)}z is the quantile check function, I(·) the indicator function, λn > 0 is a tuning parameter, depending on n, wi=(wi,1,,wi,q+q) is a vector of nonnegative weights, and wiγi(τ)1 with the Hadamard (or direct) product of two vectors. Given the minimizer of eq. (5), we define the set of relevant predictors at quantile level τ, forecast horizon h, and using parametric model i, by


where γˆi,u(H)(τ)(u=1,,q+q) is an estimate of the uth component of γi(H)(τ).

2.3 Choice of Penalty

When w=1q+q, a vector of ones, the last term of eq. (5) becomes the LASSO (least absolute shrinkage and selection operator) penalty function pλn(|γi,u(H)(τ)|)=λn|γi,u(H)(τ)|(u=1,,q+q;i=1,,m). Note that LASSO assigns the same penalty to each coefficient γi,u(H)(τ), regardless of whether predictor effects are relevant or irrelevant. This may result in estimation inefficiency and model selection inconsistency (Wang and Leng 2008). ada-LASSO, on the other hand, uses different weights for penalizing irrelevant predictors. In the empirical part of this study, we adopt ada-LASSO with γˆi,u(ini)(τ)=γˆi,u(LASSO)(τ).[1] In order to optimize eq. (5) with ada-LASSO, we use a proximal alternating direction method of multipliers algorithm contained in the R-FHDQR package.[2] As shown by Gu et al. (2018), this algorithm drastically reduces the computational burden as compared to other algorithms for penalized quantile regression in high-dimensions.

2.4 Numerical Issues and Choice of λn

Medeiros and Mendes (2014, 2016) studied the asymptotic properties of ada-LASSO when the errors of a linear regression model are non-Gaussian and may be conditionally heteroskedastic. These authors derived asymptotic properties of sign consistency for ada-LASSO. Moreover, the ada-LASSO estimator has oracle properties; see also Audrino and Camponovo (2017). Recently, Han and Tsay (2019) investigated the properties of the LASSO estimators of a linear regression model in the presence of serial dependence (AR models) in both the covariate vector and the errors. In particular, these authors provide model selection consistency of LASSO estimators under certain conditions on the tuning parameter λn.

To select the tuning parameter λn, we consider the following prediction-based criterion. For a given λn, let γˆλn(H)(τ) be the (q + q*)-dimensional vector of penalized estimates, and M˜ˆλ,τ(H)(h) be the corresponding number of non-zero estimates. Then, we choose λn as the value that minimizes the following high-dimensional BIC criterion


Using simulations, Sherwood and Wang (2016) report that QBICτ can be effective in terms of finite-sample prediction accuracy and predictor selection. De Gooijer and Zerom (2019, Sec. 3.4) show results of a robustness check on λn for the LASSO penalty in a time series penalized averaging framework.

3 Models, Data, and Forecasting Procedure

3.1 Parametric Models

In Section 4, we evaluate the ability of the hybrid conditional quantile approach in forecasting the one-step ahead (h = 1) risk premium of the monthly S&P 500 index, denoted by Rt, using a large set of potential predictors Zj, t. In particular, we adopt the following parametric (m = 4) DGPs.

  1. Time-varying mean (TVM) model with constant volatility:

  2. Constant, or prevailing mean (PM) model with EGARCHZ volatility (PM-EGARCHZ):

  3. Time-varying mean model with EGARCH volatility (TVM-EGARCH):

  4. Time varying mean model with EGARCHZ volatility (TVM-EGARCHZ):


Models (8)–(11) were analyzed by Cenesizoglu and Timmermann (2012) in the context of conditional mean prediction. Clearly, models (8)–(10) are all nested within model (11). However, too many model parameters can lead to efficiency losses in the out-of-sample forecasting performance due to estimation uncertainty and model misspecification. So, we gain flexibility by considering a wider set of models than just using model (11). There is a large literature in finance exploring the predictability of the conditional mean and/or variance of stock returns; see, e. g. Rapach and Zhou (2013) for a review.

3.2 Data

To help obtain the one-step-ahead quantile forecasts for Rt, we consider a total of 28 exogenous predictors Zj, t (j=1,,28) consisting of 14 macroeconomic predictors and 14 technical indicators covering the time period 1951:01 – 2016:12 (792 observations). The macroeconomic predictors are summarized as follows: dividend-price ratio (DP); dividend yield (DY); earnings-price ratio (EP); dividend-payout ratio (DE); equity risk premium volatility (RVOL); book-to-market ratio (BM); net equity expansion (NTIS); treasury bill rate (TBL); long-term yield (LTY); long-term return (LTR); term spread (TMS); default yield spread (DFY); default return spread (DFR); and inflation (INFL). These predictors have been the subject of conditional mean forecasting studies by Welch and Goyal (2008), Cenesizoglu and Timmermann (2012), and Pedersen (2015), among others. Neely et al. (2014) apply single-variable mean regressions to predict Rt using the above data set. Lima and Meng (2017) and Meligkotsidou et al. (2014, 2019) consider the 14 macroeconomic predictors within the context of quantile prediction.[3]

The 14 technical indicators fall into three categories:

  1. A moving average (MA) rule that generates a buy (sell) signal Si, t with Si, t = 1 if MA s,tMA,t; 0 otherwise. Here MA j,t=(1/j)i=0j1Pti for j=s, with Pt the level of the stock price index, and s() the length of the short (long) MA (s<). The resulting MA indicator is denoted by MA (s,) with s = 1, 2, 3 and =9,12.

  2. A momentum (MOM) rule where Si, t = 1 if PtPtm; 0 otherwise. The momentum indicator is denoted by MOM(m) with m = 9, 12.

  3. A trading volume rule using MAs of “on-balance” volumes (OBVs) defined as OBVt=k=1tVOLkDk where VOLk is a measure of the trading volume during period k and Dk is a binary variable that takes a value of 1 if PkPk10 and –1 otherwise. Then Si, t = 1 if MAs,tOBVMA,tOBV; 0 otherwise. Here MAj,tOBV=(1/j)i=0j1OBVti for j=s,. The corresponding indicator is denoted by VOL(s,) with s = 1, 2, 3 and =9,12.

The monthly macroeconomic predictors are available from Amit Goyal’s website at The technical indicators can be constructed using Matlab code made available with the paper of Neely et al. (2014). The S&P 500 volume data, used to compute the monthly VOL indicators, can be downloaded from Yahoo Finance.

3.3 Forecasting Procedure

We generate one-step ahead quantile forecasts, i. e. Rn + 1, with a recursive window scheme. Our first forecast origin/base is 1965:12 and hence the first sample covers the period t ∈ [1951:01, 1964:12] (n = 168). By recursive, we mean that our combined conditional quantile procedure is repeated by extending the forecast origin, one observation at a time, up to and including 2016:11. Thus, our last time series sample covers the period 1951:01–2016:11 (n = 791). With such a recursive design, our forecasts cover the period 1966:01–2016:12 and a total of F = 624 conditional quantile forecasts are generated.

In Section 4, we report recursive prediction results for the following seven models/methods. Note that forecasts are for each quantile level τ ∈ {0.1, 0.5, 0.9}. Methods 1–5 are logically connected in the sense that they are based on the concept of averaging or combining forecasts. For example, Method 1 allocates all the weight to one predictor. Methods 2–5 assign weights (equal or varying) to all the predictors. Methods 6 and 7 are popular approaches used in the quantile regression literature. Unlike Methods 6 and 7, penalized averaging does not require additivity.

  1. Benchmark Model (BM). For each parametric model i ∈ {1, 2, 3, 4} (corresponding to (8)–(11)) and each time period t ∈ {1951:01–2016:11}, we select the predictor from the set Zj, t (j ∈ {1, …, 28}) that produces the lowest quantile forecast error. This results in F total forecast errors, and the average of these F values will be reported as a measure of accuracy, see Section 3.4. In this way, we will have four forecast benchmarks, denoted by BM-i (i = 1, …, 4).

  2. Equal weighting (EW). For each parametric model i ∈ {1, 2, 3, 4} and each time period t ∈ {1951:01–2016:11}, we obtain quantile forecasts by equally averaging the quantile forecasts obtained from the 28 predictors. In this way, we will have four forecast equally weighted methods, denoted by EW-i (i = 1, …, 4). Unlike the benchmark model, all predictors contribute equally to quantile forecasts regardless their relative forecast accuracy.

  3. Penalized Averaging (PA). Forecasts are based on weighted averaging similar to the EW approach, but the weights are data driven obtained by penalized quantile averaging. In terms of the proposed hybrid approach, see eq. (3), we only consider the second component (or the parametric part). Thus, PA is a special case of the hybrid method. In this way, we will have four methods, denoted by PA-i (i = 1, …, 4).

  4. Penalized Averaging of Nonparametric quantiles (PA-NP). This approach is also a special case of the hybrid method, see eq. (3), where only the non-parametric component is considered. This approach was proposed in De Gooijer and Zerom (2019).

  5. Hybrid (H). This is the hybrid approach introduced in this paper (3) where parametric and non-parametric approaches are combined. In this way, there are four hybrid methods, denoted by H-i (i = 1, …, 4), and 56 potential predictors.

  6. Penalized Linear Quantile Regression (P-Lin-QR). This approach is well known in the literature. It does not allow for nonlinearities in the predictors.

  7. Additive QR (Ad-QR). This approach is a generalization of (6) where each additively entered predictor has a non-parametric (possibly nonlinear) effect. Unlike the approaches (3), (4), (5) and (6), we do not conduct predictor selection for the Ad-QR. Instead, we use the predictor selection results of approach (4) and implement a low dimensional additive models using the R-gamboost package. To by-pass predictor selection, we set the hyper parameter (called mstop) at 5,000.

3.4 Evaluating Predicted Quantiles

Let et()=YN+1QˆYt()(τ|XNqy+1) (t = 1, …, n) be the one-step ahead out-of-sample quantile prediction error (QPE) at quantile level τ, where the superscript (·) refers either to the one of the seven forecast approaches. In this paper, we evaluate the accuracy of the quantile forecasts using an average of the check loss function of QPE values, i. e.


which is an estimate of the expected loss Lτ()=E[YN+1QYt()(τ|XNqy+1)]. Lˆτ() weights the difference between the observed value YN + 1 and the forecasted quantile QˆYt()(τ|XNqy+1) by (1 – τ) when YN + 1 is lower than the τth quantile, and by τ when YN + 1 exceeds the quantile. Hence, eq. (12) is a natural way to evaluate quantile forecasts; see, e. g. Giacomini and Komunjer (2005).

We also make pairwise comparisons between each of the four BM forecast errors et,i(BM)=YN+1Qˆτ,i(BM) (i = 1, …, 4; Nn = 168,…) and the quantile forecast errors obtained from the hybrid method. In particular, we assess their differences via a Diebold–Mariano (DM) type test statistic. To this end, let et(A) and et(B) be the associated one-step ahead QPEs for the pair (A, B) of methods. Then, for fixed τ, the null- and alternative hypotheses of interest are, respectively,


The corresponding loss differential is defined as


The null hypothesis that method A produces as accurate forecasts as method B can be tested using the test statistic


where Dτ(A,B) is the average over t of Dt, τ(A, B). Under the alternative hypothesis, we specify a one-sided test (right tail), so that rejection of the null indicates that method A is more accurate than method B. For h = 1, a consistent estimator of Var(Dτ(A,B)) is given by the sample variance of Dt, τ(A, B). For h > 1, one may use the Newey–West estimator for the variance.[4]

4 Forecast Results

Throughout this section, we present predictor selection and forecasting results for the full sample and for periods of contraction and expansion of the U.S. business cycle, as dated by the U.S. National Bureau of Economic Research. This provides insight in the relative strength of the proposed combined method in forecasting conditional quantiles of Rt during each time period. For the full sample period F = 624, and F = 534 (90) for the expansion (contraction) time periods.

4.1 Results

4.1.1 Predictor Selection

Table 1 shows predictor selection frequencies of five models/methods for the full sample period. To ease interpretation, we only report those predictors which are selected in at least 5 % of the times for the three quantile levels τ.

Table 1:

Frequency of predictors which are selected in at least 5 % of the times. Full prediction sample.

























Quantile level τ = 0.1
BM-1 10.3 17.0 10.1 12.8 27.7
BM-2 13.3 8.7 17.1 8.7 6.1 10.7 5.9 6.7
BM–3 21.5 6.1 15.1 5.8 18.1 14.1 5.8
BM–4 6.4 11.2 18.6 19.1 15.7 6.6 6.4
PA-1 12.9 10.9 14.8 8.6 6.6 5.7 5.1 11.3 10.2
PA-2 34.0 5.5 10.1 5.8 6.6 5.1 5.0 12.5
PA-3 6.1 14.9 8.2 12.6 13.7 6.4 5.4 8.4 5.8
PA-4 17.3 9.0 14.1 15.3 6.9 10.0 10.7 5.4
PA-NP 5.3 10.7 12.7 7.3 7.5 5.8 8.9 7.9
P-Lin-QR 7.0 7.2 9.3 9.1 6.4 8.6 7.4 9.2 5.8
H-1 6.4* 8.0* 9.6* 7.4* 7.1* 6.3* 9.9* 6.0*
H-2 5.7* 5.1 7.8* 8.2* 5.0 7.2 5.1* 6.2* 8.7* 5.4*
H-3 5.1* 7.1* 7.6* 8.8* 6.5* 5.6* 8.1* 6.1*
H-4 5.8* 7.0 7.2* 5.4* 8.0* 7.1 5.0* 6.6* 8.6* 5.4*
Quantile level τ = 0.5
BM–1 10.1 17.6 10.6 13.1 26.3
BM–2 8.3 34.8 9.9 9.5 26.0 5.9
BM–3 16.5 5.3 25.8 11.1 23.7
BM–4 5.6 19.2 13.8 14.4 25.0
PA-1 17.0 7.2 8.3 30.6 9.7 7.0 8.6

Table 1:


























Quantile level τ = 0.5
PA-3 44.0 9.2 5.4 8.0 9.9 6.4 6.8
PA-4 44.7 10.1 13.5 8.0 7.5
PA-NP 12.4 18.1 6.7 21.3 11.7 6.0 15.7
P-Lin-QR 16.6 12.8 21.8 15.7 9.8 5.7 6.0
H-1 5.4* 9.0* 10.7* 10.4* 6.3, 5.9* 6.3* 5.1* 8.6*
H-2 6.1* 10.2* 11.7* 11.1* 13.1* 5.7* 8.3* 5.7* 9.2*
H-3 5.9* 5.4* 10.0* 10.8* 6.2* 6.6 8.1* 8.9*
H-4 5.8* 9.4* 10.7* 10.9* 6.2* 6.7 8.1* 8.9*
Quantile level τ = 0.9
BM–1 8.0 11.7 7.0 8.8 20.3 9.2 10.6 8.6
BM–2 19.7 25.5 12.3 17.6 12.8
BM–3 29.2 12.4 34.0 15.6
BM–4 27.8 21.4 29.2
PA-NP 7.6 11.3 10.6 5.7 5.7 5.1 5.1
P-Lin-QR 6.1 6.3 11.5 8.0 7.6 8.3 5.3 9.8 7.2
H-1 6.3* 14.5* 9.7* 7.4 9.7* 6.1* 8.9*
H-2 6.5* 15.1 10.7* 10.0* 8.8* 13.2 9.3*
H-3 6.6* 14.8, 8.6* 10.3* 8.5 10.6* 8.0* 7.0*
H-4 6.6* 15.4 11.2* 10.1* 8.7* 13.1 9.4*

  1. Notes: (i) BM = Benchmark Model, PA = Penalized Averaging, PA-NP = Penalized Averaging with NP marginal quantiles, P-Lin-QR = Penalized linear model with quantile regression, H = Hybrid; (ii) * denotes NP marginal predictors selected by the hybrid method.

Five important observations follow from Table 1. First, it is clear that only a small subset of the total number of macroeconomic predictors plays a role in the selection of the final set of predictors at all quantile levels. In particular this applies to the set of predictors {RVOL, NTIS, TBL, LTR, TMS}. Second, only a few technical indicators are selected as a part of the overall set of predictors. Overall, MA(3,9), MOM(12), VOL(1,12), and VOL(3,9) contribute to the forecasting performance of the hybrid quantile averaging approach. But as can be seen from the selection frequencies, their contribution is relatively low. These results differ from the study by Neely et al. (2014), who showed that technical indicators provide complementary information to conditional mean forecasting. Third, we find considerable differences in the selection of predictors at the three quantile levels. For instance, INFL contributes to the forecasting performance of all models/methods at τ = 0.1 while at τ = 0.9 the selection frequency of this variable is below 5 %. Also the macroeconomic predictor DP is selected frequently at τ = 0.9 by the hybrid approach and hardly anywhere else by other models/methods. Fourth, very few macroeconomic predictors have more than 10 % selection scores. Some exceptions are DE (44.0 %) with method PA-3, RVOL (44.7 %) with method PA-4, and TBL (34.8 %) with method BM-2. Lastly, nonparametric marginal predictors are frequently selected by the hybrid approach, as denoted by the superscript *. Contributory predictors obtained from parametric conditional quantiles are RVOL, DFY and DFR at τ = 0.1, LTR and DFR at τ = 0.5, and RVOL, TBL, LTY, and DFR at τ = 0.9.

4.1.2 Forecasting Performance

Table 2 presents ratios of thick loss functions, ρτ(et(Method))/ρτ(et(BM-i))(i=1,,4), averaged over F = 624 predictions, i. e. the full prediction sample. Values less (greater) than one indicate that a particular method is more (less) accurate than a particular benchmark model (BM).

Table 2:

Comparing conditional quantile averaging methods. The entries are ratios of thick loss functions, ρτ(et(Method))/ρτ(et(BM-i)), averaged over F = 624 predictions (i = 1, …, 4). Embolded entries show the lowest ratios for each τ value and each BM model.

Methodτ = 0.1
τ = 0.5
τ = 0.9

  1. Notes: (i) EW = Equal Weighting, BM = Benchmark Model, PA = Penalized Averaging; PA-NP = Penalized Averaging with NP marginal quantiles, P-Lin-QR = Penalized linear model with quantile regression and Ad-QR = Additive quantile regression; (ii) For the purpose of this table, the entries are measured up to three decimal figures such that more than one ratio of thick loss functions may have the lowest value.

The hybrid approach has the best forecasting results relative to the BM models and across all values of τ. Indeed, without any exception, the gains are considerable. Also these gains are present over all (semi-)parametric models and methods. The ability of the hybrid approach to capture nonlinear effects (e. g. volatility) in a simple way is a likely reason for this success.

Two other results in Table 2 are noteworthy. First, the EW-2–EW-4 quantile forecasts are better than the forecasts obtained from BM-1 to BM-4 at all quantile levels. This is interesting, since the EW combination approach uses fixed (non-random) weights of conditional quantile forecasts across all time periods while the BM approach uses random weights obtained by choosing the best predictor at each time period. Second, the Ad-QR approach has lower thick loss function values than the P-Lin-QR approach across almost all BM models and τ values. This is another indication that allowing for nonlinearities in a quantile forecasting approach can lead to systematic improvements.

The use of the test statistic Dτ to the pair (H-i, BM-i) (i = 1, …, 4) resulted in very small p-values at all quantile levels τ. Thus, indicating that there are no benefits in using the BM approach. We also computed Dτ for pairwise comparisons between the hybrid approach and all other models/methods.[5] In all cases, the p-values do not exceed the 5 % nominal significance level. Hence, in summary, we conclude that the hybrid averaging approach yields statistically more accurate quantile forecasts than the (semi-)parametric models/methods.

4.2 Expansion and Contraction

Table 3, presents model selection frequencies of the hybrid approach for the expansion (Panel A) and contraction (Panel B) period. As expected from the results presented in Table 1, the list of selected predictors no longer includes the predictors EP, BM, LTY, MOM(9), and VOL(2,12). Panel A indicates that the set of macroeconomic predictors {DY, DE, RVOL, NTIS, TBL, LTR, TMS} have considerable forecasting power at all quantile levels. On the other hand, Panel B indicates that during periods of recession (contraction), the set of macroeconomic predictors {LTR, TMS, DFY, INFL} contribute to the forecasting performance of the hybrid quantile averaging approach. So, except for LTR, there is no overlap between both sets, given the 5 % selection threshold. Further, we see from Table 3 that only a few technical indicators contribute to the prediction performance of the hybrid approach. Overall, the list of selected predictors in Table 3 is not markedly different from the one given in Table 1, and the selection frequencies are about the same. It is also evident from Table 3 that the conditional parametric quantile estimates of the predictors RVOL, LTR, DFR, and INFL all contribute to the forecasting power of the hybrid approach.

Table 4 presents averaged ratios of the thick loss functions ρτ(et(Method))/ρτ(et(BM-i))(i=1,,4) for the periods of expansion and contraction. For the expansion period (Panel A), the embolded entries indicate that the hybrid quantile approach has higher predictive power than all other methods/models irrespective of the quantile level τ. These findings are in line with the earlier evidence for the full sample period. Turning to the contraction period (Panel B), we see similar improvements of the hybrid quantile approach in the case of τ = 0.1 and τ = 0.9. For τ = 0.5, it appears that the prediction performance of Ad-QR is slightly “better” than the four HW methods. So the hybrid forecasting results are quite robust in the above two sub-sample periods.

Table 3:

Frequency of predictors which are selected in at least 5 % of the times. Expansion and contraction period.

Panel A: Expansion
0.5H-15.2*9.5*10.8*10.6*6.7, 5.6*6.4*5.0*8.7*
H-35.7*5.5*10.0*11.1*7.0, 5.8*8.2*9.9*
H-45.6*9.9*10.7*11.1*7.1, 5.8*8.2*9.0*
H-36.6*4.7, 8.0*10.8*8.611.2*8.2*7.3*
Panel B: Contraction

  1. Notes: (i) H = Hybrid; (ii) * denotes NP marginal predictors selected by the hybrid method.

Table 4:

Comparing conditional quantile averaging methods for the expansion and contraction period. The entries are ratios of thick loss functions, ρτ(et(Method))/ρτ(et(BM-i)), averaged over F = 534 (expansion) and F = 90 (contraction) predictions (i = 1, …, 4). Embolded entries show the lowest ratios for each τ value, each BM model, and each time period.

Methodτ = 0.1
τ = 0.5
τ = 0.9
Panel A: Expansion
Panel B: Contraction

  1. Notes: (i) EW = Equal Weighting, BM = Benchmark Model, PA = Penalized Averaging; PA-NP = Penalized Averaging with NP marginal quantiles, P-Lin-QR = Penalized linear model with quantile regression regression and Ad-QR = Additive quantile regression; (ii) For the purpose of this table, the entries are measured up to three decimal figures such that more than one ratio of thick loss functions may have the lowest value.

5 Conclusions

We proposed a hybrid approach to combine relevant information from parametric and semiparametric quantile forecasts in high dimensions. It rests on the idea of combining, or averaging misspecified candidate parametric and nonparametric quantile forecasts, which in this study are the marginal quantiles, to form an approximation of the true combined conditional quantile. One advantage of the approach is that the weights of the combined quantile forecasts are unknown as opposed to methods with known, fixed, combinations of predictors. Indeed, we find overwhelming empirical evidence that in terms of quantile forecasting performance our hybrid approach works well in identifying relevant predictors from a large set of macroeconomic predictors and technical indicators and, more importantly, results in improved combined out-of-sample forecasts over (semi-)parametric models/methods. We also provided some insights as to where the gain of the hybrid method comes from. Further, we have seen that very different forecast results emerge in the right tail of the conditional quantile distribution. These results are mainly based on the prediction performance of the set of 14 macroeconomic variables, while there is hardly any evidence supporting the quantile predictive performance of the 14 technical indicators.

These empirical results raise questions for further research. In particular, it may be of interest to gain further insight in the forecasting performance of the hybrid approach via a controlled Monte Carlo experiment. One may also explore the possibility of making economic gains from utilizing information in the tails of the return distribution via application of the hybrid approach. Extending the paper to multi-step ahead conditional quantile forecasting within the current hybrid framework is another topic worth investigating.


We would like to thank two anonymous referees for their helpful comments and suggestions.


Audrino, F., and L. Camponovo. 2017. “Oracle Properties, Bias Correction, and Bootstrap Inference for Adaptive Lasso for Time Series M-estimators.” Journal of Time Series Analysis 39: 111–28. in Google Scholar

Bai, J., and S. Ng. 2008. “Forecasting Economic Time Series Using Targeted Predictors.” Journal of Econometrics 146: 304–17. in Google Scholar

Bayer, S. 2018. “Combining Value-at-Risk Forecasts Using Penalized Quantile Regressions.” Econometrics and Statistics 8: 56–77. in Google Scholar

Bonaccolto, G., M. Caporin, and S. Paterlini. 2018. “Asset Allocation Strategies Based on Penalized Quantile Regression.” Computational Management Science 15: 1–32. in Google Scholar

Cenesizoglu, T., and A. Timmermann. 2012. “Do Return Prediction Models add Economic Value?” Journal of Banking & Finance 36: 2974–87. in Google Scholar

Chen, J., D. Li, O. Linton, and Z. Lu. 2018. “Semiparametric Ultra-high Dimensional Model Averaging of Nonlinear Dynamic Time Series.” Journal of the American Statistical Association 113: 919–32. in Google Scholar

De Gooijer, J. G., and D. Zerom. 2019. “Semiparametric Quantile Averaging in the Presence of High-Dimensional Predictors.” International Journal of Forecasting 35: 891–909. in Google Scholar

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani. 2004. “Least Angle Regression.” The Annals of Statistics 32: 407–99. in Google Scholar

Exterkate, P., D. Van Dijk, C. Heij, and P. J. F. Groenen. 2011. “Forecasting the Yield Curve in a Data-Rich Environment Using the Factor-Augmented Nelson–Siegel Model.” Journal of Forecasting 32: 193–215. in Google Scholar

Garcia, M. G. P., M. C. Medeiros, G. F. R. Vasconcelos. 2017. “Real-Time Inflation Forecasting with High-Dimensional Models: The Case of Brazil.” International Journal of Forecasting 33: 697–93. in Google Scholar

Giacomini, R., and I. Komunjer. 2005. “Evaluation and Combination of Conditional Quantile Forecasts.” Journal of Business and Economic Statistics 23: 416–31. in Google Scholar

Giovannelli, A. 2012. “Nonlinear Forecasting Using Large Datasets: Evidence on US and Euro Area Economies?” CEIS Tor Vergata, Research Paper Series, Vol. 10, Issue 13, No. 255.Search in Google Scholar

Gu, Y., J. Fan, L. Kong, S. Ma, and H. Zou. 2018. “ADMM for High-dimensional Sparse Penalized Quantile Regression.” Technometrics 60: 319–31. in Google Scholar

Han, Y., and R. S. Tsay. 2019. “High-Dimensional Linear Regression for Dependent Data with Applications to Now casting.” Statistica Sinica (forthcoming). in Google Scholar

Jiang, B., G. Athanasopoulos, R. J. Hyndman, A. Panagiotelis, and F. Vahid. 2018. “Macroeconomic Forecasting for Australia Using a Large Number of Predictors.” Working paper 17/02, Department of Econometrics and Business Statistics, Monash Business School.Search in Google Scholar

Kong, E., and Y. Xia. 2014. “An Adaptive Composite Quantile Approach to Dimension Reduction.” The Annals of Statistics 42: 1657–88. in Google Scholar

Konzen, E., and F. A. Ziegelmann. 2016. “LASSO-type Penalties for Covariate Selection and Forecasting in Time Series.” Journal of Forecasting 35: 592–612. in Google Scholar

Lee, E. R., H. Noh, and B. U. Park. 2014. “Model Selection via Bayesian Information Criterion for Quantile Regression Models.” Journal of the American Statistical Association 109: 216–29. in Google Scholar

Lima, L. R., and F. Meng. 2017. “Out-of-Sample Return Predictability: A Quantile Combination Approach.” Journal of Applied Econometrics 32: 877–95. in Google Scholar

Ma, S., R. Fildes, and T. Huang. 2016. “Demand Forecasting with High Dimensional Data: The Case of SKU Retail Sales Forecasting with Intra- and Inter-Category Promotional Information.” European Journal of Operational Research 249: 245–57. in Google Scholar

Medeiros, M. C., and E. F. Mendes. 2014. “Penalized Estimation of Semi-Parametric Additive Time-Series Models.” In Essays in Nonlinear Time Series Econometrics, edited by N. Haldrup, M. Meitz, and P. Saikkonen. Oxford Scholarship Online. pp. 215–237. in Google Scholar

Medeiros, M. C., and E. F. Mendes. 2016. “ℓ1-Regularization of High-Dimensional Time-Series Models with Non-Gaussian and Heteroskedastic Errors.” Journal of Econometrics 191: 255–71. in Google Scholar

Meligkotsidou, L., E. Panopoulou, I. Vrontos, and S. Vrontos. 2014. “A Quantile Regression Approach to Equity Premium Prediction.” Journal of Forecasting 33: 558–76. in Google Scholar

Meligkotsidou, L., E. Panopoulou, I. Vrontos, and S. Vrontos. 2019. “Quantile Forecast Combinations in Realised Volatility Prediction.” Journal of the Operational Research Society (forthcoming). in Google Scholar

Neely, C. J., D. E. Rapach, J. Tu, and G. Zhou. 2014. “Forecasting the Equity Risk Premium: The Role of Technical Indicators.” Management Science 60: 1772–91. in Google Scholar

Pedersen, T. Q. 2015. “Predictable Return Distributions.” Journal of Forecasting 34: 114–32. in Google Scholar

Rapach, D. E., and G. Zhou. 2013. “Forecasting Stock Returns.” In Handbook of Economic Forecasting, edited by G. Elliott, and A. Timmermann, Vol. 2, 328–383, Part A. North-Holland: Elsevier. in Google Scholar

Sherwood, B., and L. Wang. 2016. “Partially Linear Additive Quantile Regression in Ultra-High Dimension.” The Annals of Statistics 44: 288–317. in Google Scholar

Similä, T., and J. Tikka. 2006. “Common Subset Selection of Inputs in Multiresponse Regression.” In Proceedings of the IEEE International Joint Conference on Neural Networks, Vancouver, Canada, 1908–1915. in Google Scholar

Wang, H., and C. Leng. 2008. “A Note on Adaptive Group Lasso.” Computational Statistics and Data Analysis 52: 5277–86. in Google Scholar

Welch, I., and A. Goyal. 2008. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies 21: 1455–508. in Google Scholar

West, K. D. 2006. “Forecast Evaluation.” In Handbook of Economic Forecasting, edited by G. Elliott, C. W. J. Granger, and A. Timmermann, Vol. 1, 99–134. North-Holland: Elsevier. in Google Scholar

Published Online: 2019-12-17

© 2019 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Scroll Up Arrow