Accessible Unlicensed Requires Authentication Published by De Gruyter February 1, 2018

Bayesian estimation of Gegenbauer long memory processes with stochastic volatility: methods and applications

Andrew Phillip, Jennifer S.K. Chan ORCID logo and Shelton Peiris ORCID logo


This paper discusses a time series model which has generalized long memory in the mean process with stochastic volatility errors and develops a new Bayesian posterior simulator that couples advanced posterior maximisation techniques, as well as traditional latent stochastic volatility estimation procedures. Details are provided on the estimation process, data simulation, and out of sample performance measures. We conduct several rigorous simulation studies and verify our results for in and out of sample behaviour. We further compare the goodness of fit of the generalized process to the standard long memory model by considering two empirical studies on the US Consumer Price Index (CPI) and the US equity risk premium (ERP).


We thank two anonymous referees and the editor for their constructive comments and invaluable suggestions. These have improved the quality and readability of the paper.

Appendix A

Tuning the proposal distribution of [u, d]

In order to achieve high efficiency when sampling [u, d], we tune the precision parameter of the proposal distribution(s). An acceptance rate that is too high could mean the proposal variance is too low and always accepting values around the current value. An acceptance rate that is too low could mean the proposal variance is too high and always rejecting, and therefore, the chain is not moving.

Gelman et al. (2013) notes that care must be taken when tuning to avoid convergence to the wrong distribution. Since the updating rule is dependant on our previous simulation steps the transition probabilities are now more complicated than before. The chain may move more quickly through flat parts of the distribution and slower through non-smooth parts of the distribution. This of course would result in the incorrect sampling of the entire proposal distribution. The general advice here to rectify such a situation is to tune in one phase of the sampling, and make the relevant inferences from a second phase where no tuning is performed. We follow this advice and tune only in the burn-in period.

We calculate the acceptance rate for every 250 MCMC iterates. If this acceptance rate is below 15% or above 50%, then we update our tuning parameter cu and cd according to:


where Φ1() is the inverse Normal CDF, poptimal is the optimal acceptance rate and pcurrent is the current acceptance rate. Roberts and Rosenthal (2001) prove an acceptance rate between 15% and 50% is at least 80% efficient. We choose an optimal acceptance rate of 23.4% due to the seminal work of Roberts, Gelman, and Gilks (1997).

This procedure is repeated for every 250 loops, and pcurrent resets after each 250 MCMC set (i.e. from 1 to 250, from 251 to 500, …). After the burn-in has completed, we record only one acceptance rate, which is what is reported in all of our inferences.

Appendix B

Estimating h

We discuss here the estimation of the latent variable h=[h1,,hT]. Clearly, modifying this to cater for alternative means in the observation equation is trivial, so we discuss the estimation of the GARMA-SV model only to focus on the relevant derivations. In essence, we modify the precision sampler of Chan (2013) to to exploit the banded structure of p(h|α,β,σ2). First, we seek a linear expression for h:


where A·B refers to the Hadamard product of A and B. Let Y=logGJ2Y2=[y1,,yT] and ε=[logε12,,logεT2] for notational convenience. The sampling of ε is highly non-standard as this is now a log-χ12 distribution. Kim, Shephard, and Chib (1998) suggest to approximate this using an offset Gaussain mixture representation. Essentially, the probability density function is approximated as:


Each pi is the probability of the ith mixture component. The authors estimate K,pi,μi,σi2 by matching the first four moments of the true theoretical distributions. This is performed using non-linear least squares optimization techniques until the approximating densities are within a small distance to the true density.

Kim, Shephard, and Chib (1998) find satisfactory results with K = 7 mixture components, however, Omori et al. (2007) remark that K = 10 is a more reliable fit when leverage effects are considered. Although we do not consider leverage effects in our work, we favour this more conservative approach and use the following parameters as shown in Table 5.

Table 5:

K = 10 mixture components as found in Omori et al. (2007).


Evidently, these parameters do not need to be estimated during each MCMC sweep since they are independant of all other parameters in the sampler. It should be noted that the mixture density can be written in terms of a component indicator variable st such that P(st=i)=pi. Therefore, it is computationally cheap to sample the mixture components, which are denoted as s.

It is worthwhile to reinforce here that s is a T × 1 vector, and we sample st for each time point. Each st is independant so that p(s|Y,h)=t=1Tp(st|yt,ht). Since st is discrete, it is easy to sample using the slice sampler.

Once st has been sampled, we are able to sample ε as:


where με=(μs1,,μsT), Σε=diag(σs12,,σsT2). Hence, it is clear to see that


So the likelihood of Y is


Recall that


which can be written out in matrix notation as






So that

h|α,β σ2N(Hϕ1α~,(HϕΣh1Hϕ)1)



where μh=Hϕ1α~. However, it is clear to see that |(HϕΣh1Hϕ)1|=σ2T/(1β2). So the log-density of h can be expressed as:


Therefore, the full conditional distribution of log(h) is:

(14)logp(h|Y,α,β σ2)logp(Y|s,h,α,β σ2)+logp(h|α,β σ2)=12(Yhμε)Σε1(Yhμε)12logσ2T(1β2)12(hμh)(HϕΣh1Hϕ)(hμh)(Yhμε)Σε1(Yhμε)+(hμh)(HϕΣh1Hϕ)(hμh)=(Yμε)Σε1(Yμε)(Yμε)Σε1hhΣε1(Yμε)+hΣε1h+hHϕΣh1HϕhhHϕΣh1HϕμhμhHϕΣh1Hϕh+μhHϕΣh1Hϕμh(Yμε)Σε1hhΣε1(Yμε)+hΣε1h+hHϕΣh1HϕhhHϕΣh1α~α~Σh1Hϕh=h(Σε1+HϕΣh1Hϕ)h2h[Σε1(Yμε)+HϕΣh1α~].

Now, consider some multivariate Gaussian distribution θN(μθ,Σθ) with log PDF


If we compare 14 with 15, then it is clear to see that


Finally, the posterior distribution of h can be sampled as a block from


Appendix C

Estimating the marginal likelihood

Suppose we want to compare a set of models {M1,,MK} in a Bayesian setting. The frequentist is able to use the classical log likelihood ratio test, which if of course distributed as a χ2 with degrees of freedom equal to the difference in parameters between the two models. In Bayesian analysis, we use the Bayes Factor, which is given by




The quantity in 17 is called the marginal likelihood (ML). It can be shown to be asymptotically equivalent to the Schwarz Information Criterion. The estimation of the ML is typically nontrivial, and as such, we use the importance sampler to do so. See Kroese, Taimre, and Botev (2011) for a textbook treatment of the importance sampler. In general, the importance sampler estimates the quantity:


where the function H ∈ ℜ and f is the density of a random vector. Now suppose that r is another density such that Hf is dominated by r. We can represent ℓ as


Consequently, if X1,,XN are identical and independant samples from the importance density r(x), then


is an unbiased, simulation-consistent estimator of ℓ. We can apply the importance sampling scheme 18 to that of the marginal likelihood 17 so that:


The ML is estimated as


where each θ(i) is a draw from the importance density, r(θ(i)), and R is the total number of draws.

The choice of r is important. Ideally, we would use the posterior as it carries all the information that we need, but the normalizing constant is not known. We instead use something as close as possible – p(θ^), where p(θ) is the prior defined below in 20, and θ^ is the posterior mean of all parameters It can be shown that this density minimizes the Kullback-Leibler distance to the posterior.

In the case of the GARFIMA-SV model, it is easy to see that p(θ(i)) is




where Φ(x;m,s2) is the normal CDF with mean m and variance s2 evaluated at x, and 1 is an indicator variable that is equal to 1 when |u|<1, and 0 otherwise. Note that we restrict the support of [u, d] and β to 1ud and (−1, 1) with normalizing constants [Δu,Δd] and Δβ respectively to ensure the estimation of the marginal likelihood is not biased.

We work with logarithms for ease of computation. Thus,


Now, the observed-likelihood (or the integrated-likelihood) p(y|θ) is calculated as


This again, is a nontrivial quantity to estimate. We once again use an importance sampling scheme to estimate the term p(y|θ,Mk) in (19):


It can be shown that a good approximating density for p(h(i)) is p(h|y,θ). Hence,


where h~ is the mode of h, and Γ=GJVGJ. The mode can be found using a search method, such as a Newton-Raphson scheme.

Appendix D

Simulation study diagnostics

Figure 7: Each graph depicts the mean Sample Autocorrelation of 1000 MCMC runs of û for various values of [u, d]. The most notable observation is that when d is low, and as $u\rightarrow 0.\dot{9}$u→0.9˙ the convergence of u to its true value gets slower. This is an expected result, since as d → 0, the process has less information, and becomes “less long-memory.” Furthermore, this slow decay is not a result of boundary issues, since we do not see the same slow decay with d = 0.45, which too is 0.05 units away from the boundary.

Figure 7:

Each graph depicts the mean Sample Autocorrelation of 1000 MCMC runs of û for various values of [u, d]. The most notable observation is that when d is low, and as u0.9˙ the convergence of u to its true value gets slower. This is an expected result, since as d → 0, the process has less information, and becomes “less long-memory.” Furthermore, this slow decay is not a result of boundary issues, since we do not see the same slow decay with d = 0.45, which too is 0.05 units away from the boundary.

Figure 8: Each graph depicts the mean Sample Autocorrelation of 1000 MCMC runs of $\hat{d}$d^ for various values of [u, d]. Similarly to u, d exhibits slow decay for low values of d.

Figure 8:

Each graph depicts the mean Sample Autocorrelation of 1000 MCMC runs of d^ for various values of [u, d]. Similarly to u, d exhibits slow decay for low values of d.

Table 6:

Gelman-Rubin statistics for the in-sample simulation study using T =1500. All parameters have a statistic close to 1, which is suggestive of convergence.



Artiach, M., and J. Arteche. 2012. “Doubly Fractional Models for Dynamic Heteroscedastic Cycles.” Computational Statistics & Data Analysis 56: 2139–2158.Search in Google Scholar

Atkeson, A., and L. E. Ohanian. 2001. “Are Phillips Curves Useful for Forecasting Inflation?” Federal Reserve Bank of Minneapolis. Quarterly Review-Federal Reserve Bank of Minneapolis 25: 2.Search in Google Scholar

Aye, G. C., M. Balcilar, R. Gupta, N. Kilimani, A. Nakumuryango, and S. Redford. 2014. “Predicting BRICS Stock Returns Using ARFIMA Models.” Applied Financial Economics 24: 1159–1166.Search in Google Scholar

Baillie, R. T. 1996. “Analysing Inflation by the Fractionally Integrated ARFIMA-GARCH Model,” Journal of Applied Econometrics 11: 23–40.Search in Google Scholar

Baillie, R. T., T. Bollerslev, and H. O. Mikkelsen. 1996. “Fractionally Integrated Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics 74: 3–30.Search in Google Scholar

Bardet, J.-M., G. Lang, G. Opeenhiem, A. Philippe, and M. S. Taqqu. 2003. Theory and Applications of Long-Range Dependance, 579–623. Berlin: Springer Science+Business Media.Search in Google Scholar

Bhardwaj, G., and N. R. Swanson. 2006. “An Empirical Investigation of the Usefulness of ARFIMA Models for Predicting Macroeconomic and Financial Time Series.” Journal of Econometrics 131: 539–578.Search in Google Scholar

Bordignon, S., M. Caporin, and F. Lisi. 2007. “Generalised Long-Memory GARCH Models for Intra-daily Volatility.” Computational Statistics & Data Analysis 51: 5900–5912.Search in Google Scholar

Bos, C. S., S. J. Koopman, and M. Ooms. 2014. “Long Memory with Stochastic Variance Model: A Recursive Analysis for US Inflation.” Computational Statistics & Data Analysis 76: 144–157.Search in Google Scholar

Carlos, J. C., and L. A. Gil-Alana. 2016. “Testing for Long Memory in the Presence of Non-linear Deterministic Trends with Chebyshev Polynomials.” Studies in Nonlinear Dynamics & Econometrics 20: 57–74.Search in Google Scholar

Chan, J. C. 2013. “Moving Average Stochastic Volatility Models with Application to Inflation Forecast.” Journal of Econometrics 176: 162–172.Search in Google Scholar

Chan, N. H., and W. Palma. 1998. “State Space Modeling of Long-Memory Processes.” The Annals of Statistics 26: 719–740.Search in Google Scholar

Cheung, Y.-W. 1993. “Long Memory in Foreign-Exchange Rates.” Journal of Business & Economic Statistics 11: 93–101.Search in Google Scholar

Chib, S., and E. Greenberg. 1994. “Bayes Inference in Regression Models with ARMA(p, q) Errors.” Journal of Econometrics 64: 183–206.Search in Google Scholar

Conrad, C., and M. Karanasos. 2005. “Dual Long Memory in Inflation Dynamics Across Countries of the Euro Area and the Link Between Inflation Uncertainty and Macroeconomic Performance.” Studies in Nonlinear Dynamics & Econometrics 9: 1–36.Search in Google Scholar

Crato, N., and P. Rothman. 1994. “Fractional Integration Analysis of Long-Run Behavior for US Macroeconomic Time Series.” Economics Letters 45: 287–291.Search in Google Scholar

Dissanayake, G., M. Peiris, and T. Proietti. 2016. “State Space Modeling of Gegenbauer Processes with Long Memory.” Computational Statistics & Data Analysis 100: 115–130.Search in Google Scholar

Doornik, J. A., and H. Hansen. 2008. “An Omnibus Test for Univariate and Multivariate Normality.” Oxford Bulletin of Economics and Statistics 70: 927–939.Search in Google Scholar

Gelman, A. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1: 515–534.Search in Google Scholar

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. Hoboken, NJ: CRC Press.Search in Google Scholar

Geweke, J., and G. Amisano. 2011. “Hierarchical Markov Normal Mixture Models with Applications to Financial Asset Returns.” Journal of Applied Econometrics 26: 1–29.Search in Google Scholar

Geweke, J., and S. Porter-Hudak. 1983. “The Estimation and Application of Long-Memory Time Series Models.” Journal of Time Series Analysis 4: 221–238.Search in Google Scholar

Gil-Alana, L. A. 2002. “Modelling the Persistence of Unemployment in Canada.” International Review of Applied Economics 16: 465–477.Search in Google Scholar

Gil-Alana, L. A., and J. Toro. 2002. “Estimation and Testing of ARFIMA Models in the Real Exchange Rate.” International Journal of Finance & Economics 7: 279–292.Search in Google Scholar

Goldman, E., J. Nam, H. Tsurumi, and J. Wang. 2013. “Regimes and Long Memory in Realized Volatility.” Studies in Nonlinear Dynamics & Econometrics 17: 521–549.Search in Google Scholar

Granger, C. W. 1980. “Long Memory Relationships and the Aggregation of Dynamic Models.” Journal of Econometrics 14: 227–238.Search in Google Scholar

Granger, C. W., and R. Joyeux. 1980. “An Introduction to Long-Memory Time Series and Fractional Differencing.” Journal of Time Series Analysis 1: 15–29.Search in Google Scholar

Gray, H. L., N.-F. Zhang, and W. A. Woodward. 1989. “On Generalized Fractional Processes.” Journal of Time Series Analysis 10: 233–257.Search in Google Scholar

Hauser, M., and R. Kunst. 1998. “Fractionally Integrated Models With ARCH Errors: With an Application to the Swiss 1-Month Euromarket Interest Rate.” Review of Quantitative Finance and Accounting 10: 95–113.Search in Google Scholar

Hosking, J. 1981. “Fractional Differencing.” Biometrika 68: 165–176.Search in Google Scholar

Huang, F.-x., Y. Zhao, and T.-s. Hou. 2009. “Long Memory and Leverage Effect of Euro Exchange Rate Based on ARFIMA-FIEGARCH.” In Management Science and Engineering, 2009, 1416–1421. .Search in Google Scholar

Hwang, E., and D. W. Shin. 2014. “Infinite-Order, Long-Memory Heterogeneous Autoregressive Models.” Computational Statistics & Data Analysis 76: 339–358.Search in Google Scholar

Iglesias, P., H. Jorquera, and W. Palma. 2006. “Data Analysis Using Regression Models with Missing Observations and Long-Memory: An Application Study.” Computational Statistics & Data Analysis 50: 2028–2043.Search in Google Scholar

Jacquier, E., N. G. Polson, and P. E. Rossi. 1994. “Bayesian Analysis of Stochastic Volatility Models.” Journal of Business & Economic Statistics 12: 371–389.Search in Google Scholar

Jacquier, E., N. G. Polson, and P. E. Rossi. 2004. “Bayesian Analysis of Stochastic Volatility Models with Fat-Tails and Correlated Errors.” Journal of Econometrics 122: 185–212.Search in Google Scholar

Kass, R. E., and A. E. Raftery. 1995. “Bayes Factors.” Journal of the American Statistical Association 90: 773–795.Search in Google Scholar

Kim, S., N. Shephard, and S. Chib. 1998. “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models.” Review of Economic Studies 65: 361–393.Search in Google Scholar

Kroese, D. P., T. Taimre, and Z. I. Botev. 2011. Handbook of Monte Carlo Methods. Hoboken, NJ: John Wiley & Sons.Search in Google Scholar

Lahiani, A., and O. Scaillet. 2009. “Testing for Threshold Effect in ARFIMA Models: Application to US Unemployment Rate Data.” International Journal of Forecasting 25: 418–428.Search in Google Scholar

Lillo, F., and J. D. Farmer. 2004. “The Long Memory of the Efficient Market.” Studies in Nonlinear Dynamics & Econometrics 8: 1–33.Search in Google Scholar

Lopes, S. R., and T. S. Prass. 2013. “Seasonal FIEGARCH Processes.” Computational Statistics & Data Analysis 68: 262–295.Search in Google Scholar

Mandelbrot, B. 1969. “Long-Run Linearity, Locally Gaussian Process, H-Spectra and Infinite Variances.” International Economic Review 10: 82–111.Search in Google Scholar

Mandelbrot, B. B., and J. W. Ness. 1968. “Fractional Brownian Motions, Fractional Noises and Applications.” SIAM Review 10: 422–437.Search in Google Scholar

Mascagni, M., and A. Srinivasan. 2004. “Parameterizing Parallel Multiplicative Lagged-Fibonacci Generators.” Parallel Computing 30: 899–916.Search in Google Scholar

Mikhail, O., C. J. Eberwein, and J. Handa. 2006. “Estimating Persistence in Canadian Unemployment: Evidence from a Bayesian ARFIMA.” Applied Economics 38: 1809–1819.Search in Google Scholar

Omori, Y., S. Chib, N. Shephard, and J. Nakajima. 2007. “Stochastic Volatility with Leverage: Fast and Efficient Likelihood Inference.” Journal of Econometrics 140: 425–449.Search in Google Scholar

Rainville, E. D. 1960. Special Functions. New York: Macmillan.Search in Google Scholar

Reisen, V. A., A. L. Rodrigues, and W. Palma. 2006. “Estimation of Seasonal Fractionally Integrated Processes.” Computational Statistics & Data Analysis 50: 568–582.Search in Google Scholar

Robert, C. 1995. “Simulation of Truncated Normal Variables.” Statistics and Computing 5: 121–125.Search in Google Scholar

Roberts, G. O., and J. S. Rosenthal. 2001. “Optimal Scaling for Various Metropolis-Hastings Algorithms.” Statistical Science 16: 351–367.Search in Google Scholar

Roberts, G. O., A. Gelman, and W. R. Gilks. 1997. “Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms.” The Annals of Applied Probability 7: 110–120.Search in Google Scholar

So, M. K., K. Lam, and W. K. Li. 1998. “A Stochastic Volatility Model with Markov Switching.” Journal of Business & Economic Statistics 16: 244–253.Search in Google Scholar

So, M. K. P., W. K. Li, and K. Lam. 2002. “A Threshold Stochastic Volatility Model.” Journal of Forecasting 21: 473–500.Search in Google Scholar

Sowell, F. 1992. “Modeling Long-Run Behavior with the Fractional ARIMA Model.” Journal of Monetary Economics 29: 277–302.Search in Google Scholar

Stock, J. H., and M. W. Watson. 2007. “Why Has U.S. Inflation Become Harder to Forecast?” Journal of Money, Credit and Banking 39: 3–33.Search in Google Scholar

Taylor, S. 1986. Modelling Financial Time Series. Chichester; New York: Wiley.Search in Google Scholar

Turkyilmaz, S., and M. Balibey. 2014. “Long Memory Behavior in the Returns of Pakistan Stock Market: ARFIMA-FIGARCH Models,” International Journal of Economics and Financial Issues 4: 400–410.Search in Google Scholar

Wang, J. J., S. T. B. Choy, and J. S. Chan. 2013. “Modelling Stochastic Volatility Using Generalized t-distribution.” Journal of Statistical Computation and Simulation 83: 340–354.Search in Google Scholar

Supplemental Material

The online version of this article offers supplementary material (DOI:

Published Online: 2018-2-1

©2018 Walter de Gruyter GmbH, Berlin/Boston