Skip to content
Publicly Available Published by De Gruyter March 8, 2016

Robust estimation of nonstationary, fractionally integrated, autoregressive, stochastic volatility

  • Mark J. Jensen EMAIL logo


Empirical volatility studies have discovered nonstationary, long-memory dynamics in the volatility of the stock market and foreign exchange rates. This highly persistent, infinite variance, but still mean reverting, behavior is commonly found with nonparametric estimates of the fractional differencing parameter, d, for financial volatility. In this paper, a fully parametric Bayesian estimator, robust to nonstationarity, is designed for the fractionally integrated, autoregressive, stochastic volatility (SV-FIAR) model. Joint estimates of the autoregressive and fractional differencing parameters of volatility are found via a Bayesian, Markov chain Monte Carlo (MCMC) sampler. Like [Jensen, M. J. 2004. “Semiparametric Bayesian Inference of Long-memory Stochastic Volatility.” Journal of Time Series Analysis 25: 895–922.], this MCMC algorithm relies on the wavelet representation of the log-squared return series. Unlike the Fourier transform, where a time series must be a stationary process to have a spectral density function, wavelets can represent both stationary and nonstationary processes. As long as the wavelet has a sufficient number of vanishing moments, this paper’s MCMC sampler will be robust to nonstationary volatility and capable of generating the posterior distribution of the autoregressive and long-memory parameters of the SV-FIAR model regardless of the value of d. Using simulated and empirical stock market return data, we find our Bayesian estimator producing reliable point estimates of the autoregressive and fractional differencing parameters with reasonable Bayesian confidence intervals for either stationary or nonstationary SV-FIAR models.

JEL Classification: C11; C14; C22

1 Introduction

Long-memory, the behavior where the correlation in a times series decays at a slow hyperbolic rate as the interval between observations increases, is a prevalent feature of the volatility of financial returns. Nonparametric estimates of volatility such as realized volatility (Andersen et al. 2001a,b, 2003) along with different power transformations of absolute returns (Ding, Granger, and Engle 1993; Granger and Ding 1996) have exhibited long-memory behavior in a wide array of financial asset returns. Fractionally integrated versions of the ARCH and EGARCH model (Baillie, Bollerslev, and Mikkelsen 1996; Bollerslev and Mikkelsen 1996, 1999) and the stochastic volatility model (Breidt, Crato, and de Lima 1998; Harvey 1998) also find long-memory in the volatility of returns.

Different degrees of long-memory has been found in the estimates of the fractional differencing parameter d for realized volatility where stationary values of d are found taking on values in the range of 0.35–0.45 (see Andersen et al. 2001a,b, 2003). Others have found volatility’s fractional order of integration to be significantly greater than the mean-reverting and stationary threshold value of 1/2. Using a FIGARCH model of the S&P 500 composite stock index, Bollerslev and Mikkelsen (1996, 1999) find d to be near 0.6. Hurvich and Ray (2003) find the d in the daily series of the Deutsch Mark-US Dollar exchange rate to equal 0.56 with a fractionally integrated stochastic volatility model. Using a different time period, Harvey (1998) also finds the volatility of the Deutsch Mark-US Dollar exchange rate following a non-stationary, mean-reverting, long-memory process whose variance is infinite by estimating d to be 0.87 with a fractionally integrated stochastic volatility model.

A long-memory process does not have a finite ordered, moving average representation for positive values of d. Instead, its infinite ordered, moving average representation converges when d<1/2, but diverges when 1/2≤d<1 (see Cheung and Lai 1993; Bae, Jensen, and Murdock 2005). This divergences when d is greater than one-half leads to a non-stationary, mean-reverting, long-memory process whose variance is infinite. Such nonstationary values of d present problems for the existing suite of time-varying, conditional, long-memory, variance model estimators. A fractional differencing parameter between one-half and one causes a unit root type volatility model to be unreliable since differencing volatility results in a series that is to close to being non-invertible. Baillie, Bollerslev, and Mikkelsen (1996) and Bollerslev and Mikkelsen (1996, 1999) estimate the fractionally integrated ARCH class of models by truncating the long-memory process’s infinite moving average average representation. The size of the approximation error from this truncation grows as the value of d get larger, and, hence, should in general be avoided regardless of d being greater than one-half or not.

Besides volatility not having a variance when d is greater than one-half, the spectral density function of the volatility process also does not exist. Hence, Fourier based semiparametric estimators of a fractionally integrated stochastic volatility model, for example, Deo and Hurvich (2001), Hurvich and Ray (2003) and Arteche (2004), also suffer when d is greater than one-half. Hurvich, Moulines and Soulier (2005) and Frederiksen and Nielsen (2008) overcome a d greater than one-half by computing the pseudo-spectral density function. They prove the frequency domain, local Whittle estimator of d to be a consistent estimator of d when applied to log-squared returns. However, these frequentist, local Whittle estimators are only asymptotically normal when 0≤d≤3/4 and have been shown to be heavily biased in simulation studies.

In this paper we are interested in providing the Bayesian joint posterior distribution of the short and long-memory parameters under both stationary and nonstationary, fractionally integrated, autoregressive, stochastic volatility (SV-FIAR). Short memory volatility dynamics are governed by stationary values of the autoregressive parameter. Stationary, and nonstationary, long-memory volatility behavior is determined by the fractional differencing parameter. We obtain the posterior distribution of the SV-FIAR model’s autoregression and long-memory parameters by constructing a highly efficient Markov chain Monte Carlo (MCMC) algorithm in the wavelet domain to produce draws from the parameters joint posterior distribution.

Bayesian methods have become more and more the norm in estimating autoregressive, stochastic volatility models (see the Introduction to Shephard 2005). This dominance has occurred because the Bayesian approach integrates away the unobservable volatilities to produce likelihood-based estimates of the volatility process’ unknown parameters. A Bayesian approach is also appealing because, unlike the above frequentist approaches, which rely on asymptotic results, posterior inference is exact and only conditions on the observed assets returns. These advantages have already been applied by Jensen (2004) in the posterior inference of stationary, long-memory, stochastic volatility, but have not yet been applied to the non-stationary case.

The novelty of our Bayesian approach is its use of the Daubechies (1988) class of wavelets to transform a nonstationary, highly persistent, volatility process into the wavelet domain where its wavelet coefficients are nearly uncorrelated but always with finite variance. While the Fourier series is a basis for stationary processes, wavelets provide a basis for the class of stationary and nonstationary processes. Wavelet analysis is the analysis of change where the wavelet coefficient measures the amount of information lost at a point in time when a weighted moving average of larger and larger order is applied to the time series; e.g. for a first differenced time series the original time series can be found by adding back the value of the wavelet coefficients to the differenced times series. It is this decorrelating feature along with the wavelet being a basis function for nonstationary processes that makes wavelet analysis ideal for estimating stationary and nonstationary SV-FIAR models.

We apply our Bayesian sampler to stationary and nonstationary SV-FIAR model quasi-return data. We also us the sampler to estimate a SV-FIAR model for daily stock market return data. For the artificially generated SV-FIAR return data our Bayesian estimator produces good point estimates of the autoregressive parameter and of stationary and nonstationary values of the fractional differencing parameter. Empirically, we find strong evidence of non-stationary, mean reverting, long-memory behavior in 30 years worth of volatility dynamics from the Center for Research in Security Prices value-weighted stock portfolio index. Bollerslev and Mikkelsen (1996) also found similar non-stationary, long-memory behavior in the conditional, hetroskedasticity of stock market returns. However, as mentioned above their approach truncates the long-memory process’s infinite moving average representation causing their estimate of d to capture only part of the long-memory, persistence of volatility. The approach we propose here does not truncate the moving average representation of long-memory and as such our estimator allows the autoregressive term to model the short-term dynamics of volatility and not be contaminated by the left over persistence not modeled by the truncated approximation.

The contents of the paper are as follows. In Section 2 we introduce the fractionally integrated, autoregressive, stochastic volatility model where volatility is highly persistence, situations where the variance of volatility is infinite but the volatility process is still mean-reverting, and others where volatility is nonstationary. Bayesian inference of the stochastic volatility model is explained in Section 3. Section 4 then provides the necessary wavelet theory to construct the paper’s sampling algorithm. We recommend either Mallat (1999) or Percival and Walden (2000) to those interested in a more complete description of the statistical theory behind wavelet analysis. Section 5 constructs the Markov chain Monte Carlo simulator using a normal mixture representation the log-squared returns wavelet coefficients unknown distribution. Applications of the sampler are found in Section 6 where the fully parametric Bayesian wavelet estimator is applied to simulated SV-FIAR models and daily compounded stock market return data. Lastly, Section 7 summarizes our findings.

2 SV-FIAR model

Define the fractionally integrated, autoregressive, stochastic volatility model (SV-FIAR) as

(2)(1ϕB)(1B)dvt=σηηt,   t=1,,T,

where at time t the mean corrected return from holding a financial instrument is yt. The series vt is the unobservable log-volatility of returns where Eq. (2) shows vt following a fractionally integrated, first-order autoregressive process. We ignore the possible presence of leverage effects (see Nelson 1991; Harvey and Shephard 1996) by assuming the innovations, ξt and ηt, to be uncorrelated, standard normal processes. The parameter ση is the standard deviation of log-volatility and σ is the modal instantaneous volatility. To ensure stationarity and invertiblility in the autoregressive dynamics of vt, the absolute value of the autoregressive parameter, ϕ, is assumed to be less than one. Lastly, we denote the parameter vector of the SV-FIAR model as β=(σ, ση, d, ϕ)′.

2.1 Stationary behavior

As the product of two independent stochastic processes, one being the Gaussian white noise process, ξt, whose variance is known to be one, and the other being exp{vt/2}, all the odd moments of yt will be zero, whereas its second and fourth order moments depend directly on the corresponding moments of vt. Because ηt is assumed to be standard normal, exp{vt/2} is distributed log-normal with a mean of one. If the variance of vt exists it follows that the unconditional variance, Var[yt]=σ2exp{σv2/2}, where σv2Var[vt]. When the variance of vt does not exist, yt continues to be a martingale difference process but its unconditional variance is no longer defined. Hence, stationarity of yt will depend on the variance of vt. If the variance of vt is finite (infinite), then yt will be a (strictly) stationary process.

Long-memory processes like vt were independently introduced by Granger and Joyeux (1980) and Hosking (1981).[1] These long-memory processes are defined in terms of the fractional differencing operator, (1–B)d, whose binomial expansion equals

(1B)d=j=0πj(d)Bj,   where πj(d)=0<kjk1dkjd1Γ(d),   as j,

Bj is the lag operator, Bjxtx(tj), j=0, 1, 2, …, and Γ(·) equals the gamma function. Inverting the differencing operator results in


where ψj(d)=πj(–d)~jd−1/Γ(d), as j→∞, and F(·) is the hypergeometric function (see Gradshteyn and Ryzhik 1994, p. 1066). When 0<d<1/2, the ψjs are not absolutely summable, instead, the sequence {ψj} is square-summable; i.e. j=0ψj2< (see Beran 1994).

Combining our assumption of |ϕ|<1 with the square summability condition, 0<d<1/2, it directly follows that vt is a unique, mean zero, covariance stationary process (see Brockwell and Davis 1993, Theorem 13.2.2). Under these assumptions the latent volatility process, vt, exhibits the variance


and the long-memory behavior captured by the slow hyperbolic rate of decay of its autocovariance

E[vtvt+s]|s|2d1   ass,

where as~bs indicates as/bsc, for some positive constant c, and the exponential rate of increase at frequency zero of its power spectrum

(4)Sv(ω)=ση22π1|1ϕeiω|2|1eiω|2d,=ση22π1|1ϕeiω|2[4sin2ω]d,ση22π1|1ϕ|2ω2d   as ω0.

2.2 Strictly stationary behavior

Past research on long-memory stochastic volatility models has primarily focused on the case where vt is a weakly stationary, long memory process with no autoregressive terms; i.e. 0<d≤1/2 and ϕ=0 (see Deo and Hurvich 2001; Jensen 2004; Hurvich, Moulines, and Soulier 2005). This papers contributions to the analysis of a (strictly) stationary long-memory, autoregressive, volatility process where 0≤d<1/2 (1/2≤d<1). When d≥1/2, the sequence {ψj} is no longer square-summable, hence, σv2 will be undefined (Granger and Joyeux 1980) and neither vt nor yt will be weakly stationary. However, it follows from the moving average representation of volatility


and the definition of the hypergeometric function F(d, 1, 1; B), that when d<1 the effect of the innovations, ηti, i=0, 1, …, dies out over time since the hypergeometric function, F(d, 1, 1; 1)=0, when d<1. In other words, vt is a mean reverting process even when 1/2≤d<1. Thus, vt and yt are strictly stationary processes when 1/2≤d<1. On the other hand, the hypergeometric function F(d, 1, 1; 1) diverges when d≥1, causing past shocks to vt to persistent infinitely into the future.

3 Bayesian analysis

Because yt is comprised of the two innovations ξt and ηt, the likelihood function consists of a mixture distribution over the latent v


where y=(y1, …, yT)′ and v1=(v1, …, vT)′. Because vt enters through the variance of the density of yt, f(y|β) has no analytical solution. It follows from the non-analytic nature of the likelihood that the SV-FIAR model has neither a maximum likelihood estimator of β, nor Bayesian posterior distribution, π(β|y)∝π(β)f(y|β), where π(β) is the parameter vector’s prior. Fortunately, this can be overcome with a Bayesian approach where the unknown β is augmented with the unobservable volatilities, v, and a Monte Carlo Markov chain (MCMC) sampling algorithm is designed to produce draws from π(β, v|y) (see Tanner and Wong 1987; Tierney 1994). Drawing β(i) and v(i) from π(β, v|y), where i=1, …, M represent the different draws, posterior inference can be made with the Monte Carlo approximation, π(β|y)=f(β|v,y)f(v|y)dvM1i=1Mf(β|v(i),y)f(v(i)|y) and by the law of iterative expectation, M1i=1Mβ(i)E[β|y], as M→∞ (see Gelfand and Smith 1990).

Designing a MCMC algorithm involves producing draws of β and v from the intractable posterior distribution π(β, v|y). This task can be made tractable by sampling first from the posterior of β conditional on a particular value of v, π(β|y, v), and then drawing v with a multi-state sampler of the posterior distribution of v conditional on the sampled value of β, in other words, π(v|y, β). By iteratively sampling from these conditional marginal distributions one produces draws from the joint posterior, π(β, v|y).[2]

Sampling from π(v|y, β) presents a challenge when v is a short-memory process (see Chib, Nardari, and Shephard 2002), but especially so when vt is a long-memory process. Faced with the computing and memory costs associated with sampling a long-memory process (see Chan and Palma 1998), we build on the sampler by Jensen (2004) for the fractionally integrated stochastic volatility model. Our sampler and that of Jensen both utilize the nearly independent behavior of the fractionally differenced process’s wavelet coefficients to efficiently sample the latent volatilities. The wavelet coefficients for the SV-FIAR process are independently distributed random variables with a variance equal the value of the power spectrum integrated over limits determined by the support of the wavelet transform.

4 Wavelet representation

Before we construct our multi-state sampler of the latent volatility’s wavelet coefficients, we first provide the necessary wavelet theory to understand how and why sampling in the wavelet domain algorithm is robust to nonstationary long-memory.[3] Wavelets are a redundant family of self-similar basis functions in the sense that all the basis functions have the same shape but are stretched, compressed, and time shifted versions of one another. Based on the idea of stretching and compressing the basis function wavelets are essentially a band-pass filter whose support at different dilations partitions the frequency space.

A wavelet is defined as any L2(ℜ) function, ψ(t), that satisfies the admissibility condition


where ψ^ denotes the Fourier transform of ψ. Since ψ is a square integrable function, the admissibility condition is satisfied only if ψ^(0)=0, i.e. ψ(t)dt=0 and ψ^ has sufficiently fast decay as |ω|→∞.[4] A useful pedagogical example of a wavelet that facilitate the findings of this paper is the ideal, high-bandpass, wavelet whose frequency response is

(6)ψ^(ω)={1if |ω|(π,2π],0otherwise.

Let ψj,n(t)=2j/2ψ(2jtn) represent the dilations and translations of the wavelet ψ, where jZ={0, ±1, ±2, …} is the scaling parameter and nZ is the translation parameter. It follows from the Fourier properties for dilated and translated functions that ψ^j,n(ω)=2j/2ei2jωnψ^(2jω) so that the frequency response of the dilated ideal, high-bandpass, wavelet in Eq. (6) equals

ψ^j,0(ω)={2j/2if |ω|(2jπ,2j+1π],0otherwise.

Because the Fourier transform of the translated wavelet ψj,n includes the complex exponential term, ei2jωn,ψ^j,n, does not equal ψ^j,0, but their supports are the same.

Daubechies (1988) shows {ψj,n}j,nZ, to be a complete orthonormal basis of L2 such that any x(t)∈L2(ℜ) can be represented as


where Wj,n=x(t)ψj,n(t)dt,j, nZ, are the wavelet coefficients of x(t).

Drawing on the ideal high-bandpass wavelet, the wavelet representation of x(t) in Eq. (7) essentially partitions the frequency space into the dyadic blocks, (±2jπ, ±2j+1π]. For each positive integer value of the scaling parameter, j, the dyadic frequency block moves down an octave (frequencies twice as small) from those with scale j–1, with half the range of frequencies but with log constant length. Small (large) values of j are associated with dyadic blocks of large (small) support over high (low) frequency octaves.

4.1 Discrete wavelet filter

In theory the wavelet coefficient, Wj,n, equals the convolution of x(t) with ψj,n, but as an empirical time series x(t) is only observed at a discrete set of observations. Empirically, Wj,n is calculated with a two-channel filter bank.[5] Thus, one never needs an analytical formula for ψ. Instead all the calculations are performed in terms of a filter bank of coefficients.[6]

A two-channel filter bank representation of the wavelet transform consists of the low-pass filter


where {gl}l=0L1 are non-zero filter coefficients of positive even length L. The matching high-bandpass filter to the wavelet transforms two-channel filter is


where hl=(–1)lgL–1–l. Daubechies (1988) provides a sufficient set of non-zero values for {hl}l=0L1 where ψ is compactly supported with the smallest possible support, L–1, for a wavelet possessing L/2 vanishing moments and whose regularity (number of derivatives and support size) increases linearly with L. Wavelets constructed with the filter coefficients of Daubechies are referred to the Daubechies class of wavelets with order L.

The low-pass filter coefficients, {gl}, are equivalent to a moving average filter that smooth the high frequency traits of the series. For the Daubechies class of wavelets these low-pass filter coefficients have the squared-gain function


As L increase 𝒢 converges to the squared-gain function of a ideal low-pass filter supported on [–π, π] (see Lai 1995).

The high-bandpass filter coefficients {hl} constitute a two-stage filter comprised of a first-state L/2-differencing operator and a second-stage weighted moving average. These two-stages are better understood in terms of the squared gain function of {hl}


where 𝒟L/2(ω)=(4sin2ω)L/2 is the square modulus of the transfer function for the L/2-order differencing operator, (1–B)L/2, and the squared gain function


is approximately a low-pass filter. As L increases ℋ approaches the squared gain function of an ideal, high-pass, filter with support on the octave, (π/2, π].

The function ϕ(·) is referred to as the scaling function since it does just that. Like the ideal, high-bandpass, wavelet, one can imagine a ideal, low-bandpass, scaling function whose frequency response is

ϕ^(ω)={1if ω[π,π],0otherwise.

Note that unlike the wavelet, ϕ(·) does not require any vanishing moments. However, we normalize ϕ(·) so that ϕ(t)dt=1, i.e. lgl2=1.

In a similar manner to the definition of ψj,n, define the dilations and translations of ϕ(ω) to be ϕj,n=2j/2ϕ(2jtn), where j, nZ. The filter bank definition of ψj,n can then be written as:


Using this high-bandpass filter definition of ψj,n it follows that:


where Vj,n=2j/2x(t)ϕ(2jn)dt is the scaling coefficient. Thus, computing Wj,n in this manner requires knowledge of {Vj−1,n}nZ.

Calculation of the scaling coefficient, Vj,n, can be performed by writing ϕj,n in terms of the lowpass filter as:


Convoluting x(t) with the above equation we find that Vj,n equals:


Thus, both Vj,n and Wj,n are calculated recursively from the smallest to the largest scale with the simple multiplication and addition operators of a two-channel filter bank.[7]

As an example of the two-channel filter wavelet transform suppose we observe the time series x(t) at t=1, 2, …, 2max. Let V0,n be the output from applying the low-pass filter to x(t) at the lowest possible scale, i.e. let V0,n=x(n) for n=1, …, 2max.[8] The value of Wj,n and Vj,n for j=1, 2, …, max and n=1, 2, …, 2max−m are recursively calculated from the x(t) s by applying over and over the filters of Eqs. (12) and (13).

This recursive algorithm known as the Fast Wavelet Transform is illustrated in Figure 1 where x=(x(1),,x(2max)),Vj=(Vj,1,,Vj,2maxj), and Wj=(Wj,1,,Wj,2maxj). The wavelet coefficients for any value of j represents the information lost when Vj−1 is filtered by {gl}, i.e. Wj contains the details or information needed to obtain Vj−1 from Vj. The box

in Figure 1 represents the decimation of the output from the filter by 2, i.e. discarding the observations with odd time stamps. By their definition the ideal, low, and high-bandpass, filters, ϕ(·) and ψ(·), include this decimation. Because the filters coefficients {gl} and {hl} are both applied to Vj, twice as many observations as the length of Vj are created. Only half the output from the two-channel filter is needed to completely represent or recover Vj.

Figure 1: Schematic representation of the Fast Wavelet Transform.
Figure 1:

Schematic representation of the Fast Wavelet Transform.

Because of the orthogonality of the filter banks for the Daubechies wavelet, the down arrows in Figure 1 can be reversed to synthesize x from its wavelet coefficients, Wj, j=1, …, max. In the wavelet synthesis, adding the details of Wj to the smoothed series Vj provides us with a representation of x at the next degree of resolution Vj−1 with the highest resoluteness being the actual series x.

Though not as fast the Fast Wavelet Transform in computing the wavelet coefficients, there exists a cleaner definition of Wj,n and Vj,n in terms of the series x(t) and the filter banks {gl} and {hl}. They are


where {hj,l} and {gj,l} are filter coefficients at scale j associated with the Daubechies class of wavelets and Lj=(2j–1)(L–1)+1. Note that for j=1, h1,l=hl and g1,l=hl. From the definition of Wj,n and Vj,n in Eqs. (12) and (13) and the squared-gain functions of Eqs. (10) and (11), if follows that the squared-gain functions of {gj,l} and {hj,l} equal


where ℋ1=ℋ and 𝒢1=𝒢. Since 𝒢j and ℋj are the product of dilated 𝒢 and ℋ, as L increases 𝒢j approaches the ideal low-pass filter with the support [–π/2j, π/2j] and ℋj approaches the ideal band-pass filter with support on the intervals (±π/2j, ±π/2j−1].

These squared-gain functions also have a nice intuitive flavor to them that follows from the FWT schematic in Figure 1. Skipping V1 and W1, because their squared-gain functions follow directly from (10) and (11), the scaling coefficients V2 and wavelet coefficients W2 are computed by, respectively filtering V1 by {gl} and {hl} and then decimating the filtered output by two. Since the filtered output of x has already been decimated by two in order to obtain V1, applying {gl} and {hl} to V1 is equivalent to applying filters with the squared-gain functions 𝒢(2ω) and ℋ(2ω). Since V1 is the output from filtering x with coefficients whose squared-gain function is 𝒢(ω), the squared-gain function for the filter of x with output V2 is equal to the product of the squared-gain function for the filter x to V1, and the squared-gain function of the filter V1 to V2; i.e. 𝒢2(ω)=𝒢(ω)𝒢(2ω). Likewise, a filter applied to x whose output is W2 has a squared-gain function equal to 𝒢(ω) ℋ(2ω); i.e. ℋ2. The squared-gain functions generalizes to Vj and Wj being the output from filtering Vj−1 with {gl} and {hl}, respectively.

4.2 Wavelet variance and covariance

Let vt be the process defined in Eq. (2) but whose fractional differencing parameter d is allowed to take on any positive real values; i.e. vt can be either (strictly) stationary or nonstationary. Denote by Wj(v) the 2max/j length wavelet coefficient vector of vt at scale j constructed with the Daubechies wavelet having L/2 vanishing moments. By Percival (1995) Theorem 1, the wavelet coefficients, W1,n(v), where n=1, …, 2max/2, will be a mean zero process with variance


From the findings of Section 2.2 we know the variance of vt does not exist when d≥1/2. However, when dL/2 the integrand [4 sin2ω]L/2−d is bounded by one so the variance of W1,n(v) will be finite. Because a wavelet requires L≥2, the Daubechies class of wavelets transforms vt into a stationary process of wavelet coefficients regardless of vt being stationary (d<1/2) or strictly stationary (1/2≤d<1). Furthermore, when d≥1, in other words, when vt is a nonstationary explosive process, W1,n(v) will be stationary with finite variance when the wavelet transform is performed with any L-ordered, Daubechies, wavelet where the number of vanishing moments, L/2, is greater than d. This stationarity result generalizes to wavelet coefficients having scale parameters j>1 where


See Percival (1995, Section 6).

From Eq. (18) the variance of W1,n is independent of the value n; i.e. all the wavelet coefficients of equal scale j have the same variance. Wavelet coefficients of the same scale j are thus homoskedastic but hetroskedastic over different scales. Empirical work with Wj,n thus requires computing this variance at each scale, j, using a analytic formula similar to Eq. (18). Given the summation operator this would involve numerically evaluating the integral a number of times for different values of l. A very computationally costly calculation.

An alternative method to computing the variance of Wj,n(v) is to recall that ℋj is approximately equal to the squared-gain function of an ideal bandpass filter supported on the octave (π/2j, π/2j−1]. Employing this approximation in the calculation of the wavelet variances leads to numerically evaluating:


This approximation becomes more exact as L increases, or in other words, the number of vanishing moments L/2 increases. However, simulation studies have shown that even for small values of L this approximation is very good (Percival and Walden 2000, Ch. 8).

Drawing on the ideal bandpass nature of the wavelet, McCoy and Walden (1996) and Jensen (1999, 2000) show that, like the Fourier function, the wavelet is a near diagonalizing operator to a stationary ARFIMA processes covariance matrix. If the number of vanishing moments of the wavelet exceed the differencing parameter d the wavelet coefficient’s covariance between and across scales is equal to zero. As a result we can treat the wavelet coefficients Wj,n(v) as being a nearly independent stationary process with mean zero and variance approximately equal to Eq. (19).

5 Bayesian analysis with wavelets

The SV-FIAR model can be converted into a infinite dimensional, state-space model by squaring the returns, yt, and applying the logarithmic transformation, to obtain the SV-FIAR model’s linear offset representation:


where yt=log(yt2+c),zt=logξt2+1.2704, and logξt2 is distributed logχ(1)2 with mean –1.2704 and variance π2/2. The offset constant c is introduced to reduce the explosive nature of the log-squared returns that occurs when returns are close to or equal to zero (see Fuller 1996, pp. 495–496; Kim, Shephard, and Chib 1998). Throughout this article we will stick to a offset representation with c=0.0005.

We project the SV-FIAR linear offset series, yt, into the time-scale space of the wavelet domain and arrive at the linear relationship between the wavelet coefficients of y* and v:


where j=1, …, max≡[log2T], with [·] being the integer portion of the argument, n=1, …, T/2j, and Wj,n(y),Wj,n(logσ2), Wj,n(v) and Wj,n(z) are the wavelet coefficients of y*, log σ2, v and z, respectively. Note that when the number of observations T is a power of two the number of dilated and translated wavelet coefficients equals T–1. For convenience we assume that T is always a power of 2, however, when the data set is not a power of two in length we follow the practice of Ogden (1997) and pad the actual data up to a power of two by appending in reverse the observed data.

Being a constant unknown parameter, log σ2 wavelet coefficients are all zero.[9] Hence, we drop Wj,n(logσ2) from Eq. (21) and write Wj,n(y) as


As a “near” diagonalizing operator the wavelet transformed latent volatilities are normally distributed[10]


where W(v)=(W1(v),,Wmax(v)),. From the results of Section 4.2 on the wavelet variances and covariances of a fractionally integrated, autoregressive process, the wavelet coefficients covariance matrix is approximately



(24)σj2=2j+1ση22π2jπ2j+1π1|1ϕeiω|2|1eiω|2ddω,   j=1,,max,

and Ij is the T/2j×T/2j identity matrix.

By projecting y* into the time-scale space of the wavelet basis, Section 3 objective of drawings from π(β, v|y) changes to designing a MCMC sampler that draws β=(ϕ,d,ση2) and W(v) from the SV-FIAR wavelet domain posterior distribution, π(β,W(v)|W(y)), given the observed log-squared returns entire vector of wavelet coefficients, W(y). The near independent, multivariate, Gaussian distribution of W(v) affords us an easy and efficient way of sampling the latent volatilities. Instead of inefficiently drawing the highly correlated v, the volatilities wavelet coefficient vector, W(v), can be quickly and efficiently sampled from the posterior


where f(W(y)|Wj,n(v),β) is the likelihood function and the conditional prior π(Wj,n(v)|β) is defined by Eq. (23). We will detail the first and second moments of this conditional normal posterior distribution in Section 5.3, but for now it should be clear that the independent wavelet coefficients, Wj,n(v), enables us to sidestep the difficult task of sampling the highly correlated vt.

5.1 Sampling distribution

The conditional distributions, β|W(y),W(v) and W(v)|W(y),β are intractable since the sampling distribution of Wj,k(y) is not normal. Unlike the time-domain where the zt in Eq. (20) is known to be distributed log-chi-square, the distribution of Wj,k(z) is not known. Thus, we are unable to use the method of Kim, Shephard, and Chib (1998) to set the parameters of a mixture of normal distributions to match the first four moments of π(Wj,k(z)). Instead, we choose to use a Bayesian semiparametric estimator with a Dirichlet process prior to estimate and fix the approximate mixture distribution representation of π(W(z)).[11]

A sample of z with T=4096 observations is generated by applying the log-square transformation to a random draw from the standard normal distribution and adding 1.2704 to the simulate series. The wavelet transform is then applied to z to produce a realization of W(z). A nonparametric Bayesian estimator with a Dirichlet process prior applied to W(z) results in a second-order mixture of normal distributions where the mixture probabilities are π=(0.798, 0.202)′ and the means and variances of the two mixture clusters are μ=(0.269, –0.994)′ and ν=(1.732, 3.245)′, respectively. Thus, the date generating process for W(z) is approximated by the normal mixture distribution


Under this mixture representation of Wj,k(z) distribution, the likelihood for W(y) equals


where sj,n=1, 2, such that π2≡Prob(Sj,n=1)=0.798 and π2≡Prob(sj,n=2)=0.202, is the mixture cluster assignment variable and the assignment vector is s={sj,n}, where j=1, …, max and n=1, …, 2max/j. As the assignment variable of the j-scale, n-translated, wavelet coefficient of y* to a particular mixture cluster, the sj,n are unknown binomial distributed random variables whose prior is the above discrete mixture probabilities.

5.2 Prior

To complete our Bayesian sampler we still need to specify the form of the prior π(β). We first assume that all the unknown parameters are mutually independent; i.e.


To ensure the value of d can produce both stationary and nonstationary, long-memory behavior in volatility we choose an non-informative, uniform prior for π(d) with the support (0, 2). For the autoregressive parameter ϕ we also choose a non-informative prior that is uniform over the region where autoregressive processes are stationary, |ϕ|<1. Lastly, we assume our initial knowledge concerning ση2 is accurately reflected by the diffuse, inverse gamma distribution


with hyperparameters δ0=2 and v0=0.02.

5.3 Sampler

To sample from the desired joint posterior distribution, β,W(v),s|W(y), we design a MCMC sampler that generalizes the Metropolis-Hastings (MH) algorithm found in Jensen (2004). Here we compute the latent wavelet coefficient’s variances by numerically evaluating the integral found in Eq. (24). Our hybrid MCMC algorithm is as follows

  1. Initialize s, β, W(v).

  2. Jointly sample β and W(v) from β,W(v)|W(y),s by drawing,

    1. β from β|W(y),s, and then drawing,

    2. W(v) from W(v)|W(y),β,s.

  3. Sample s from s|W(y),W(v).

  4. Repeat Steps 2 and 3.

The MCMC algorithm begins in Step 1 with the initialization of the mixture distribution’s assignment vector, s, parameter vector, β, and latent volatilities wavelet coefficients, W(v). Step 2 is the method of composition approach to jointly sampling, β and W(v). In Step 2(a), β is sampled from β|W(y),s, where W(v) has been marginalized out, and in Step 2(b), W(v) is drawn from W(v)|W(y),s. This reduced blocking scheme of drawing β,W(v)|W(y),s eliminates the functional dependency that exists between W(v) and its parameters, β. In contrast, sampling from β|W(y),W(v),s, and W(v)|W(y),β,s, produces draws of W(v) and β that are strongly correlated. Lastly, Step 3 in the above sampler, involves independently sampling the sj,n, j=1, …, max, n=1, …, 2max−j, from the set {1, 2} where the probability of selecting the ith element equals

P(sj,n=i|Wj,n(y),Wj,n(v))πifN(Wj,n(y)|Wj,n(v)+μi,νi),   i=1,2,

where π1=0.789 and π2=0.202.

The conditional distribution, β|W(y),s, in Step 2(a) is proportional to


where from the likelihood function in Eq. (25)


with σj2 being defined in Eq. (24).

Eq. (27) is a nonstandard distribution and hence, draws from β|W(y),s cannot be made from a standard distribution. We choose to employ the tailored Metropolis-Hasting (MH) algorithm to sample this intractable distribution.[12] With the tailored MH sampler we make a candidate draw of β|W(y),s by sampling from the multivariate Student-t density ft(·|m, (η–2)V, η), whose mean vector is m, with scale matrix, (η–2)V, and η degrees of freedom. In practice we set the Student-t mean and covariance equal to


whose values are obtained with the quasi-Newton maximization algorithm developed by Broyden, Fletcher, Goldfarb, Shanno initialized at the value of β from the MCMC samplers previous sweep. The degrees of freedom is set to η=10.

Denote the candidate draw from the Student-t distribution as β′. The draw, β′, is accepted as a realization from β|W(y),s with Metropolis-Hastings probability


where β is the value of the parameter from the previous sweep. In other words, β′ will be accepted with probability α(β,β|W(y),s) as a realization from β|W(y),s, or conversely, β will be kept with probability 1α(β,β|W(y),s) as the sampler’s current realization.

Because of the wavelets approximate independent nature, draws in Step 2(b) from the multivariate distribution W(v)|W(y),β,s can be sampled individually from the tractable univariate distributions


where π(Wj,n(v)|β)N(0,σj2) and the likelihood function is


for j=1. …, max, and j=1, …, 2max−j. After completing the square the conditional latent wavelet coefficients is found to be independently distributed



W¯j,n=(1/σsj,n21/σsj,n2+1/σj2)W¯j,n(y),   υjn2=11/σsj,n2+1/σj2,

and W¯j,n(y)=Wj,n(y)μsj,n.

6 Applications

In this section we report the results from our MCMC algorithm when applying it to simulated and empirical daily stock return data. Daubechies (1992, p. 198) least asymmetric wavelet with L=8 nonzero filter coefficients (four vanishing moments) is used in every instance. A number of issues need to be taken into consideration when choosing the particular family of wavelets and its number of vanishing moments; issues like boundary effects, possible statistical artifacts caused by a wavelet with a large number of vanishing moments, and how well the wavelet approximates an ideal bandpass filter. We choose the Daubechies least asymmetric class of wavelets with four vanishing moments because it favorably addresses each of these items while possessing the extraordinary theoretical property of diagonalizing a long-memory process’s covariance matrix (Percival and Walden 2000, pp. 346–349; Whitcher, Guttorp, and Percival 2000).

6.1 Quasi-returns

We generate and estimate a collection of SV-FIAR models having T=4096 observations and consisting of 12 different combinations of the autoregressive and differencing parameter values; ϕ=0, 0.5, 0.75, 0.99 and d=0, 0.45, 0.6, 1.0.[13] These particular values of ϕ and d generate a wide variety of time series behavior in the volatility process, vt, including weak to strong persistent autoregressive dynamics, stationary and strict stationary, long-memory, and explosive, nonstationary unit root behavior. Each of the 12 simulated return series is generated using the instantaneous volatility value of σ2=1 and the volatility of volatility ση2=0.01. In robustness tests, we found that the estimates of d and ϕ were unaffected by larger values for σ and ση but we choose not to report them here.

Table 1 lists the parameter estimates and their summary statistics for the 12 different data generating SV-FIAR models. Each simulated SV-FIAR model result is separated in the table by horizontal lines. To encourage convergence of the MCMC algorithm to the target posterior density, for each simulated series, only the last 5000 draws of β and W(v) out of 6000 MCMC sweeps are used to make inference concerning the parameter values. In the second through sixth column of Table 1 we list the model’s true parameter values, ση2,d and ϕ1, their posterior mean which equal the sample mean of the MCMC draws, their numerical standard error (NSE), the sampler’s inefficiency measure (Ineff.) for the particular parameter, and the 5 and 95 percentiles (in parenthesis) of the parameter’s posterior distribution. Finishing out the table in the seventh column is the tailored Metropolis-Hasting rejection rate, 1–α(β, β′), for the candidate draw from ββ|W(y),s.

Table 1:

Bayesian estimates (6000 draws with the first 1000 draws discarded) of the autoregressive and fractional differencing parameter for a simulated SV-FIAR model where NSE is the numerical standard error, Ineff. is the inefficiency factor, 90% Conf. denotes the 5 and 95 percentiles of the posterior distribution, and 1–α(β, β′) is the MH rejection rate.

yt=exp{vt/2}ξt,(1ϕB)(1B)dvt=0.01ηt,t=1, …, 4096
TruthMeanNSEIneff.90% Conf.1–α(β, β′)
d00.19370.03176.180(0.0630, 0.3372)0.375
ϕ0.990.97150.00077.593(0.9489, 0.9870)
d1.01.01560.01877.185(0.9140, 1.1232)0.453
ϕ00.30800.28479.469(–0.0510, 0.6187)
d0.450.24500.098710.367(0.0384, 0.4094)0.427
ϕ0.750.87000.029114.486(0.7812, 0.9479)
d0.450.36310.148217.405(0.1538, 0.5174)0.516
ϕ0.50.58730.358118.247(0.3018, 0.8630)
d0.450.40820.01572.592(0.2368, 0.5487)0.284
ϕ0–0.12770.16672.734(–0.5802, 0.3947)
d0.60.49250.04416.677(0.3255, 0.6476)0.299
ϕ0.750.83870.01839.588(0.7330, 0.9141)
d0.60.49900.160719.875(0.2862, 0.6424)0.538
ϕ0.50.72420.225826.533(0.5649, 0.8919)
d0.60.62110.02315.766(0.4949, 0.7410)0.371
ϕ00.05350.36887.364(–0.3726, 0.4621)

Note that some of the results are from misspecified models when the true value of either ϕ or d is zero.

The inefficiency measure indicates how slow our MCMC sampler converges to the target posterior density by measuring the level of dependency between the drawn βs. The higher the inefficiency the greater the number of draws of β needed to reach a particular level of Monte Carlo accuracy. The inefficiency measure for a parameter is:


where K(·) is Parzen’s filter (see Percival and Walden 1998, p. 265), ρ(·)is the sample autocorrelation function of the drawn parameter, N is the number of draws (N=5000), and J is the largest lag at which the autocorrelation function is computed (In the simulations below we find that the MCMC draws correlogram decays so rapidly that J=100 is adequate.).

The measure of inefficiency quantifies the loss associated with using correlated draws to make inference concerning the parameter as opposed to truly independent draws. This inefficiency measure for each unknown is also used to compute the numerical standard error (NSE) of the parameter’s posterior mean by taking the square root of the product between the parameter’s inefficiency and the sample variance of its draws (Geweke 1992).

The parameter inefficiency measures and MH rejection rates in Table 1 are indicative of a MCMC sampler that is producing weakly correlated draws from over the entire support of the posterior distribution. In the simulations the efficiency measures are no bigger than 30 and in most cases less than ten. Small inefficiency measures support our decision to jointly sample the latent volatilities wavelet coefficients and the parameters with the two step composition approach of first sampling from β|W(y),s with the tailored MH algorithm followed by individually drawing the latent volatilities wavelet coefficients.

The tailored MH portion of the sampler mixes well with the conditional sampler of the parameter vector accepting nearly half of the candidate draws of d and ϕ. Acceptance rates of this size for a distribution with the dimension of β|W(y),s are indicative of a accurate Metropolis-Hastings algorithm that has converged and is drawing from the desired target distribution. It also shows that the sampler is not getting stuck in any particular region of the distribution but is spanning the distributions entire domain.

From the parameter estimates reported in Table 1 our Bayesian estimator is capable of distinguishing between a unit root (d=1) in volatility and strongly persistent autoregressive behavior (ϕ=0.99). When the simulated returns are generated from a SV-FIAR model with d=0 and ϕ=0.99, the sampler correctly assigns the autoregressive persistence to ϕ and not d by finding ϕ to equal 0.9715.[14] At the other end of the persistence spectrum, when the return series is generated with a volatility process where d=1 and ϕ=0, our estimate of the differencing parameter captures the nonstationary unit root behavior of volatility by estimating d=1.0156 and ϕ=0.3080.

The ability to distinguish between, and assign autoregressive and long memory persistence to their rightful parameter, also holds when ϕ and d move away from the extreme cases of d=1, ϕ=0, and d=0, ϕ=0.99. However, as the level of persistence assigned to ϕ increases a portion of the long-memory dynamics gets assigned by the estimator to ϕ and not to d. This reallocation of persistence away from d to ϕ causes d to be under estimated. Examples of this are seen in the quasi-returns where ϕ=0.75. When volatility is generated with ϕ=0.75 and d=0.45, the fractional differencing parameter estimate is d=0.245. When the true value of d is increased to 0.6, our estimate of d also increases but only to 0.4925. This is still less than the true value but noticeably less than when the true factional differencing parameter was d=0.45.

Under estimating d also occurs, but to a smaller degree, when volatility is simulated with ϕ=0.5. Estimates of d when volatility is generated with ϕ=0.5 and d=0.45, 0.6 are, respectively, 0.3631 and 0.499. The only simulated series where the fractional differencing parameter is over estimated occurs when volatility is void of a autoregressive term (ϕ=0). Even though misspecified with a non-zero autoregressive parameter, the estimate of d and ϕ are still very close to their true values. Hence, the simulation results suggest our estimator is capturing some of the strongly persistent, low frequency, long-memory dynamics of volatility in its estimate of ϕ. The residual low frequency energy not estimated by ϕ is then captured in the estimate of d.

Our wavelet based estimator of the fractional differencing parameter of latent volatility is also robust to values of d that lead to volatility being nonstationary; i.e. when d is greater than 1/2. For each of the quasi SV-ARFI returns simulated with d=0.6, 1.0 the 90% posterior probability interval of d contains the true value of d. In addition, when volatility is simulated with a nonstationary, long-memory model with no short-memory autoregressive term the SV-FIAR model’s posterior mean of d is within 0.02 of its true value. From these findings one can set aside their worries about over differencing the log-squared returns when estimating a nonstationary, stochastic volatility process with our time-scale wavelet space estimator.

The posterior draws {d(i)} and {ϕ(i)} also provide good inference concerning the autoregressive and long-memory dynamics of volatility. In only three out the 12 simulated SV-FIAR return series does the 90% Bayesian probability interval fail to contain the true parameter value. Two of these are the probability intervals of ϕ when the true autoregressive parameter is 0.99 and 0.5. The third is the probability interval for d when the fractional differencing parameter is 0.45 and the autoregressive parameter was 0.75. Even in these three instances the 90% probability interval for the other parameter contains that parameters actual value. So the posterior uncertainty around the parameter estimate can, in general, be trusted to separate out autoregressive dynamics from long-memory.

6.2 Market returns

In this section we report the results from applying our nonstationary robust, Bayesian estimator of the SV-FIAR model to the continuously compounded daily returns of the Center for Research in Security Prices (CRSP) value weighted portfolio index. The returns are corrected for the effects of stock splits and dividends and consist of 8192 observations from July 22, 1974 to December 29, 2006.[15]

In Table 2 we report the summary statistics of the MCMC draws for the SV-FI model, where the autoregressive parameter ϕ is set equal to zero, and the SV-FIAR model. A total of 6000 sweeps of the MCMC algorithm are made of which the final 5000 draws are used in our posterior analysis of d and ϕ. The MH-rejection rates in Table 2 suggest the β sampler is mixing in a similar manner to the quasi-return cases. The rejection rate is 21% for the SV-FI model and 31% when ϕ is estimated. By being small the inefficiency measures for d and ϕ are again giving us confidence that our sampler is working efficiently and producing draws from the entire posterior distribution.

Table 2:

SV-FIAR model estimates for compounded daily returns of the Center for Stock Return Prices value weighted portfolio over the period from July 22, 1974 to December 29, 2006 (T=8192).

ModelβMeanNVARIneff.90% Conf.1–α(β, β′)
SV-FId0.58960.00181.8660(0.5291, 0.6505)0.2053
SV-FIARd0.65650.00503.2559(0.5799, 0.7336)0.3133
ϕ–0.56340.09074.5891(–0.7945, –0.2455)

In Figure 2 the posterior draws of d for the SV-FI model are plotted, along with the histogram, and the first 20 lags of the draws correlogram. Figure 3 graphs the posterior draws, histogram and lagged autocorrelations for the d and ϕ from the SV-FIAR model. Since the efficiency factors are less than 5, the decay in the correlograms of Figures 2 and 3 show very little correlations between the parameter draws. This give us comfort that our sampler has converged to the underlying posterior distribution and statistical inference on the unknown value of d and ϕ is possible with these draws.

Figure 2: The 5000 MCMC draws of d, their histogram and correlogram for the SV-FI model of daily compounded returns of the Center for Stock Return Prices value weighted portfolio over the period from July 22, 1974 to December 29, 2006 (T=8192).
Figure 2:

The 5000 MCMC draws of d, their histogram and correlogram for the SV-FI model of daily compounded returns of the Center for Stock Return Prices value weighted portfolio over the period from July 22, 1974 to December 29, 2006 (T=8192).

Figure 3: The 5000 MCMC draws of d and ϕ1, their histograms and correlograms for the SV-FIAR model of daily compounded returns of the Center for Stock Return Prices value weighted portfolio over the period from July 22, 1974 to December 29, 2006 (T=8192).
Figure 3:

The 5000 MCMC draws of d and ϕ1, their histograms and correlograms for the SV-FIAR model of daily compounded returns of the Center for Stock Return Prices value weighted portfolio over the period from July 22, 1974 to December 29, 2006 (T=8192).

From the posterior draws of the SV-FI model’s fractional differencing parameter the density of the posterior distribution of d is centered around 0.5896 with a 90% Bayesian probability interval of (0.5291, 0.6505). Such values for the fractional differencing parameter suggests volatility is more persist than had previously been thought, so much so that volatility is a mean-reverting, long-memory process but whose variance is infinite.

When the autoregressive coefficient is included in the model, the value of d increases. This suggests volatility of the stock market is even more persistent and nonstationary than inferred by the SV-FI model. Past short-memory, auto-regresssive, stochastic volatility models of the equity market have generally found ϕ to be near one. Hence, one might expect that by including the autoregressive parameter ϕ in the SV-FIAR model it would take away some of the persistence associated with the fractionally differencing operator. However, the opposite occurs. In the SV-ARFI model d increases to 0.66 and the 90% probability interval of d lies over the larger values of (0.59, 0.73). The posterior mean of the autoregressive ϕ, equals –0.56 and the 90% probability interval of (–0.79, –0.25) infers a short-memory dynamics opposite to that of persistence. The histograms of ϕ in Figure 3 suggest its posterior distribution is not even supported over positive values and hence, persistent dynamics.

Our negative estimate of ϕ stands in contrast to the findings of Bollerslev and Mikkelsen (1996), who also find the conditional hetroskedasticity of daily stock market returns following a non-stationary, long-memory, process but with an autoregression parameter that is positive. Unlike the approach we take here of first fitting a long-memory model of the conditional variance and then adding the autogressive term, Bollerslev and Mikkelsen first estimate a short-memory autogressive model finding, as expected, ϕ to be very close to one. When they include the fractional differencing operator in their model it captures the non-stationary persistent behavior of the conditional hetroskedasticity in an estimate of d greater than one-half. This comes at the expense of ϕ whose value declines but continues to be positive and large relative to the unit circle. Hence, in Bollerslev and Mikkelsen (1996) the highly persistent dynamics of conditional hetroskedasticity is captured by both the fractional differencing operator and the autoregressive term.

One possible explanation for our ϕ being opposite in sign to Bollerslev and Mikkelsen (1996) is found in their finite series representation of the fractional differencing operators infinite series. By truncating the fractional differencing operator they implicitly assume shocks associated with the dropped terms have no effect on current levels of volatility. The implications of this is there will be less persistence associated with the truncated series relative to the actual fractional differencing operator. Since the persistence associated with the autoregressive term is uneffected by the truncation, the autogressive term picks up the slack and tries to capture this lost persistence through a larger value of ϕ.

Our approach to estimating d does not appoximate the fractional differencing operator. Instead, it uses the entire wavelet representation of the fractional differening operator so that our estimator of d captures all of the persistence associated with the fractional differencing operator. Our approach leaves no leftover long-memory behavior in the volatility for the autogressive term to pick up. The ϕ estimated with our wavelet domain approach is able to model the short term dynamics of volatility and is not contaminated by having to mop up any left over long-memory behavior lost from approximating the fractional differencing operator. With our results, we can safely say market volatlity is a highly persistent, nonstationary, mean-reverting process, whose short-run dynamics consist of a back and forth, flipping dynamic.

7 Conclusion

In this paper we have designed a quick and efficient Markov chain Monte Carlo sampler of the fractional differencing and autoregressive parameters from the posterior distribution of a weakly and strictly stationary SV-FIAR model. The key contributions of our MCMC sampler is the joint estimation of the autoregressive and fractional differencing parameter from within the wavelet domain, the ability of the Bayesian, wavelet domain, estimator to handle fractionally integrated volatility that are either weakly stationary processes with finite variance, or strictly stationary processes whose variance is infinite but still mean-reverting. Simulations show that the proposed MCMC sampler is a reliable estimator of the SV-FIAR model’s unknown parameters. Applying the SV-FIAR model’s Bayesian, wavelet domain estimator to CRSP market returns shows the long-memory observed in the volatility of stock market returns is of the highly persistent, infinite variance, but mean-reverting type that comes from the fractional differencing parameter being larger than 1/2.

Corresponding author: Mark J. Jensen, Federal Reserve Bank of Atlanta – Research-Georgia, USA, e-mail:


Andersen, T., T. Bollerslev, F. X. Diebold, and H. Ebens. 2001a. “The Distribution of Realized Stock Return Volatility.” Journal of Financial Economics 61: 43–76.10.1016/S0304-405X(01)00055-1Search in Google Scholar

Andersen, T., T. Bollerslev, F. X. Diebold, and P. Labys. 2001b. “The Distribution of Realized Exchange Rate Volatility.”Journal of the American Statistical Association 96: 42–55.10.1198/016214501750332965Search in Google Scholar

Andersen, T., Bollerslev, T., Diebold, F. X., and P. Labys. 2003. “Modeling and Forecasting Realized Volatility.” Econometrica 71: 529–626.10.3386/w8160Search in Google Scholar

Arteche, J. 2004. “Gaussian Semiparametric Estimation in Long Memory in Stochastic Volatility and Signal Plus Noise Models.” Journal of Econometrics 119: 131–154.10.1016/S0304-4076(03)00158-1Search in Google Scholar

Bae, S. K., M. J. Jensen, and S. G. Murdock. 2005. “Long-run Neutrality in a Fractionally Integrated Model.” Journal of Macroeconomics 27: 257–274.10.1016/j.jmacro.2003.11.019Search in Google Scholar

Baillie, R. T. 1996. “Long-memory Processes and Fractional Integration in Econometrics.” Journal of Econometrics 73: 5–59.10.1016/0304-4076(95)01732-1Search in Google Scholar

Baillie, R. T., T. Bollerslev, and H. O. Mikkelsen. 1996. “Fractionally Integrated Generalized Autoregressive Conditional Hetroskedasticity.” Journal of Econometrics 74: 3–30.10.1016/S0304-4076(95)01749-6Search in Google Scholar

Beran, J. 1994. Statistics for Long-Memory Processes. New York: Chapman & Hall.Search in Google Scholar

Bollerslev, T., and H. O. Mikkelsen. 1996. “Modeling and Pricing Long Memory in Stock Market Volatility.” Journal of Econometrics 73: 151–184.10.1016/0304-4076(95)01736-4Search in Google Scholar

Bollerslev, T., and H. O. Mikkelsen. 1999. “Long-term Equity Anticipation Securities and Stock Market Dynamics.” Journal of Econometrics 92: 75–99.10.1016/S0304-4076(98)00086-4Search in Google Scholar

Breidt, F. J., N. Crato, and P. de Lima. 1998. “On the Detection and Estimation of Long-memory in Stochastic Volatility.” Journal of Econometrics 83: 325–348.10.1016/S0304-4076(97)00072-9Search in Google Scholar

Brockwell, P. J., and R. A. Davis. 1993. Time Series: Theory and Methods. 2nd ed. New York: Springer.Search in Google Scholar

Chan, N. H., and W. Palma. 1998. “State Space Modeling of Long-memory Processes.” Annals of Statistics 26: 719–740.10.1214/aos/1028144856Search in Google Scholar

Cheung, Y. W., and K. S. Lai. 1993. “A Fractional Cointegration Analysis of Purchasing Power parity.” Journal of Business and Economic Statistics 11: 103–112.Search in Google Scholar

Chib, S. 2001. “Markov Chain Monte Carlo Methods: Computation and Inference.” In: Handbook of Econometrics, Vol. 5, edited by J. J. Heckman and E. Leamer, 3569–3649. Amsterdam: Elsevier Science.10.1016/S1573-4412(01)05010-3Search in Google Scholar

Chib, S., and E. Greenberg. 1995. “Understanding the Metropolis-Hastings Algorithm.” American Statistician 49: 327–335.Search in Google Scholar

Chib, S., and E. Greenberg. 1996. “Markov Chain Monte Carlo Simulation Methods in Econometrics.” Econometric Theory 12: 409–431.10.1017/S0266466600006794Search in Google Scholar

Chib, S., and E. Greenberg. 1998. “Analysis of Multivariate Probit Models.” Biometrika 85: 347–361.10.1093/biomet/85.2.347Search in Google Scholar

Chib, S., F. Nardari, and N. Shephard. 2002. “Markov Chain Monte Carlo Methods for Stochastic Volatility Models.” Journal of Econometrics 108: 281–316.10.1016/S0304-4076(01)00137-3Search in Google Scholar

Daubechies, I. 1988. “Orthonormal Bases of Compactly Supported Wavelets.” Communication in Pure Applied Mathematics 41: 909–996.10.1137/1.9781611970104.ch6Search in Google Scholar

Daubechies, I. 1992. Ten Lectures on Wavelets. Philadelphia: SIAM.10.1137/1.9781611970104Search in Google Scholar

Deo, R. S., and C. M. Hurvich. 2001. “On the Log Periodogram Regression Estimator of the Memory Parameter in Long-memory Stochastic Volatility Models.” Econometric Theory 17: 686–710.10.1017/S0266466601174025Search in Google Scholar

Ding, Z., C. W. J. Granger, and R. F. Engle. 1993. “A Long Memory Property of Stock Market Returns and a New Model.” Journal of Empirical Finance 1: 83–106.10.1016/0927-5398(93)90006-DSearch in Google Scholar

Frederiksen, P., and M. Ø. Nielsen. 2008. “Bias-reduced Estimation of Long-memory Stochastic Volatility.” Journal of Financial Econometrics 6: 496–512.10.1093/jjfinec/nbn009Search in Google Scholar

Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley.10.1002/9780470316917Search in Google Scholar

Gallegati, M., and W. Simmler. 2014. Wavelet Applications in Economics and Finance. New York: Springer.10.1007/978-3-319-07061-2Search in Google Scholar

Gelfand, A. E., and A. F. M. Smith. 1990. “Sampling-based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association 85: 398–409.10.21236/ADA208388Search in Google Scholar

Geweke, J. 1992. “Evaluating the Accuracy of Sampling-based Approaches to the Calculation of Posterior Moments.” In: Bayesian Statistics, edited by J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, 169–193. New York, NY: Oxford University.10.21034/sr.148Search in Google Scholar

Gradshteyn, I. S., and I. M. Ryzhik. 1994. Table of Integrals, Series, and Products, translated from the Russian. Translation fifth edited by A. Jeffrey, San Diego: Academic Press.Search in Google Scholar

Granger, C. W. J., and Z. Ding. 1996. “Varieties of Long Memory Models.” Journal of Econometrics 73: 61–77.10.1016/0304-4076(95)01733-XSearch in Google Scholar

Granger, C. W. J., and R. Joyeux. 1980. “An Introduction to Long-memory Time Series Models and Fractional Differencing.” Journal of Time Series Analysis 4: 221–237.10.1017/CCOL052179207X.018Search in Google Scholar

Harvey, A. C. 1998. “Long-memory in Stochastic Volatility” In: Forecasting Volatility in the Financial Market, edited by J. Knight, and S. Sachell. London: Butterworth-Heineman.10.1016/B978-075066942-9.50018-2Search in Google Scholar

Harvey, A. C., and N. Shephard. 1996. “Estimation of an Asymmetric Stochastic Volatility Model for Asset Returns.” Journal of Business and Economic Statistics 14: 429–434.10.1080/07350015.1996.10524672Search in Google Scholar

Hirano, K. 2002. “Semiparametric Bayesian Inference in Autoregressive Panel Data Models.” Econometrica 70: 781–799.10.1111/1468-0262.00305Search in Google Scholar

Hosking, J. R. M. 1981. “Fractional Differencing.” Biometrika 68: 165–176.10.1093/biomet/68.1.165Search in Google Scholar

Hosking, J. R. M. 1984. “Modeling Persistence in Hydrological Time Series Using Fractional Differencing.” Water Resources Research 20: 1898–1908.10.1029/WR020i012p01898Search in Google Scholar

Hurvich, C. M., and B. K. Ray. 2003. “The Local Whittle Estimator of Long-memory Stochastic Volatility.” Journal of Financial Econometrics 1: 445–470.10.1093/jjfinec/nbg018Search in Google Scholar

Hurvich, C. M., E. Moulines, and P. Soulier. 2005. “Estimating Long Memory in Volatility.” Econometrica 73: 1283–1328.10.1111/j.1468-0262.2005.00616.xSearch in Google Scholar

In, F., and S. Kim. 2013. An Introduction to Wavelet Theory in Finance: A Wavelet Multiscale Approach. Singapore: World Scientific Publishing.10.1142/8431Search in Google Scholar

Jensen, M. J. 1999. “An Approximate Wavelet MLE of Short and Long-memory Parameters.” Studies in Nonlinear Dynamics and Econometrics 3: 239–353.10.2202/1558-3708.1051Search in Google Scholar

Jensen, M. J. 2000. “An Alternative Maximum Likelihood Estimator of Long-memory Processes Using Compactly Supported Wavelets.” Journal of Economic Dynamics and Control 24: 361–387.10.1016/S0165-1889(99)00010-XSearch in Google Scholar

Jensen, M. J. 2004. “Semiparametric Bayesian Inference of Long-memory Stochastic Volatility.” Journal of Time Series Analysis 25: 895–922.10.1111/j.1467-9892.2004.00384.xSearch in Google Scholar

Jensen, M. J., and J. M. Maheu. 2010. “Bayesian Semiparametric Stochastic Volatility Modeling.” Journal of Econometrics 157: 306–316.10.1016/j.jeconom.2010.01.014Search in Google Scholar

Kim, S., N. Shephard, and S. Chib. 1998. “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models.” Review of Economic Studies 65: 361–393.10.1111/1467-937X.00050Search in Google Scholar

Lai, M. J. 1995. “On the Digital Filter Associated with Daubechies’ Wavelets.” IEEE Transactions on Signal Processing 43: 2203–2205.10.1109/78.414786Search in Google Scholar

Mallat, S. 1999. A Wavelet Tour of Signal Processing. 2nd ed. New York: Academic Press.10.1016/B978-012466606-1/50008-8Search in Google Scholar

McCoy, E. J., and A. T. Walden. 1996. “Wavelet Analysis and Synthesis of Stationary Long-memory Processes.” Journal of Computational and Graphical Statistics 5: 26–56.Search in Google Scholar

Nelson, D. B. 1991. “Conditional Hetroskedasticity in Asset Pricing: A New Approach.” Econometrica 59: 347–370.10.2307/2938260Search in Google Scholar

Ogden, R. T. 1997. “On Preconditioning the Data for the Wavelet Transform when the Sample Size is Not a Power of Two.” Communication in Statistics B 26: 267–285.Search in Google Scholar

Percival, D. B. 1995. “On Estimation of the Wavelet Variance.” Biometrika 82: 619–631.10.1093/biomet/82.3.619Search in Google Scholar

Percival, D. B., and A. T. Walden. 1998. Spectral Analysis for Physical Applications. Cambridge, UK: Cambridge University Press.Search in Google Scholar

Percival, D. B., and A. T. Walden. 2000. Wavelet Methods for Time Series Analysis. Cambridge, UK: Cambridge University Press.10.1017/CBO9780511841040Search in Google Scholar

Shephard, N. 2005. Stochastic Volatility. Oxford, UK: Oxford University Press.Search in Google Scholar

Strang, G., and T. Nguyen. 1996. Wavelets and Filter Banks. Boston: Wellesley-Cambridge Press.Search in Google Scholar

Tanner, M. A., and W. H. Wong. 1987. “The Calculation of Posterior Distributions by Data Augmentation.” Journal of the American Statistical Association 82: 528–550.10.1080/01621459.1987.10478458Search in Google Scholar

Tierney, L. 1994. “Markov Chains for Exploring Posterior Distributions.” Annals of Statistics 21: 1701–1762.10.1214/aos/1176325750Search in Google Scholar

Whitcher, B., P. Guttorp, and D. B. Percival. 2000. “Wavelet Analysis of Covariance with Application to Atmospheric Time Series.” Journal of Geophysical Research 105: 14941–14962.10.1029/2000JD900110Search in Google Scholar

Supplemental Material:

The online version of this article (DOI: 10.1515/snde-2014-0116) offers supplementary material, available to authorized users.

Published Online: 2016-3-8
Published in Print: 2016-9-1

©2016 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 8.2.2023 from
Scroll Up Arrow