Copula multivariate GARCH model with constrained Hamiltonian Monte Carlo

Abstract The Copula Multivariate GARCH (CMGARCH) model is based on a dynamic copula function with time-varying parameters. It is particularly suited for modelling dynamic dependence of non-elliptically distributed financial returns series. The model allows for capturing more flexible dependence patterns than a multivariate GARCH model and also generalizes static copula dependence models. Nonetheless, the model is subject to a number of parameter constraints that ensure positivity of variances and covariance stationarity of the modeled stochastic processes. As such, the resulting distribution of parameters of interest is highly irregular, characterized by skewness, asymmetry, and truncation, hindering the applicability and accuracy of asymptotic inference. In this paper, we propose Bayesian analysis of the CMGARCH model based on Constrained Hamiltonian Monte Carlo (CHMC), which has been shown in other contexts to yield efficient inference on complicated constrained dependence structures. In the CMGARCH context, we contrast CHMC with traditional random-walk sampling used in the previous literature and highlight the benefits of CHMC for applied researchers. We estimate the posterior mean, median and Bayesian confidence intervals for the coefficients of tail dependence. The analysis is performed in an application to a recent portfolio of S&P500 financial asset returns.


Introduction
Modeling, predicting, and understanding the volatility of nancial time series lies in the core of nancial econometrics. Volatility prediction has a very wide range of applications, such as in portfolio optimization, risk management, asset allocation, and asset pricing. Since the seminal papers [12] and [5] introducing the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) modeling framework, voluminous research has been conducted on the topic, encompassing various model extensions and generalizations. The usual assumption in multivariate GARCH (MGARCH) models is that the conditional joint distribution of the returns follows a multivariate normal or multivariate t-distribution (see e.g. the survey [3]). However, these elliptical distribution models require a very strong symmetry of the data and may not be appropriate in many circumstances.
Researchers have been addressing two key aspects of volatility analysis: looking for the most appropriate model speci cation and selecting the most e cient approach for inference and prediction. In this paper, for the modeling aspect we adopt the copula multivariate GARCH (CMGARCH) model [20], featuring very appeal-ing dynamic dependence structure for nancial time series. We then focus on the latter aspect of inference and prediction by proposing Bayesian inference via Constrained Hamiltonian Monte Carlo (CHMC), which we show is particularly suited for practical inference relative to traditional random-walk based Markov Chain Monte Carlo (RW-MCMC) methods utilized for the CMGARCH in the previous literature [2]. In addition to parameter inference, we show that CHMC facilitates computationally e cient estimation of the posterior mean, median and Bayesian con dence intervals for the coe cients of tail dependence.
Traditional approaches measure dependence by linear correlation, quanti ed by the Pearson's correlation coe cient. However, linear correlation expresses a symmetric linear notion of dependence. Modern asset returns present nonlinear, non-Gaussian, and dynamic features. Copula functions, a powerful tool originating from mathematical analysis, can capture the dependence structures among nancial time series without the standard linear dependence and multivariate i.i.d return distribution restrictions. Since the 1990s, copulabased analysis have entered the core of dependence modeling in nance [10,26].
A recent stream of literature joins the dynamic MGARCH modeling framework with the copula dependence structure. The key advantage of this approach is that the individual marginal densities of the returns can be de ned separately from their dependence structure. Using this approach, many non-elliptical and exible multivariate distributions can be obtained, capturing the corresponding features of nancial portfolio series. In this context, most researchers have speci ed the marginal series following univariate GARCH processes and captured the dependence structure between them by a copula function to form the CMGARCH model [7,16,18,23,31]. However, the CMGARCH model requires a set of parameter constraints maintaining positivity of variances and covariance stationarity of the modeled stochastic processes. The model likelihood is then highly irregular, skewed, asymmetric, and truncated in regions of relatively high posterior density, hindering the applicability and accuracy of asymptotic inference [18,23]. Perhaps for this reason the number of application of CMGARCH has so far remained relatively limited. We contribute to this stream of literature by focusing on the practical aspects of computationally e cient model implementation.
The remainder of the paper is organized as follows. In Section 2 we lay out the CMGARCH model, including background on GARCH modeling and copula functions. In Section 3 we detail Constrained Hamiltonian Monte Carlo. In Section 4 we contrast CHMC with traditional RW-MCMC in the context of CMGARCH in an empirical application to a portfolio of S&P500 equity returns, quantifying the coe cients of tail dependence. Section 5 concludes.

Copula MGARCH
We will rst lay out the background of dynamic GARCH modeling and copula functions, and then describe the copula multivariate GARCH model considered in this paper. Bayesian analysis of the model with CHMC will follow next.

. GARCH Framework
Denote by y it the marginal nancial series for asset i = , ..., N and time t = , ..., T in an N-dimensional portfolio. Then y it follows a GARCH(1,1) model if where h it is the conditional variance of y it given the information set I i,t− = {y i,t− , y i,t− , ...}, and ε it are independent and identically distributed random variables with zero mean. The parameter constraints are required to ensure positivity of h it and covariance stationarity of y it , respectively. We assume that ε it follow the Student t-distribution, ε it ∼ tν i , with ν i degrees of freedom, zero mean and variance ν i /(ν i − ). The density of ε it , f (ε it ) is thus given by This distributional assumption is typical for modelling fat tails in univariate time series [5]. The conditional distribution function of each marginal series is then

. Copulas
A N-dimensional copula C(u , ..., u N ) is a multivariate distribution function in the unit hypercube [ , ] N , with uniform U( , ) marginal distributions. By the Sklar's Theorem, every joint distribution F(x , ..., x N ), whose marginals are given by F (x ), ..., F N (x N ), can be written as Furthermore, if the marginal distributions are continuous, then there is a unique copula associated to the joint distribution, F, that can be obtained from Conversely, given an N-dimensional copula, C(u , ..., u N ), and N univariate distributions, F (x ), ..., F N (x N ), the function (5) is an N-variate distribution function with margins F (x ), ..., F N (x N ) whose corresponding density function is given by where f i denotes the marginal density functions and c is the copula density function derived from (6) as .
The principal characteristic of a copula function is its ability to decompose the joint distribution into two parts: marginal distributions and dependence structure. Di erent dependence structures can combine the same marginal distributions into di erent joint distributions. Similarly, di erent marginal distributions under the same dependence structure can also lead to di erent joint distributions. Many parsimonious parametric families of copula functions have been proposed in the literature [26]. A fundamental case is the Gaussian copula, obtained from the multivariate normal distribution with correlation matrix R, given by where u = (u , ..., u N ) and Φ − is the inverse of the cumulative distribution function of the univariate standard normal distribution. The Gaussian copula assumes that there is no dependence in the tails of the distribution. Tail dependence re ects the limiting proportion of one variable reaching extreme values given that the other related variable has reached an extreme as well. Formally, it is de ned by [32] as follows.

De nition 1 (Tail dependence (2-dimensional case)). Let Z = (X, Y). The random vector Z is upper tail depen
provided that the limit exists. If λ U = , the Z is upper tail independent. λ U is called the upper tail dependence coe cient. Similarly, the lower tail dependence coe cient λ L is de ned as provided that the limit exist. If λ L = , the Z is lower tail independent, and is lower tail dependent if λ L > .
Such dependencies are of interest in nancial economics since nancial markets exhibit higher correlation in times of distress [9], hence using the Gaussian copula would not be appropriate since it assumes this tail dependency to be null. It is therefore often more useful to consider the t-copula, which is obtained from the multivariate t-distribution with η degrees of freedom and correlation matrix, R, given by where t − η denotes the inverse of the cumulative distribution function of the standard univariate Student-t distribution with η degrees of freedom. The Gaussian copula results as a special case of the t-copula when η goes to in nity. As the t-copula is of an elliptically symmetric distribution, the two measures λ U and λ L coincide and can be denoted simply by λ. While the Gaussian copula is characterized by zero tail dependence, λ = , [11] provide the following simple formula to represent the tail dependence coe cient λ of the t-copula: for a pair of random variables with correlation coe cient ρ.

. Copula Multivariate GARCH Model
We consider the CMGARCH model setup of [2]. The dependence structure between the marginal series is described by a time-varying t-copula¹ function with η degrees of freedom, as de ned in (8), with density at each time t given by Rt is the joint density of the standard multivariate Student-t distribution with η degrees of freedom and correlation matrix R t and f t η is the density of the standard univariate t-distribution with η degrees of freedom. The parameter matrix R t of the t-copula varies through time according to where a and b are nonnegative parameters, R is a time-invariant N × N positive de nite parameter matrix with unit diagonal elements and Ψ t− is an N × N matrix whose (i, j)-th element is given by gives the sample correlation of the copula-transformed error term ε it from equation (1), The dynamics in (10) are based on the correlation matrix evolution proposed by [33] in a multivariate GARCH model. The advantage of this speci cation is that the parameter matrix R t is a well-de ned positive de nite correlation matrix with unit diagonal elements, without the need to use any transformation such as the logistic function considered e.g. in [27] and [7]. In order to maintain stationarity, the model is constrained with where r ij is the (i, j)-th element of the parameter matrix R. Using (7), the log-likelihood function of the CMGARCH model is given by Figure 1 shows the model likelihood for (α , β ) with the constraint (4), α + β < , evidenced by the lack of likelihood contours in the upper right hand section of the graph. Such constraint would result in redundancies for traditional MCMC methods as any proposal falling into the region where the constraint is violated would be rejected. In contrast, CHMC proposals are re ected o the constraint barrier and the proposal is made every time in the region where the constraint is not violated, as detailed below.
As in [2], for the parameters µ i , ω i , α i , β i we assume a uniform prior over their respective domains of stationarity given by the constraints (3)-(4). For the copula parameters a, b, R we also assume a uniform prior distribution restricted to satisfy the constraints (11)- (13). For the degrees of freedom of the t-distribution ν i we assume a half-right side Cauchy prior for ν i > , and we assume the same prior for the degrees of freedom parameter η. All parameters are sampled jointly in each iteration.

Constrained Hamiltonian Monte Carlo
The purpose of traditional Markov Chain Monte Carlo (MCMC) methods is to formulate a Markov chain on the parameter space Θ for which, under certain conditions, π(θ) ∈ P θ is the invariant (also called 'equilibrium' or 'long-run') distribution. The Markov chain of draws of θ can be used to construct simulation-based estimates of the required integrals, and functionals Q(θ) of θ that are expressed as integrals. These functionals include objects of interest for inference on θ such as quantiles of π(θ).
In a generic MCMC scheme, the Markov chain sampling mechanism speci es a method for generating a sequence of random variables {θ k } K k= , starting from an initial point θ , in the form of conditional distributions for the draws θ k+ |θ k ∼ G(θ k ). Under relatively weak regularity conditions [29], the average of the Markov chain converges to the expectation under the stationary distribution: A Markov chain with this property is called ergodic. As a means of approximation the analyst relies on large but nite number of draws K ∈ N which the analyst selects in applications.
G(θ k ) can be obtained from a given model and its corresponding likelihood π(θ). Typically, π(θ) has a complicated form which precludes direct sampling in which case the Metropolis-Hastings (M-H) principle is usually used for θ k+ |θ k from G(θ k ); see [6] for a detailed overview. The M-H algorithm proceeds according to the following steps. Suppose we have a proposal-generating density q(θ * k+ |θ k ) where θ * k+ is a proposed state given the current state θ k of the Markov chain. The M-H principle stipulates that θ * k+ be accepted as the next state θ k+ with the acceptance probability otherwise θ k+ = θ k . Then, the Markov chain satis es the so-called detailed balance condition which is su cient for ergodicity. α(θ * k+ , θ k ) is the probability of the move θ k |θ * k+ if the dynamics of the proposal generating mechanism are reversed. The proposal-generating density q(θ * k+ |θ k ) can be chosen to be sampled conveniently even though it may be di cult or expensive to sample from π(θ). The popular Gibbs sampler arises as a special case when the M-H sampler is factored into conditional densities. The proposal draws θ * k+ |θ k from q(θ * k+ |θ k ) in (14) are generated in one step. We will refer to this procedure as random walk MCMC (RW-MCMC).

. CHMC Principle
Hamiltonian (or Hybrid) Monte Carlo (HMC) originates in the physics literature where it was introduced as a fast method for molecular dynamics simulations [8]. It has since become an important tool for Bayesian statistical inference [4,17,22,24,25] with applications in a number of diverse elds including statistical physics [1,14] and computational chemistry [34]. Detailed description of the CHMC mechanism from the statistics perspective is provided in [25]. Nonetheless, its usage in economics and nance has so far been relatively scarce. HMC and its constrained version, CHMC, applies to a general class of models that is parametrized by a Euclidean vector θ ∈ Θ for which all information in the sample is contained in the model likelihood π(θ) assumed known up to an integrating constant. Formally, this class of models is characterized by a family P θ of probability measures on a measurable space (Θ, B) where B is the Borel σ-algebra.
In contrast to a traditional MCMC scheme utilizing one-step proposal draws, Hamiltonian Monte Carlo (HMC) uses a sequence of many proposal steps whereby the last step in the sequence is designated as the proposal draw. The proposal is thus drawn in a relatively distant location from the current draw, which facilitates e cient exploration of the parameter space with the resulting Markov chain. The proposal sequence is constructed using di erence equations of the law of motion that result in high acceptance probability even for distant proposals. The parameter space Θ is augmented with a set of independent auxiliary stochastic parameters γ ∈ Γ that ful ll a supplementary role in the proposal algorithm, facilitating the directional guidance of the proposal mechanism. The proposal sequence takes the form {θ k , γ k } L = starting from the current state (θ k , γ k ) = (θ k , γ k ) and yielding a proposal (θ k , γ k ) = (θ L k , γ L k ). The detailed balance is then satis ed using the acceptance probability In constrained HMC (CHMC), constraints are embedded in the HMC proposal mechanism via "hard walls" representing a barrier against which the proposal sequence, simulating a particle movement, bounces o elastically. Thus, violating a constraint along the proposal sequence does not result in proposal rejection but rather an altered course of the proposal path, eliminating any associated redundancies. Heuristically, the constraint is checked at each step of the proposal sequence and if it is violated then the trajectory of the sequence is re ected o the hard wall posed by the constraint. This facilitates e cient exploration of the parameter space even in the presence of highly complex parameter constraints.
Let γ ∼ Φ(γ; , M) where Φ denotes the Gaussian distribution with mean vector 0 and covariance matrix M. Denote the constraint on the parameter space by a density kernel exp C k (θ) where if the constraint is satis ed C k > s.t. lim k→∞ C k = ∞ if the constraint is violated Denote the joint density of (θ, γ) by π(θ, γ). The negative of the logarithm of the joint density of (θ, γ) is given by the constrained Hamiltonian equation where d = dim(θ) = dim(γ). In the physics literature, the Hamiltonian H(θ, γ) represents the total energy of the (θ, γ)-system where θ denotes the position (or state) variable and − ln π(θ) describes its potential energy. γ is the momentum variable with kinetic energy γ M − γ/ where M is a constant, symmetric, positive-de nite "mass" matrix, typically set to the identity matrix. A canonical iteration of Constrained Hamiltonian Monte Carlo (CHMC) is performed in the following steps: 1. Draw an initial auxiliary parameter vector γ k ∼ Φ(γ; , M); 2. Transition from (θ k , γ k ) to (θ L k , γ L k ) = (θ * k+ , γ * k+ ) according to the constrained Hamiltonian dynamics; 3. Accept (θ * k+ , γ * k+ ) with probability otherwise keep (θ k , γ k ) as the next MC draw.
The constraints on the parameter space Θ are re ected in Step 2. We will now describe each step in detail.
Step 1 updates the system akin to a RW draw in order to make the resulting Markov chain (θ k , γ k ) R k= irreducible and aperiodic [17].
Step 2 constructs a sequence {θ k , γ k } L = according to the Hamiltonian dynamics starting from the current state (θ k , γ k ), with the newly drawn γ k , and setting the last member of the sequence as the CHMC new state proposal (θ * k+ , γ * k+ ) = (θ L k , γ L k ). The transition from (θ k , γ k ) to (θ L k , γ L k ) via the proposal sequence {θ k , γ k } L = taken according to the discretized Hamiltonian dynamics described below is fully deterministic, placing a Dirac delta probability mass δ(θ k , γ k ) = on each (θ k , γ k ) conditional on (θ k , γ k ). The CHMC acceptance probability in (17) is speci ed in terms of the di erence between the Hamiltonian (16) evaluated at the initial (θ k , γ k ) and at the proposal (θ * k+ , γ * k+ ). The role of the Hamiltonian dynamics is to ensure that the acceptance probability (17) for (θ * k+ , γ * k+ ) is kept close to 1. This is achieved by maintaining the di erence −H(θ * k+ , γ * k+ )+ H(θ k , γ k ) close to zero throughout the sequence {θ k , γ k } L = . This property can be achieved by conceptualizing θ and γ as functions of continuous time t and specifying their evolution using the Hamiltonian dynamics for i = , . . . , d. For any discrete time interval of duration s, (18)- (19) de ne a mapping Ts from the state of the system at time t to the state at time t + s. The di erential equations (18)- (19) are generally solved by numerical methods, typically the Stormer-Verlet (or leapfrog) numerical integrator [21]. For each step of {θ k , γ k } L = , CHMC discretizes the Hamiltonian dynamics (18)- (19) as follows: for some small ε ∈ R rst take a half-ε step in γ then take a full ε step in θ and check the constraint at θ(t + ε) for each dimension i of θ. If the constraint is violated for any i then set γ i (t + ε/ ) = −γ i (t + ε/ ) reversing the proposal dynamics and take further steps in θ i until the constraint is satis ed, then nish with another half-ε step in γ γ(t + ε) = γ(t + ε/ ) + (ε/ )∇ θ ln π(θ(t + ε)).
Intuitively, the proposal trajectory bounces o the "walls" given by the constraint. Since a move in γ only occurs conditional on θ for which the constraint is satis ed in which case C k (θ) = , and the move in θ only depends on ∂H(θ, γ)/∂γ i which does not involve C k (θ), the exact functional form of C k (θ) is inconsequential as long as it is di erentiable in θ to de ne valid Hamiltonian dynamics. Even though ln π(θ k ) can deviate substantially from ln π(θ k ), the additional Hamiltonian terms parametrized by γ in (16) (19). In contrast, without the additional parametrization with γ, if only ln π(θ k ) were to be used in the proposal mechanism as is the case in RW style samplers, the M-H acceptance probability would often drop to zero relatively quickly.
In the next Section we apply CHMC to the CMGARCH model of Section 2.

Application
We apply CHMC and RW-MCMC to the daily closing prices of S&P500 equities for the period 02/Jan/2015 to 29/Jun/2018, with T = , obtained from the Compustat database. We chose two companies with the highest overall transaction volumes: Citigroup (C) and Bank of America (BAC). The log return series are plotted in Figure 2 and summary statistics are presented in Table 1.
In theory, the dimensionality of the portfolio is only limited by the sample size. Nonetheless, in practice the portfolio size for feasible implementation will also be restricted by the computing power available. Our implementation was run with a Coarray Fortran 2008 code using Intel 2016 compiler on 3.1 GHz desktop computer². We set m = and ν i = for i = , ..., N. In RW-MCMC we tuned the step sizes for each parameter to achieve transition acceptance rate of about 30% [30]. In CHMC tuned the step sizes to achieve transition acceptance rate of about 60%. Theoretical analysis of optimal step sizes and acceptance rates for HMC is provided in [4].
We used L = as the length of the proposal sequence in CHMC. Even though each iteration takes longer to complete for CHMC than for RW-MCMC, the resulting parameter draws exhibit superior mixing properties for the former, even when normalized per unit of time. In order to make both methods comparable from the user's perspective, we ran both procedures for 10 minutes of wallclock time and discarded one extra initial minute of runtime as a burn-in part of the chain. CHMC completed just over 1,100 iterations while RW-MCMC completed just over 31,000 iterations. In order to minimize autocorrelation of the chain and render the output comparable to CHMC in terms of size per unit of time, we thinned the RW-MCMC chain by retaining every 30-th iteration. Table 2 reports comparison of the resulting chains in terms of e ective sample size (ESS), evaluated by the R package coda [28]. ESS of a Bayesian Markov chain parameter draw is the number of e ectively independent draws from the posterior distribution to which the Markov chain draw is equivalent [19]. The ESS has been commonly used as a standard measure for the quality of the posterior approximation for individual chains.
Markov chain convergence diagnostics, obtained using the R package coda, for both methods are reported in Table 3. Speci cally, we evaluated the z-scores of the [13] convergence test standardized z-score and p-values of the [15] stationary distribution test. As evidenced by the z-scores, while for CHMC all chains have converged within the burn-in section, for RW-MCMC half of the parameter chains have failed to reach convergence. The p-values output con rms that both methods drew chains from a stationary distribution.
We further visually illustrate the CHMC versus RW-MCMC comparison in Figures 3-5 in the Appendix, showing the trace plots of the Markov chains. The CHMC trace (left column) mixes very well, exploring the tails of the posterior, while the RW-MCMC trace (right column) su ers from relatively poorer mixing with scant tail exploration. This is consistent with the outcomes of the convergence diagnostics reported above. Overall, .
our results favor the use of the CHMC as an e ective method of inference for the complex posterior of the CMGARCH model. .
We used the CHMC output to evaluate the posterior mean value of the tail dependence index (9) as where the superscript (k) indicated the k-th draw. We then used the posterior distribution of the tail dependence index to quantify its 95% Bayesian Credible Set (BCS). The posterior mean was thus 0.176, and the 95% BCS (0.157, 0.194).

Conclusions
In this paper we build on a recent stream of literature merging the dynamic MGARCH modeling framework with the copula dependence structure, yielding a family of exible yet parsimonious non-elliptical multivariate distributions for modeling the features of nancial portfolio series. The main advantage of this approach is that the individual marginal densities of the series can be de ned separately from their dependence structure. However, the resulting CMGARCH model requires a set of parameter constraints maintaining positivity of variances and covariance stationarity of the modeled stochastic processes. The model likelihood is then highly irregular, skewed, asymmetric, and truncated in regions of relatively high posterior density, hindering the applicability and accuracy of asymptotic inference. In this paper, we show that the Constrained Hamiltonian Monte Carlo exhibits a number of advantages over traditional random-walk based Markov Chain Monte Carlo methods utilized for the CMGARCH in the previous literature. Our analysis is performed in the context of an application to modelling a portfolio of S&P500 equity returns, quantifying the coe cient of tail dependence.