Bayesian Estimation of Generalized Partition of Unity Copulas

This paper proposes two (Metropolis-Hastings) algorithms to estimate Generalized Partition of Unity Copulas (GPUC), a new class of nonparametric copulas that includes the versatile Bernstein Copula as a special case. Additionally a prior distribution for the parameter Matrix of GPUCs is established via Importance Sampling and an algorithm to sample such matrices is introduced. Finally, simulation studies show the effectiveness of the presented algorithms.


Introduction
Copulas are attractive tools to model multivariate dependence structures in a exible way. They are widely applied in elds where it is crucial to have a good working model of the joint distribution, e.g. in nance or portfolio management. In the past years, various families of copulas have been suggested ranging from the classic families that are generated by inversion of multivariate distributions like the Gaussian or t-copula to the class of Archimedean copulas that use generator functions to create copulas. To add even more exibility in higher dimensions all these copula families can be mixed using vines [3] or, within the class of Archimedean copulas, bivariate copulas can be organized hierarchically [10,20].
A disadvantage of parametric copula families is the obvious risk of misspeci cation. A misspeci ed copula will result in misleading statistical inference. An obvious alternative to parametric copula models is to use nonparametric copulas that directly capture the characteristics of the data. The simplest way is, of course, the empirical copula. Another nonparametric copula is the Bernstein copula [1] which builds on Bernstein polynomials to smooth the estimated copula density. Bernstein copulas can be estimated by the EM algorithm [7].
Recently, [18] and [15] developed a more general family of nonparametric copulas, the so called Generalized Partition of Unity Copulas (GPUC). In contrast to Bernstein copulas, the copula density over the unit hypercube is no longer approximated by a nite mixture of functions but by an in nite mixture. Bernstein copulas are nested as a special case of GPUCs. [16] expand the concept of GPUCs to continuous partitions of the unit hypercube and hence, a continuous parameter space. In this paper we tackle the question how to t GPUCs resulting from a discrete partition of unity. [8] propose an MCMC based estimation approach suited for all copulas that can be parameterized by a doubly stochastic matrix. To t the copula, they indirectly draw the doubly stochastic parameter matrix from its Hilbert space representation in a random walk Metropolis-Hastings sampler.
This paper proposes and applies Bayesian methods to estimate GPUCs. The rest of the paper proceeds as follows. Section 2 reviews the Generalized Partition of Unity Copulas and their properties. Section 3 presents two Bayesian Markov chain Monte Carlo algorithms to sample from the posterior distribution of the GPUC parameters. In section 4, the algorithms are compared in a simulation study. Our substantive contribution is an empirical application in section 5. We consider the joint return behaviour of two of the leading crpytocurrencies: Bitcoin and Ethereum. We show how GPUCs can be applied to nonparametrically model the dependence between their exchange rates. Finally, section 6 concludes.

Generalized Partition of Unity Copulas
The concept of Generalized Partition of Unity Copulas (GPUC) was rst introduced by [18] and applied in the context of risk management [17]. We brie y review the de nition and main properties of GPUCs. A GPUC is a bivariate copula¹ de ned by the density function The generating functions ϕ i (u) and φ j (v) can be considered as probability mass functions of discrete random variables over the positive integers with parameters u and v, and hence, ∞ For simplicity we consider GPUCs with ϕ i (u) = φ i (u) (and hence α i = γ i ) for all i, i.e. copulas with identical generating functions. Note that this does not imply symmetry since the parameter matrix M need not be symmetric. The generating function may have no additional parameter, but it may also depend on one or more parameters. We denote the generic parameter (vector) as θ and write ϕ θ,i for the generating function.
The independence copula is nested by GPUCs if M ij = α i α j for all i, j since Equation (1) denotes the fully parametrized form GPUC with an in nitely dimensional parameter matrix M. For modelling and estimation purposes, we impose the following restrictions on the parameter matrix M: where In general, reduced form GPUCs do not nest the independence copula and, hence, care must be taken when specifying the size of the parameter matrix M before estimation.² The aim of this paper is to estimate the parameters of a reduced form GPUC. Obviously, the natural parametrization of GPUCs is in terms of θ and M. However, it turns out that this choice is inconvenient for estimation since the row and column sums of M may depend on θ. We therefore decided to parametrize the GPUCs in a di erent fashion.
Let Dm denote the space of m × m doubly stochastic matrices, and de ne

. Binomial generating function
The generating function has no additional parameter but only depends on m, . . , m and ϕ i (u) = for i > m. Note that we shifted the usual support of the binomial distribution such that the smallest possible value is 1. This generator implies that (for i = , . . . , m) The resulting copulas are Bernstein copulas (see also [19]) with density Since α i = for i > m this equals also the reduced form copula.

. Negative binomial generating function
The generating function has a single shape parameter β > . For i = , , . . .
The negative binomial distribution is an attractive generating function for many applications since it yields copulas with upper tail dependence [18]. In addition, negative binomial GPUC copulas can be represented as mixtures of beta distributions, o ering an appealing way to simulate them. The row and column sums are The copula density of the full form is and for the reduced form (2) we obtain For integer β the in nite sum can be explicitly evaluated as a nite sum [18]. Since we do not restrict the posterior distribution of β to have integer support, we approximate the in nite sum by a su ciently long nite sum.

Estimation algorithm
from a GPUC with parameter (vector) θ and parameter matrix D of known dimension m × m. We assume that there is no prior knowledge about θ and D and, hence, we use at priors that only incorporate the restriction that the parameters must lie inside the parameter space, e.g. that θ = β > for the negative binomial generator.
To draw samples from the joint posterior distribution p(θ, D|Y) we set up a standard Gibbs sampler iterating between the conditional posteriors p(θ|D, Y) and p(D|θ, Y). Let θ (r) and D (r) denote the r-th draw from the respective posteriors, where r = , . . . , B + R with B being the length of the burn-in phase and R the number of Monte-Carlo iterations. If the generating function has no parameter θ, there is no need to iterate between the two Gibbs steps. Instead, one then only draws from p(D|Y). However, in the following description of the algorithm we presuppose that the generating function has a parameter (vector) θ.
The Gibbs step p(θ|D, Y) is simple. We generate a draw from p(θ|D, Y) using the Metropolis-Hastings algorithm. The proposal for iteration r + of the Markov chain is If θ is a parameter vector, then ε can be drawn from a multivariate normal distribution. The proposal is accepted with probability where the logarithm of the likelihood is with the copula density de ned as in (2). If the proposal is accepted we set θ (r+ ) = θ pr ; if it is rejected then θ (r+ ) = θ (r) . If the proposal is outside the parameter space its likelihood is zero, and the proposal is rejected. Drawing from the high-dimensional p(D|θ, Y) is more intrictate. We make use of the approach suggested by [6] and adapt it in two di erent ways as explained in the next subsections³.

. Random Walk Metropolis-Hastings
In our random walk Metropolis-Hastings (RW-MH) algorithm the proposal for the doubly stochastic matrix is where the random matrix F is constructed as follows: Draw two distinct rows i , i ∈ { , . . . , m} and two distinct columns j , j ∈ { , . . . , m}, and set All other elements of F are set to zero. The step size h is drawn from a uniform distribution on the interval [−g, g]. By construction, D pr is a doubly stochastic matrix as long as all elements are positive. The proposal is accepted with probability If the proposal is accepted we set D (r+ ) = D pr ; if it is rejected then D (r+ ) = D (r) . For evaluating the copula density at D pr it is not necessary to compute c B (u, v) again as it does not depend on the parameter matrix D pr but only on θ which does not change in this Gibbs step. In this algorithm, adding hF ensures the symmetry of the random walk proposal distribution. The scaling parameter h is set such that the acceptance ratio is relatively stable. While this algorithm is simple, draws from the posterior of the parameter matrix D do not mix well since it might take many iterations until an element of D is randomly chosen and altered. This e ect is the more severe the larger is m.

. Random Blocking Random Walk Metropolis-Hastings
To overcome poor mixing, we propose an alternative random blocking sampling scheme along the lines of [5] who introduced random blocking in the DSGE framework. [5] propose to divide the random walk Metropolis-Hastings algorithm into two steps. The rst step consists of randomly generating blocks of parameters. The second step is the classical RW-MH over the randomly blocked parameters. Both the blocks and parameters are drawn in each iteration. For DSGE models, "[g]enerating random blocks [...] is [...] straightforward and does not require comment" [5, p. 22]. In the setting of this paper, however, the blocking method is essential. For ease of exposition we assume that m is even. Our random blocking random walk Metropolis-Hastings (RBRW-MH) algorithm for the parameter matrix D can be summarized as follows. Then set R kl = R il = R kj = . 4. Set F = m×m. 5. Set F ij = F kl = and F il = F kj = − . 6. Draw h from a uniform distribution on [−g, g] and compute the proposal D pr = D (r) + hF. 7. Accept the proposal with probability (4). If any element of D pr is negative, do not accept. If the proposal is accepted we set D (r) = D pr . 8. If all elements of R are zero, set D (r+ ) = D (r) and stop the algorithm. If not, continue with step 2. Since m is assumed to be even, all elements of D are subjected to the algorithm after m / steps. This algorithm generates proposals for all elements of D in the Gibbs step, and consequently achieves much better mixing than RW-MH.

. Adaptive proposal distributions
Allowing the scaling factors σε and g to be varying (with respect to the iteration in the Markov chain), the sampling algorithms can be used in an adaptive fashion. Determining a su ciently large value of g is crucial for both achieving a well mixing Markov chain and e ectively exploring the full posterior distribution. However, too large values lead to a large fraction of rejected proposals.
The adaptation strategy is based on acceptance ratios. In the case of the RW-MH, g is decreased once the acceptance ratio falls below 0.15. For the RBRW-MH algorithm we suggest to decrease g if more than 95% of the four-parameter swaps are rejected. As to σε we adapt it such that the acceptance ratio of the θ-proposals is roughly 0.4.

Simulation study
In this section we investigate the exibility of the Bayesian estimation approach by tting GPUCs to simulated data drawn from known parametric copulas. We consider (i) the normal copula with relatively high correlation (0.7), and (ii) a Gumbel copula with parameter α = . The normal copula displays no tail dependence, the tail dependence index of the Gumbel copula is 0.586.
For each simulation, we draw n = observations (u , v ), . . . , (un , vn) from the parametric copula. Then we run either the RWMH or the RB-RWMH approach over B = burn-in iterations and R = iterations to draw from the joint posterior of the GPUC. We set the dimension of the matrix D to m × m with m = . The distributions of the random walk increments are governed by g = /m (for the elements of D) both for the binomial and negative binomial generators, and σε = . for β in case of the negative binomial generator (the parameter β is absent in GPUCs with the binomial generator). For the simulation study we did not adapt these parameters but kept them x.
To visualize that the MCMC approach yields draws from the posterior distribution, gure 1 shows the histogram of β (top left panel), the trace plots for β (top right panel), and the trace plots of two elements of the matrix D (bottom left: D , bottom right: D , ) when tting a GPUC with a negative binomial generator to the observations from a Gumbel distribution. Apparently the algorithm has converged to the posterior distribution after the burn-in period. Since mixing is only moderately strong, we show the thinned plots (with thinning parameter 10).
As the true copulas are known in the simulations, we can compute the distance of the nonparametrically estimated copulas from the true ones. As distance measure we employ the absolute distance of the copula where c is the true copula (i.e. either the normal copula or the Gumbel copula in our simulations), andĉ is the GPUC copula with matrixD = R − R r= D (r) (and, for the negative binomial generator, similarly forβ). The results show that the GPUC with a binomial generator and m = ts both the normal copula and the Gumbel copula better, even though it cannot re ect the upper tail dependence of the Gumbel copula.

Empirical illustration: Returns of cryptocurrencies
Cryptocurrencies constitute a new asset class and are an interesting portfolio component for investors. The two most widely traded cryptocurrencies are Bitcoin and Ethereum. Recently, much research has been conducted about the statistical properties of the return distributions of cryptocurrencies. The return is de ned as the logarithm of the change in the exchange rate to the US dollar.
All studies consistently nd that: returns are clearly non-normally distributed; the mean return as well as the volatility (as measured by the standard deviation) are an order of magnitude larger than for traditional currencies; the tails are very heavy. [13] estimates a tail index of slightly less than 2 implying that the variance does not exist. Other studies nd a tail index above 2, e.g. [14], [4] or [22]. [14] also estimate the extremal index to quantify the amount of clustering of extreme returns. The extremal index tends to be smaller than for normal currencies, showing that extreme movements of the Bitcoin price are more clustered. [22] examine more time series properties and nd volatility clustering, i.e. the autocorrelation of returns is small while the autocorrelation of absolute returns is larger. Volatility clustering is also documented and modeled by [11].
There is consensus in the literature that cryptocurrencies cannot be regarded as normal currencies. [2] argue that Bitcoin does not have the de ning properties of money: it is not a general medium of exchange, nor is it a widely used accounting unit, nor can it act as a value storage. They argue that Bitcoin is an asset for speculation. [9] nd that Bitcoin returns behave more like stock returns than like currency returns. [12] detect a signi cant momentum e ect and attention e ect for Bitcoin returns. They conclude that Bitcoin does not behave like a traditional currency.
If cryptocurrencies are added to a portfolio it is important to have an accurate model of their joint return distribution that mirrors its relevant features. We illustrate how GPUCs can help model the (bivariate) dependence structure of Bitcoin and Ethereum both at the level of daily prices and at the high-frequency level.   Figure 3 shows the daily returns. Extreme daily returns of more than 10 (or even 20) percent in absolute terms happen much more often than for traditional currencies. Volatility clusters are also discernible. This conforms with return autocorrelations of − .
(Ethereum) and autocorrelations of the absolute returns of 0.159 (Bitcoin) and 0.203 (Ethereum). Figure 4 shows the scatterplot of the empirical edfs (i.e. the ranks normalized to the unit interval) of Bitcoin and Ethereum returns. The plot indicates that there is a rather strong positive dependence between the daily returns. However, sometimes extremely large Bitcoin returns can coincide with extremely small Ethereum returns and vice versa. Further, a visual inspection of the plot suggests that there is lower tail dependence but no upper tail dependence.
We t nonparametric GPUCs both with binomial and negative binomial generators to the data. Since the negative binomial generator implies copulas with upper tail dependence we rotate the empirical observations to transform their apparent lower tail dependence into upper tail dependence. Figures 5 shows the nonparametrically estimated copula density of the joint returns (left panel: binomial generator, right panel: negative binomial generator). Note that the negative binomial generator induces a markedly stronger upper tail dependence while the binomial generator implies a more symmetric tail behaviour.

. High frequency return data
Cryptocurrencies are continuously traded worldwide. Our high-frequency dataset⁴ records the prices and returns at 10   ), the autocorrelation of absolute returns is higher than for daily data (0.245 and 0.242). Figures 6 and 7 show the high-frequency time series of prices and returns. Figure 8 shows the scatterplot of the empirical edfs of Bitcoin and Ethereum returns. The structure appears to be more symmetric than for daily data.
The estimated copula densities are depicted in gure 9 (left: binomial generator, right: negative binomial generator). The nonparametrically estimated densities are rather dissimilar. The tail dependence symmetry is re ected by the GPUC with binomial generator but to a much lesser extent by the GPUC with negative binomial generator. We conclude that GPUCs with a negative binomial generator do not t copulas with lower (or upper and lower) tail dependence su ciently accurately.

Concluding Remarks
This paper proposes Bayesian methods to estimate generalized partition of unity copulas (GPUCs). While the approach is applicable to general GPUCs we explicitly consider GPUCs with (a) binomial generating functions (i.e. Bernstein copulas) and (b) negative binomial generating functions. GPUCs with a negative binomial generator exhibit upper tail dependence which prima facie makes them attractive for many elds of applications.
We introduce a parametrization for the GPUCs that enables us to use random walk Metropolis-Hastings algorithms. To improve the mixing properties of the sampler, we suggest random blocking for the elements of the parameter matrix. The random walk step sizes can be chosen in an adaptive fashion. In a simulation study, we show that GPUCs with a binomial generator (Bernstein copulas) t both the normal and Gumbel copula better than GPUCs with a negative binomial generator for a moderate sample size of 5000. This holds true   Figure 9: Estimated copula densities of high-frequency Bitcoin and Ethereum returns for binomial (left) and negative binomial (right) generators, the scales are the same in both plots despite the upper tail dependence of the Gumbel copula which cannot be reproduced by Bernstein copulas unless m is large. The apparent advantage of negative binomial generators turns out not to be pertinent in (at least some) practical applications. Our empirical illustration shows that important main features of the joint distribution of daily and high-frequency returns of the crypto-currencies Bitcoin and Ethereum are well captured by GPUCs. Again we nd that GPUCs with a negative binomial generator tend to t the empirical joint distribution of daily and high-frequency returns less closely than Bernstein copulas.