# Nonparametric C- and D-vine-based quantile regression

Marija Tepegjozova, Jing Zhou, Gerda Claeskens and Claudia Czado
From the journal Dependence Modeling

# Abstract

Quantile regression is a field with steadily growing importance in statistical modeling. It is a complementary method to linear regression, since computing a range of conditional quantile functions provides more accurate modeling of the stochastic relationship among variables, especially in the tails. We introduce a nonrestrictive and highly flexible nonparametric quantile regression approach based on C- and D-vine copulas. Vine copulas allow for separate modeling of marginal distributions and the dependence structure in the data and can be expressed through a graphical structure consisting of a sequence of linked trees. This way, we obtain a quantile regression model that overcomes typical issues of quantile regression such as quantile crossings or collinearity, the need for transformations and interactions of variables. Our approach incorporates a two-step ahead ordering of variables, by maximizing the conditional log-likelihood of the tree sequence, while taking into account the next two tree levels. We show that the nonparametric conditional quantile estimator is consistent. The performance of the proposed methods is evaluated in both low- and high-dimensional settings using simulated and real-world data. The results support the superior prediction ability of the proposed models.

MSC 2010: 62H05; 62G08; 62G05

## 1 Introduction

As a robust alternative to the ordinary least squares regression, which estimates the conditional mean, quantile regression [38] focuses on the conditional quantiles. This method has been studied extensively in statistics, economics, and finance. The pioneer literature by Koenker [35] investigated linear quantile regression systematically. It presented properties of the estimators including asymptotic normality and consistency, under various assumptions such as independence of the observations, independent and identically distributed (i.i.d.) errors with continuous distribution, and predictors having bounded second moment. Subsequent extensions of linear quantile regression have been intensively studied, see for example adapting quantile regression in the Bayesian framework [66], for longitudinal data [34], time-series models [63], high-dimensional models with l 1 -regularizer [7], nonparametric estimation by kernel weighted local linear fitting [65], by additive models [37,20], etc. The theoretical analysis of the above-mentioned extensions is based on imposing additional assumptions such as samples that are i.i.d. (see for example Belloni and Chernozhukov [7], Yu and Jones [65]), or that are generated by a known additive function (see for example Koenker [37,34]). Such assumptions, which guarantee the performance of the proposed methods for certain data structures, cause concerns in applications due to the uncertainty of the real-world data structures. Bernard and Czado [8] addressed other potential concerns such as quantile crossings and model-misspecification, when the dependence structure of the response variables and the predictors does not follow a Gaussian copula. Flexible models without assuming homoscedasticity, or a linear relationship between the response and the predictors are of interest. Recent research on dealing with this issue includes quantile forests [2,42,44] inspired by the earlier work of random forests [9] and modeling conditional quantiles using copulas (see also Chen et al. [13], Noh et al. [50,51]).

Vine copulas in the context of conditional quantile prediction have been investigated by Kraus and Czado [41] using drawable vine copulas (D-vines) and by Chang and Joe [11] and, most recently, Zhu et al. [67] using restricted regular vines (R-vines). The approach of Kraus and Czado [11] is based on first finding the locally optimal regular vine structure among all predictors and then adding the response to each selected tree in the vine structure as a leaf, as also followed by Bauer and Czado [5] in the context of non-Gaussian conditional independence testing. The procedure in Chang and Joe [11] allows for a recursive determination of the response quantiles, which is restricted through the prespecified dependence structure among predictors. The latter might not be the one maximizing the conditional response likelihood, which is the main focus in regression setup. The approach of Kraus and Czado [41] is based on optimizing the conditional log-likelihood and selecting predictors sequentially until no improvement of the conditional log-likelihood is achieved. This approach based on the conditional response likelihood is more appropriate to determine the associated response quantiles. Furthermore, the intensive simulation study in Kraus and Czado [41] showed the superior performance of the D-vine copulas-based quantile regression compared to various quantile regression methods, i.e. linear quantile regression [38], boosting additive quantile regression [35,37,20], nonparametric quantile regression [43], and semiparametric quantile regression [51]. In parallel to our work, Zhu et al. [67] proposed an extension of this D-vine-based forward regression to a restricted R-vine forward regression with comparable performance to the D-vine regression. Thus, the D-vine quantile regression will be our benchmark model.

The article is organized as follows. Section 2 introduces the general setup, the concept of C-vine and D-vine copulas, and the nonparametric approach for estimating copula densities. Section 3 describes the vine-based approach for quantile regression. The new two-step ahead forward selection algorithms are described in Section 4. We investigate in Proposition 4.1 the consistency of the conditional quantile estimator for given variable orders. The finite sample performance of the vine-based conditional quantile estimator is evaluated in Section 5 by several quantile-related measurements in various simulation settings. We apply the newly introduced algorithms to low- and high-dimensional real data in Section 6. In Section 7, we conclude and discuss possible directions of future research.

## 2 Theoretical background

Consider the random vector X = ( X 1 , , X d ) T with observed values x = ( x 1 , , x d ) T , joint distribution and density function F and f , marginal distribution and density functions F X j and f X j for X j , j = 1 , , d . Sklar’s theorem [57] allows us to represent any multivariate distribution in terms of its marginals F X j and a copula C encoding the dependence structure. In the continuous case, C is unique and satisfies F ( x ) = C ( F X 1 ( x 1 ) , , F X d ( x d ) ) and f ( x ) = c ( F X 1 ( x 1 ) , , F X d ( x d ) ) [ j = 1 d f X j ( x j ) ] , where c is the density function of the copula C . To characterize the dependence structure of X , we transform each X j to a uniform variable U j by applying the probability integral transform, i.e. U j F X j ( X j ) , j = 1 , , d . Then the random vector U = ( U 1 , , U d ) T with observed values ( u 1 , , u d ) T has a copula as a joint distribution denoted as C U 1 , , U d with associated copula density function c U 1 , , U d . While the catalogue of bivariate parametric copula families is large, this is not true for d > 2 . Therefore, conditioning was applied to construct multivariate copulas using only bivariate copulas as building blocks. Joe [28] gave the first pair copula construction for d dimensions in terms of distribution functions, whereas Bedford and Cooke [6] independently developed constructions in terms of densities together with a graphical building plan, called a regular vine tree structure. It consists of a set of linked trees T 1 , , T d (edges in tree T j become nodes in tree T j + 1 ) satisfying a proximity condition, which allows us to identify all possible constructions. Each edge of the trees is associated with a pair copula C U i , U j ; U D , where D is a subset of indices not containing i , j . In this case, the set { i , j } is called the conditioned set, while D is the conditioning set. A joint density using the class of vine copulas is then the product of all pair copulas identified by the tree structure evaluated at appropriate conditional distribution functions F X j X D and the product of the marginal densities f X j , j = 1 , , d . A detailed treatment of vine copulas together with estimation methods and model choice approaches are given, for example in the studies of Joe [30] and Czado [17].

Since we are interested in simple copula-based estimation methods for conditional quantiles, we restrict to two subclasses of the regular vine tree structure, namely, the D- and C-vine structure. We show later that these structures allow us to express conditional distribution and quantiles in closed form. In the D-vine tree structure all trees are paths, i.e. all nodes have degree 2 . Nodes with degree 1 are called leaf nodes. A C-vine structure occurs, when all trees are stars with a root node in the centre. The right and left panels of Figure 1 illustrate a D-vine and a C-vine tree sequence in four dimensions, respectively.

### Figure 1

C-vine tree sequence (left panel) and a D-vine tree sequence (right panel) in four dimensions.

For these sub classes we can easily give the corresponding vine density [17, Chapter 4]. For a D-vine density we have a permutation s 1 , , s d of 1 , , d such that

(1) f ( x 1 , , x d ) = j = 1 d 1 i = 1 d j c U s i , U s i + j ; U s i + 1 , , U s i + j 1 ( F X s i X s i + 1 , , X s i + j 1 ( x s i x s i + 1 , , x s i + j 1 ) , F X s i + j X s i + 1 , , X s i + j 1 ( x s i + j x s i + 1 , , x s i + j 1 ) ) k = 1 d f X s k ( x s k ) ,

while for a C-vine density the following representation holds

(2) f ( x 1 , , x d ) = j = 1 d 1 i = 1 d j c U s j , U s j + i ; U s 1 , , U s j 1 ( F X s j X s 1 , , X s j 1 ( x s j x s 1 , , x s j 1 ) , F X s j + i X s 1 , , X s j 1 ( x s j + i x s 1 , , x s j 1 ) ) k = 1 d f X s k ( x s k ) .

To determine the needed conditional distribution F X j X D in Eqs. (1) and (2) for appropriate choices of j and D , the recursion discussed in Joe [28] is available. Using u j = F X j X D ( x j x D ) for j = 1 , , d we can express them as compositions of h-functions. These are defined in general as h U i U j ; U D ( u i u j ; u D ) = u j C U i , U j ; U D ( u i , u j ; u D ) . Additionally, we made in Eqs. (1) and (2) the simplifying assumption ([17], Section 5.4) that is, the copula function C U i , U j ; U D does not depend on the specific conditioning value of u D , i.e. C U i , U j ; U D ( u i , u j ; u D ) = C U i , U j ; U D ( u i , u j ) . The dependence on u D in Eqs. (1) and (2) is completely captured by the arguments of the pair copulas. This assumption is often made for tractability reasons in higher dimensions (Haff et al. [27] and Stoeber et al. [58]). It implies further that the h-function satisfies h U i U j ; U D ( u i u j ; u D ) = u j C U i , U j ; U D ( u i , u j ) = C U i U j ; U D ( u i u j ) and is independent of u D .

### 2.1 Nonparametric estimators of the copula densities and h-functions

There are many methods to estimate a bivariate copula density c U i , U j nonparametrically. Examples are the transformation estimator [12], the transformation local likelihood estimator [22], the tapered transformation estimator [61], the beta kernel estimator [12], and the mirror-reflection estimator [24]. Among the above-mentioned kernel estimators, the transformation local likelihood estimator [22] was found by Nagler et al. [47] to have an overall best performance. The estimator is implemented in the R packages kdecopula [45] and rvinecopulib [49] using Gaussian kernels. We review its construction in Appendix A. To satisfy the copula definition, it is scaled to have uniform margins.

As mentioned above the simplifying assumption implies that h U i U j ; U D ( u i u j ; u D ) is independent of specific values of u D . Thus, it is sufficient to show how the h-function h U i U j = C U i U j ( u i u j ) can be estimated nonparametrically. For this, we use as estimator

C ˆ U i U j ( u i u j ) = 0 u i c ˆ U i , U j ( u ˜ i , u j ) d u ˜ i ,

where c ˆ U i , U j is one of the aforementioned nonparametric estimators of the bivariate copula density of ( U i , U j ) for which it holds that c ˆ U i , U j integrates to 1 and has uniform margins.

## 3 Vine-based quantile regression

In the general regression framework, the predictive ability of a set of variables X = ( X 1 , , X p ) T for the response Y R is studied. The main interest of vine-based quantile regression is to predict the α ( 0 , 1 ) quantile q α ( x 1 , , x p ) = F Y X 1 , , X p 1 ( α x 1 , , x p ) of the response variable Y given X by using a copula-based model of ( Y , X ) T . As shown in Kraus and Czado [41], this can be expressed as

(3) F Y X 1 , , X p 1 ( α x 1 , , x p ) = F Y 1 ( C V U 1 , , U p 1 ( α F X 1 ( x 1 ) , , F X p ( x p ) ) ) ,

where C V U 1 , , U p is the conditional distribution function of V = F Y ( Y ) given U j = F X j ( X j ) = u j for j = 1 , , p , with corresponding density c V U 1 , , U p , and C V , U 1 , , U p denotes the ( p + 1 ) -dimensional copula associated with the joint distribution of ( Y , X ) T . In view of Section 1, we have d = p + 1 . An estimate of q α ( x 1 , , x p ) can be obtained using estimated marginal quantile functions F ˆ Y 1 , F ˆ X j 1 , j = 1 , , p , and the estimated conditional distribution function C ˆ V U 1 , , U p 1 giving q ˆ α ( x 1 , , x p ) = F ˆ Y 1 ( C ˆ V U 1 , , U p 1 ( α F ˆ X 1 ( x 1 ) , F ˆ X p ( x p ) ) ) .

In general, C V , U 1 , , U p can be any ( p + 1 ) -dimensional multivariate copula, however, for certain vine structures the corresponding conditional distribution function C V U 1 , , U p can be obtained in closed form not requiring numerical integration. For D-vine structures this is possible and has been already used in the studies of Kraus and Czado [41]. Tepegjozova [59] showed that this is also the case for certain C-vine structures. More precisely, the copula C V , U 1 , , U p with D-vine structure allows us to express C V U 1 , , U p in a closed form if and only if the response V is a leaf node in the first tree of the tree sequence. For a C-vine structure we need, that the node containing the response variable V in the conditioned set is not a root node in any tree. Additional flexibility in using such D- and C-vine structures is achieved by allowing for nonparametric pair-copulas as building blocks.

The order of the predictors within the tree sequences itself is a free parameter with direct impact on the target function C V U 1 , , U p and thus, on the corresponding prediction performance of q α ( x 1 , , x p ) . For this, we recall the concept of a node order for C- and D-vine copulas introduced in the study by Tepegjozova [59]. A D-vine copula denoted by C D has order O D ( C D ) = ( V , U i 1 , , U i p ) , if the response V is the first node of the first tree T 1 and U i k is the ( k + 1 )th node of T 1 , for k = 1 , , p . A C-vine copula C C has order O C ( C C ) = ( V , U i 1 , , U i p ) , if U i 1 is the root node in the first tree T 1 , U i 2 U i 1 is the root node in the second tree T 2 , and U i k U i k 1 ; U i 1 , , U i k 2 is the root node in the kth tree T k for k = 3 , , p 1 .

Now our goal is to find an optimal order of D- or C-vine copula model with regard to a fit measure. This measure has to allow us to quantify the explanatory power of a model. One such measure is the estimated conditional copula log-likelihood function as a fit measure. For N i.i.d. observations v ( v ( 1 ) , , v ( N ) ) T and u j ( u j ( 1 ) , , u j ( N ) ) T , for j = 1 , , p of the random vector ( V , U 1 , , U p ) T we fit a C- or D-vine copula with order O ( C ˆ ) = ( V , U 1 , , U p ) using nonparametric pair copulas. We denote this copula by C ˆ , then the fitted conditional log-likelihood can be determined as

c l l ( C ˆ , v , ( u 1 , , u p ) ) = n = 1 N ln c ˆ V U 1 , , U p ( v ( n ) u 1 ( n ) , , u p ( n ) ) = n = 1 N ln c ˆ V , U 1 ( v ( n ) , u 1 ( n ) ) + j = 2 p ln c ˆ V , U j U 1 , , U j 1 ( C ˆ V U 1 , , U j 1 ( v ( n ) u 1 ( n ) , , u j 1 ( n ) ) , C ˆ U j U 1 , , U j 1 ( u j ( n ) u 1 ( n ) , , u j 1 ( n ) ) ) .

Penalizations for model complexity when parametric pair copulas are used can be added as shown in the study by Tepegjozova [59]. To define an appropriate penalty in the case of using nonparametric pair copulas is an open research question (see also Section 7).

## 4 Forward selection algorithms

Having a set of p predictors, there are p ! different orders that uniquely determine p ! C-vines and p ! D-vines. Fitting and comparing all of them are computationally inefficient. Thus, the idea is to have an algorithm that will sequentially choose the elements of the order, so that at every step the resulting model for the prediction of the conditional quantiles has the highest conditional log-likelihood. Building upon the idea of Kraus and Czado [41] for the one-step ahead D-vine regression, we propose an algorithm which allows for more flexibility and which is less greedy, given the intention to obtain a globally optimal C- or D-vine fit. The algorithm builds the C- or D-vine step by step, starting with an order consisting of only the response variable V . Each step adds one of the predictors to the order based on the improvement of the conditional log-likelihood, while taking into account the possibility of future improvement, i.e. extending our view two steps ahead in the order. As discussed in Section 2.1, the pair copulas at each step are estimated nonparametrically in contrast to the parametric approach of Kraus and Czado [41]. We present the implementation for both C-vine and D-vine-based quantile regression in a single algorithm, in which the user decides whether to fit a C-vine or D-vine model based on the background knowledge of dependency structures in the data. Implementation for a large data set is computationally challenging; therefore, randomization is introduced to guarantee computational efficiency in high dimensions.

### 4.1 Two-step ahead forward selection algorithm for C- and D-vine-based quantile regression

Input and data preprocessing: Consider N i.i.d observations y ( y ( 1 ) , , y ( N ) ) and x j ( x j ( 1 ) , , x j ( N ) ) for j = 1 , , p , from the random vector ( Y , X 1 , , X p ) T . The input data are on the x-scale, but in order to fit bivariate copulas we need to transform it to the u-scale using the probability integral transform. Since the marginal distributions are unknown we estimate them, i.e. F Y and F X j , for j = 1 , , p , are estimated using a univariate nonparametric kernel density estimator with the R package kde1d [48]. This results in the pseudo copula data v ˆ ( n ) F ˆ Y ( y ( n ) ) and u ˆ j ( n ) F ˆ X j ( x j ( n ) ) , for n = 1 , , N , j = 1 , , p . The normalized marginals ( z -scale) are defined as Z j Φ 1 ( U j ) for j = 1 , , p , and Z V Φ 1 ( V ) , where Φ denotes the standard normal distribution function.

Step 1: To reduce computational complexity, we perform a pre-selection of the predictors based on Kendall’s τ . This is motivated by the fact that Kendall’s τ is rank-based, therefore, invariant with respect to monotone transformations of the marginals and can be expressed in terms of pair copulas. Using the pseudo copula data ( v ˆ , u ˆ j ) = { v ˆ ( n ) , u ˆ j ( n ) n = 1 , , N } , estimates τ ˆ V U j of the Kendall’s τ values between the response V , and all possible predictors U j for j = 1 , , p , are obtained. For a given k p , the k largest estimates of τ ˆ V U j are selected and the corresponding indices q 1 , , q k are identified such that τ ˆ V U q 1 τ ˆ V U q 2 τ ˆ V U q k τ ˆ V U q k + 1 τ ˆ V U q p . The parameter k is a hyper-parameter and therefore, subject to tuning. To obtain a parsimonious model, we suggest a k corresponding to 5–20% of the total number of predictors. The k candidate predictors and the corresponding candidate index set of step 1 are defined as U q 1 , , U q k and K 1 = { q 1 , , q k } , respectively. For all c K 1 and j { 1 , , p } { c } the candidate two-step ahead C- or D-vine copulas are defined as the three-dimensional copulas C c , j 1 with order O ( C c , j 1 ) = ( V , U c , U j ) . The first predictor is added to the order based on the conditional log-likelihood of the candidate two-step ahead C- or D-vine copulas, C c , j 1 , given as

c l l ( C c , j 1 , v ˆ , ( u ˆ c , u ˆ j ) ) = n = 1 N [ log c ˆ V , U c ( v ˆ ( n ) , u ˆ c ( n ) ) + log c ˆ V , U j U c ( h ˆ V U c ( v ˆ ( n ) u ˆ c ( n ) ) , h ˆ U j U c ( u ˆ j ( n ) u ˆ c ( n ) ) ) ] .

For each candidate predictor U c , the maximal two-step ahead conditional log-likelihood at step 1, c l l c 1 , is defined as c l l c 1 max j { 1 , , p } { c } c l l ( C c , j 1 , v ˆ , ( u ˆ c , u ˆ j ) ) , c K 1 . Finally, based on the maximal two-step ahead conditional log-likelihood at step 1, c l l c 1 , the index t 1 is chosen as t 1 arg max c K 1 c l l c 1 , and the corresponding candidate predictor U t 1 is selected as the first predictor added to the order. An illustration of the vine tree structure of the candidate two-step ahead copulas C c , j 1 , in the case of fitting a D-vine model, with order O D ( C c , j 1 ) = ( V , U c , U j ) is given in Figure 2. Finally, the current optimal fit after the first step is the C-vine or D-vine copula, C 1 with order O ( C 1 ) = ( V , U t 1 ) .

### Figure 2

V is fixed as the first node of T 1 and the first candidate predictor to be included in the model, U c (grey), is chosen based on the conditional log-likelihood of the two-step ahead copula including the predictor U j (grey filled).

Step r : After r 1 steps, the current optimal fit is the C- or D-vine copula C r 1 with order O ( C r 1 ) = ( V , U t 1 , , U t r 1 ) . At each previous step i , the order of the current optimal fit is sequentially updated with the predictor U t i for i = 1 , , r 1 . At the rth step, the next predictor candidate is to be included. To do so, the set of potential candidates is narrowed based on a partial correlation measure. Defining a partial Kendall’s τ is not straightforward and requires the notion of a partial copula, which is the average over the conditional copula given the values of the conditioning values (e.g. see Gijbels and Matterne [23] and references therein). In addition, the computation in the case of multivariate conditioning is very demanding and still an open research problem. Therefore, we took a pragmatic view and base our candidate selection on partial correlation. Due to the assumption of Gaussian margins inherited to the Pearson’s partial correlation, the estimates are computed on the z -scale. Estimates of the empirical Pearson’s partial correlation, ρ ˆ Z V , Z j ; Z t 1 , , Z t r 1 , between the normalized response variable V and available predictors U j for j { 1 , 2 , , p } { t 1 , , t r 1 } are obtained. Similar to the first step, a set of candidate predictors of size k is selected based on the largest values of ρ ˆ Z V , Z j ; Z t 1 , , Z t r 1 and the corresponding indices q 1 , , q k . The k candidate predictors and the corresponding candidate index set of step r are defined as U q 1 , , U q k and the set K r = { q 1 , , q k } , respectively. For all c K r and j { 1 , 2 , , p } { t 1 , , t r 1 , c } , the candidate two-step ahead C- or D-vine copulas are defined as the copulas C c , j r with order O ( C c , j r ) = ( V , U t 1 , , U t r 1 , U c , U j ) . There are k ( p r ) different candidate two-step ahead C- or D-vine copulas C c , j r (since we have k candidates for the one-step ahead extension U c , and for each, p ( r 1 ) 1 two step ahead extensions U j ). Their corresponding conditional log-likelihood functions are given as

c l l ( C c , j r , v ˆ , ( u ˆ t 1 u ˆ t r 1 , u ˆ c , u ˆ j ) ) = c l l ( C r 1 , v ˆ , ( u ˆ t 1 u ˆ t r 1 ) ) + n = 1 N log c ˆ V U c ; U t 1 , , U t r 1 ( C ˆ V U t 1 , , U t r 1 ( v ˆ ( n ) u ˆ t 1 ( n ) , , u ˆ t r 1 ( n ) ) , C ˆ U c U t 1 , , U t r 1 ( u ˆ c ( n ) u ˆ t 1 ( n ) , , u ˆ t r 1 ( n ) ) ) + n = 1 N log c ˆ V U j ; U t 1 , , U t r 1 , U c ( C ˆ V U t 1 , , U t r 1 , U c ( v ˆ ( n ) u ˆ t 1 ( n ) , , u ˆ t r 1 ( n ) , u ˆ c ( n ) ) , C ˆ U j U t 1 , , U t r 1 , U c ( u ˆ j ( n ) u ˆ t 1 ( n ) , , u ˆ t r 1 ( n ) , u ˆ c ( n ) ) ) .

The rth predictor is then added to the order based on the maximal two-step ahead conditional log-likelihood at Step r , c l l c r , defined as

(4) c l l c r max j { 1 , 2 , , p } { t 1 , , t r 1 , c } c l l ( C c , j r , v ˆ , ( u ˆ t 1 u ˆ t r 1 , u ˆ c , u ˆ j ) ) , c K r .

The index t r is chosen as t r arg max c K r c l l c r , and the predictor U t r is selected as the rth predictor of the order. An illustration of the vine tree structure of the candidate two-step ahead copulas C c , j r for a D-vine model with order O D ( C c , j r ) = ( V , U t 1 , , U t r 1 , U c , U j ) is given in Figure 3. At this step, the current optimal fit is the C-vine or D-vine copula C r , with order O ( C r ) = ( V , U t 1 , U t r ) . The iterative procedure is repeated until all predictors are included in the order of the C- or D-vine copula model.

### Figure 3

In step r , the current optimal fit, C r 1 (black), is extended by one more predictor, U c (grey), to obtain the new current optimal fit C r (black and grey), based on the conditional log-likelihood of the two-step ahead copula C c , j r which also includes the predictor U j (grey filled) (In the figure, we use the shortened notation U t 1 : r 1 instead of writing U t 1 , , U t r 1 and we use U t 1 : r 1 , c instead of U t 1 , , U t r 1 , U c .).

#### 4.1.1 Additional variable reduction in higher dimensions

The above search procedure requires calculating p r conditional log-likelihoods for each candidate predictor at a given step r . This leads to calculating a total of ( p r ) k conditional log-likelihoods, where k is the number of candidates. For p large, this procedure would cause a heavy computational burden. Hence, the idea is to reduce the number of conditional log-likelihoods calculated for each candidate predictor. This is achieved by reducing the size of the set, over which the maximal two-step ahead conditional log-likelihood c l l c r in Eq. (4), is computed. Instead of over the set { 1 , 2 , , p } { t 1 , , t r 1 , c } , the maximum can be taken over an appropriate subset. This subset can then be chosen either based on the largest Pearson’s partial correlations in absolute value denoted as ρ ˆ Z V , Z j ; Z t 1 , , Z t r 1 , Z c , by random selection, or a combination of the two. The selection method and the size of reduction are user-decided.

### 4.2 Consistency of the conditional quantile estimator

The conditional quantile function on the original scale in Eq. (3) requires the inverse of the marginal distribution function of Y . Following Kraus and Czado [41] and Noh et al. [50], the marginal cumulative distribution functions F Y and F X j , j = 1 , p , are estimated nonparametrically to reduce the bias caused by model misspecification. Examples of nonparametric estimators for the marginal distributions F Y and F X j , are the continuous kernel smoothing estimator [52] and the transformed local likelihood estimator in the univariate case [21]. Using a Gaussian kernel, the above two estimators of the marginal distribution are uniformly strong consistent. When all inverses of the h-functions are also estimated nonparametrically, we establish the consistency of the conditional quantile estimator F ˆ Y X 1 , , X p 1 in Proposition 4.1 for fixed variable orders. By showing the uniform consistency, Proposition 4.1 gives an indication of the performance of the conditional quantile estimator F ˆ Y X 1 , , X p 1 for fixed variable orders, while combining the consistent estimators of F Y , F X j , and bivariate copula densities. Under the consistency guarantee, the numerical performance of F ˆ Y X 1 , , X p 1 investigated by extensive simulation studies is presented in Section 5.

### Proposition 4.1

Let the inverse of the marginal distribution functions F Y and F X j j = 1 , , p be uniformly continuous and estimated nonparametrically, and let the inverse of the h-functions expressing the conditional quantile estimator C V U 1 , , U p 1 be uniformly continuous and estimated nonparametrically in the interior of the support of bivariate copulas, i.e. [ δ , 1 δ ] 2 , δ 0 + .

1. If estimators of the inverse of marginal functions F ˆ Y 1 , F ˆ X j 1 , j = 1 , , p , are uniformly strong consistent on the support [ δ , 1 δ ] , δ 0 + , and the estimators of the inverse of h-functions composing the conditional quantile estimator C V U 1 , , U p 1 are uniformly strong consistent, then the estimator F ˆ Y X 1 , , X p 1 ( α x 1 , , x p ) is also uniformly strong consistent.

2. If estimators of the inverse of marginal functions F ˆ Y 1 , F ˆ X j 1 , j = 1 , , p , are at least weak consistent, and the estimators of the inverse of h-functions are also at least weak consistent, then the estimator F ˆ Y X 1 , , X p 1 ( α x 1 , , x p ) is weak consistent.

For more details about uniform continuous functions see the studies by Bartle and Sherbert ([4] Section 5.4), Kolmogorov and Fomin ([39], p. 109, Def. 1). For a definition of strong uniform consistency or convergence with probability one, see the studies by Ryzin [54], Silverman [56], and Durrett ([19], p. 16), while for a definition for weak consistency or convergence in probability, see the studies by Durrett ([19], p. 53). The strong uniform consistency result in Proposition 1 requires additionally that all estimators of F ˆ Y 1 , F ˆ X j 1 , for j = 1 , p , are strong uniformly consistent on a truncated compact interval [ δ , 1 δ ] , δ 0 + . Although not directly used in the proof of Proposition 4.1 in Appendix B, the truncation is an essential condition for guaranteeing the strong uniform consistency of all estimators of the inverse of the marginal distributions (i.e. estimators of quantile functions), see the studies by Cheng [15,14], Van Keilegom and Veraverbeke [60].

## 5 Simulation study

The proposed two-step ahead forward selection algorithms for C- and D-vine-based quantile regression, from Section 4.1, are implemented in the statistical language R [53]. The D-vine one-step ahead algorithm is implemented in the R package vinereg [46]. In the simulation study from R [41], it is shown that the D-vine one-step ahead forward selection algorithm performs better or similar, compared to other state-of-the-art quantile methods, boosting additive quantile regression [20,36], nonparametric quantile regression [43], semi-parametric quantile regression [51], and linear quantile regression [38]. Thus, we use the one-step ahead algorithm as the benchmark competitive method in the simulation study. We set up the following simulation settings given below. Each setting is replicated for R = 100 times. In each simulation replication, we randomly generate N train samples used for fitting the appropriate nonparametric vine-based quantile regression models. Additionally, another N eval = 1 2 N train sample for settings (a)–(f) and N eval = N train for settings (g) and (h) are generated for predicting conditional quantiles from the models. settings (a)–(f) are designed to test quantile prediction accuracy of nonparametric C- or D-vine quantile regression in cases where p N ; hence, we set N train = 1,000 or 300. settings (g) and (h) test quantile prediction accuracy in cases where p > N ; hence, we set N train = 100 .

1. Simulation setting M5 from Kraus and Czado [41]:

Y = 2 X 1 X 2 + 0.5 + ( 0.5 X 3 + 1 ) ( 0.1 X 4 3 ) + σ ε ,

with ε N ( 0 , 1 ) , σ { 0.1 , 1 } , ( X 1 , X 2 , X 3 , X 4 ) T N 4 ( 0 , Σ ) , and the ( i , j ) th component of the covariance matrix given as ( Σ ) i , j = 0 . 5 i j .

2. ( Y , X 1 , , X 5 ) T follows a mixture of two six-dimensional T copulas with degrees of freedom equal to 3 and mixture probabilities 0.3 and 0.7. Association matrices R 1 , R 2 and marginal distributions are recorded in Table 1.

### Table 1

Association matrices of the multivariate t-copula and marginal distributions for setting (b)

 R 1 = 1 0.6 0.5 0.6 0.7 0.1 0.6 1 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.6 0.5 0.5 1 0.5 0.5 0.7 0.5 0.5 0.5 1 0.5 0.1 0.5 0.5 0.5 0.5 1 R 2 = 1 - 0 . 3 - 0 . 5 - 0 . 4 - 0 . 5 - 0 . 1 - 0 . 3 1 0.5 0.5 0.5 0.5 - 0 . 5 0.5 1 0.5 0.5 0.5 - 0 . 4 0.5 0.5 1 0.5 0.5 - 0 . 5 0.5 0.5 0.5 1 0.5 - 0 . 1 0.5 0.5 0.5 0.5 1
Y X 1 X 2 X 3 X 4 X 5
N ( 0 , 1 ) t 4 N ( 1 , 4 ) t 4 N ( 1 , 4 ) t 4
1. Linear and heteroscedastic [11]: Y = 5 ( X 1 + X 2 + X 3 + X 4 ) + 10 ( U 1 + U 2 + U 3 + U 4 ) ε , where ( X 1 , X 2 , X 3 , X 4 ) T N 4 ( 0 , Σ ) , Σ i , j = 0 . 5 I { i j } , ε N 4 ( 0 , 0.5 ) , and U j , j = 1 , , 4 are obtained from the X j ’s by the probability integral transform.

2. Nonlinear and heteroscedastic [11]: Y = U 1 U 2 e 1.8 U 3 U 4 + 0.5 ( U 1 + U 2 + U 3 + U 4 ) ε , where U j , j = 1 , , 4 are probability integral transformed from N 4 ( 0 , Σ ) , Σ i , j = 0 . 5 I { i j } , and ε N ( 0 , 0.5 ) .

3. R-vine copula [17]: ( V , U 1 , , U 4 ) T follows an R-vine distribution with pair copulas given in Table 2.

### Table 2

Pair copulas of the R-vine C V , U 1 , U 2 , U 3 , U 4 , with their family parameter and Kendall’s τ for setting (e)

Tree Edge Conditioned; Conditioning Family Parameter Kendall’s τ
1 1 U 1 , U 3 ; Gumbel 3.9 0.74
1 2 U 2 , U 3 ; Gauss 0.9 0.71
1 3 V , U 3 ; Gauss 0.5 0.33
1 4 V , U 4 ; Clayton 4.8 0.71
2 1 V , U 1 ; U 3 Gumbel(90) 6.5 0.85
2 2 V , U 2 ; U 3 Gumbel(90) 2.6 0.62
2 3 U 3 , U 4 ; V Gumbel 1.9 0.48
3 1 U 1 , U 2 ; V , U 3 Clayton 0.9 0.31
3 2 U 2 , U 4 ; V , U 3 Clayton(90) 5.1 0.72
4 1 U 1 , U 4 ; V , U 2 , U 3 Gauss 0.2 0.13
1. D-vine copula [59]: ( V , U 1 , , U 5 ) T follows a D-vine distribution with pair copulas given in Table 3.

### Table 3

Pair copulas of the D-vine C V , U 1 , U 2 , U 3 , U 4 , U 5 , with their family parameter and Kendall’s τ for setting (f)

Tree Edge Conditioned; Conditioning Family Parameter Kendall’s τ
1 1 V , U 1 ; Clayton 3.00 0.60
1 2 U 1 , U 2 ; Joe 8.77 0.80
1 3 U 2 , U 3 ; Gumbel 2.00 0.50
1 4 U 3 , U 4 ; Gauss 0.20 0.13
1 5 U 4 , U 5 ; Indep. 0.00 0.00
2 1 V , U 2 ; U 1 Gumbel 5.00 0.80
2 2 U 1 , U 3 ; U 2 Frank 9.44 0.65
2 3 U 2 , U 4 ; U 3 Joe 2.78 0.49
2 4 U 3 , U 5 ; U 4 Gauss 0.20 0.13
3 1 V , U 3 ; U 1 , U 2 Joe 3.83 0.60
3 2 U 1 , U 4 ; U 2 , U 3 Frank 6.73 0.55
3 3 U 2 , U 5 ; U 3 , U 4 Gauss 0.29 0.19
4 1 V , U 4 ; U 1 , U 2 , U 3 Clayton 2.00 0.50
4 2 U 1 , U 5 ; U 2 , U 3 , U 4 Gauss 0.09 0.06
5 1 V , U 5 ; U 1 , U 2 , U 3 , U 4 Indep. 0.00 0.00
1. Similar to setting (a),

Y = 2 X 1 X 2 + 0.5 + ( 0.5 X 3 + 1 ) ( 0.1 X 4 3 ) + ( X 5 , , X 110 ) ( 0 , , 0 ) T + σ ε ,

where ( X 1 , , X 110 ) T N 110 ( 0 , Σ ) with the ( i , j ) th component of the covariance matrix ( Σ ) i , j = 0 . 5 i j , ε N ( 0 , 1 ) , and σ { 0.1 , 1 } .

2. Similar to (g), Y = ( X 1 3 , , X 110 3 ) β + ε , where ( X 1 , , X 10 ) T N 10 ( 0 , Σ A ) with the ( i , j ) th component of the covariance matrix ( Σ A ) i , j = 0 . 8 i j , ( X 11 , , X 110 ) T N 100 ( 0 , Σ B ) with ( Σ B ) i , j = 0 . 4 i j . The first ten entries of β are a descending sequence between ( 2 , 1.1 ) with increment of 0.1, respectively, and the rest are equal to 0. We assume ε N ( 0 , σ ) and σ { 0.1 , 1 } .

Since the true regression quantiles are difficult to obtain in most settings, we consider the averaged check loss [41,40] and the interval score [11,25], instead of the out-of-sample mean averaged square error in Kraus and Czado [41], to evaluate the performance of the estimation methods. For a chosen α ( 0 , 1 ) , the averaged check loss is defined as

(5) CL ^ α = 1 R r = 1 R 1 N eval n = 1 N eval { γ α ( Y r , n eval q ˆ α ( X r , n eval ) ) } ,

where γ α is the check loss function.

The interval score, for the ( 1 α ) × 100 % prediction interval, is defined as

(6) IS ^ α = 1 R r = 1 R 1 N eval n = 1 N eval { ( q ˆ α 2 ( X r , n eval ) q ˆ 1 α 2 ( X r , n eval ) ) + 2 α ( q ˆ 1 α 2 ( X r , n eval ) Y r , n eval ) I { Y r , n eval q ˆ 1 α 2 ( X r , n eval ) } + 2 α ( Y r , n eval q ˆ α 2 ( X r , n eval ) ) I { Y r , n eval > q ˆ α 2 ( X r , n eval ) } ,

and smaller interval scores are better.

### Table 4

Out-of-sample predictions IS ^ 0.5 , CL ^ 0.05 , CL ^ 0.5 , CL ^ 0.95 for settings (a)–(f) with N train = 300 and N train = 1,000

Setting Model IS ^ 0.05 CL ^ 0.05 CL ^ 0.5 CL ^ 0.95 IS ^ 0.05 CL ^ 0.05 CL ^ 0.5 CL ^ 0.95
N train = 300 N train = 1,000
(a) D-vine One-step 55.54 0.66 0.16 0.51 55.89 0.67 0.15 0.50
σ = 0.1 D-vine Two-step 43.33 0.47 0.10 0.41 40.74 0.45 0.09 0.37
C-vine One-step 53.51 0.64 0.16 0.49 54.52 0.66 0.15 0.49
C-vine Two-step 42.01 0.45 0.10 0.40 40.04 0.44 0.09 0.37
(a) D-vine One-step 154.35 1.63 0.45 1.62 162.12 1.70 0.43 1.66
σ = 1 D-vine Two-step 148.53 1.57 0.45 1.56 156.77 1.63 0.42 1.62
C-vine One-step 151.60 1.61 0.45 1.60 160.78 1.68 0.43 1.65
C-vine Two-step 148.41 1.56 0.45 1.56 156.79 1.63 0.42 1.62
(b) D-vine One-step 118.75 1.29 0.42 1.30 125.33 1.37 0.40 1.36
D-vine Two-step 119.10 1.30 0.42 1.30 125.24 1.36 0.40 1.36
C-vine One-step 119.08 1.30 0.41 1.30 125.12 1.36 0.40 1.36
C-vine Two-step 118.90 1.30 0.42 1.30 125.30 1.36 0.40 1.36
(c) D-vine One-step 2908.90 30.54 8.55 30.42 3064.78 31.69 8.15 31.47
D-vine Two-step 2853.52 30.21 8.70 29.95 3041.95 31.61 8.20 31.26
C-vine One-step 2859.23 30.24 8.59 29.95 3046.52 31.64 8.18 31.25
C-vine Two-step 2850.10 30.19 8.64 29.84 3042.46 31.62 8.20 31.23
(d) D-vine One-step 86.40 0.92 0.24 0.91 91.11 0.96 0.22 0.95
D-vine Two-step 83.54 0.90 0.24 0.88 89.56 0.96 0.22 0.92
C-vine One-step 84.99 0.91 0.24 0.90 90.40 0.96 0.22 0.94
C-vine Two-step 83.33 0.90 0.24 0.87 89.47 0.96 0.22 0.92
(e) D-vine One-step 10.59 0.11 0.03 0.11 10.49 0.11 0.03 0.11
D-vine Two-step 10.32 0.10 0.03 0.11 10.26 0.09 0.02 0.11
C-vine One-step 10.23 0.11 0.03 0.10 10.02 0.10 0.02 0.10
C-vine Two-step 10.35 0.10 0.03 0.11 10.33 0.10 0.02 0.11
(f) D-vine One-step 13.79 0.16 0.04 0.14 13.70 0.16 0.04 0.14
D-vine Two-step 8.44 0.09 0.02 0.08 8.28 0.09 0.02 0.08
C-vine One-step 12.62 0.14 0.04 0.13 12.23 0.13 0.04 0.13
C-vine Two-step 9.09 0.10 0.02 0.09 8.93 0.09 0.02 0.08

Lower values, indicating better performance, are highlighted in bold. With we denote the scenarios in which there is an improvement through the second step and with we denote scenarios in which the models perform similar.

### Table 5

Out-of-sample predictions IS ^ 0.5 , CL ^ 0.05 , CL ^ 0.5 , CL ^ 0.95 for settings (g)–(h) with N train = 100

Model IS ^ 0.05 CL ^ 0.05 CL ^ 0.5 CL ^ 0.95 IS ^ 0.05 CL ^ 0.05 CL ^ 0.5 CL ^ 0.95
(g), σ = 0 . 1 (g), σ = 1
D-vine One-step 19.63 0.26 0.25 0.23 53.38 0.69 0.67 0.65
D-vine Two-step 20.48 0.26 0.26 0.25 52.17 0.68 0.65 0.63
C-vine One-step 19.73 0.25 0.25 0.24 53.62 0.69 0.67 0.65
C-vine Two-step 19.79 0.25 0.25 0.25 52.35 0.67 0.65 0.64
(h), σ = 0 . 1 (h), σ = 1
D-vine One-step 558.36 6.92 6.98 7.04 554.18 6.87 6.93 6.99
D-vine Two-step 529.51 6.46 6.62 6.78 531.30 6.64 6.64 6.64
C-vine One-step 514.08 6.05 6.43 6.81 512.96 6.39 6.41 6.44
C-vine Two-step 479.66 5.87 6.00 6.12 483.92 6.05 6.05 6.05

Lower values, indicating better performance, are highlighted in bold. With we denote the scenarios in which there is an improvement through the second step and with we denote scenarios in which the models perform similar.

The proposed two-step algorithms, as compared to the one-step algorithms, are computationally more intensive. We present the averaged computation time over R = 100 replications on 100 paralleled cores (Xeon Gold 6140 CPUSs@2.6 GHz) in settings (g) and (h) where p > N train , for the one step ahead and the two-step ahead approach. The high-dimensional settings have similar computational times since the computational intensity depends on the number of pair copula estimations and the number of candidates, which are the same for settings (g) and (h). Hence, we only report the averaged computational times for settings (g) and (h). The average computation time in minutes for the one-step ahead (C- and D-vine) approach is 83.01, in contrast to 200.28 by the two-step ahead (C- and D-vine) approach. With the variable reduction from Section 4.1.1, the two-step algorithms double the time consumption of the one-step algorithms in exchange for prediction accuracy.

## 6 Real data examples

We test the proposed methods on two real data sets, i.e. the Concrete data set from Yeh [64] corresponding to p N , and the riboflavin data set from Bühlmann and van de Geer [10] corresponding to p > N . For both, performance of the four competitive algorithms is evaluated by the averaged check loss defined in Eq. (5) at 5, 50, and 95% quantile levels, and the 95% prediction interval score defined in Eq. (6), by randomly splitting the data set into training and evaluation sets 100 times.

### 6.1 Concrete data set

The concrete data set was initially used in Yeh [64], and is available at the UCI Machine Learning Repository [18]. The data set has in total 1,030 samples. Our objective is quantile predictions of the concrete compressive strength, which is a highly nonlinear function of age and ingredients. The predictors are age (AgeDay, counted in days) and seven physical measurements of the concrete ingredients (given in kg in a m 3 mixture): cement (CementComp), blast furnace slag (BlastFur), fly ash (FlyAsh), water (WaterComp), superplasticizer, coarse aggregate (CoarseAggre), and fine aggregate (FineAggre). We randomly split the data set into a training set with 830 samples and an evaluation set with 200 samples; the random splitting is repeated 100 times. Performance of the proposed C- and D-vine two-step ahead quantile regression, compared to the C- and D-vine one-step ahead quantile regression, is evaluated by several measurements reported in Table 6 after 100 repetitions of fitting the models. It is not unexpected that the results of the four algorithms are more distinct than most simulation settings, given the small number of predictors. However, there is an improvement in the performance of the two-step ahead approach compared to the one-step ahead approach for both C- and D-vine-based models. Also, the C-vine model seems more appropriate for modeling the dependency structure in the data set. Finally, out of all models, the C-vine two-step ahead algorithm is the best performing algorithm in terms of out-of-sample predictions IS ^ 0.5 , CL ^ 0.05 , CL ^ 0.5 , CL ^ 0.95 on the Concrete data set, as seen in Table 6.

In Figure 4, the marginal effect plots based on the fitted quantiles, from the C-vine two-step model, for the three most influential predictors are given. The marginal effect of a predictor is its expected impact on the quantile estimator, where the expectation is taken over all other predictors. This is estimated using all fitted conditional quantiles and smoothed over the predictors considered.

### Table 6

Concrete data set: out-of-sample predictions IS ^ 0.5 , CL ^ 0.05 , CL ^ 0.5 , CL ^ 0.95

Model IS ^ 0.05 CL ^ 0.05 CL ^ 0.5 CL ^ 0.95
D-vine One-step 1032.32 10.75 2.76 10.52
D-vine Two-step 987.10 10.54 2.78 9.82
C-vine One-step 976.75 10.65 2.70 9.45
C-vine Two-step 967.00 10.52 2.64 9.45

The best performing model is highlighted in bold.

### Figure 4

Marginal effect plots for the three most influential predictors on the concrete compressive strength for α values of 0.05 (red colour), 0.5 (green colour), and 0.95 (blue colour).

### Table 7

Out-of-sample predictions IS ^ 0.5 , CL ^ 0.05 , CL ^ 0.5 , CL ^ 0.95

Model IS ^ 0.05 CL ^ 0.05 CL ^ 0.5 CL ^ 0.95
D-vine One-step 33.83 0.44 0.42 0.41
D-vine Two-step 30.57 0.44 0.38 0.33
C-vine One-step 34.52 0.49 0.43 0.38
C-vine Two-step 28.59 0.41 0.36 0.30

The best performing model is highlighted in bold.

### Table 8

Ten most influential gene expressions on the conditional quantile function, ranked based on their position in the order

Model/Position 1 2 3 4 5 6 7 8 9 10
D-vine One-step GGT YCIC MTA RPSE YVAK THIK ANSB SPOVB YVZB YQJB
D-vine Two-step MTA RPSE THIK YMFE YCIC sigM PGM YACC YVQF YKPB
C-vine One-step GGT YCIC MTA RPSE HIT BFMBAB PHRC YBAE PGM YHEF
C-vine Two-step MTA RPSE THIK YCIC YURU PGM sigM YACC YKRM ASNB

### Figure 5

Marginal effect plots for the ten most influential predictors on the log-transformed Bacillus subtilis production rate for α = 0.5 .

## 7 Summary and discussion

In this article, we introduce a two-step ahead forward selection algorithm for nonparametric C- and D-vine copula-based quantile regression. Inclusion of future information, obtained through considering the next tree in the two-step ahead algorithm, yields a significantly less greedy sequential selection procedure in comparison to the already existing one-step ahead algorithm for D-vine-based quantile regression in Kraus and Czado [41]. We extend the vine-based quantile regression framework to include C-vine copulas, providing an additional choice for the dependence structure. Furthermore, for the first time, nonparametric bivariate copulas are used to construct vine copula-based quantile regression models. The nonparametric estimation overcomes the problem of possible family misspecification in the parametric estimation of bivariate copulas and allows for even more flexibility in dependence estimation. Additionally, under mild regularity conditions, the nonparametric conditional quantile estimator is shown to be consistent.

The extensive simulation study, including several different settings and data sets with different dimensions, strengths of dependence, and tail dependencies, shows that the two-step ahead algorithm outperforms the one-step ahead algorithm in most scenarios. The results for the Concrete and Riboflavin data sets are especially interesting, as the C-vine two-step ahead algorithm has a significant improvement compared to the other algorithms. These findings provide strong evidence for the need of modeling the dependence structure following a C-vine copula. In addition, the two-step ahead algorithm allows controlling the computational intensity independently of the data dimensions through the number of candidate predictors and the additional variable selection discussed in Section 5. Thus, fitting vine-based quantile regression models in high dimensions becomes feasible. As seen in several simulation settings, there is a significant gain by introducing additional dependence structures other than the D-vine-based quantile regression. A further research area is developing similar forward selection algorithms for R-vine tree structures while optimizing the conditional log-likelihood.

At each step of the vine building stage, we compare equal-sized models with the same number of variables. The conditional log-likelihood is suited for such a comparison. Other questions might come in handy, such as choosing between C-vine, D-vine, or R-vine information criteria. When maximum likelihood estimation is employed at all stages, the selection criteria of Akaike (AIC) [1], the Bayesian information criterion (BIC) [55], and the focused information criterion (FIC) [16] might be used immediately. Ko et al. [33] studied FIC and AIC specifically for the selection of parametric copulas. The copula information criterion in the spirit of the Akaike information criterion by Gronneberg and Hjort [26] can be used for selection among copula models with empirically estimated margins, while Ko and Hjort [32] studied such a criterion for parametric copula models. We plan a deeper investigation of the use of information criteria for nonparametrically estimated copulas and for vines in particular. Such a study is beyond the scope of this article but could be interesting to study stopping criteria too for building vines.

Nonparametrically estimated vines are offering considerable flexibility. Their parametric counterparts, on the other hand, are enjoying simplicity. An interesting route for further research is to combine parametric and nonparametric components in the construction of the vines in an efficient way to bring the most benefit, which should be made tangible through some criteria such that guidance can be provided about which components should be modelled nonparametrically and which others are best modelled parametrically. For some types of models, such choice between a parametric and a nonparametric model has been investigated by Jullum and Hjort [31] via the focused information criterion. This and alternative methods taking the effective degrees of freedom into account are worth further investigating for vine copula models.

# Acknowledgements

The authors would like to thank the editor and the two referees for their comments, which helped to improve the manuscript. This work was supported by the Deutsche Forschungs gemeinschaft [DFG CZ 86/6-1], the Research Foundation Flanders and KU Leuven internal fund C16/20/002. The resources and services used in this work were provided by the VSC (Flemish Supercomputer centre), funded by the Research Foundation-Flanders (FWO) and the Flemish Government.

1. Conflict of interest: Prof. Claudia Czado is an Honorary Board Member of “Dependence Modeling” although had no involvement in the final decision.

## Appendix A Construction of the transformation local likelihood estimator of the copula density

Let the N × 2 transformed sample matrix be

(A.1) D = ( S , T ) ,

where the transformed samples D n = ( S n = Φ 1 ( U i ( n ) ) , T n = Φ 1 ( U j ( n ) ) ) , n = 1 , , N , and Φ denotes the cumulative distribution function of a standard Gaussian distribution. The logarithm of the density f S , T of the transformed samples ( S n , T