On kernel-based estimation of conditional Kendall's tau: finite-distance bounds and asymptotic behavior

We study nonparametric estimators of conditional Kendall's tau, a measure of concordance between two random variables given some covariates. We prove non-asymptotic bounds with explicit constants, that hold with high probabilities. We provide"direct proofs"of the consistency and the asymptotic law of conditional Kendall's tau. A simulation study evaluates the numerical performance of such nonparametric estimators.


Introduction
In the field of dependence modeling, it is common to work with dependence measures. Contrary to usual linear correlations, most of them have the advantage of being defined without any condition on moments, and of being invariant to changes in the underlying marginal distributions. Such summaries of information are very popular and can be explicitly written as functionals of the underlying copulas: Kendall's tau, Spearman's rho, Blomqvist's coefficient... See Nelsen [9] for an introduction. In particular, Kendall's tau is a well-known dependence measure in [−1, 1] which quantifies the positive or negative dependence between two random variables X 1 and X 2 . Denoting by C 1,2 the (assumed unique) underlying copula of (X 1 , X 2 ), their Kendall's tau can be directly defined as = IP (X 1,1 − X 2,1 )(X 1,2 − X 2,2 ) > 0 − IP (X 1,1 − X 2,1 )(X 1,2 − X 2,2 ) < 0 , where (X i,1 , X i,2 ) i=1,2 are two independent versions of (X 1 , X 2 ). This measure is then interpreted as the probability of observing a concordant pair minus the probability of observing a discordant pair.
Similar dependence measure can be introduced in a conditional setup, when a p-dimensional covariate Z is available. The goal is now to model the dependence between the two components X 1 and X 2 , given the vector of covariates Z. Logically, we can invoke the conditional copula C 1,2|Z=z of (X 1 , X 2 ) given Z = z for any point z ∈ R p (see Patton [10,11]), and the corresponding conditional Kendall's tau would be simply defined as where (X i,1 , X i,2 , Z i ) i=1,2 are two independent versions of (X 1 , X 2 , Z). As above, this is the probability of observing a concordant pair minus the probability of observing a discordant pair, conditionally on Z 1 and Z 2 being both equal to z. Indeed, if Z 1 and Z 2 have two different values, then concordant/discordant pairs do not bring any information about the dependence between X 1 and X 2 for any fixed value of the conditioning covariate. Note that, as conditional copulas themselves, conditional Kendall's taus are invariant w.r.t. increasing transformations of the conditional margins X 1 and X 2 , given Z.
Of course, if Z is independent of (X 1 , X 2 ) then, for every z ∈ R p , the conditional Kendall's tau τ 1,2|Z=z is equal to the (unconditional) Kendall's tau τ 1,2 . A weaker sufficient condition for this to happen is the so-called "simplifying assumption" about the conditional copula, i.e. "C 1,2|Z=z does not depend on the choice of z", a key assumption for vine modeling in particular: see [1] or [8] for a discussion, [3] for a review and a presentation of formal tests for this hypothesis. However, in general, there is no reason why the mapping z → τ 1,2|Z=z or any other conditional dependence measure should stay constant. And conditional Kendall's tau are of interest per se because they allow to summarize the evolution of the dependence between X 1 and X 2 , when the covariate Z is changing. Some conditional dependence measures and their estimates have been introduced in the literature a few years ago ( [7], [15], [6]) but their properties have not yet been fully studied in depth. Note that our τ 1,2|Z=z has not to be confused with the so-called "conditional Kendall's tau" in the case of truncated data, as in Tsai [14].
Interestingly, dependence measures are of interest for the purpose of estimating copula models too. Indeed, several popular parametric families of copulas have a simple one-to-one mapping between their parameter and the associated Kendall's tau (or Spearman's rho): Gaussian, Student with a fixed degree of freedom, Clayton, Gumbel and Frank copulas, etc. Getting back in a conditional framework, assume that the conditional copula C 1,2|Z=z belongs to such a parametric family, say a Gaussian copula with a parameter ρ(z). Then, by estimating the conditional Kendall's tau τ 1,2|Z=z , we get an estimate of the corresponding parameter ρ(z), and finally of the conditional copula itself.
Until now, the theoretical properties of conditional Kendall's tau estimates have been obtained "in passing" in the literature, as a sub-product of the weak-convergence of conditional copula processes ( [15]) or as intermediate quantities that will be "plugged-in" ( [5]). Therefore, such properties have been stated under too demanding assumptions, in particular some assumptions related to the estimation of conditional margins while it is not required. In this paper, we will directly study nonparametric estimatesτ 1,2|z without relying on the theory/inference of copulas. Therefore, we will state their main usual properties of statistical estimates: exponential bounds in probability, consistency, asymptotic normality.
In Section 2, different kernel-based estimators of the conditional Kendall's tau are proposed. In Section 3, the theoretical properties of the latter estimators are proved, first with finite-distance bounds and then under an asymptotic point-of-view. A short simulation study is provided in Section 4. Proofs are postponed into the appendix.
2 Definition of several kernel-based estimators of τ 1,2|z Let (X i,1 , X i,2 , Z i ), i = 1, . . . , n be an i.i.d. sample distributed as (X 1 , X 2 , Z), and n ≥ 2. Assuming continuous underlying distributions, there are several equivalent ways of defining the conditional Kendall's tau: Motivated by each of the latter expressions, we introduce several kernel-based estimators of τ 1,2|Z=z : where ½ denotes the indicator function, w i,n is a sequence of weights given by with K h (·) := h −p K(·/h) for some kernel K on R p , and h = h(n) denotes a usual bandwidth sequence that tends to zero when n → ∞. In this paper, we have chosen usual Nadaraya-Watson weights. Obviously, there are alternatives (local linear, Priestley-Chao, Gasser-Müller, etc., weight), that would lead to different theoretical results.
The estimatorsτ 1,2|Z=z andτ 1,2|Z=z look similar, but they are nevertheless different, as shown in Proposition 1. These differences are due to the fact that all theτ 1,2|Z=z in [−1 + 2s n , 1]. Moreover, there exists a direct relationship between these estimators, given by the following proposition.
1,2|Z=z + s n =τ This proposition is proved in Section A.
Note that none of the latter estimators depends on any estimation of conditional marginal distributions.
In other words, we only have to choose conveniently the weights w i,n to obtain an estimator of the conditional Kendall's tau. This is coherent with the fact that conditional Kendall's taus are invariant with respect to conditional marginal distributions. Moreover, note that, in the definition of our estimators, the inequalities are strict (there are no terms corresponding to the cases i = j). This is inline with the definition of (conditional) Kendall's tau itself through concordant/discordant pairs of observations.
The definition ofτ (1) 1,2|Z=z can be motivated as follows. For j = 1, 2, letF j|Z (·|Z = z) be an estimator of the conditional cdf of X j given Z = z. Then, a usual estimator of the conditional copula of X 1 and X 2 given See [15] or [6], e.g. The latter estimator of the conditional copula can be plugged into (1) to define an estimator of the conditional Kendall's tau itself: If the functionsF j|Z (·|Z = z) are increasing, this reduces tô Veraverbeke et al. [15], Subsection 3.2, introduced their estimator of τ 1,2|z by (3). By the functional Delta-Method, they deduced its asymptotic normality as a sub-product of the weak convergence of the process √ nh Ĉ 1,2|Z (·, ·|z) − C 1,2|Z (·, ·|z) when Z is univariate. In our case, we will obtain the theoretical properties ofτ (1) 1,2|Z=z under weaker conditions by a more direct analysis. We could similarly justifyτ 1,2|Z=z in a similar way by considering conditional survival copulas.
Let us define g 1 , g 2 , g 3 by where for i = 1, . . . , n, we set X i := (X i,1 , X i,2 ). Clearly,τ   1,2|Z=z that is made with our dataset where the i-th and j-th observations have been removed. As a consequence, the random functionτ The bandwidthĥ can then be chosen as the minimizer of the cross-validation criteria for k = 1, 2, 3. A similar criterion can be proposed for the rescaled versionτ 1,2|Z=· .

Finite distance bounds
Hereafter, we will consider the behavior of conditional Kendall's tau estimates given Z = z belongs to some fixed open subset Z in R p . For the moment, let us state an instrumental result that is of interest per se. Let be the usual kernel estimator of the density f Z of the conditioning variable Z. Note that the estimatorsτ (k) 1,2|Z=z , k = 1, . . . , 4 are well-behaved only wheneverf Z (z) > 0. Denote the joint density of (X, Z) by f X,Z . In our study, we need some usual conditions of regularity.
Assumption 3.1. The kernel K is bounded, and set K ∞ =: C K . It is symmetrical and satisfies K = 1, |K| < ∞. This kernel is of order α for some integer α > 1: for all j = 1, . . . , α − 1 and every indices for all z ∈ Z, is strictly positive with a probability larger than The latter proposition is proved in Section A.2. It guarantees that our estimatorsτ (k) 1,2|z , k = 1, . . . , 4, are well-behaved with a probability close to one.
is integrable and there exists a finite constant C XZ,α > 0 such that, for every z ∈ Z and every n, Proposition 3 (Exponential bound for the estimated conditional Kendall's tau). Under Assumptions 3.1- for any z ∈ Z and every k = 1, 2, 3, with c 1 := c 3 := 4 and c 2 := 2.
Remark 4. In Propositions 2 and 3, the lower bound f Z,min can be replaced by the real density f Z (z) when it is positive. Moreover, when the support of K is included in [−c, c] p for some c > 0, f Z,max can be replaced by a local bound supz ∈V (z,ǫ) f Z (z), denoting by V(z, ǫ) a closed ball of center z and any radius ǫ > 0, when ] is positive, the results above apply, replacing 2/3 by 4/3 in the denominators.
This proposition is proved in Section A.3. As a corollary, it yields the weak consistency ofτ (k) 1,2|Z=z for every z ∈ Z, under the assumptions of Proposition 3 and if nh 2p → ∞ (set t = 1 and t ′ > 0 sufficiently small).
In the next section, some asymptotic results will be stated, including consistency under weaker assumptions.
This property is proved in Section A.4.
Proposition 6 (Uniform consistency). Under Assumption 3.1, assume that nh 2p n / log n → ∞, lim K(t)|t| p = 0 when |t| → ∞, K is Lipschitz, f Z and z → τ 1,2|Z=z are continuous on a bounded set Z, and there exists a This property is proved in Section A.5. To derive the asymptotic law of this estimator, we will assume: Proposition 7 (Joint asymptotic normality at different points). Let z ′ 1 , . . . , z ′ n ′ be fixed points in a set Z ⊂ R p . Assume 3.1, 3.4, 3.5, that the z ′ i are distinct and that f Z and z → f X,Z (x, z) are continuous on Z, for every x. Then, as n → ∞, where H is a n ′ × n ′ real matrix defined by This proposition is proved in Section A.6.
Remark 8. The results of Proposition 5 and 7 apply to the "rescaled" estimatorτ 1,2|z , k = 1, 2, 3, too. Indeed, under our assumptions, it can be easily proved by Markov's inequality that n i=1 w 2 i,n (z) = O P ((nh p ) −1 ) for any z, that tends to zero. Then, by Slutsky's theorem, we get an asymptotic equivalence between the limiting laws of our estimated Kendall's tauτ

Z = R and the law of Z is
These simple frameworks allow us to compare the numerical properties of our different estimators in different parts of the space, in particular when Z is close to zero or one, i.e. when the conditional Kendall's tau is close to −1 or to 1. We compute the different estimatorsτ We also consider their integrated version w.r.t the usual Lebesgue measure on the whole support of z, respectively denoted by IBias, ISd and IM SE. Some results concerning these integrated measure are given in For every n, the best results seem to be obtained with α h = 1.5 and the fourth (rescaled) estimator, particularly in terms of bias. This is not so surprising, because the estimatorsτ (k) , k = 1, 2, 3, do not have the right support at a finite distance. Note that this comparative advantage ofτ in terms of bias decreases with n, as expected. In terms of integrated variance, all the considered estimators behave more or less similarly, particularly when n ≥ 500.
To illustrate our results for Setting 1 (resp. Setting 2), the functions z → Bias(z), Sd(z) and M SE(z) have been plotted on Figures 1-2 (resp. Figures 3-4), both with our empirically optimal choice α h = 1.5. We can note that, considering the bias, the estimatorτ behaves similarly asτ (1) when the true τ is close to −1, and similarly asτ (3) when the true Kendall's tau is close to 1. But globally, the best pointwise estimator is clearly obtained with the rescaled versionτ 1,2|Z=· , after a quick inspection of MSE levels, and even if the differences between our four estimators weaken for large sample sizes. The comparative advantage of τ 1,2|z more clearly appears with Setting 2 than with Setting 1. Indeed, in the former case, the support of Z's distribution is the whole line. Thenf Z does not suffer any more from the boundary bias phenomenon, contrary to what happened with Setting 1. As a consequence, the biases induced by the definitions ofτ (k) 1,2|z , k = 1, 3, appear more strinkingly in Figure 3, for instance: when z is close to (−1) (resp. 1), the biases of τ (1) 1,2|z (resp.τ (3) 1,2|z ) andτ 1,2|z are close, when the biasτ (3) 1,2|z (resp.τ (1) 1,2|z ) is a lot larger. Since the squared biases are here significantly larger than the variances in the tails,τ 1,2|z provides the best estimator globally considering "both sides" together. But even in the center of Z's distribution, the latter estimator behaves very well.
Then, for any t > 0 and n ≥ m, where c denotes summation over all subgroups of m distinct integers (i 1 , . . . , i m ) of {1, . . . n}.

A.1 Proof of Proposition 1
Since there are no ties a.s.,
It remains to prove Lemma 10. Use the usual decomposition between a stochastic component and a bias: . We first bound the bias from above. 1]. This function has at least the same regularity as f Z , so it is α-differentiable. By a Taylor-Lagrange expansion, we get for some real number t z,u ∈ (0, 1). By Assumption 3.1 and for every i < α, Second, the stochastic component may be written aŝ where g(Z i ) := K h (Z i −z). Apply Lemma 9 with m = 1 and the latter g(Z i ). Here, we have b = −a = h −p C K , θ = E[g(Z 1 )] ≥ 0 and V ar[g(Z 1 )] ≤ h −p f Z,max K 2 , and we get

A.3 Proof of Proposition 3
We show the result for k = 1. The two other cases can be proven in the same way.
Consider the decomposition The conclusion will follow from the next two lemmas. Then, f 2 Z (z)/f 2 Z (z) and 1≤i,j≤n S i,j will be bounded separately.
Proof : Applying the mean value inequality to the function x → 1/x 2 , we get the inequality 1/f 2 where f * Z lies betweenf Z (z) and f Z (z). By Lemma 10, we obtain Therefore, on this event, and then f Z,min /2 ≤ f * Z . Combining the previous inequalities, we finally get 1 and we deduce the result.
) is negative and negligible. It will be denoted by −∆ n < 0. Now, let us deal with the main term, that is decomposed as a stochastic component and a bias component.
First, let us deal with the bias. Simple calculations provide, if i = j, because, for every z, We apply the Taylor-Lagrange formula to the function φ x1,x2,u,v (t) := f X,Z x 1 , z + thu f X,Z x 2 , z + thv .
With obvious notations, this yields using Assumption 3.4, we get Second, the stochastic component will be bounded from above. Indeed, with the function g defined by We can now apply Lemma 9 to the sum of theg i,j . With its notations, θ = E g i,j = 0. Moreover, and the same upper bound applies forg i,j (invoke Cauchy-Schwartz inequality). Here, we choose b = −a = Then, for every t > 0, we obtain The latter inequality and (4) conclude the proof.

A.4 Proof of Proposition 5
Let us note that τ 1,2|Z=z = E g k (X 1 , X 2 ) Z 1 = z, Z 2 = z for every k = 1, 2, 3, and that our estimators with the weights (2) can be rewritten asτ for any measurable bounded function g, with the residual diagonal term ǫ n := By Bochner's lemma (see Bosq and Lecoutre [2]), ǫ n is O P ((nh p ) −1 ), and it will be negligible compared to U n (1). Since the reasoning will be exactly the same for every estimator τ (k) 1,2|z , i.e. for every function g k , k = 1, 2, 3, we omit the sub-index k. Then, the functions g k will be simply denoted by g.
The expectation of our U-statistics is applying Bochner's lemma to z → g(x 1 , x 2 )f X|Z=z (x 1 )f X|Z=z (x 2 ) dx 1 dx 2 = τ 1,2|Z=z , that is a continuous function by assumption.
for some positive constants C 0 , C 1 , C 2 , by setting g (l) (X i , Z i ), (X j , Z j ) := g k (X i , X j )K h (t l − Z i )K h (t l − Z j ).
A.6 Proof of Proposition 7 Let g * (x 1 , x 2 ) := (g k (x 1 , x 2 ) + g k (x 2 , x 1 ))/2 for some index k = 1, 2, 3 that will be implicit in the proof. We now study the joint behavior of (τ 1,2|Z=z ′ i − τ 1,2|Z=z ′ i ) i=1,...,n ′ . We will follow Stute [13]'s approach, applying the same ideas with a multivariate conditioning variable z and studying the joint distribution of U-statistics at several conditioning points. As in the proof of Proposition 5, the estimator with the weights given by (2) can be rewritten asτ 1,2|Z=z ′ i := U n,i (g * ) / (U n,i (1) + ǫ n,i ), where U n,i (g) : for any bounded measurable function g : R 4 → R. Moreover, sup i=1,...,n ′ |ǫ n,i | = O P (n −1 h −p ). By a limited expansion of f X,Z w.r.t. its second argument, and under Assumption 3.4, we easily check that E U n,i (g) = τ 1,2|Z=z ′ i + r n,i , where |r n,i | ≤ C 0 h α n /f 2 Z (z ′ i ), for some constant C 0 that is independent of i. Now, we prove the joint asymptotic normality of U n,i (g) i=1,...,n ′ . The Hájek projectionÛ n,i (g) of U n,i (g) satisfiesÛ n,i (g) := 2 n j=1 g n,i X j , Z j /n − θ n , where θ n := E U n,i (g) and Lemma 13. Under the assumptions of Proposition 7, for any measurable bounded function g, where, for 1 ≤ i, j ≤ n ′ , This lemma is proved in Section A.7. Similarly as in the proof of Lemma 2.2 in Stute [13], for every i = 1, . . . , n ′ and every bounded symmetrical measurable function g, we have (nh p ) 1/2 V ar Û n,i (g)−U n,i (g) = o(1), which implies (nh p ) 1/2 U n,i (g) − E U n,i (g) i=1,...,n ′ D −→ N (0, M ∞ (g)), as n → ∞.