Estimation of the tail-index in a conditional location-scale family of heavy-tailed distributions

Abstract We introduce a location-scale model for conditional heavy-tailed distributions when the covariate is deterministic. First, nonparametric estimators of the location and scale functions are introduced. Second, an estimator of the conditional extreme-value index is derived. The asymptotic properties of the estimators are established under mild assumptions and their finite sample properties are illustrated both on simulated and real data.


Introduction
The literature on extreme-value analysis of independent and identically distributed observations is very elaborate, see for instance [3,12,26]. However, the regression point of view has been less extensively studied. The goal is to describe how tail characteristics such as extreme quantiles or small exceedance probabilities of the quantity of interest Y may depend on some explanatory variable x. Furthermore, as noted in [3,Chapter 7], such covariate information allows to combine datasets from di erent sources which may lead to better point estimates and thus improved inference.
A parametric approach is considered in [35] where a linear trend is tted to the expectation of the extremevalue distribution. We also refer to [11] for other examples of parametric models. Turning to semi-parametric models, [28] proposed to mix a non-parametric estimation of the trend with a parametric assumption on Y given x. Similarly, a semi-parametric estimator of γ is introduced in [2] as γ(ψ(β x)) where ψ is a known link function and β is interpreted as a vector of regression coe cients. Fully non-parametric estimators have been rst introduced in [6,10] through respectively local polynomial and spline models. We also refer to [12, Theorem 3.5.2] for the approximation of the nearest neighbors distribution using the Hellinger distance and to [13] for the study of their asymptotic distribution. Focusing on the estimation of the tail-index of the conditional distribution of Y given x, moving windows and nearest neighbors approaches are developed respectively by [15,16] in a xed design setting. Kernels methods are proposed in [8,9,19,20,24] to tackle the random design case. Finally, these methods have been adapted to the situation where the covariate is a random eld or in nite dimensional, see respectively [33] and [17,18].
The aim of our work is to estimate in a semi-parametric way the tail-index γ in a location-scale model for conditional heavy-tailed distributions. The so-called conditional tail-index is assumed to be constant while the location and scale parameters depend on the covariate, in a xed design setting. The underlying idea of this model is to achieve a balance between the exibility of non-parametric approaches (for the location and scale functions) and the stability of parametric estimators (for the conditional tail-index) compared to purely non-parametric ones. This intuition has also been implemented in [30]: An extreme-value distribution with constant extreme-value index is tted to standardized rainfall maxima. Here, we introduce a statistical framework to assess the bene ts of such approaches in terms of convergence rates of the estimators. This paper is organized as follows. The location-scale model for heavy-tailed distribution is introduced in Section 2. The associated inference procedures are described in Section 3. Asymptotic results are provided in Section 4 while the nite sample behaviour of the estimators is illustrated in Section 5 on simulated data and in Section 6 on insurance data. Proofs are postponed to the Appendix.

Conditional location-scale family of heavy-tailed distributions
Let Y be a real random variable. We assume that the conditional survival function of Y given x ∈ [ , ] can be written asF for y ≥ y (x) > a(x). The functions a : [ , ] → R and b : [ , ] → R + are referred to as the location and scale functions respectively whileF Z is the survival function of a real random variable Z which is assumed to be heavy-tailed:F Here, γ > is called the conditional tail-index and is a slowly-varying function at in nity i.e. for all λ > , lim z→∞ (λz) (z) = .
F Z is said to be regularly varying at in nity with index − /γ. This property is denoted for short byF Z ∈ RV − /γ , see [5] for a detailed account on regular variations. Combining (1) and (2) yields for y ≥ y (x) > a(x) where the functions a(·), b(·) and the conditional tail-index γ are unknown. We thus obtain a semi-parametric location-scale model for the (heavy) tail of Y given x. The main assumption is that the conditional tail-index γ is independent of the covariate. On the one hand, the proposed semi-parametric modeling o ers more exibility than purely parametric approaches. On the other hand, assuming a constant conditional tail-index γ should yield more reliable estimates in small sample contexts than purely nonparametric approaches. A similar idea is developed in [30]: An extreme-value distribution with constant extremevalue index is tted to standardized rainfall maxima. In the following, a xed design setting is adopted, and thus the covariate x is supposed to be nonrandom. Model (1) can be rewritten as where x ∈ [ , ] and Z is a random variable distributed according to (2). Starting with a n-sample {(Y , x ), . . . , (Yn , xn)} from (4), it is clear that, since Z is not observed, a(·) and b(·) may only be estimated up to additive and multiplicative factors. This identi ability issue can be xed by introducing some constraints onF Z . To this end, for all α ∈ ( , ) consider the αth quantile of Z: and assume there exist < µ < µ < µ < such that q Z (µ ) = and q Z (µ ) − q Z (µ ) = .
From (4), it straightforwardly follows that, for all α ∈ ( , ), the conditional quantile of Y given and therefore the location and scale functions are de ned in an unique way by for all x ∈ [ , ]. This remark is the starting point of the inference procedure.

Inference
Let {(Y , x ), . . . , (Yn , xn)} be a n-sample from (4): . . , Zn are independent and identically distributed (iid) from (2). For the sake of simplicity, it is assumed that the design points are equidistant: x i = i/n for all i = , . . . , n and x := . This assumption could be weakened to max [1,32]. A three-stage inference procedure is adopted.
(i) First, letq n,Y (α | x) be a nonparametric estimator of the conditional quantile q Y (α | x) where α ∈ ( , ) and x ∈ [ , ]. In view of (7), the location and scale functions are estimated for all x ∈ [ , ] bŷ (ii) Second, the non-observed Z , . . . , Zn can be estimated by the residualŝ for all i = , . . . , n. In practice, nonparametric estimators can su er from boundary e ects [7,31] and therefore only design points su ciently far from and are considered. Let us denote by In the set of indices associated with such design points and set mn =card(In).
(iii) Finally, let (kn) be an intermediate sequence of integers, i.e. such that < kn ≤ n, kn → ∞ and kn /n → as n → ∞. The (kn + ) top order statistics associated with the pseudo-observationsẐ i , i ∈ In are denoted byẐ mn−kn ,mn ≤ · · · ≤Ẑm n ,mn . The conditional tail-index is estimated using an Hill-type statistics: This estimator is similar to Hill estimator [29], but in our context, it is built on non iid pseudo-observations. The proposed procedure relies on the choice of an estimator for the conditional quantiles. Here, a kernel estimator forF Y (y | x) is considered (see for instance [32]). For all (x, y) where 1 {·} is the indicator function, K h (·) := K(·/h)/h with K a density function on R called a kernel and h = hn is a nonrandom sequence called the bandwidth such as hn → as n → ∞. The corresponding In this context, In = { nh , n − nh } and mn = n − nh + . Remark that In is properly de ned for all large n since h < / eventually. Nonparametric regression quantiles obtained by inverting a kernel estimator of the conditional distribution function have been extensively investigated, see, for example [4,34,36], among others.

Main results
The following general assumptions are required to establish our results. The rst one gathers all the conditions to de ne a conditional location-scale families of heavy-tailed distributions. (A.1) (Y , x ), . . . , (Yn , xn) are independent observations from the conditional location-scale family of heavy-tailed distributions de ned by (1), (2) and (5). The functions a(·) and b(·) are continuous on [ , ] and the survival functionF Z (·) is continuously di erentiable on R with associated density f Z (·) = −F Z (·). Under (A.1), the quantile function q Z (·) exists and we let H Z (·) := /f Z (q Z (·)) the quantile density function and U Z (·) = q Z ( /·) the tail quantile function of Z. The second assumption is a Lipschitz condition on the conditional survival function of Y. Lemma 1 in Appendix provides su cient conditions on a(·), b(·) andF Z (·) such that it is veri ed.
The next assumption is standard in the nonparametric kernel estimation framework.
Finally, the so-called second-order condition is introduced (see for instance [26, eq (3.2.5)]: where γ > , ρ < and A is a positive or negative function such that A(z) → as z → ∞. The rationale behind (A.4) is the following. From [5,Theorem 1.5.12], it is clear that (2) is equivalent to U Z ∈ RVγ, that is U Z (λz)/U Z (z) → λ γ as z → ∞ for all λ > . The role of the second-order condition is thus to control the rate of the previous convergence thanks to the function A(·). Moreover, it can be shown that |A| is regularly varying with index ρ, see [26,Lemma 2.2.3]. It is then clear that ρ, referred to as the second-order parameter, is a crucial quantity, tuning the rate of convergence of most extreme-value estimators, see [26,Chapter 3] for examples. Our rst result states the joint asymptotic normality of the estimators (8) of the location and scale parameters at a point tn ∈ ( , ) not too close from the boundaries of the unit interval.
where the coe cients of the matrix D are given by A uniform consistency result can also be established: Theorem 2 will reveal useful to prove that the residualsẐ i are close to the unobserved Z i , i = , . . . , n. This justi es the computation of the Hill estimator (10) on the residuals. Our nal main result provides the asymptotic normality of this conditional tail-index estimator.
It appears that our methodology is able to estimate the tail-index in the conditional location-scale family at the same rate / kn as in iid case, see [25] for a review. As expected, the conditional location-scale family is a more favorable situation than the purely nonparametric framework for the estimation of the conditional tail index where the rate of convergence / kn h is impacted by the covariate, see [ ). Up to logarithmic factors, the constraint is then kn = o(n (− ρ/( − ρ))∧( / ) ). If ρ ≥ − , the rate of convergence ofγn is thus n ρ/( − ρ) which is the classical rate for estimators of the tail-index, see for instance [27,Remark 3]. Let us also remark that, since nh/(kn log n) → ∞ and since b(·) is lower bounded under (A.1), Theorem 1 and Theorem 3 entail that where the coe cients of the matrix E are given by E , = and E i,j = if i ∈ { , } or j ∈ { , }. The joint limiting distribution is degenerated sinceγn converges at a slower rate thanân(tn) andbn(tn).

Illustration on simulated data
The nite-sample performance of the estimators of the location and scale functions as well as of the conditional tail-index are illustrated on simulated data from model (4).
In all the experiments, N = replications of a dataset of size n = are considered. The kernel function K is chosen to be the quartic (or biweight) kernel and the bandwidth is xed to h = . . The estimated location and scale functions a(·) and b(·) are compared with the mean estimatesā n(·) andbn(·) on the top-right (b) and bottom-left panels (c) respectively. Finally, the estimated conditional tail-indicesγ n,i , i = , . . . , N, the mean estimated valueγn and the true conditional tail-index are displayed as functions of kn ∈ { , . . . , } on the bottom-right panels (d). As expected, it appears on Figure 1(a)-3(a) that the tail heaviness of Y|x decreases as ν increases. The estimation accuracy of the location and scale function does not seem to be sensitive to ν, see Figure 1(b,c)-3(b,c). On the contrary, it appears on Figure 1(d)-3(d) that large values of ν yield a large bias in the estimation of the conditional tail-index. This trend was expected, since the conditional second-order parameter is the main driver of the bias, as explained in Section 4, and since |ρ| = /( ν) for a Student distribution. Small values of |ρ| in (A.4) entail high bias in extreme-value estimators such as Hill's statistics. A way to mitigate this bias could be to replace the conditional tail-index estimator (10) by a bias-reduced Hill-type estimators, see for instance [25].

Real data example
We consider here a dataset on motorcycle insurance policies and claims over the period 1994-1998 collected from the former Swedish insurance provider Wasa. The dataset is available from www.math.su.se/GLMbook and the R package insuranceData. We focus on two variables: the claim severity Y (de ned as the ratio of claim cost by number of claims for each given policyholder) in SEK, and the age x of the policyholder in years. Removing missing data and an a ne transformation of a covariate result in n = pairs ( . Some graphical diagnostics have been performed in [22] to check that the heavy-tailed assumption makes sense for Y. Our goal is to estimate the conditional extreme quantile q Y (αn | x) where nαn → and x ∈ ( , ). Two estimators are considered. The rst one relies on the semi-parametric model via (6): and on Weissman estimator [37] applied to the pseudo-observationsẐ i , i ∈ In: The second one is the nonparametric conditional Weissman estimator introduced in [9]: , whereq n,Y (kn /mn | x) is de ned in (12) andγn(x) is an estimator of the conditional tail index. Here, we selected a recent estimator introduced in [22] and denoted byγ ( ) kn (x) in the previously mentioned paper. As in Section 5, we set the normalizing parameters to µ = / , µ = / and µ = / . The quartic kernel is used and the bandwidth h = .
is chosen by the cross-validation procedure implemented in R as h.cv. The estimated location and scaled functions are superimposed to the dataset on Figure 4. The residuals are then computed according to (9).
To con rm that the location-scale model (3) is appropriate, Figure 5 displays a quantile-quantile plot of the weighted log-spacings within the top of the residuals against the quantiles of the standard exponential distribution. Formally, let W i,mn = i log(Ẑ mn−i+ ,mn /Ẑ mn−i,n ), ≤ i ≤ kn − , denote the weighted log-spacings computed from the consecutive top order statistics of the residuals. It is known that, ifẐ is heavy-tailed with tail-index γ then, the W i,mn are approximately independent copies of an exponential random variable with mean γ, see for instance [3]. Here, the number of upper statistics is xed to kn = by a visual inspection of the Hill plot (not reproduced here). The relationship appearing on Figure 5 is approximately linear, which constitutes a graphical evidence that the heavy-tail assumption (2) on Z makes sense and that the choice of kn is appropriate.
Finally, the two conditional quantile estimatorsq n,Y (αn | ·) andq n,Y (αn | ·) are graphically compared on Figure 6 for αn = /n. Both of them yield level curves with similar shapes and located above the sample. Unsurprisingly, the estimatorq n,Y (αn | ·) based on the location-scale model has a smoother behavior thaň q n,Y (αn | ·) since it relies on the assumption that the tail-index does not depend on the covariate.

A. Auxiliary lemmas
We begin by providing some su cient conditions such that (A.2) holds.
The Lipschitz assumption on logF Z yields for all (t, s) ∈ [ , ] and y ∈ R: then, as n → ∞, Proof. Consider the expansion As a consequence, for all yn ∈ C, Let us now turn to the second term We have, for all yn ∈ C, in view of (13). Finally, collecting (13) and (14), the conclusion follows.
As a consequence of Lemma 2, the asymptotic bias and variance of the estimator (11) of the conditional survival function can be derived.
(ii) Let us consider the expansion: Let us write uniformly on (s , s ) ∈ [x i− , x i ] and i = , . . . , n. It thus follows that Replacing in T n, , we obtain: Applying Lemma 2 twice and recalling that nh → ∞ as n → ∞ entail Similarly, and the conclusion follows: under the assumption lim inf F Y (yn | tn) > .
The next lemma controls the error between each unobserved random variable Z i and its estimationẐ i , for all i = , . . . , n. = { nh , . . . , n − nh } and suppose nh/ log n → ∞ and nh / log n → as n → ∞. Then, for all i ∈ In,

Lemma 4. Assume (A.1), (A.2) and (A.3) hold. Let In
Proof. Remark that for all i ∈ In, one has Let us de ne, for all i ∈ In, On the one hand, Theorem 2 entails On the other hand, Again, Theorem 2 shows that the following uniform consistency holds: For all ϵ > , there exists M(ϵ) > such that Now, for n large enough (nh/ log n) / > M(ϵ) so that

A. Preliminary results
Let ∨ (resp. ∧) denote the maximum (resp. the minimum). The next proposition provides a joint asymptotic normality result for the estimator (11)  Proof. Let us rst remark that, for all j ∈ { , . . . , J}, in view of (6), the sequence y j,n = a(tn) + b(tn)(q Z (α j ) + ϵ j,n ) is bounded since ϵ j,n → as n → ∞ and since a(·) and b(·) are continuous functions de ned on compact sets. Besides, from (1), F Y (y j,n | tn) = F Z (q Z (α j ) + ϵ j,n ) → − α j > as n → ∞ and thus the assumptions of Lemma 3(i,ii) are satis ed. Let β ≠ in R J , J ≥ and consider the random variable Let us rst consider the random term: where S n,i is de ned by (15) in the proof of Lemma 3, and where Σ (i,n) is the matrix whose coe cients are de ned for (k, l) ∈ { , . . . , J} by Σ (i,n) k,l = cov 1 {Y i >y k,n } , 1 {Y i >y l,n } . In view of (16), where φ is the function . Replacing in (17) yields var(Γ n, ) = β t C (n) β, where C (n) is the covariance matrix whose coe cients are de ned by Applying Lemma 2 twice and recalling that nh → ∞ entail As a result, where B (n) k,l = φ(y k,n , y l,n | tn) =F Y (y k,n ∨ y l,n | tn)F Y (y k,n ∧ y l,n | tn). Let us remark that, in view of (6), as n → ∞. Thus, assuming for instance k < l implies α k > α l and thus q Z (α k ) < q Z (α l ) leading to y k,n < y l,n for n large enough. More generally, y k,n ∨ y l,n = y k∨l,n and y k,n ∧ y l,n = y k∧l,n for n large enough and thus B (n) k,l =F Y (y k∨l,n | tn)F Y (y k∧l,n | tn).
From (1) and (6), we havē in view of the continuity ofF Z . As a result, B (n) k,l → B k,l = α k∨l ( − α k∧l ) as n → ∞ and therefore The proof of the asymptotic normality of Γ n, is based on Lyapounov criteria for triangular arrays of independent random variables: as n → ∞. Let us rst remark that, for all i = , . . . , n, the random variable T i,n is bounded: and it is thus clear that (19) holds under the assumption nh → ∞. A a result, Let us now turn to the nonrandom term Lemma 3(i) together with the assumptions nh → and nh → ∞ as n → ∞ entail Finally, collecting (20) and (21), √ nhΓn converges to a centered Gaussian random variable with variance K β t Bβ, and the result follows.
The following proposition provides the joint asymptotic normality of the estimator (12) of conditional quantiles. It can be read as an adaptation of classical results [4,34,36] to the location-scale setting.
where, for j = , . . . , J, Let us rst examine the nonrandom term v j,n . In view of (1) and (6), it follows that SinceF Z (·) is di erentiable, for all j ∈ { , . . . , J}, there exists θ j,n ∈ ( , ) such that In view of the continuity of f Z (·) and since s j / √ nh → as n → ∞, it follows that Let us now turn to the random variable V j,n . For all j = , . . . , J, let where ϵ j,n → as n → ∞. Then, Proposition 1 entails that converges to a centered Gaussian random variable with covariance matrix K B. Taking account of (22) yields that Wn converges to the cumulative distribution function of a centered Gaussian distribution with covariance matrix K C, evaluated at (s , . . . , s J ), which is the desired result.
The following proposition provides a uniform consistency result for the estimator (12) of conditional quantiles of Y given a sequence of design points (not too close from the boundaries 0 and 1).
Proof. Let ϵ ∈ ( , ) and α ∈ ( , ). De ne vn = (nh/ log n) / , and for all i ∈ In let q ± i,n = q Y (α | x i ) ± M(ϵ, α)b(x i )/vn. Let us consider the expansion: Let us focus on the term δ + n . Assumption nh/ log n → ∞ entails that vn → ∞ as n → ∞ and thus q + i,n is bounded. Therefore Lemma 3(i) yields in view of the assumption nh / log n → as n → ∞. As a preliminary result, In addition, where, for all j = , . . . , n, the random variables are independent, centered and bounded: Besides, q + i,n → q Y (α | x i ) as n → ∞ and thusF Y q + i,n | x i → α as n → ∞ in view of the continuity of F Y (· | x i ). It follows that, for n large enough. Collecting (23)-(25) yields δ + n ≤ n exp − − log(ϵ/ ) log n = exp log(ϵ/ ) log n ≤ ϵ/ for n large enough. The proof that δ − n ≤ ϵ/ follows the same lines. As a conclusion, we have shown that, for all α ∈ ( , ) and ϵ ∈ ( , ) there exists M(ϵ, α) > such that which is the desired result.

A. Proofs of main results
The proof of Theorem 1 directly relies on Proposition 2: Applying Proposition 2 with J = , α = µ , α = µ and α = µ yields Therefore,Ã ξn d −→ N R , K Ã CÃ t , and the conclusion follows from standard calculations.
Theorem 2 is a straightforward consequence of Proposition 3: Proof of Theorem 2. Remarking that the rst part of the result is a consequence of Proposition 3 applied with α = µ . Similarly, and the conclusion follows from Proposition 3 successively applied with α = µ and α = µ .
in view of the assumption nh/(kn log n) → ∞ as n → ∞. Let us now focus on Υ ,n . Remarking that mn ∼ n as n → ∞ it is clear that mn /kn → ∞ as n → ∞. Besides, since |A| ∈ RVρ, we thus have A(mn /kn) ∼ A(n/kn) as n → ∞. Therefore, kn A(mn /kn) → as n → ∞ and, since Z , . . . , Zn are iid from (2), classical results on Hill estimator apply, see for instance [26,Theorem 3.2.5], leading to The conclusion follows by combining (26) and (27).