Bregman superquantiles. Estimation methods and applications

Abstract In thiswork,we extend some parameters built on a probability distribution introduced before to the casewhere the proximity between real numbers is measured by using a Bregman divergence. This leads to the definition of the Bregman superquantile (thatwe can connect with severalworks in economy, see for example [18] or [9]). Axioms of a coherent measure of risk discussed previously (see [31] or [3]) are studied in the case of Bregman superquantile. Furthermore,we deal with asymptotic properties of aMonte Carlo estimator of the Bregman superquantile. Several numerical tests confirm the theoretical results and an application illustrates the potential interests of the Bregman superquantile.


Introduction
The aim of this article is to define and to study properties and estimation procedures for Bregman extension of the superquantile defined in [16] (see also [15], [14] and references therein).We first recall the necessary conditions for a measure of risk to be coherent and present the superquantile as a partial response to this problem.In section 2, we introduce the Bregman superquantile and study axioms of a coherent measure of risk for this quantity.In Section 3 we are interested in estimating this Bregman superquantile, we introduce a plug-in estimator and study its consistent and asymptotic normality.All the proofs are differed to Section 5.

Coherent measures of risk
Let X be a real-valued random variable and let F X be its cumulative distribution function.We denote for u ∈]0, 1[, the quantile function X (u) := inf{x : F X (x) ≥ u}.A usual way to quantify the risk associated with X is to consider, for a given number α ∈]0, 1[ close to 1, its lower quantile q α := F −1 X (α).But quantile is not sub-additive, a property considered important in some applications (e.g finance, see [2]).Thus Rockafellar introduces in [13] a new quantity called therein superquantile that satisfies this property.The superquantile is defined by : 1 − α Sub-additivity is not the sole interesting property for a measure of risk (for example for financial application).Following Rockaffelar in [13] we define : Date: May 27, 2014.
1 Definition 1.1.Let R be a measure of risk and X and X ′ be two real-valued random variables.We say that R is coherent if, and only if, it satisfies the five following properties : i) Constant invariant : let C ∈ R, If X = C (a.s.) then R(C) = C. ii) Homogeneity : ∀λ > 0, R(λX) = λR(X).

Bregman superquantiles
In this section the aim is to build a general measure of risk that satisfies some of the regularity axioms stated in Definition 1.1.These quantities will be built by using a dissimilarity measure beetween real numbers, the Bregman divergence.
2.1.Bregman divergence, mean and superquantile.In this section we first recall the definition of the Bregman mean of a probability measure µ (see [3]) and define the measure of risk that we will study.To begin with, we recall the definition of the Bregman divergence that is used to build the Bregman mean.Let γ be a strictly convex function, R-valued on R. As usual we set domγ := {x ∈ R : γ(x) < +∞}.
For sake of simplicity we assume that domγ is a non empty open set and that γ is closed proper and differentiable on the interior of domγ (see [12]).From now we always consider function γ satisfying this hypothesis.The Bregman divergence d γ associated to γ (see [4]) is a function defined on domγ × domγ by The Bregman divergence is not a distance as it is not symmetric.Nevertheless, as it is non negative and vanishes, if and only if, the two arguments are equal, it quantifies the proximity in domγ.Let us recall some classical examples of such a divergence.
• Euclidean.γ(x) = x 2 on R, we obviously obtain, for x, x ′ ∈ R, Let µ be a probability measure whose support is included in domγ such that µ(domγ\domγ) = 0 and γ ′ is integrable with respect to µ.Following [3], we first define the Bregman mean as the unique point b in the support of µ satisfying (1) In fact, we replace the L 2 minimization in the definition of the mathematical expectation by the minimization of the Bregman divergence.Existence and uniqueness come from the convexity properties of d γ with respect to its first argument.By differentiating it's easy to see that Hence, coming back to our three previous examples, we obtain the classical mean in the first example (Euclidean case), the geometric mean (exp log(x)µ(dx)), in the second one and the harmonic mean ([ x −1 µ(dx)] −1 ), in the third one.Notice that, as the Bregman divergence is not symmetric, we have to pay attention to the definition of the Bregman mean.Indeed, we have We turn now to the definition of our new measure of risk.
In words Q dγ α satisfies (1) taking for µ the distribution of X conditionally to X ≥ F −1 X (1 − α).We now denote Q dγ α the Bregman superquantile of the law X when there is no ambiguity and Q dγ α (X) if we need to distinguish Bregman superquantile of different laws.

2.2.
Coherence of Bregman superquantile.The following proposition gives some conditions under which the Bregman superquantile is a coherent measure of risk.Proposition 2.2.Fix α in ]0, 1[.i) Any Bregman superquantile always satisfies the properties of constant invariance and non decreasing.ii) The Bregman superquantile associated to γ is homogeneous if and only if γ ′ (x) = ln(x) or γ ′ (x) = (x β − 1)/β for some β = 0. iii) If γ ′ is concave and sub-additive, then subadditivity and closeness axioms both hold.
The proof of this proposition, like all the others, is differed in Section 5.
To conclude, under some regularity assumptions on γ, the Bregman superquantile is a coherent measure of risk.Let us take some examples.

Examples and counter-examples.
• Example 1 : x → x 2 satisfies all the hypothesis but it's already known that the classical superquantile is sub-addtive.• Example 2 : The Bregman geometric and harmonic functions satisfies the hypothesis i and ii.Moreover, their derivatives are x → γ ′ (x) = log(x) and x → γ ′ (x) = x−1 x which are concave but sub-additive only on [1, +∞[.Then the harmonic and geometric functions satisfy iii not for every couple of random variables but only for couples (X, X ′ ) such that, denoting Z := X + X ′ we have min q X α (α), q X ′ α (α), q Z α (α) > 1 • Counter-example 3 : The sub-additivity is not true in the general case.Indeed, let γ(x) = exp(x) be our convex function and let us consider the uniform law on [0, 1].
For α = 0.95 and λ = 2, we obtain and the subadditivity is not true.We can also notice that for λ = 4 and the homogeniety is not true.It's coherent with our proposition since the derivative of γ = exp is not one of those presented in ii) of proposition 2.2.

Estimation and asymptotics of the Bregman superquantile
In this section the aim is to make estimation of the Bregman superquantile.We introduce a Monte Carlo estimator and study its asymptotics properties.Under regularity assumptions on the functions γ and F −1 X , the Bregman superquantile is consistent and asymptotically gaussian.All along this section, we consider a function γ satisfying our usual properties and a real-valued random variable X such that γ ′ (X) is right-integrable.
3.1.Monte Carlo estimator.Assume that we have at hand (X 1 , . . ., X n ) an i.i.d sample with same distribution as X.If we wish to estimate Q dγ α , we may use the following empirical estimator : is the re-ordered sample built with (X 1 , . . ., X n ).
3.2.Asymptotics.In this section, we present the following theorem which study the asymptotic behaviour of the Bregman superquantile.
Theorem 3.1.Let α ∈]0, 1[ be close to 1 and X be a real-valued random variable.Let (X 1 , . . .X n ) be an independant sample with the same distribution as X. i) We assume that γ is twice differentiable and that the cumulative function F X is C 1 on ]0, 1[.We note f X the density and we suppose that f X > 0 on its support.We moreover suppose that the derivative of γ ′ • F −1 X that we denote l γ is non-decreasing and o (1 − t) −2 in the neighborhood of 1.

Then the estimator γ
ii) Strongly, we assume that γ is three times differentiable, the cumulative ditribution function F X is C 2 , f X > 0 on its support and the second derivative of γ ′ • F −1 X that we denote L γ is non decreasing and O ((1 − t) −m L ) for an 1 < m L < 5 2 , in the neighborhood of 1. Then the estimator is asymptotically normal where , Remark 3.2.Easy calculations show that we have the following equalities The second part of the theorem shows the asymptotic normality of the Bregman superquantile.We can then use the Slutsky's lemma to find confident interval.Indeed, since our estimator Qdγ Then to prove this theorem we use results on the asymptotic behavior of the superquantile.We have the fundamental link between this two quantities To prove our theorem, we first prove the following proposition on the asymptotic behavior of the superquantile (which is equivalent to deal with the Bregman superquantile with the function γ equals to identity).Then, we apply this proposition to the sample (Z 1 , . . .Z n ) where Z i := γ ′ (X i ).We conclude by applying the continuous map theorem for the consistency and the delta-method for the asymptotic normality (both with the regular function γ ′−1 ).Proposition 3.3.Let αbe ∈]0, 1[ close to 1 and X be a real-valued random variable right integrable.Let (X 1 , . . .X n ) be an independant sample with the same distribution as X.
i) We assume that the cumulative function F X is C 1 on ]0, 1[.We note f X the density and we suppose that f X > 0 on its support.We moreover suppose that the derivative of the quantile function Then, the estimator ii) Strongly, we assume that the cumulative ditribution function F X is C 2 , f X > 0 on its support and the second derivative of the quantile function that we denote L is non decreasing and O (( , in the neighborhood of 1. Then the estimator is asymptotically normal .

3.3.
Examples of asymptotic behavior for the classical superquantile.Our hypothesis are easy to check in practice.Let us show examples of the asymptotic behaviour of the superquantile : the exponential distribution of parameter 1 and the Pareto distribution.
3.3.1.Exponential distribution.In this case, we have on • Consistency : In the neighborhood of 1, and l is non decreasing then the superquantile is consistent.
• Asymptotic normality : In the neighborhood of 1, and L is non-decreasing .Then the theorem of asymptotic normality is true.

Pareto law.
Here, we consider the Pareto law of parameter a > 0 : on as soon as a > 1.Then, l is non-decreasing near 1.The consistency is true.
• Asymptotically gaussian : in the neighborhood of 1 and L is non decreasing around 1. The asymptotic normality is true if and only if a > 2.

Examples of asymptotic behaviour of the Bregman superquantile.
Let us now study the same examples for the Bregman superquantile.For the exponential distribution, the conclusion is the same.Nevertheless, we can find a function γ such that the estimator of the Bregman superquantile is asymptotically normal without conditions on a.In this sens, the Bregman superquantile is a more interesting measure of risk than the superquantile.
• Consistency.In this case, we have, So in the neighborhood of 1, l γ is non decreasing and o (1 The estimator is consistent.

Pareto law.
Let us now study the case of the Pareto law with the harmonic Bregman function.We have • Consistency. 2  and the non-decreasinf is true.The estimator is consistent.
2 .The estimator is consistent and normally asymptotic for every a > 0. Proof.For the first part of the point i), we obviously have Let us show non decreasing property.We first show that the superquantile is nondecreasing.Then, (2) and the monotony of γ ′−1 and γ will allow us to conclude.[11] shows that the Superquantile of the law of X satisfy : Let X and X ′ be two random variables such that X ≤ X ′ (a.s.).Using the previous results we have Then the monotony of the function x → (x − c) + when c is fixed gives Let us now deal with ii).For every (measurable) function f , we denote Let X and X ′ be two real-valued random variable.The Bregman superquantile associated to γ is: According to definition 1.1, it is homogeneous if, for every λ > 0, As γ ′ and x → (γ ′ (x) − γ ′ (1))/γ ′′ (1) yield the same superquantiles, one may assume without loss of generality that γ ′ (1) = 0 and that γ ′′ (1) = 1 First, it is easy to check that the condition given is sufficient.For simplicity, we write If φ(x) = ln(x), then φ −1 (y) = exp(y) and For the other implication, let x > 0. Let X be a random variable with distribution P such that, denoting a = x ∧ 1, P(du The homogeneity property and the assumption φ(1) = 0 imply that By assumption, the expressions on both sides are smooth in p and x.Taking the derivative in p at p = 0 yields and hence, as . By differentiating with respect to x, one gets: Let ψ be defined on R by ψ(y) = ln (φ ′ (exp(y))).One readily checks that Equation (3) yields: ψ (log(x) + log(λ)) = ψ (log(x)) + ψ (log(λ)) .This equation holds for every x, λ > 0. This is well known to imply the linearity of ψ: there exists a real number β such that for all y ∈ R, Thus, φ ′ (exp(y)) = exp(β exp(y)), that is φ ′ (x) = x β for all x > 0. For α = −1, one obtains φ(x) = ln(x).Otherwise, taking into account the constraint φ(1) = 0, this yields Let us show iii).Let X and X ′ be two real-valued random variable.Since γ is convex, γ ′ is non-decreasing.So, to deal with expectation, it is the same thing to show the subadditivity or that We set S := X + X ′ .Using the concavity of γ, we have So we want to show that The hypothesis of sub-additivity is now usefull because it allows us to use the same argument that in the proof of the sub-additivity of the classical superquantile (see [2]).
Finally, we show the closeness under the same hypothesis as just before.Let be (X h ) h > 0 satisfying the hypothesis.By subadditivity we have Then, it is enough to show that to conclude.Thanks to the concavity of γ ′ we can use Jensen inequality We conclude with Cauchy-Schwartz inequality 4.2.Proof of proposition 3.3 : asymptotic behavior of the plug-in estimator of the superquantile.

Mathematical tools.
We first give some technical or classical results we will use in the forcoming proofs.
4.3.1.Ordered statistics and Beta function.Let us recall some results of ordered statistics (see [8]).Let (Y i ) i=1...n+1 be an independent sample having the standard exponential distribution.It's well known that ( 4) as the same distribution as the i th ordered statistics of an i.i.d sample of size n uniformly distributed on [0, 1].That is Beta law of parameter i, n − i + 1 and we denote it B(i, n − i + 1).It's also known that there are this equality in law The Beta distribution has the following density, for a and b strictly positives numbers is the Beta function.A classical property of the Beta function is the forcoming one Generalizing the definition of the factorials, we set n The first equation is the definition (??).To proof the second equation, let us fix k = 1 2 .
Lemma 4.1.Let δ be a real number strictly superior than 1. Then Proof.Let δ be superior than 1.We have to characterize the δ for which ) −δ is bounded when n goes to infinity.We set j := n + 1 − i.
The sum becomes when n goes to infinity, where ζ is the Zeta function which is finite for argument strictly superior than 1.
Finally, our problem is equivalent to characterize the set of δ which are superior than 1 and such that n δ− 1 2 is bounded.This set is clearly ]1, 3  2 ].

4.3.3.
A corollary of Lindenberg-Feller theorem.To prove the asymptotic normality, we use a central limit theorem which is a corollary of the Lindeberg-Feller theorem (see lemma 1 in [5]).Proof.We want to show the consistency of the estimator.
Let us first notice that In the sequel we omit the index X in F −1 X because there is no ambiguity.Let us introduce the two following quantities.
A n converges to 0 in L 1 .Indeed, denoting F n the empirical cumulative distribution function, we have n and We know thanks to 4.3.1 that is distributed like the i th ordered statistic of a uniform sample.Thus, defining U (i) with law B(i, n + 1 − i), it holds that, .
By Mean Value theorem, there exists w n (i) ∈]U (i) , i n+1 [ such that : when n is large enough, because of the non-decreasing of l in the neighborhood of 1.Let us now deals with the two terms in the maximum.As l = o((1 − t) 2 ) in the neighborhood of 1, for ǫ > 0, there exists N such that for n ≥ N and i ∈ using the density of the Beta distribution.5 allow us to conclude that when n goes to infinity.The second term in the maximum can be calculated in the same way.
Finally, the two sums : convergence to 0 when n goes to infinity thanks to lemma 4.1.So, A n converge in L 1 to 0 when n goes to infinity, then it converges to 0 in probability.Let us study the term y)dy to show its a.s convergence to 0. Remark 4.3.To begin with, it's easy to show the convergence when the sum and the integral are truncated at 1 − ǫ, for all epsilon (it is the convergence of a Riemann sum of a continuous functions).
Let us fix ǫ > 0. To begin with we separate the forcoming sum in two parts denoting the first part S 1 n and the second part S 2 n , to make appear the rest we need to control.
Since the quantile function is non-decreasing on [α, 1], we have : We denote C 1 n and C 2 n the two terms of the first line and D 1 n and D 2 n those of the third line.
Then, we have : n converge to 0, to conclude (the convergence of D 1 n − S 1 n and C 1 n − S 1 n to 0 is true thanks to the remark 4.3).As in the neighborhood of 1 l(t) = o (1 − t) −2 , we also have Then, for ǫ > 0, there exist N such that for n ≥ N : Finally, S 2 n − D 2 n converges to 0 a.s.So is B n .We have shown that A n + B n converge to 0 in probability.So under our hypothesis, the superquantile is consistent.
Remark 4.4.With the same kind of proof, we can show that under stronger hypothesis on the quantile function (that is the case in ii) of porposition 3.3), we have : We will use it in the next part.

Proof of ii) of proposition 3.3 : asymptotic normality of the plug-in estimator.
Let us prove the asymptotic normality of the estimator of the superquantile.To begin with, we make some technical remarks.
Remark 4.5.The hypothesis on L implies that there exists m l < 3 2 and m Proof.The proof stands in three steps.First we reformulate and simplify the problem and apply the Taylor Lagrange formula.Then, we show that the second order term converges to 0 in probability.In the third step, we identify the limit of the first order term.
Step 1 : Taylor-Lagrange formula Let us first omit the (1 − α) −1 .The problem is to study the convergence in law of To begin with, we have already noticed (remarks 4.4 and 4.5) that Slutsky's lemma, allows us to study only the convergence in law of The quantile function is C 2 then we can apply the Taylor-Lagrange formula at the order 2. Using the same reasoning than in the proof of i), we introduce U (i) a random variable distributed as a B(i, n + 1 − i).Considering an equality in law we then have We name √ nQ n the first-order term and R n the second-order.
Step 2 : The second-order term converges to 0 in probability Let us show that R n converge to 0 in probability.Thanks to Markov's inequality, we have Since the function L is non decreasing in the neighborhood of 1, for n large enough, we have, ) As before, we study the two terms in the maximum separately.First, using the variance of the Beta distribution, But according to remark ??, our hypothesis on L, gives that for ǫ ′ , there exists a rank N such that for n ≥ N , for i ∈ [⌊nα⌋, n], The convergence to 0 when n goes to infinity of the first term in the maximum is true thanks to the lemma 4.1.
We have now to deal with the second term in the maximum.Using hypothesis, we have, for ǫ ′ and n large enough, Using the Beta distribution of U (i) , we can write that, We name this last quantity I i n .To have n − i − 5 2 + 1 non negative, we cut the sum and deal only with the terms for i from ⌊nα⌋ to n − 2. Then we obtain : Using ( 6) we obtain We set E i n such that Developping E n i , we find its order when n goes to infinity : Let us study the term B(i,n+1−i) .Using ( 5) and (8), we obtain Since each i can be written like i = ⌊nβ⌋ with β < 1, n − i goes to infinity when n goes to infinity and we can apply the Stirling formula : π(n − 2) .
Then, when n goes to infinity .
That is why we obtain .
Finally, for n large enough, we have and thanks to our hypothesis on L, for ǫ ′ and n large enough, we have To conclude the second term in the maximum converges to 0 thanks the lemma 4.1.
Remark 4.6.The terms for i = 1 and i = n − 1 are of inferior order of the sum, so they converge to 0 too.Finally, the second order R n converges to 0 in probability.We can now deal with the first order term.
Step 3 : Identification of the limit Our goal is to find the limit of √ nQ n .Let us reorganize the expression of Q n to have a more classical form (sum of independent random variables) and to be allowed to use the proposition 4.2.
Denoting by we have where we have inverted the two sums.The law of large numbers gives that → 1 when n goes to infinity.Then thanks to Slutsky's lemma, we need to study Finally, we obtain where : , ∀j ≤ ⌊nα⌋ + 1 and , ∀j ≥ ⌊nα⌋ + 2 Let us check the hypothesis of the proposition 4.2.To begin with, let us show that σ 2 n converges a.s.We find a equivalent of σ 2 n and study the convergence of this equivalent.
Let us work with the two terms which depend on G n j .The first term can be develop in this way We simplify the second one too Finally Let us first notice that, if we note So we have the following equivalent of σ 2 n when n goes to infinity.
Let us show that this equivalent converges to σ 2 = and m l < 3 2 .
Remark 4.7.It is here that we need to take the hypothesis L(t) = O ((1 − t) −m L ) with m L < 5 2 instead of L(t) = o (1 − t) − 5 2 .Indeed, when m L = 5 2 and so m l = 3 2 , the primitives are different and the integral is not finite.
As we already seen, the results on Riemann's sum in dimension 2, give by continuity of (x, y) → min(x,y)−xy f (F −1 (x))f (F −1 (y)) that for all β < 1 : dxdy We have to study the rest of the sum to conclude.Let us fix β close to 1 and deal with .
In this case, the monotony will allowed us to conclude with the Lebesgue theorem.First of all, let us notice that Let now deal with the second hypothesis about the maximum of the α i,n .For j ≤ ⌊nα⌋ + 1, we have α j,n = (n − ⌊nα⌋ + 1)G n ⌊nα⌋+1 − H n n(n + 1) Then, with previous computations, when n is large ∼ (n + 1)G j n − T n n 2 So, we are in the same situation as the previous case replacing α by j.The previous argument are still true in this case.The second hypothesis is then true.We can apply proposition 4.2 and conclude that where σ 2 = 1 α 1 α (min(x, y) − xy)l(x)l(y)dxdy.Finally, we multiply everything by (1 − α) −1 to find the true limit.

Step 4 : Conclusion
The Slutsky lemma allows us to sum step 1 and step 2 to conclude.

3. 4 . 1 .
Exponential law.Let us show the example of the exponential distribution and the geometric Bregman function.We have γ ′