Evaluating Range Value at Risk Forecasts

The debate of what quantitative risk measure to choose in practice has mainly focused on the dichotomy between Value at Risk (VaR) -- a quantile -- and Expected Shortfall (ES) -- a tail expectation. Range Value at Risk (RVaR) is a natural interpolation between these two prominent risk measures, which constitutes a tradeoff between the sensitivity of the latter and the robustness of the former, turning it into a practically relevant risk measure on its own. As such, there is a need to statistically validate RVaR forecasts and to compare and rank the performance of different RVaR models, tasks subsumed under the term 'backtesting' in finance. The predictive performance is best evaluated and compared in terms of strictly consistent loss or scoring functions. That is, functions which are minimised in expectation by the correct RVaR forecast. Much like ES, it has been shown recently that RVaR does not admit strictly consistent scoring functions, i.e., it is not elicitable. Mitigating this negative result, this paper shows that a triplet of RVaR with two VaR components at different levels is elicitable. We characterise the class of strictly consistent scoring functions for this triplet. Additional properties of these scoring functions are examined, including the diagnostic tool of Murphy diagrams. The results are illustrated with a simulation study, and we put our approach in perspective with respect to the classical approach of trimmed least squares in robust regression.


Introduction
In the field of quantitative risk management, the last one or two decades have seen a lively debate about which monetary risk measure (Artzner et al., 1999) be best in (regulatory) practice. The debate mainly focused on the dichotomy between Value at Risk (VaR β ) on the one hand and Expected Shortfall (ES β ) on the other hand, at some probability level β ∈ (0, 1) (see Section 2 for definitions). Mirroring the historical joust between median and mean as centrality measures in classical statistics, VaR β , basically a quantile, is esteemed for its robustness, while ES β , a tail expectation, is deemed attractive due to its sensitivity and the fact that it satisfies the axioms of a coherent risk measure (Artzner et al., 1999). We refer the reader to Embrechts et al. (2014) and Emmer et al. (2015) for comprehensive academic discussions, and to Bank for International Settlements (2014) for a regulatory perspective in banking. Cont et al. (2010) considered the issue of statistical robustness of risk measure estimates in the sense of Hampel (1971). They showed that a risk measure cannot be both robust and coherent. As a compromise, they propose the risk measure 'Range Value at Risk', RVaR α,β at probability levels 0 < α < β < 1. It is defined as the average of all VaR γ with γ between α and β (see Section 2 for definitions). As limiting cases, one obtains RVaR β,β = VaR β and RVaR 0,β = ES β , which presents RVaR α,β as a natural interpolation of VaR β and ES β . Quantifying its robustness in terms of the breakdown point and following the arguments provided in Huber and Ronchetti (2009, p. 59), RVaR α,β has a breakdown point of min{α, 1−β}, placing it between the very robust VaR β (with a breakdown point of min{β, 1−β}) and the entirely non-robust ES β (breakdown point 0). This means it is a robust -and hence, not coherent -risk measure, unless it degenerates to RVaR 0,β = ES β (or if 0 ≤ α < β = 1). Moreover, RVaR belongs to the wide class of distortion risk measures (Kusuoka, 2001). For further contributions to robustness in the context of risk measures, we refer the reader to Krätschmer et al. (2012Krätschmer et al. ( , 2014, Kou et al. (2013), Embrechts et al. (2015) and Zähle (2016). Since the influential article Cont et al. (2010), RVaR has gained increasing attention in the risk management literature -see Embrechts et al. (2018a,b) for extensive studies -as well as in econometrics (Barendse, 2020) where RVaR sometimes has the alternative denomination Interquantile Expectation. For the symmetric case β = 1 − α > 1/2, RVaR α,1−α is known under the term α-trimmed mean in classical statistics and it constitutes an alternative to and interpolation of the mean and the median as centrality measures; see Lugosi and Mendelson (2019) for a recent study and a multivariate extension of the trimmed mean. It is closely connected to the α-Winsorized mean, see (2.4).
How to evaluate the predictive performance of point forecasts, x t , for a statistical functional T , such as the mean, median or a risk measure, of the (conditional) distribution of a quantity of interest, y t ? It is commonly measured in terms of the average realised score 1 n n t=1 S(x t , y t ) for some scoring or loss function S, using the orientation the smaller the better. Consequently, the loss function S should be strictly consistent for T in that T (F ) = arg min x S(x, y) dF (y): Correct predictions are honoured and encouraged in the long run. E.g., the squared loss S(x, y) = (x − y) 2 is consistent for the mean, and the absolute loss S(x, y) = |x − y| is consistent for the median. If a functional admits a strictly consistent score, it is called elicitable (Osband, 1985;Lambert et al., 2008;Gneiting, 2011). By definition, elicitable functionals allow for M -estimation and have natural estimation paradigms in regression frameworks (Dimitriadis et al., 2020, Section 2), such as quantile regression (Koenker and Basset, 1978;Koenker, 2005) or expectile regression (Newey and Powell, 1987). Elicitability is crucial for meaningful forecast evaluation (Engelberg et al., 2009;Murphy and Daan, 1985;Gneiting, 2011). In the context of probabilistic forecasts with distributional forecasts F t or density forecasts f t , (strictly) consistent scoring functions are often referred to as (strictly) proper rules, such as the log-score S(f, y) = − log f (y) . In quantitative finance, and particularly in the debate about which risk measure is best in practice, elicitability has gained considerable attention (Emmer et al., 2015;Ziegel, 2016;Davis, 2016). Especially, the role of elicitability for backtesting purposes has been highly debated (Gneiting, 2011;Acerbi andSzékely, 2014, 2017). It has been clarified that elicitability is central for comparative backtesting Nolde and Ziegel, 2017).
Not all functionals are elicitable. Osband (1985) showed that an elicitable functional necessarily has convex level sets (CxLS): Variance and ES generally do not have CxLS (Weber, 2006;Gneiting, 2011), therefore failing to be elicitable. The revelation principle (Osband, 1985;Gneiting, 2011) asserts that any bijection of an elicitable functional is elicitable. This implies that the pair (mean, variance) -being a bijection of the first two moments -is elicitable despite the variance failing to be elicitable. Similarly,  showed that the pair (VaR β , ES β ) is elicitable with the structural difference that the revelation principle is not applicable in this instance. This gave rise to the more general finding that the minimal expected score and its minimiser are always jointly elicitable (Brehmer, 2017;Frongillo and Kash, 2020).
Recently, Wang and Wei (2020, Theorem 5.3) showed that RVaR α,β , 0 < α < β < 1, similarly to ES α , fails to have the CxLS property, which rules out its elicitability. In contrast, they observe that the identity and the CxLS property of the pair (VaR α , ES α ) implies the CxLS property of the triplet (VaR α , VaR β , RVaR α,β ) (Wang and Wei, 2020, Example 4.6), leading to the question whether this triplet is elicitable or not. Invoking the elicitability of (VaR α , ES α ), the identity at (1.1) and the revelation principle establishes the elicitability of the quadruples (VaR α , VaR β , ES α , RVaR α,β ) and (VaR α , VaR β , ES β , RVaR α,β ). This approach has already been used in the context of regression in Barendse (2020). A fortiori, we show that the triplet (VaR α , VaR β , RVaR α,β ) is elicitable (Theorem 3.2) under weak regularity conditions. Practically, opens the way to meaningful forecast performance comparison, and in particular comparative backtests, of this triplet, as well as to a regression framework. Theoretically, this shows that the elicitation complexity (Lambert et al., 2008;Frongillo and Kash, 2020) or elicitation order  of RVaR α,β ist at most 3. Moreover, requiring only VaR-forecasts besides the RVaR-forecast is particularly advantageous to additionally requiring an ES-forecasts since the triplet (VaR α (F ), VaR β (F ), RVaR α,β (F )), 0 < α < β < 1, exists and is finite for any distribution F , whereas ES α (F ) and ES β (F ) only exist if the (left) tail of the distribution F is integrable. As RVaR α,β is used often for robustness purposes, safeguarding against outliers and heavy-tailedness, this advantage is important.
We would like to point out the structural difference between the elicitability result of (VaR α , VaR β , RVaR α,β ) provided in this paper and the one concerning (VaR α , ES α ) in  as well as the more general results of Frongillo and Kash (2020) and Brehmer (2017). While ES α corresponds to the negative of a minimum of an expected score which is strictly consistent for VaR α , it turns out that RVaR α,β can be represented as the difference of minima of expected strictly consistent scoring functions for VaR α and VaR β (Proposition 3.1). As a consequence, the class of strictly consistent scoring functions for the triplet (VaR α , VaR β , RVaR α,β ) turns out to be less flexible than the one for (VaR α , ES α ); see Remark 3.7 for details. In particular, there is essentially no translation invariant or positively homogeneous scoring function which is strictly consistent for (VaR α , VaR β , RVaR α,β ); see Section 4.
The paper is organised as follows. In Section 2, we introduce the relevant notation and definitions concerning RVaR, scoring functions and elicitability. The main results establishing the elicitability of the triplet (VaR α , VaR β , RVaR α,β ) (Theorems 3.2 and 3.5) and related findings are presented in Section 3. Section 4 shows that there are basically no strictly consistent scoring functions for (VaR α , VaR β , RVaR α,β ) which are positively homogeneous or translation invariant. In Section 5, we establish a mixture representation of the strictly consistent scoring functions in the spirit of Ehm et al. (2016). This result allows to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams. We demonstrate the applicability of our results and compare the discrimination ability of different scoring functions in a simulation study presented in Section 6. The paper finishes in Section 7 with a discussion of our results in the context of M -estimation and compares them to other suggestions in the statistical literature, in variants of a trimmed least squares procedure (Koenker and Basset, 1978;Ruppert and Carroll, 1980;Rousseeuw, 1984).

Definition of Range Value at Risk
There are different sign conventions in the literature on risk measures. In this paper we use the following convention: If a random variable Y models the losses and gains, then positive values of Y represent gains and negative values of Y losses. Moreover, if ρ is a risk measure, we assume that ρ(Y ) ∈ R corresponds to the maximal amount of money one can withdraw such that the position Y − ρ(Y ) is still acceptable. Hence, negative values of ρ correspond to risky positions. In the sequel, let F 0 be the class of probability distribution functions on R. Recall that the α-quantile, α Definition 2.1. Value at Risk of F ∈ F 0 at level α ∈ [0, 1] is defined as VaR α (F ) = inf q α (F ).
Hence, provided that ES α (F ), ES β (F ) are finite, one obtains the identity (1.1). If F has a finite left tail ( 0 −∞ |y| dF (y) < ∞) then one could use the right hand side of (1.1) as a definition of RVaR α,β (F ). However, in line with our discussion in the introduction, RVaR α,β (F ) always exists and is finite for 0 < α < β < 1 even if the right hand side of (1.1) is not defined.
Interestingly, Embrechts et al. (2018b, Theorem 2) establish that RVaR can be written as an inf-convolution of VaR and ES at appropriate levels. This result amounts to a supconvolution in our sign convention. Also note that our parametrisation of of RVaR α,β differs from theirs.

Elicitability and scoring functions
Using the decision-theoretic framework of  and Gneiting (2011), we introduce the following notation. Let F ⊆ F 0 be some generic subclass, and A ⊆ R k be an action domain. Whenever we consider a functional T : F → A, we tacitly assume that T (F ) is well-defined for all F ∈ F and is an element of A. We say that a function a : R → R is F-integrable if it is measurable and |a(y)| dF (y) < ∞ for all F ∈ F. Similarly, a function g : If g is F-integrable, we define the map g : A × F → R,ḡ(x, F ) := g(x, y) dF (y). If g : A × R → R is sufficiently smooth in its first argument, we denote the mth partial derivative of g(·, y) with ∂ m g(·, y).
for all x ∈ A and for all F ∈ F. A functional T : F → A is elicitable on F if it possesses a strictly F-consistent scoring function.
Definition 2.5. Two scoring function S, S : A × R → R are equivalent if there is some a : R → R and some λ > 0 such that S(x, y) = λS(x, y) + a(y) for all (x, y) ∈ A × R. They are strongly equivalent if additionally a ≡ 0.
This equivalence relation preserves (strict) consistency: If S is (strictly) F-consistent for T and if a is F-integrable, then S is also (strictly) F-consistent for T . Closely related to the concept of elicitability is the notion of identifiability.
In contrast to Gneiting (2011) we consider point-valued functionals only. For a recent comprehensive study on elicitability of set-valued functionals we refer to . For the sake of completeness, we list some assumptions used in Section 3 which were originally introduced in Fissler and Ziegel (2016) in the Appendix. Wang and Wei (2020, Theorem 5.3) show that for 0 < α < β < 1, RVaR α,β (and also the pairs (VaR α , RVaR α,β ) and (VaR β , RVaR α,β )) do not have CxLS on F dis , the class of distributions with bounded and discrete support. Hence, invoking that CxLS are necessary for elicitability and identifiability, RVaR α,β and the pairs (VaR α , RVaR α,β ) and (VaR β , RVaR α,β ) fail to be elicitable and identifiable on F dis . Our novel contribution is that the triplet (VaR α , VaR β , RVaR α,β ), however, is elicitable and identifiable, subject to mild conditions. We use the notation (Gneiting, 2011).

Elicitability and identifiability results
Proof. The proof is standard, observing that which follows from the representation (2.3).
The following theorem establishes a rich class of (strictly) consistent scoring functions , the class gets even broader.
are increasing, and If moreover φ is strictly convex, and the functions at (3.4) and (3.5) are strictly increasing, then S is strictly F α ∩ F β -consistent for T .
with a strict inequality under the conditions for strict consistency and if (x 1 , x 2 ) = (t 1 , t 2 ). Finally, Remark 3.3. Provided condition (iii) in Theorem 3.2 holds and if φ is strictly convex, and G 1,x 3 and G 2,x 3 strictly increasing then S given in Making use of (2.4) and the revelation principle (Osband, 1985;Gneiting, 2011;Fissler, 2017), Theorem 3.2 also provides a rich class of strictly consistent scoring function for The following proposition is useful to construct examples; see Section 6.
Proposition 3.4. Let S be of the form (3.3) with a (strictly) convex and non-constant function φ, and functions g 1 , g 2 such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.2 is satisfied. Then the following holds: (i) The subgradient φ ′ of φ is necessarily bounded and the one-sided derivatives of g 1 and g 2 are necessarily bounded from below.
(ii) S is strongly equivalent to a scoring functionS of the form (3.3) with a (strictly) convex functionφ such thatφ ′ is bounded with , and strictly increasing functionsg 1 ,g 2 such that their onesided derivatives are bounded from below by one and such that such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.2 is satisfied.
Proof. (i) The proof is similar to the one of Corollary 5.5 in : Therefore, φ ′ is bounded, and the one-sided derivative of g 1 is bounded from below by sup 3) for S, then S does not change. Also φ is (strictly) convex if and only if φ is (strictly) convex. Furthermore, conditions (ii) and (iii) of Theorem 3.2 hold for φ, g 1 , g 2 if and only if they hold for φ, g 1 and g 2 . By part (i) of the proposition φ ′ is bounded. Therefore, we can assume without loss of generality that Invoking the inequality (2.2) the triplet (VaR α , VaR β , RVaR α,β ) can only attain values in the domain Therefore, we call A 0 the maximal sensible action domain. Issuing forecasts for T outside A 0 , thus violating (2.2) would be irrational, corresponding to, say, negative variance forecasts. Still, the scoring functions of the form (3.3) allow for the evaluation of forecasts violating (2.2). Striving for a necessary characterisation result of (strictly) consistent scoring functions for (VaR α , VaR β , RVaR α,β ), it is immediate to realise that there is flexibility in [c min , c max ] 3 \ A 0 since one could possibly set the score to infinity there and would still preserve (strict) consistency. Therefore, it is not astonishing that a necessary characterisation result works only on domains A ⊆ A 0 . The key to such a necessary characterisation is Osband's principle (Fissler and Ziegel, 2016, Theorem 3.2) originating from the seminal dissertation of Osband (1985). Since it exploits a first-order condition of the minimisation of the expected score, the main assumptions of the result consist of smoothness assumptions on expected score as well as richness assumptions on the underlying class of distributions F; see Appendix for the detailed technical formulations and  for a discussion of these conditions. We introduce the class F cont ⊂ F 0 of distributions which are continuously differentiable and with a strictly positive derivative / density. (Clearly F cont ⊂ F γ ∩ F (γ) for any γ ∈ (0, 1).) For any A ⊆ R 3 , we denote the projections on the rth component by Proof. First note that V satisfies assumption (V3) on F ⊆ F cont . Let F ∈ F with derivative f and let x ∈ int(A). Then one obtains The partial derivatives of V are given by , 2, 3}. Since we assume thatS(·, F ) is twice continuously differentiable for any F ∈ F, the second order partial derivatives need to commute. Let t = T (F ).
The above calculations imply that the Hessian of the expected score, ∇ 2S (x, F ), at its minimiser x = t = T (F ), is a diagonal matrix with entries c 1 (t 1 , t 3 )f (t 1 ), c 2 (t 2 , t 3 )f (t 2 ), and c 3 (t 3 ). As a second order condition ∇ 2S (t, F ) must be positive semi-definite. Invoking the surjectivity of T once again, this shows that c 1 , c 2 , c 3 ≥ 0. More to the point, invoking the continuous differentiability of the expected score and the fact that S is strictly F-consistent for T one obtains that for any F ∈ F with t = T (F ) and for any v ∈ R 3 , v = 0, there exists an ε > 0 such that d dsS (t + sv, F ) is negative for all s ∈ (−ε, 0), zero for s = 0 and positive for all s ∈ (ε, 0) For v = e 3 = (0, 0, 1) ⊺ , this means that for any F ∈ F with t = T (F ) there is an ε > 0 such that d dsS (t + se 3 , F ) = c 3 (t 3 + s)s has the same sign as s for all s ∈ (−ε, ε). Therefore, c 3 (t 3 + s) > 0 for all s ∈ (−ε, ε) \ {0}. Using the surjectivity of T and invoking a compactness argument, c 3 attains a 0 only finitely many times on any compact interval. Recall that int(A) ′ 3 is an open interval. Hence, it can be approximated by an increasing sequence of compact intervals. Therefore, c −1 3 ({0}) is at most countable and therefore a Lebesgue null set. With similar arguments one can show that for any x 3 ) = 0} are at most countable and therefore also Lebesgue null sets.
Finally, using Proposition 1 in  (recognising that V is locally bounded) one obtains that S is almost everywhere of the form (3.3). Moreover, it holds almost everywhere that φ ′′ = c 3 and g ′ m = b m for m ∈ {1, 2}. Hence, φ is strictly convex and the functions at (3.4) and (3.5) are strictly increasing.
Combining Theorems 3.2 and 3.5, one can show that the scoring functions given at (3.3) are essentially the only strictly consistent scoring functions for the triplet (VaR α , VaR β , RVaR α,β ) on the action domain A Under the conditions of Theorem 3.5, a scoring function S : A × R → R is strictly F-consistent for T = (VaR α , VaR β , RVaR α,β ), 0 < α < β < 1, if and only if it is of the form (3.3) almost everywhere satisfying conditions (i), (ii), (iii). Moreover, the function φ ′ : [c min , c max ] → R is necessarily bounded.
Proof. For the proof it suffices to show that for r ∈ {1, 2}, G r,x 3 defined in (3.4), (3.5) is not only increasing on A ′ r,x 3 for any . This means that where the second inequality stems from the fact that φ ′ is increasing. If the function G 1,x ′ 1 is strictly increasing, then the first inequality is strict. The argument for G 2,x 3 works analogously.
Remark 3.7. Note the structural difference of Theorems 3.2 and 3.5 to Frongillo and Kash (2020, Theorem 1), Brehmer (2017, Proposition 4.14) and in particular Fissler and Ziegel (2016, Theorem 5.2 and Corollary 5.5). Our functional of interest, RVaR α,β with 0 < α < β < 1, is not a minimum of an expected scoring function -or Bayes risk -, but a difference of minima of two scoring functions. Indeed, while ES β (F ) = − 1 βS β (VaR β (F ), F ), we have that This structural difference is reflected in the minus sign appearing at (3.4). In particular, it means that the functions g 1 and g 2 cannot identically vanish if we want to ensure strict consistency of S, whereas the corresponding functions in Theorem 5.2 in Fissler and Ziegel (2016)  Concrete examples for choices of the functions g 1 , g 2 , and φ for the scoring function S at (3.3) are given and discussed in Section 6.

Translation invariance and homogeneity
There are many choices for the functions g 1 , g 2 , and φ appearing in the formula for the scoring function S at (3.3). Often, these choices can be limited by imposing secondary desirable criteria on S. In this section we show that, unfortunately, standard criteria (Patton (2011); Nolde and Ziegel (2017); Fissler and Ziegel (2019)) such as translation invariance and positive homogeneity are not fruitful for RVaR.
If one is interested in scoring functions with an action domain of the form A = {x ∈ R 3 | c min ≤ x 1 ≤ x 3 ≤ x 2 ≤ c max } possessing the additional property of translation invariant score differences, the only sensible choice is c min = −∞, c max = ∞, amounting to the maximal action domain A 0 . Similarly, for scoring functions with positively homogeneous score differences, the most interesting choices for action domains are A = A 0 , Proposition 4.1 (Translation invariance). Under the conditions of Theorem 3.5 there are no strictly F-consistent scoring functions for (VaR α , VaR β , RVaR α,β ) on A 0 with translation invariant score differences.
Proof. Using Theorem 3.5 any strictly F-consistent scoring function for T must be of the form (3.3) where in particular φ is strictly convex, twice differentiable, and φ ′ is bounded. Assume that S has translation invariant score differences. That means that the function Ψ : vanishes. Then, for all x ∈ A 0 and for all z, y ∈ R Therefore, φ ′′ needs to be constant. Since φ is convex and that means that The proof of Proposition 4.1 closely follows the one of Proposition 4.10 in Fissler and Ziegel (2019). The fact that the latter assertion entails a positive result has the following background: The strictly consistent scoring function for (VaR α , ES α ) given in Fissler and Ziegel (2019, Proposition 4.10) works only on a very restricted action domain. To guarantee strict consistency on such an action domain, one would need a refinement of Theorem 3.2 in the spirit of Fissler and Ziegel (2020, Proposition 2). However, since such a positive result on a quite restricted action domain is practically irrelevant, we dispense with such a refinement and only state the relevant negative result here.
Proof. Using Theorem 3.5 any strictly F-consistent scoring function for T must be of the form (3.3) where in particular φ is strictly convex, twice differentiable, and φ ′ is bounded. Assume that S has positively homogeneous score differences of some degree b ∈ R. That means that the function Ψ : vanishes. Therefore, for all x ∈ A, for all y ∈ R and all c > 0 (4.1) For the sake of brevity, we only consider the case A = A − 0 , the other cases being similar.
for any x 3 > 0. Due to the strict convexity of φ, we need that φ ′′ (−1) > 0. However,  Nolde and Ziegel (2017) characterising homogeneous strictly consistent scoring functions for the pair (VaR β , ES β ). Since they use a different sign convention for VaR and ES than we do in this paper, their choice of the action domain R × (0, ∞) corresponds to our choice A − 0 . When interpreting RVaR α,β as a risk measure, negative values of RVaR are the more interesting and relevant ones, using our sign convention. Inspecting the proof of Proposition 4.2 and of Proposition 3.4(i) one makes the following observation: For b ≥ 1, Nolde and Ziegel (2017) state an impossibility result for their choice of action domain. In fact, the problem occurring in our context is that φ ′ is not bounded from below. In Proposition 3.4 this property is implied by the fact that the function G 2,x 3 at (3.5) is increasing. And it is exactly such a condition that is also present for strictly consistent scoring functions for the pair (VaR β , ES β ); see Theorem 5.2 in . On the other hand, the complication for b < 1 stems from the fact that φ ′ is not bounded from above. This condition is related to the monotonicity of G 1,x 3 at (3.4). Such a condition is not present for strictly consistent scoring functions for the pair (VaR β , ES β ). Correspondingly, there can be homogeneous and strictly consistent scoring functions for b < 1 for this pair (Nolde and Ziegel, 2017) while this is not possible for the triplet (VaR α , VaR β , RVaR α,β ).

Mixture representation of scoring functions
When forecasts are compared and ranked with respect to consistent scoring functions, one has to be aware that in the presence of non-nested information sets, model misspecification and/or finite samples, the ranking may depend on the chosen consistent scoring function (Patton, 2020). In the specific case of (VaR α , VaR β , RVaR α,β ), the forecast ranking may depend on the specific choice for the functions g 1 , g 2 , and φ appearing in Theorem 3.2. A possible remedy to this problem is to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams as introduced by Ehm et al. (2016). Murphy diagrams are based on the fact that the class of all consistent scoring functions can be characterised as a class of mixtures of elementary scoring functions that depend on a low-dimensional parameter. The following theorem provides such a mixture representation for the scoring functions at (3.3). The applicability is illustrated in Section 6. Recall that S α (x, y) = (1{y ≤ x} − α)x − 1{y ≤ x}y.
Theorem 5.1. Let 0 < α < β < 1. Any scoring function S : [c min , c max ] 3 × R → R of the form (3.3) with a : R → R chosen such that S(y, y, y, y) = 0 can be written as Proof. An increasing function h : [c min , c max ] → R can always be written as Using the arguments from Proposition 3.4, it is no loss of generality to show the assertion for a score S such that λ(β − α) = − inf x φ ′ (x) = sup x φ ′ (x) and the one-sided derivatives of g 1 , g 2 are bounded from below by λ > 0.
Then, there is a measure H 3 on [c min , c max ] such that H 3 ([c min , c max ]) = 2λ(β − α), which is strictly positive if and only if φ is strictly convex, such that for all for all Using Fubini's theorem, we find that Using (3.3), (5.2) and Proposition 3.4 it is straight forward to check that a scoring function of the form (3.3) can be written as in (5.1) with L 3 v replaced bỹ and locally finite measuresH 1 ,H 2 on [c min , c max ] instead of H 1 , H 2 such thatH i (A) ≥ λL(A) for i = 1, 2, and for all Borel sets A ⊆ R, and the measure measure H 3 . We can Scoring function Table 1: Examples of scoring functions. In all cases we choose g 1 (x 1 ) = x 1 and g 2 (x 2 ) = x 2 . The parameters c 1 , c 2 ∈ R satisfy c 1 < c 2 .
Integrating v → L 1 v with respect to λL, we obtain the function λ(S α (x 1 , y) + αy), and analogously for L 2 v . Using that H 3 ([c min , c max ]) = 2λ(β − α) yields the claim with which is equal to the formula given in the statement of the theorem. The scoring functions L 1 v and L 2 v are consistent for VaR at level α and β, respectively. The scoring function L 3 v is of the form (3.3) with g 1 (x) = g 2 (x) = x/(2β − 2α) and φ(x) = |x − v|/2, which renders it a consistent scoring function for (VaR α , VaR β , RVaR α,β ). The converse statement follows by direct computations.

Simulations
This simulation study illustrates the usage of consistent scoring functions for the triplet (VaR α , VaR β , RVaR α,β ) when comparing the predictive performances of different forecasts for this triplet, e.g., in the context of comparative backtests (Nolde and Ziegel, 2017). Due to the negative results in Section 4 it is challenging to suggest concrete examples for the choices of the functions φ, g 1 and g 2 in (3.3). In Table 1, we give some first suggestions. The scoring function S 4 is in the spirit of the Huber loss (Huber, 1964, p. 79). It is only strictly consistent on [c 1 , c 2 ] 3 , but remains consistent for all of R 3 . We illustrate the discrimination ability of the suggested scoring functions with a slightly extended version of a simulation example of Gneiting et al. (2007) which has also been considered in .
We consider a data generating process (Y t ) t=1,...,N given by Y t = µ t + u t , where (µ t ) t=1,...,N and (u t ) t=1,...,N are mutually independent sequences of i.i.d. standard normal random variables. Suppose we have three different forecasters who provide point forecasts, aiming at correctly specifying (VaR α , VaR β , RVaR α,β ) of the (conditional) distribution of Y t . The first forecaster has access to µ t and uses the correct conditional distribution for prediction, that is, they predict for timepoint t, where ϕ and Φ denote the density and quantile function of the standard normal distribution, respectively. The second forecaster predicts g t = (g 1,t , g 2,t , g 3,t ), where g 1,t = f 1,t + ε t , g 2,t = f 2,t + ε t and g 3,t = f 3,t + ε t and where (ε t ) t=1,...,N is independent normally distributed noise with mean zero and variance σ 2 . The third forecaster, h t = (h 1,t , h 2,t , h 3,t ), bases their predictions on the unconditional distribution of Y t , that is N (0, 2). Therefore, the forecasts take the form It is clear that the first forecaster dominates the second and the third forecaster, that is, they will be preferred under any consistent scoring function. Indeed, invoking Holzmann and Eulert (2014), in case of the first and the second forecaster, the first one is ideal with respect to the information set σ(µ t , ε t ), whereas the second one is based on the same information set but is not ideal. In case of the first and the third forecaster, both forecasters are ideal but the information set of the first forecaster, σ(µ t ), is larger than the one of the third forecaster, which is the trivial σ-algebra. It will depend on the size of the variance σ 2 whether the second or the third forecaster is preferred. Figures  1 and 2 provide Murphy diagrams of all forecasters computed from a sample of size N = 100 ′ 000, providing a good approximation of the population level. They are in line with our theoretical considerations above concerning the ranking of the three forecasts. We compare the predictive performances using Diebold-Mariano tests (Diebold and Mariano, 1995) based on the scoring functions in Table 1. We consider samples of size N = 250 and repeat our experiment 10'000 times. In the left panel of Table 2, we consider the case that α = 1 − β = 0.1 where RVaR α,β is a trimmed mean. We report the ratio of rejections of the null hypothesis that forecaster i outperforms forecaster j, i, j ∈ {1, 2, 3}, i = j, evaluated in terms of the score S at significance level 0.05. E.g., for i = 1, j = 2, we consider the null hypothesis for all t = 1, . . . , N , or in short, f g. Analogously, in the right panel of Table 2, we consider the case that α, β are both close to zero, that is, α = 0.01 and β = 0.05, which is a setting that is relevant if RVaR α,β is used as a risk measure. For the scoring function S 4 , we have experimented a bit with the values c 1 and c 2 and report the results for the choices that worked best in our experiments. A systematic study on how to choose these two parameters goes beyond the scope of the present paper.
For the situation of the left panel of Table 2 concerning α = 1 − β = 0.1, we can see that forecaster 1 (2) outperforms forecaster 3 with a power of 1 (almost 1) for all scoring functions used. For a comparison of forecaster 1 and forecaster 2, the situation is more interesting: While forecaster 1 outperforms forecaster 2 with regard to all scoring functions considered, the power of the tests (and the associated discrimination ability of the scoring functions) varies substantially. While S 1 leads to an empirical power of 0.304 for the null hypothesis f g, the score S 4 induces a power of 0.624 for the same null hypothesis. The situation described in the right panel of Table 2 considering the parameter choice α = 0.01 and β = 0.05 leads to a different situation. The tests employing S 1 , S 2 and S 3 have a similar power. In contrast, S 4 yields a considerably smaller power (0.393) for the null h g than the other scores (≥ 0.874 for all cases). A more detailed study and comparison of other scoring functions and other situations is deferred to future work.  Table 2: Power of Diebold-Mariano tests at significance level 0.05 for the scoring functions in Table 1 in the case that α = 1 − β = 0.1 (left panel), and α = 0.01, β = 0.05 (right panel). In the first case we chose −c 1 = c 2 = 12 for the scoring function S 4 , and c 1 = −5, c 2 = 1 in the second case. The null hypothesis for all t = 1, . . . , N for the scoring function specified in the column label. We chose σ 2 = 0.5 2 for the forecaster g.

Implications for regression
After illustrating the usage of consistent scoring functions in forecast comparison and comparative backtesting in Section 6, we would like to outline how one can implement our results about the elicitability of the triplet (VaR α , VaR β , RVaR α,β ), 0 < α < β < 1 in a regression context. Then we would like to contrast our ansatz to other suggestions for regression of the α-trimmed mean (which can be generalised to RVaR α,β ). The most common alternative approaches in the literature on robust statistics are the trimmed least squares approach and a two-step estimation procedure using the Huber skipped mean.
7.1 A joint regression framework for (VaR α , VaR β , RVaR α,β ) Let (X t , Y t ) t∈N be a time series with the usual notation that Y t denotes some real valued response variable and X t is a d-dimensional vector of regressors. Let Θ ⊆ R k be some parameter space and M : R d ×Θ → R 3 a parametric model for T = (VaR α , VaR β , RVaR α,β ), 0 < α < β < 1. We assume a correct model specification, that is, we assume that there is a unique θ 0 ∈ Θ such that T (F Yt|Xt ) = M (X t , θ 0 ) P-a.s. for all t ∈ N, (7.1) where F Yt|Xt denotes the conditional distribution of Y t given X t . That means, M (X t , θ 0 ) models jointly the conditional VaR α , VaR β and the conditional RVaR α,β . Let S be a strictly consistent scoring function of the form (3.3) and suppose the sequence (X t , Y t ) t∈N satisfies certain mixing conditions (White, 2001, Corollary 3.48) (in particular under independence). Then one obtains under additional moment conditions that, as n → ∞, It is essentially this Law of Large Numbers result which allows for consistent parameter estimation with the empirical M -estimator θ n = arg min θ∈Θ n −1 n t=1 S(M (X t , θ), Y t ); see e.g. van der Vaart (1998), Huber and Ronchetti (2009), Nolde and Ziegel (2017) and Dimitriadis et al. (2020) for details.
In summary, we can see that the complication of this procedure is that one needs to model the components VaR α , VaR β , even if one is only interested in RVaR α,β . The advantage is that one can substantially deviate from an i.i.d. assumption on the data generating process. One can deal with serially dependent, though mixing, and nonstationary data. One only needs the semiparametric stationarity specified through (7.1).

Trimmed least squares
Most proposals for M -estimation and regression for RVaR α,β in the field of robust statistics focus on the α-trimmed mean, α ∈ (0, 1/2), corresponding to RVaR α,1−α . But they can often be extended to the general case 0 < α < β < 1 in a straightforward way. When this is the case, we describe the procedure in this more general manner. A majority of the proposals in the literature are commonly referred to as a trimmed least squares (TLS) approach. However, strictly speaking, TLS actually subsumes different, though closely related estimation procedures.
The first one was coined by Koenker and Basset (1978) -cf. Ruppert and Carroll (1980) -and constitutes a two-step M -estimator: In a first step, the α-and β-quantile are determined via usual M -estimation. Then, all values below the former and above the latter are omitted and RVaR α,β is computed with an ordinary least squares approach. One can also express this procedure using order-statistics. Using the notation from Subsection 7.1, an M -estimator for RVaR α,β is given by arg min z∈R Here, Y (1) ≤ · · · ≤ Y (n) is the order-statistics of the sample Y 1 , . . . , Y n . While this procedure seems to work for a simplistic regression model (ignoring the regressors X t and only modelling the intercept part), it is not clear how to use it in a more interesting regression context, where one is actually interested in the conditional distribution of Y t given X t rather than the unconditional distribution of Y t . Moreover, since this approach uses the order statistics of the entire sample Y 1 , . . . , Y n to implicitly estimate the α-and β-quantile, it requires that these quantiles be constant in time. Hence, heteroscedasticity (in time) can lead to problems, even if RVaR α,β is constant in time.
While this procedure appears to be fairly similar to an ordinary least squares procedure with the respective computational advantages, one should recall that the trimming crucially depends on the choice of the parameter θ. That means even if the model m is linear in the parameter θ, one generally yields a non-convex objective function with several local minima. Interestingly, the trimming takes place only for residuals with large modulus. If the error distribution is symmetric, this procedure yields a consistent estimator for θ 0 in an i.i.d. setting. If one wants to relax the assumption on the error distribution and is interested in modelling RVaR α,β for general 0 < α < β < 1 in (7.2), one could come up with the following ad-hoc procedure: Consider the order-statistics of the residuals ε (1) (θ) ≤ · · · ≤ ε (n) (θ). Then define an M -estimator via θ n = arg min θ∈Θ 1 n [nβ] i= [nα] |ε (i) (θ)| 2 .
This procedure takes into account the asymmetric nature of trimming when dealing with β = 1−α or β = 1−α and an asymmetric error distribution. However, as outlined above, this procedure can lead to problems in the presence of heteroscedasticity or general nonstationarity of the error distribution, if the conditional VaR α and VaR β of Y t given X t depends on X t . We would like to point out that, at the cost of additionally modelling the α-and β-quantile, the procedure using our strictly consistent scoring functions for the triplet (VaR α , VaR β , RVaR α,β ) described in Subsection 7.1 does not rely on the usage of order-statistics and it can in general deal with heteroscedasticity. The only degree of 'stationarity' is required through (7.1). Especially stationarity is deemed a too strong assumption in the context of financial data; see Davis (2016). Finally, we would like to remark that there are further procedures belonging to the field of TLS. For instance, Atkinson and Cheng (1999) propose an adaptive procedure where the trimming parameter is data driven; see also Cerioli et al. (2018). However, we see no apparent way how to use such procedures if one is interested in predefined trimming parameters α and β.

Connections to Huber loss and Huber skipped mean
In his seminal paper, Huber (1964) introduced the famous Huber loss S(x, y) = ρ(x − y) where ρ(t) = 1 2 t 2 for |t| ≤ k and ρ(t) = k|t| − 1 2 k 2 for |t| > k. Huber argues that the "the corresponding [M-]estimator is related to Winsorizing" (Huber, 1964, p. 79). What obtained significantly less attention -maybe due to its lack of convexity -is another loss function he considers on the same page of the paper which is defined as S(x, y) = ρ(x − y) for ρ(t) = 1 2 t 2 for |t| ≤ k and ρ(t) = 1 2 k 2 for |t| > k. He writes about it: "the corresponding [M-]estimator is a trimmed mean" (ibidem).
One could define an asymmetric version of the latter loss function by using S k 1 ,k 2 (x, y) = ρ k 1 ,k 2 (x − y) with ρ k 1 ,k 2 (t) =      1 2 k 2 1 t < k 1 1 2 t 2 k 1 ≤ t < k 2 1 2 k 2 2 t ≥ k 2 . Assuming that F is continuous with density f for the sake of the simplicity of the argument, the corresponding first-order condition for a minimum of the expected scorē S k 1 ,k 2 (x, F ) is equivalent with Now a suggestion similar to Rousseeuw (1984, p. 876) is to consider this loss with k 1 = VaR β (F ) and k 2 = VaR α (F ) stemming from some pre-estimate. However, one can see that the first order-condition is generally not solved by RVaR α,β (F ). Again, if one is interested in M -estimation for the trimmed mean or, more generally, RVaR, one should use the scoring functions introduced at (3.3).
Assumption (F1). For every y ∈ R there exists a sequence (F n ) n∈N of distributions F n ∈ F that converges weakly to the Dirac-measure δ y such that the support of F n is contained in a compact set K for all n.
Assumption (VS1). Suppose that the complement of the set C := {(x, y) ∈ A × R | V (x, ·) and S(x, ·) are continuous at the point y} has (k + d)-dimensional Lebesgue measure zero.
Assumption (S2). For every F ∈ F, the functionS(·, F ) is continuously differentiable and the gradient is locally Lipschitz continuous. Furthermore,S(·, F ) is twice continuously differentiable at t = T (F ) ∈ int(A).