The debate of which quantitative risk measure to choose in practice has mainly focused on the dichotomy between value at risk (VaR) and expected shortfall (ES). Range value at risk (RVaR) is a natural interpolation between VaR and ES, constituting a tradeoff between the sensitivity of ES and the robustness of VaR, turning it into a practically relevant risk measure on its own.
Hence, there is a need to statistically assess, compare and rank the predictive performance of different RVaR models, tasks subsumed under the term “comparative backtesting” in finance.
This is best done in terms of
strictly consistent loss or scoring functions, i.e., functions which are minimized in expectation by the correct risk measure forecast.
Much like ES, RVaR does not admit strictly consistent scoring functions, i.e., it is not elicitable.
Mitigating this negative result, we show that a triplet of RVaR with two VaR-components is elicitable. We characterize all strictly consistent scoring functions for this triplet. Additional properties of these scoring functions are examined, including the diagnostic tool of Murphy diagrams. The results are illustrated with a simulation study, and we put our approach in perspective with respect to the classical approach of trimmed least squares regression.
KeywordsBacktestingconsistencyexpected shortfallpoint forecastsscoring functionstrimmed meanMSC 201062C99 62G35 62P05 91G70Tobias Fissler is grateful to the Department of Mathematics at Imperial College London who funded his fellowship during which most of the work of this paper has been done. Johanna Ziegel is grateful for financial support from the Swiss National Science Foundation.Introduction
In the field of quantitative risk management, the last one or two decades have seen a lively debate about which monetary risk measure [3] would be best in (regulatory) practice. The debate mainly focused on the dichotomy between value at risk (VaRβ{\operatorname{VaR}_{\beta}}) on the one hand and expected shortfall (ESβ{\operatorname{ES}_{\beta}}) on the other hand, at some probability level β∈(0,1){\beta\in(0,1)} (see Section 2 for definitions).
Mirroring the historical joust between median and mean as centrality measures in classical statistics, VaRβ{\operatorname{VaR}_{\beta}}, basically a quantile, is esteemed for its robustness, while ESβ{\operatorname{ES}_{\beta}}, a tail expectation, is deemed attractive due to its sensitivity and the fact that it satisfies the axioms of a coherent risk measure [3].
We refer the reader to [15, 17] for comprehensive academic discussions, and to [58] for a regulatory perspective in banking.
Cont, Deguest and Scandolo [8] considered the issue of statistical robustness of risk measure estimates in the sense of [30]. They showed that a risk measure cannot be both robust and coherent. As a compromise, they propose the risk measure “range value at risk”, RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} at probability levels 0<α<β<1{0<\alpha<\beta<1}.
It is defined as the average of all VaRγ{\operatorname{VaR}_{\gamma}} with γ between α and β (see Section 2 for definitions).
As limiting cases, one obtains RVaRβ,β=VaRβ{\operatorname{RVaR}_{\beta,\beta}=\operatorname{VaR}_{\beta}} and RVaR0,β=ESβ{\operatorname{RVaR}_{0,\beta}=\operatorname{ES}_{\beta}}, which presents RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} as a natural interpolation of VaRβ{\operatorname{VaR}_{\beta}} and ESβ{\operatorname{ES}_{\beta}}. Quantifying its robustness in terms of the breakdown point
and following the arguments provided in [33, p. 59], RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} has a breakdown point of min{α,1-β}{\min\{\alpha,1-\beta\}}, placing it between the very robust VaRβ{\operatorname{VaR}_{\beta}} (with a breakdown point of min{β,1-β}{\min\{\beta,1-\beta\}}) and the entirely non-robust ESβ{\operatorname{ES}_{\beta}} (breakdown point 0). This means it is a robust – and hence not coherent – risk measure, unless it degenerates to RVaR0,β=ESβ{\operatorname{RVaR}_{0,\beta}=\operatorname{ES}_{\beta}} (or if 0≤α<β=1{0\leq\alpha<\beta=1}).
Moreover, RVaR{\operatorname{RVaR}} belongs to the wide class of distortion risk measures [55, 52]. For further contributions to robustness in the context of risk measures, we refer the reader to [37, 38, 36, 16, 56].
Since the influential article [8], RVaR has gained increasing attention in the risk management literature – see [13, 14] for extensive studies – as well as in econometrics [5] where RVaR sometimes has the alternative denomination interquantile expectation.
For the symmetric case β=1-α>12{\beta=1-\alpha>\frac{1}{2}}, RVaRα,1-α{\operatorname{RVaR}_{\alpha,1-\alpha}} is known under the term α-trimmed mean in classical statistics and it constitutes an alternative to and interpolation of the mean and the median as centrality measures; see [40] for a recent study and a multivariate extension of the trimmed mean. It is closely connected to the α-Winsorized mean; see (2.3).
How is it possible to evaluate
the predictive performance of point forecasts Xt{X_{t}} for a statistical functional T, such as the mean, median or a risk measure, of the (conditional) distribution of a quantity of interest Yt{Y_{t}}?
It is commonly measured in terms of the average realized score1n∑t=1nS(Xt,Yt){\frac{1}{n}\sum_{t=1}^{n}S(X_{t},Y_{t})} for some loss or scoring function S, using the orientation the smaller the better.
Consequently, the loss function S should be strictly consistent for T in that T(F)=argminx∫S(x,y)dF(y){T(F)=\operatornamewithlimits{arg\,min}_{x}\int S(x,y)\,\mathrm{d}F(y)}: correct predictions are honored and encouraged in the long run,
e.g., the squared loss S(x,y)=(x-y)2{S(x,y)=(x-y)^{2}} is consistent for the mean, and the absolute loss S(x,y)=|x-y|{S(x,y)=\lvert x-y\rvert} is consistent for the median.
If a functional admits a strictly consistent score, it is called elicitable [44, 39, 27]. By definition, elicitable functionals allow for M-estimation and have natural estimation paradigms in regression frameworks [11, Section 2] such as quantile regression [35, 34] or expectile regression [42].
Elicitability is crucial for meaningful forecast evaluation [18, 41, 27].
In the context of probabilistic forecasts with distributional forecasts Ft{F_{t}} or density forecasts ft{f_{t}}, (strictly) consistent scoring functions are often referred to as (strictly) proper rules such as the log-score S(f,y)=-logf(y){S(f,y)=-\log f(y)} (see [29]).
In quantitative finance, and particularly in the debate about which risk measure is best in practice, elicitability has gained considerable attention [17, 57, 9]. Especially, the role of elicitability for backtesting purposes has been highly debated [27, 1, 2]. It has been clarified that elicitability is central for comparative backtesting [24, 43].
On the other hand, if one strives to validate forecasts, (strict) identification functions are crucial. Much like scoring functions, they are functions in the forecast and the observation, which, however, vanish in expectation at (and only at) the correct report.
Thus, they can be used to check (conditional) calibration [26, 43].
Not all functionals are elicitable or identifiable. Osband [44] showed that an elicitable or identifiable functional necessarily has convex level sets (CxLS): If T(F0)=T(F1)=t{T(F_{0})=T(F_{1})=t} for two distributions F0,F1{F_{0},F_{1}}, then T(Fλ)=t{T(F_{\lambda})=t} where Fλ=(1-λ)F0+λF1{F_{\lambda}=(1-\lambda)F_{0}+\lambda F_{1}}, λ∈(0,1){\lambda\in(0,1)}.
Variance and ES generally do not have CxLS [53, 27], therefore failing to be elicitable and identifiable.
The revelation principle [44, 27, 19] asserts that any bijection of an elicitable/identifiable functional is elicitable/identifiable. This implies that the pair (mean, variance) – being a bijection of the first two moments – is elicitable and identifiable despite the variance failing to be so. Similarly, Fissler and Ziegel [21] showed that the pair (VaRβ,ESβ){(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})} is elicitable and identifiable, with the structural difference that the revelation principle is not applicable in this instance. This is followed by the more general finding that the minimal expected score and its minimizer are always jointly elicitable [6, 25].
Recently, Wang and Wei [51, Theorem 5.3] showed that RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}}, 0<α<β<1{0<\alpha<\beta<1}, similarly to ESα{\operatorname{ES}_{\alpha}}, fails to have CxLS as a standalone measure, which rules out its elicitability and identifiability. In contrast, they observe that the identity
which holds if ESα{\operatorname{ES}_{\alpha}} and ESβ{\operatorname{ES}_{\beta}} are finite,
and the CxLS property of the pairs (VaRα,ESα){(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})}, (VaRβ,ESβ){(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})} implies the CxLS property of the triplet (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} (see [51, Example 4.6]). This raises the question whether this triplet is elicitable and identifiable or not.
By invoking the elicitability and identifiability of (VaRα,ESα){(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})}, identity (1.1) and the revelation principle establish the elicitability and identifiability of the quadruples (VaRα,VaRβ,ESα,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{ES}_{%
\alpha},\operatorname{RVaR}_{\alpha,\beta})} and (VaRα,VaRβ,ESβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{ES}_{%
\beta},\operatorname{RVaR}_{\alpha,\beta})}. This approach has already been used in the context of regression in [5].
Improving this result, we show that the triplet (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} is elicitable (Theorem 3.3) and identifiable (Proposition 3.1) under weak regularity conditions. Practically, our results open the way to model validation, to meaningful forecast performance comparison, and in particular to comparative backtests, of this triplet, as well as to a regression framework.
Theoretically, they show that the elicitation complexity [39, 25] or elicitation order [21] of RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} is at most 3.
Moreover, requiring only VaR-forecasts besides the RVaR-forecast is particularly advantageous in comparison to additionally requiring ES-forecasts since
the triplet (VaRα(F),VaRβ(F),RVaRα,β(F)){(\operatorname{VaR}_{\alpha}(F),\operatorname{VaR}_{\beta}(F),\operatorname{%
RVaR}_{\alpha,\beta}(F))}, 0<α<β<1{0<\alpha<\beta<1}, exists and is finite for any distribution F, whereas
ESα(F){\operatorname{ES}_{\alpha}(F)} and ESβ(F){\operatorname{ES}_{\beta}(F)} are only finite if the (left) tail of the gains-and-loss distribution F is integrable. As RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} is used often for robustness purposes, safeguarding against outliers and heavy-tailedness, this advantage is important.
We would like to point out the structural difference between the elicitability result of
provided in this paper and the one concerning (VaRα,ESα){(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})} in [21] as well as the more general results of [25, 6]. While ESα{\operatorname{ES}_{\alpha}} corresponds to the negative of a minimum of an expected score which is strictly consistent for VaRα{\operatorname{VaR}_{\alpha}}, it turns out that RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} can be represented as the negative of a scaled difference of minima of expected strictly consistent scoring functions for VaRα{\operatorname{VaR}_{\alpha}} and VaRβ{\operatorname{VaR}_{\beta}}; see equations (3.1) and (3.2). As a consequence, the class of strictly consistent scoring functions for the triplet (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} turns out to be less flexible than the one for (VaRα,ESα){(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})}; see Remark 3.9 for details.
In particular, there is essentially no translation invariant or positively homogeneous scoring function which is strictly consistent for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}; see Section 4.
The paper is organized as follows. In Section 2, we introduce the relevant notation and definitions concerning RVaR, scoring functions and elicitability. The main results are presented in Section 3,
establishing the elicitability of the triplet (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} (Theorem 3.3) and characterizing the class of strictly consistent scoring functions (Theorem 3.7), exploiting the identifiability result of Proposition 3.1.
Section 4 shows that there are basically no strictly consistent scoring functions for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} which are positively homogeneous or translation invariant. In Section 5, we establish a mixture representation of the strictly consistent scoring functions in the spirit of [12]. This result allows to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams. We demonstrate the applicability of our results and compare the discrimination ability of different scoring functions in a simulation study presented in Section 6. The paper finishes in Section 7 with a discussion of our results in the context of M-estimation and compares them to other suggestions in the statistical literature, in variants of a trimmed least squares procedure [35, 49, 47].
Notation and definitionsDefinition of range value at risk
There are different sign conventions in the literature on risk measures. In this paper, we use the following convention: if a random variable Y models the gains and losses, then positive values of Y represent gains and negative values of Y losses.
Moreover, if ρ is a risk measure, we assume that ρ(Y)∈ℝ{\rho(Y)\in\mathbb{R}} corresponds to the maximal amount of money one can withdraw such that the position Y-ρ(Y){Y-\rho(Y)} is still acceptable. Hence, negative values of ρ correspond to risky positions.
In the sequel, let ℱ0{\mathcal{F}_{0}} be the class of probability distribution functions on ℝ{\mathbb{R}}. Recall that the α-quantile, α∈[0,1]{\alpha\in[0,1]}, of F∈ℱ0{F\in\mathcal{F}_{0}} is defined as the set
qα(F)={x∈ℝ∣F(x-)≤α≤F(x)}{q_{\alpha}(F)=\{x\in\mathbb{R}\mid F(x-)\leq\alpha\leq F(x)\}}, where F(x-):=limt↑xF(t){F(x-):=\lim_{t\uparrow x}F(t)}.
Definition 2.1.
Value at risk of F∈ℱ0{F\in\mathcal{F}_{0}} at level α∈[0,1]{\alpha\in[0,1]} is defined by
VaRα(F)=infqα(F){\operatorname{VaR}_{\alpha}(F)=\inf q_{\alpha}(F)}.
For any α∈[0,1]{\alpha\in[0,1]} we introduce the following subclasses of ℱ0{\mathcal{F}_{0}}:
Distributions F∈ℱ(α){F\in\mathcal{F}^{(\alpha)}} have at least one solution to the equation F(x)=α{F(x)=\alpha};
distributions F∈ℱα{F\in\mathcal{F}^{\alpha}} have at most one solution to the equation F(x)=α{F(x)=\alpha}.
Definition 2.2.
Range value at risk of F∈ℱ0{F\in\mathcal{F}_{0}} at levels 0≤α≤β≤1{0\leq\alpha\leq\beta\leq 1} is defined by
Note that limα↑βRVaRα,β(F)=VaRβ(F)=RVaRβ,β(F){\lim_{\alpha\uparrow\beta}\operatorname{RVaR}_{\alpha,\beta}(F)=\operatorname%
{VaR}_{\beta}(F)=\operatorname{RVaR}_{\beta,\beta}(F)}.
The definition of RVaR and the fact that γ↦VaRγ(F){\gamma\mapsto\operatorname{VaR}_{\gamma}(F)} is increasing imply that
For 0<α≤β<1{0<\alpha\leq\beta<1} and F∈ℱ0{F\in\mathcal{F}_{0}} one obtains that
(i) RVaRα,β(F)∈ℝ{\operatorname{RVaR}_{\alpha,\beta}(F)\in\mathbb{R}};
(ii) RVaR0,β(F)∈ℝ∪{-∞}{\operatorname{RVaR}_{0,\beta}(F)\in\mathbb{R}\cup\{-\infty\}} and it is finite if and only if ∫-∞0|y|dF(y)<∞{\int_{-\infty}^{0}\lvert y\rvert\,\mathrm{d}F(y)<\infty}; and
(iii) RVaRα,1(F)∈ℝ∪{∞}{\operatorname{RVaR}_{\alpha,1}(F)\in\mathbb{R}\cup\{\infty\}} and it is finite if and only if ∫0∞|y|dF(y)<∞{\int_{0}^{\infty}\lvert y\rvert\,\mathrm{d}F(y)<\infty}.
Moreover, RVaR0,1(F){\operatorname{RVaR}_{0,1}(F)} exists if and only if
and then coincides with ∫ydF(y)∈ℝ∪{±∞}{\int y\,\mathrm{d}F(y)\in\mathbb{R}\cup\{\pm\infty\}}.
For α<β{\alpha<\beta} and provided that RVaRα,β(F){\operatorname{RVaR}_{\alpha,\beta}(F)} exists, it holds that
using the usual conventions F(-∞)=0{F(-\infty)=0}, F(∞)=1{F(\infty)=1} and
0⋅∞=0⋅(-∞)=0{0\cdot\infty=0\cdot(-\infty)=0}.
If F∈ℱ(α)∩ℱ(β){F\in\mathcal{F}^{(\alpha)}\cap\mathcal{F}^{(\beta)}}, then the correction terms in the second line of (2.2) vanish, yielding
Hence, provided that ESα(F){\operatorname{ES}_{\alpha}(F)} and ESβ(F){\operatorname{ES}_{\beta}(F)} are finite,
one obtains identity (1.1).
If F has a finite left tail (∫-∞0|y|dF(y)<∞{\int_{-\infty}^{0}\lvert y\rvert\,\mathrm{d}F(y)<\infty}), then one could use the right-hand side of (1.1) as a definition of RVaRα,β(F){\operatorname{RVaR}_{\alpha,\beta}(F)}. However, in line with our discussion in the introduction, RVaRα,β(F){\operatorname{RVaR}_{\alpha,\beta}(F)} always exists and is finite for 0<α<β<1{0<\alpha<\beta<1} even if the right-hand side of (1.1) is not defined.
Interestingly, [14, Theorem 2] establish that RVaR{\operatorname{RVaR}} can be written as an inf-convolution of VaR{\operatorname{VaR}} and ES{\operatorname{ES}} at appropriate levels.
This result amounts to a sup-convolution in our sign convention. Also note that our parametrization of RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} differs from theirs.
Now, for α∈(0,12){\alpha\in(0,\frac{1}{2})}, RVaRα,1-α{\operatorname{RVaR}_{\alpha,1-\alpha}} corresponds to the α-trimmed mean and has a close connection to the α-Winsorized meanWα{W_{\alpha}} (see [33, pp. 57–59]) via
Using the decision-theoretic framework of [21, 27],
we introduce the following notation.
Let ℱ⊆ℱ0{\mathcal{F}\subseteq\mathcal{F}_{0}} be some generic subclass and let 𝖠⊆ℝk{\mathsf{A}\subseteq\mathbb{R}^{k}} be an action domain.
Whenever we consider a functional T:ℱ→𝖠{T\colon\mathcal{F}\to\mathsf{A}}, we tacitly assume that T(F){T(F)} is well-defined for all F∈ℱ{F\in\mathcal{F}} and is an element of 𝖠{\mathsf{A}}. Then T(ℱ){T(\mathcal{F})} corresponds to the image {T(F)∈𝖠∣F∈ℱ}{\{T(F)\in\mathsf{A}\mid F\in\mathcal{F}\}}. For any subset M⊆ℝk{M\subseteq\mathbb{R}^{k}} we denote with int(M){\operatorname{int}(M)} the largest open subset of M. Moreover, conv(M){\operatorname{conv}(M)} denotes the convex hull of the set M.
We say that a function a:ℝ→ℝ{a\colon\mathbb{R}\to\mathbb{R}} is ℱ{\mathcal{F}}-integrable if it is measurable and ∫|a(y)|dF(y)<∞{\int\lvert a(y)\rvert\,\mathrm{d}F(y)<\infty} for all F∈ℱ{F\in\mathcal{F}}. Similarly, a function g:𝖠×ℝ→ℝ{g\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}} is called ℱ{\mathcal{F}}-integrable if g(x,⋅):ℝ→ℝ{g(x,\cdot\,)\colon\mathbb{R}\to\mathbb{R}} is ℱ{\mathcal{F}}-integrable for all x∈𝖠{x\in\mathsf{A}}. If g is ℱ{\mathcal{F}}-integrable, we define the map
If g:𝖠×ℝ→ℝ{g\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}} is sufficiently smooth in its first argument, we denote the m-th partial derivative of g(⋅,y){g(\,\cdot\,,y)} by ∂mg(⋅,y){\partial_{m}g(\,\cdot\,,y)}.
Definition 2.4.
A map S:𝖠×ℝ→ℝ{S\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}} is an ℱ{\mathcal{F}}-consistent scoring function for T:ℱ→𝖠{T\colon\mathcal{F}\to\mathsf{A}} if it is
ℱ{\mathcal{F}}-integrable and if S¯(T(F),F)≤S¯(x,F){\bar{S}(T(F),F)\leq\bar{S}(x,F)} for all x∈𝖠{x\in\mathsf{A}} and F∈ℱ{F\in\mathcal{F}}. It is strictlyℱ{\mathcal{F}}-consistent for T if it is consistent and if S¯(T(F),F)=S¯(x,F){\bar{S}(T(F),F)=\bar{S}(x,F)} implies that x=T(F){x=T(F)} for all x∈𝖠{x\in\mathsf{A}} and for all F∈ℱ{F\in\mathcal{F}}.
A functional T:ℱ→𝖠{T\colon\mathcal{F}\to\mathsf{A}} is elicitable on ℱ{\mathcal{F}} if it possesses a strictly ℱ{\mathcal{F}}-consistent scoring function.
Definition 2.5.
Two scoring functions S,S~:𝖠×ℝ→ℝ{S,\widetilde{S}\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}} are equivalent if there is some a:ℝ→ℝ{a\colon\mathbb{R}\to\mathbb{R}} and some λ>0{\lambda>0} such that S~(x,y)=λS(x,y)+a(y){\widetilde{S}(x,y)=\lambda S(x,y)+a(y)} for all (x,y)∈𝖠×ℝ{(x,y)\in\mathsf{A}\times\mathbb{R}}.
They are proportional if they are equivalent with a≡0{a\equiv 0}.
This equivalence relation preserves (strict) consistency: If S is (strictly) ℱ{\mathcal{F}}-consistent for T and if a is ℱ{\mathcal{F}}-integrable, then S~{\widetilde{S}} is also (strictly) ℱ{\mathcal{F}}-consistent for T.
Closely related to the concept of elicitability is the notion of identifiability.
Definition 2.6.
A map V:𝖠×ℝ→ℝk{V\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}^{k}} is an ℱ{\mathcal{F}}-identification function for T:ℱ→𝖠{T\colon\mathcal{F}\to\mathsf{A}} if it is
ℱ{\mathcal{F}}-integrable and if V¯(T(F),F)=0{\bar{V}(T(F),F)=0} for all F∈ℱ{F\in\mathcal{F}}. It is a strictℱ{\mathcal{F}}-identification function for T if additionally V¯(x,F)=0{\bar{V}(x,F)=0} implies that x=T(F){x=T(F)} for all x∈𝖠{x\in\mathsf{A}} and for all F∈ℱ{F\in\mathcal{F}}.
A functional T:ℱ→𝖠{T\colon\mathcal{F}\to\mathsf{A}} is identifiable on ℱ{\mathcal{F}} if it possesses a strict ℱ{\mathcal{F}}-identification function.
In contrast to [27], we consider point-valued functionals only. For a recent comprehensive study on elicitability of set-valued functionals, we refer to [20].
Elicitability and identifiability results
Wang and Wei [51, Theorem 5.3] showed that for 0<α<β<1{0<\alpha<\beta<1},
RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} (and also the pairs (VaRα,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{RVaR}_{\alpha,\beta})} and (VaRβ,RVaRα,β){(\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})}) do not have CxLS on ℱdis{\mathcal{F}_{\mathrm{dis}}}, the class of distributions with bounded and discrete support.
Hence, by invoking that CxLS are necessary both for elicitability and for identifiability, RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} and the pairs (VaRα,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{RVaR}_{\alpha,\beta})} and (VaRβ,RVaRα,β){(\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})} are neither elicitable nor identifiable on ℱdis{\mathcal{F}_{\mathrm{dis}}}.
Our novel contribution is that the triplet(VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}, however, is elicitable and identifiable, subject to mild conditions.
We use the notation Sα(x,y)=(𝟙{y≤x}-α)x-𝟙{y≤x}y{S_{\alpha}(x,y)=(\mathbb{1}\{y\leq x\}-\alpha)x-\mathbb{1}\{y\leq x\}y} and recall that Sα{S_{\alpha}} is ℱ{\mathcal{F}}-consistent for VaRα{\operatorname{VaR}_{\alpha}}
if ∫-∞0|y|dF(y)<∞{\int_{-\infty}^{0}\lvert y\rvert\,\mathrm{d}F(y)<\infty} for all F∈ℱ{F\in\mathcal{F}},
and strictly ℱ{\mathcal{F}}-consistent if furthermore ℱ⊆ℱα{\mathcal{F}\subseteq\mathcal{F}^{\alpha}} (see [27]).
Proposition 3.1.
For 0<α<β<1{0<\alpha<\beta<1}, the map V:R3×R→R3{V\colon\mathbb{R}^{3}\times\mathbb{R}\to\mathbb{R}^{3}} defined by
is an F(α)∩F(β){\mathcal{F}^{(\alpha)}\cap\mathcal{F}^{(\beta)}}-identification function for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}, which is strict on Fα∩F(α)∩Fβ∩F(β){\mathcal{F}^{\alpha}\cap\mathcal{F}^{(\alpha)}\cap\mathcal{F}^{\beta}\cap%
\mathcal{F}^{(\beta)}}.
The benefits of the identifiability result of Proposition 3.1 are two-fold. First, it facilitates (conditional) calibration backtests in the spirit of [43]. There, the null hypothesis is that a sequence of forecasts (X1,t,X2,t,X3,t){(X_{1,t},X_{2,t},X_{3,t})}, measurable with respect to the most recent information 𝒜t-1{\mathcal{A}_{t-1}}, is correctly specified in the sense that
Clearly, such a conditional backtest can be conducted using any strict identification function.
By invoking [19, Proposition 3.2.1], any strict ℱα∩ℱ(α)∩ℱβ∩ℱ(β){\mathcal{F}^{\alpha}\cap\mathcal{F}^{(\alpha)}\cap\mathcal{F}^{\beta}\cap%
\mathcal{F}^{(\beta)}}-identification function for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} is given by
where V is given in (3.1) and H:ℝ3→ℝ3×3{H\colon\mathbb{R}^{3}\to\mathbb{R}^{3\times 3}} is a matrix-valued function whose determinant does not vanish.
Second, Proposition 3.1 enables the characterization result of strictly consistent scoring functions presented in Theorem 3.7.
The following theorem establishes a rich class of (strictly) consistent scoring functions S:ℝ3×ℝ→ℝ{S\colon\mathbb{R}^{3}\times\mathbb{R}\to\mathbb{R}} for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}. By a priori assuming the forecasts to be bounded with values in some cube [cmin,cmax]3{[c_{\min},c_{\max}]^{3}}, -∞≤cmin<cmax≤∞{-\infty\leq c_{\min}<c_{\max}\leq\infty} (here and throughout the paper, we make the tacit convention that
[cmin,cmax]:=[cmin,cmax]∩ℝ{[c_{\min},c_{\max}]:=[c_{\min},c_{\max}]\cap\mathbb{R}} if cmin=-∞{c_{\min}=-\infty}
or cmax=∞{c_{\max}=\infty}), the class gets even broader.
Theorem 3.3.
For 0<α<β<1{0<\alpha<\beta<1}, the map S:[cmin,cmax]3×R→R{S\colon[c_{\min},c_{\max}]^{3}\times\mathbb{R}\to\mathbb{R}} defined by
is an F{\mathcal{F}}-consistent scoring function for T=(VaRα,VaRβ,RVaRα,β){T=(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}%
_{\alpha,\beta})}
if the following conditions hold:
ϕ:[cmin,cmax]→ℝ{\phi\colon[c_{\min},c_{\max}]\to\mathbb{R}} is convex with subgradient ϕ′{\phi^{\prime}}.
For all x3∈[cmin,cmax]{x_{3}\in[c_{\min},c_{\max}]} the functions
y↦a(y)-𝟙{y≤x1}g1(y)-𝟙{y≤x2}g2(y){y\mapsto a(y)-\mathbb{1}\{y\leq x_{1}\}g_{1}(y)-\mathbb{1}\{y\leq x_{2}\}g_{2%
}(y)} is ℱ{\mathcal{F}}-integrable for all x1,x2∈[cmin,cmax]{x_{1},x_{2}\in[c_{\min},c_{\max}]}.
If moreover ϕ is strictly convex and the functions in G1,x3{G_{1,x_{3}}} and G2,x3{G_{2,x_{3}}} in (3.4) and (3.5) are strictly increasing for all x3∈[cmin,cmax]{x_{3}\in[c_{\min},c_{\max}]}, then S is strictly Fα∩Fβ{\mathcal{F}^{\alpha}\cap\mathcal{F}^{\beta}}-consistent for T.
Proof.
Let (x1,x2,x3)∈𝖠{(x_{1},x_{2},x_{3})\in\mathsf{A}}, F∈ℱ{F\in\mathcal{F}} and (t1,t2,t3):=T(F){(t_{1},t_{2},t_{3}):=T(F)}. Then, since G1,x3{G_{1,x_{3}}} is increasing,
is ℱ{\mathcal{F}}-consistent for VaRα{\operatorname{VaR}_{\alpha}} and it is strictly ℱα{\mathcal{F}^{\alpha}}-consistent if G1,x3{G_{1,x_{3}}} is strictly increasing. Similar comments apply to the map [cmin,cmax]×ℝ∋(x2′,y)↦S(t1,x2′,x3,y){[c_{\min},c_{\max}]\times\mathbb{R}\ni(x_{2}^{\prime},y)\mapsto S(t_{1},x_{2}%
^{\prime},x_{3},y)}. Hence,
since ϕ is convex. If ϕ is strictly convex and if x3≠t3{x_{3}\neq t_{3}}, the inequality in (3.6) is strict.
∎
Remark 3.4.
Provided condition (iii) in Theorem 3.3 holds and
if ϕ is strictly convex and G1,x3{G_{1,x_{3}}} and G2,x3{G_{2,x_{3}}} are strictly increasing, then S given in (3.3)
is still strictly ℱ{\mathcal{F}}-consistent in the RVaR{\operatorname{RVaR}}-component for general ℱ⊆ℱ0{\mathcal{F}\subseteq\mathcal{F}_{0}}. That is, for F∈ℱ{F\in\mathcal{F}},
By making use of (2.3) and the revelation principle [44, 27, 19], Theorem 3.3 also provides a rich class of strictly consistent scoring functions for (VaRα,VaR1-α,Wα){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{1-\alpha},W_{\alpha})}, where Wα{W_{\alpha}} is the α-Winsorized mean.
The following proposition is useful to construct examples; see Section 6.
Proposition 3.5.
Let S be of the form (3.3) with a (strictly) convex and non-constant function ϕ, and functions g1{g_{1}}, g2{g_{2}} such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.3 is satisfied.
Then the following assertions hold:
The subgradient ϕ′{\phi^{\prime}} of
ϕ
is necessarily bounded and the one-sided derivatives of g1{g_{1}} and g2{g_{2}} are necessarily bounded from below.
S is proportional to a scoring function S~{\tilde{S}} of the form (3.3) with a (strictly) convex function ϕ~{\tilde{\phi}} such that ϕ~′{\tilde{\phi}^{\prime}} is bounded with
and strictly increasing functions g~1{\tilde{g}_{1}},
g~2{\tilde{g}_{2}} such that their one-sided derivatives are bounded from below by one and such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.3 is satisfied.
Proof.
(i) The proof is similar to the one of [21, Corollary 5.5]:
condition (ii) implies that for any
Therefore, ϕ′{\phi^{\prime}} is bounded, and the one-sided derivative of g1{g_{1}} is bounded from below by supx3ϕ′(x3)/(β-α){\sup_{x_{3}}\phi^{\prime}(x_{3})/(\beta-\alpha)}, while the one-sided derivative of g2{g_{2}} is bounded from below by -infx3ϕ′(x3)/(β-α){-\inf_{x_{3}}\phi^{\prime}(x_{3})/(\beta-\alpha)}.
(ii) For any c∈ℝ{c\in\mathbb{R}}, if we replace ϕ by ϕ^:x↦ϕ(x)+cx{\widehat{\phi}:x\mapsto\phi(x)+cx}, g1{g_{1}} by g^1:x↦g1(x)+cx/(β-α){\widehat{g}_{1}:x\mapsto g_{1}(x)+cx/(\beta-\alpha)}, and g2{g_{2}} by g^2:x↦g2(x)-cx/(β-α){\widehat{g}_{2}:x\mapsto g_{2}(x)-cx/(\beta-\alpha)} in formula (3.3) for S, then S does not change. Also, ϕ^{\widehat{\phi}} is (strictly) convex if and only if ϕ is (strictly) convex. Furthermore, conditions (ii) and (iii) of Theorem 3.3 hold for (ϕ,g1,g2){(\phi,g_{1},g_{2})} if and only if they hold for (ϕ^,g^1,g^2){(\widehat{\phi},\widehat{g}_{1},\widehat{g}_{2})}. By part (i) of the proposition, ϕ′{\phi^{\prime}} is bounded. Therefore, we can assume without loss of generality that
since ϕ is non-constant.
Then the argument follows by setting S~=λβ-αS{\tilde{S}=\frac{\lambda}{\beta-\alpha}S}.
∎
Example 3.6.
Proposition 3.5 in combination with Theorem 3.3 yields a straightforward recipe to generate (strictly) consistent scoring functions for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}.
The main degree of flexibility is the choice of ϕ. For practical purposes, it can be easier to start with the choice of ϕ′{\phi^{\prime}}, which should be a (strictly) increasing and bounded function. A rich source for such functions is the class of (strictly increasing) cumulative distribution functions, which can easily be scaled to have an infimum of -(β-α){-(\beta-\alpha)} and a supremum of β-α{\beta-\alpha}.
Then ϕ can be obtained by integrating ϕ′{\phi^{\prime}}.
The simplest choice for g1{g_{1}} and g2{g_{2}} is the identity, i.e., g1(x1)=x1{g_{1}(x_{1})=x_{1}} and g2(x2)=x2{g_{2}(x_{2})=x_{2}}.
The only remaining degree of flexibility is then to add consistent scoring functions for VaRα{\operatorname{VaR}_{\alpha}} or for VaRβ{\operatorname{VaR}_{\beta}}.
Table 1 contains some examples for choices of ϕ′{\phi^{\prime}}.
For illustrative purposes, let us discuss the score S1{S_{1}} from Table 1 more closely.
Just as in the case of S3{S_{3}}, but less obviously so, the corresponding ϕ′{\phi^{\prime}} is motivated by a distribution function. In this case, it is the logistic distribution ex3/(1+ex3){\mathrm{e}^{x_{3}}/(1+\mathrm{e}^{x_{3}})}. Proper translation and scaling according to Proposition 3.5 leads to
An antiderivative of ϕ′{\phi^{\prime}} is given by
ϕ(x3)=(β-α)(2log(ex3+1)-x3){\phi(x_{3})=(\beta-\alpha)(2\log(\mathrm{e}^{x_{3}}+1)-x_{3})}. Therefore, upon choosing a(y)=2y{a(y)=2y}, the explicit form of S1{S_{1}} reads
The particular choice of a(y)=2y{a(y)=2y} can be beneficial with regard to integrability conditions: With this choice, S1{S_{1}} is F-integrable if and only if the right tail of F is integrable, i.e., if ∫0∞ydF(y)<∞{\int_{0}^{\infty}y\,\mathrm{d}F(y)<\infty}.
In a risk management context with our sign convention, the right tail corresponds to the gains, which are commonly less heavy-tailed than the losses.
While ϕ′{\phi^{\prime}} appearing in S2{S_{2}} can easily be integrated with an antiderivative of
the antiderivative of ϕ′{\phi^{\prime}} for S3{S_{3}} has no closed form solution, therefore requiring numerical integration.
The scoring function S4{S_{4}}, where ϕ′{\phi^{\prime}} is an increasing piecewise linear function which is strictly increasing only on [c1,c2]{[c_{1},c_{2}]}, is in the spirit of the Huber loss [32, p. 79]. It is only strictly consistent on [c1,c2]3{[c_{1},c_{2}]^{3}},
but remains consistent for all of ℝ3{\mathbb{R}^{3}}.
Examples of scoring functions. In all cases we choose g1(x1)=x1{g_{1}(x_{1})=x_{1}} and g2(x2)=x2{g_{2}(x_{2})=x_{2}}. The parametersc1,c2∈ℝ{c_{1},c_{2}\in\mathbb{R}} satisfy c1<c2{c_{1}<c_{2}},
and Φ is the cumulative distribution function of a standard normal law.
we shall next establish the counterpart of Theorem 3.3, providing necessary conditions for the strict consistency.
The main tool to derive such necessary conditions is Osband’s principle, originating from the seminal dissertation of Osband [44]; see also [27] for an accessible intuition.
We use the precise technical formulation of [21, Theorem 3.2].
It is no wonder that necessary conditions for strictly ℱ{\mathcal{F}}-consistent scores for T can only be obtained for action domains 𝖠⊆ℝ3{\mathsf{A}\subseteq\mathbb{R}^{3}} such that the surjectivity condition 𝖠={T(F):F∈ℱ}{\mathsf{A}=\{T(F)\colon F\in\mathcal{F}\}} holds.
By invoking inequality (2.1), any such action domain is necessarily a subset of
which we therefore call the maximal sensible action domain.
Issuing forecasts for T outside of 𝖠0{\mathsf{A}_{0}}, thus violating (2.1), would be irrational, comparable to, say, negative variance forecasts.
Still, the scoring functions of the form (3.3) allow for the evaluation of forecasts violating (2.1).
Besides the surjectivity assumption and further richness assumptions on the class of distributions ℱ{\mathcal{F}}, we need to impose smoothness conditions on the expected score as to exploit first-order conditions stemming from the minimization problem of strict consistency; see Section A for the detailed technical formulations and [21] for a discussion of these conditions.
We introduce the class ℱcontα⊂ℱα{\mathcal{F}^{\alpha}_{\mathrm{cont}}\subset\mathcal{F}^{\alpha}} of distributions in ℱα{\mathcal{F}^{\alpha}} which are continuously differentiable (and therefore also in ℱ(α){\mathcal{F}^{(\alpha)}}).
For any 𝖠⊆ℝ3{\mathsf{A}\subseteq\mathbb{R}^{3}}, we denote the projections on the r-th component by
Let F⊆Fcontα{\mathcal{F}\subseteq\mathcal{F}^{\alpha}_{\mathrm{cont}}}, 0<α<β<1{0<\alpha<\beta<1}, T=(VaRα,VaRβ,RVaRα,β):F→A⊆A0{T=(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}%
_{\alpha,\beta})\colon\mathcal{F}\to\mathsf{A}\subseteq\mathsf{A}_{0}}, and let V=(V1,V2,V3)⊺{V=(V_{1},V_{2},V_{3})^{\intercal}} be defined at (3.1).
If Assumptions (V1) and (F1) hold and (V1,V2)⊺{(V_{1},V_{2})^{\intercal}}
satisfies Assumption (V4), then any
strictly F{\mathcal{F}}-consistent scoring function S:A×R→R{S\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}} for T that satisfies Assumptions (VS1) and (S2) is necessarily of the form (3.3) almost everywhere, where the functions Gr,x3:Ar,x3′→R{G_{r,x_{3}}\colon\mathsf{A}^{\prime}_{r,x_{3}}\to\mathbb{R}}, r∈{1,2}{r\in\{1,2\}}, x3∈A3′{x_{3}\in\mathsf{A}^{\prime}_{3}}, in (3.4) and (3.5) are strictly increasing and ϕ:A3′→R{\phi\colon\mathsf{A}^{\prime}_{3}\to\mathbb{R}} is strictly convex.
Proof.
First note that V satisfies Assumption (V3) on ℱ⊆ℱcontα{\mathcal{F}\subseteq\mathcal{F}^{\alpha}_{\mathrm{cont}}}.
Let F∈ℱ{F\in\mathcal{F}} with derivative f and let x∈int(𝖠){x\in\operatorname{int}(\mathsf{A})}. Then one obtains
and ∂rV¯1(x,F){\partial_{r}\bar{V}_{1}(x,F)} and ∂mV¯2(x,F){\partial_{m}\bar{V}_{2}(x,F)} vanish for r∈{2,3}{r\in\{2,3\}} and m∈{1,3}{m\in\{1,3\}}.
Applying [21, Theorem 3.2] yields the existence of continuously differentiable functions hlm:int(𝖠)→ℝ{h_{lm}\colon\operatorname{int}(\mathsf{A})\to\mathbb{R}}, l,m∈{1,2,3}{l,m\in\{1,2,3\}}, such that
for m∈{1,2,3}{m\in\{1,2,3\}}.
Since we assume that S¯(⋅,F){\bar{S}(\,\cdot\,,F)} is twice continuously differentiable for any F∈ℱ{F\in\mathcal{F}}, the second-order partial derivatives need to commute. Let t=T(F){t=T(F)}. Then ∂1∂2S¯(t,F)=∂2∂1S¯(t,F){\partial_{1}\partial_{2}\bar{S}(t,F)=\partial_{2}\partial_{1}\bar{S}(t,F)} is equivalent to
This needs to hold for all F∈ℱ{F\in\mathcal{F}}. The variation in the densities implied by Assumption (V4) in combination with the surjectivity of T yields that h12≡h21≡0{h_{12}\equiv h_{21}\equiv 0} on int(𝖠){\operatorname{int}(\mathsf{A})}. Similarly, evaluating ∂1∂3S¯(x,F)=∂3∂1S¯(x,F){\partial_{1}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{1}\bar{S}(x,F)} and ∂2∂3S¯(x,F)=∂3∂2S¯(x,F){\partial_{2}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{2}\bar{S}(x,F)} at x=t=T(F){x=t=T(F)} yields
So we are left with characterizing hmm{h_{mm}} for m∈{1,2,3}{m\in\{1,2,3\}}. Note that Assumption (V1) implies that for any x=(x1,x2,x3)∈int(𝖠){x=(x_{1},x_{2},x_{3})\in\operatorname{int}(\mathsf{A})} there are two distributions F1,F2∈ℱ{F_{1},F_{2}\in\mathcal{F}} such that
for all x∈int(𝖠){x\in\operatorname{int}(\mathsf{A})} and for all F∈ℱ{F\in\mathcal{F}} implies that ∂1h22≡∂2h11≡0{\partial_{1}h_{22}\equiv\partial_{2}h_{11}\equiv 0}.
Starting with ∂1∂3S¯(x,F)=∂3∂1S¯(x,F){\partial_{1}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{1}\bar{S}(x,F)} implies that
are linearly independent. Hence, we obtain that ∂1h33≡0{\partial_{1}h_{33}\equiv 0} and ∂3h11≡-h33/(β-α){\partial_{3}h_{11}\equiv-h_{33}/(\beta-\alpha)}. With the same argumentation and starting from ∂2∂3S¯(x,F)=∂3∂2S¯(x,F){\partial_{2}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{2}\bar{S}(x,F)}, one can show that
∂2h33≡0{\partial_{2}h_{33}\equiv 0} and ∂3h22≡h33/(β-α){\partial_{3}h_{22}\equiv h_{33}/(\beta-\alpha)}.
This means there exist functions
η1:{(x1,x3)∈ℝ2∣there exists (z1,z2,z3)∈int(𝖠),x1=z1,x3=z3}→ℝ,\displaystyle\eta_{1}\colon\bigl{\{}(x_{1},x_{3})\in\mathbb{R}^{2}\mid\text{%
there exists }(z_{1},z_{2},z_{3})\in\operatorname{int}(\mathsf{A}),\,x_{1}=z_{%
1},\,x_{3}=z_{3}\bigr{\}}\to\mathbb{R},η2:{(x2,x3)∈ℝ2∣there exists (z1,z2,z3)∈int(𝖠),x2=z2,x3=z3}→ℝ,\displaystyle\eta_{2}\colon\bigl{\{}(x_{2},x_{3})\in\mathbb{R}^{2}\mid\text{%
there exists }(z_{1},z_{2},z_{3})\in\operatorname{int}(\mathsf{A}),\,x_{2}=z_{%
2},\,x_{3}=z_{3}\bigr{\}}\to\mathbb{R},η3:int(𝖠)3′→ℝ,\displaystyle\eta_{3}\colon\operatorname{int}(\mathsf{A})^{\prime}_{3}\to%
\mathbb{R},
and some z∈int(𝖠)3′{z\in\operatorname{int}(\mathsf{A})^{\prime}_{3}} such that for any x=(x1,x2,x3)∈int(𝖠){x=(x_{1},x_{2},x_{3})\in\operatorname{int}(\mathsf{A})} it holds that
where ζr:int(𝖠)r′→ℝ{\zeta_{r}\colon\operatorname{int}(\mathsf{A})^{\prime}_{r}\to\mathbb{R}}, r∈{1,2}{r\in\{1,2\}}. Due to the fact that any component of T is mixture-continuous
For convex ℱ{\mathcal{F}} a functional T:ℱ→ℝk{T\colon\mathcal{F}\to\mathbb{R}^{k}} is called mixture-continuous if for any F,G∈ℱ{F,G\in\mathcal{F}} the map [0,1]∋λ↦T((1-λ)F+λG){[0,1]\ni\lambda\mapsto T((1-\lambda)F+\lambda G)} is continuous.
and since ℱ{\mathcal{F}} is convex and T surjective, the projection int(𝖠)3′{\operatorname{int}(\mathsf{A})^{\prime}_{3}} is an open interval. Hence, [min(z,x3),max(z,x3)]⊂int(𝖠)3′{[\min(z,x_{3}),\max(z,x_{3})]\subset\operatorname{int}(\mathsf{A})^{\prime}_{%
3}}.
Due to Assumptions (V3) and (S2), [21, Theorem 3.2] implies that η1,η2,η3{\eta_{1},\eta_{2},\eta_{3}} are locally Lipschitz continuous.
The above calculations imply that the Hessian of the expected score, i.e., ∇2S¯(x,F){\nabla^{2}\bar{S}(x,F)}, at its minimizer x=t=T(F){x=t=T(F)},
is a diagonal matrix with entries η1(t1,t3)f(t1){\eta_{1}(t_{1},t_{3})f(t_{1})}, η2(t2,t3)f(t2){\eta_{2}(t_{2},t_{3})f(t_{2})} and η3(t3){\eta_{3}(t_{3})}.
As a second-order condition, ∇2S¯(t,F){\nabla^{2}\bar{S}(t,F)} must be
positive semi-definite. By invoking the surjectivity of T once again, this shows that η1,η2,η3≥0{\eta_{1},\eta_{2},\eta_{3}\geq 0}. More to the point, invoking the continuous differentiability of the expected score and the fact that S is strictly ℱ{\mathcal{F}}-consistent for T,
one obtains that
for any F∈ℱ{F\in\mathcal{F}} with t=T(F){t=T(F)} and for any v∈ℝ3{v\in\mathbb{R}^{3}}, v≠0{v\neq 0}, there exists an ε>0{\varepsilon>0} such that
ddsS¯(t+sv,F){\frac{\mathrm{d}}{\mathrm{d}s}\bar{S}(t+sv,F)} is negative for all s∈(-ε,0){s\in(-\varepsilon,0)}, zero for s=0{s=0}, and positive for all s∈(ε,0){s\in(\varepsilon,0)}.
For v=e3=(0,0,1)⊺{v=e_{3}=(0,0,1)^{\intercal}}, this means that for any F∈ℱ{F\in\mathcal{F}} with t=T(F){t=T(F)} there is an ε>0{\varepsilon>0} such that
ddsS¯(t+se3,F)=η3(t3+s)s{\frac{\mathrm{d}}{\mathrm{d}s}\bar{S}(t+se_{3},F)=\eta_{3}(t_{3}+s)s}
has the same sign as s for all s∈(-ε,ε){s\in(-\varepsilon,\varepsilon)}.
Therefore, η3(t3+s)>0{\eta_{3}(t_{3}+s)>0} for all s∈(-ε,ε)∖{0}{s\in(-\varepsilon,\varepsilon)\setminus\{0\}}.
Using the surjectivity of T and invoking a compactness argument, η3{\eta_{3}} attains a 0 only finitely many times on any compact interval.
Recall that int(𝖠)3′{\operatorname{int}(\mathsf{A})^{\prime}_{3}} is an open interval. Hence, it can be approximated by an increasing sequence of compact intervals. Therefore, η3-1({0}){\eta_{3}^{-1}(\{0\})} is at most countable, and therefore a Lebesgue null set. With similar arguments, one can show that for any x3∈int(𝖠)3′{x_{3}\in\operatorname{int}(\mathsf{A})^{\prime}_{3}} the sets
are at most countable, and therefore also Lebesgue null sets.
Finally, using [23, Proposition 1 in the supplement] (recognizing that V is locally bounded), one obtains that S is almost everywhere of the form (3.3). Moreover, it holds almost everywhere that ϕ′′=η3{\phi^{\prime\prime}=\eta_{3}} and gm′=ζm{g_{m}^{\prime}=\zeta_{m}} for m∈{1,2}{m\in\{1,2\}}. Hence, ϕ is strictly convex and the functions at (3.4) and (3.5) are strictly increasing.
∎
Combining Theorems 3.3 and 3.7, one can show that the scoring functions given at (3.3) are essentially the only strictly consistent scoring functions for the triplet
(VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} on the action domain
for some -∞≤cmin<cmax≤∞{-\infty\leq c_{\min}<c_{\max}\leq\infty}. Under the conditions of Theorem 3.7, a scoring function S:A×R→R{S\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}} is strictly F{\mathcal{F}}-consistent for T=(VaRα,VaRβ,RVaRα,β){T=(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}%
_{\alpha,\beta})}, 0<α<β<1{0<\alpha<\beta<1}, if and only if it is of the form (3.3) almost everywhere
satisfying conditions (i)–(iii).
Moreover, the function ϕ′:[cmin,cmax]→R{\phi^{\prime}\colon[c_{\min},c_{\max}]\to\mathbb{R}} is necessarily bounded.
Proof.
For the proof it suffices to show that for r∈{1,2}{r\in\{1,2\}}, Gr,x3{G_{r,x_{3}}} defined in (3.4) and (3.5) is not only increasing on 𝖠r,x3′{\mathsf{A}_{r,x_{3}}^{\prime}} for any x3∈𝖠3′{x_{3}\in\mathsf{A}_{3}^{\prime}}, but on 𝖠r′=[cmin,cmax]{\mathsf{A}_{r}^{\prime}=[c_{\min},c_{\max}]}. For x3∈[cmin,cmax]=𝖠3′{x_{3}\in[c_{\min},c_{\max}]=\mathsf{A}_{3}^{\prime}}, we have 𝖠1,x3′=[cmin,x3]{\mathsf{A}_{1,x_{3}}^{\prime}=[c_{\min},x_{3}]} and 𝖠2,x3′=[x3,cmax]{\mathsf{A}_{2,x_{3}}^{\prime}=[x_{3},c_{\max}]}. Let x3∈𝖠3′{x_{3}\in\mathsf{A}^{\prime}_{3}} and x1,x1′∈𝖠1′{x_{1},x_{1}^{\prime}\in\mathsf{A}^{\prime}_{1}} with x1<x1′{x_{1}<x_{1}^{\prime}}. If x1,x1′∈𝖠1,x3′{x_{1},x_{1}^{\prime}\in\mathsf{A}^{\prime}_{1,x_{3}}}, there is nothing to show. If however x3<x1′{x_{3}<x_{1}^{\prime}}, then x1,x1′∈𝖠1,x1′′{x_{1},x^{\prime}_{1}\in\mathsf{A}^{\prime}_{1,x^{\prime}_{1}}}. This means that
where the second inequality stems from the fact that ϕ′{\phi^{\prime}} is increasing.
If the function G1,x1′{G_{1,x^{\prime}_{1}}} is strictly increasing, then the first inequality is strict. The argument for G2,x3{G_{2,x_{3}}} works analogously.
∎
Remark 3.9.
Note the structural difference of Theorems 3.3 and 3.7 to [25, Theorem 1], [6, Proposition 4.14] and in particular [21, Theorem 5.2 and Corollary 5.5]. Our functional of interest, RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} with 0<α<β<1{0<\alpha<\beta<1}, is not a minimum of an expected scoring function – or Bayes risk –, but a difference of minima of two scoring functions. Indeed, while ESβ(F)=-1βS¯β(VaRβ(F),F){\operatorname{ES}_{\beta}(F)=-\frac{1}{\beta}\bar{S}_{\beta}(\operatorname{%
VaR}_{\beta}(F),F)}, we have that
This structural difference is reflected in the minus sign appearing at (3.4). In particular, it means that the functions g1{g_{1}} and g2{g_{2}} cannot identically vanish if we want to ensure strict consistency of S, whereas the corresponding function in [21, Theorem 5.2] may well be set to zero.
[25, Theorem 2] generalizes our results and presents an elicitability result of any linear combination of Bayes risks.
Translation invariance and homogeneity
There are many choices for the functions g1{g_{1}}, g2{g_{2}} and ϕ appearing in the formula for the scoring function S at (3.3). Often, these choices can be limited by imposing secondary desirable criteria on S,
e.g., acknowledging that T=(VaRα,VaRβ,RVaRα,β){T=(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}%
_{\alpha,\beta})} is translation equivariant (meaning that T(FY+z)=T(FY)+z{T(F_{Y+z})=T(F_{Y})+z} for any constant z∈ℝ{z\in\mathbb{R}}) and positively homogeneous of degree 1 (meaning that T(FcY)=cT(FY){T(F_{cY})=cT(F_{Y})} for any c>0{c>0}), it would make sense if the forecast ranking were also invariant under a joint translation of the forecasts and the observations on the one hand, and joint scaling of the forecasts and the observations on the other hand.
This would require translation invariance of the score differences on the one hand, i.e.,
for all (x1,x2,x3),(x1′,x2′,x3′)∈𝖠{(x_{1},x_{2},x_{3}),(x_{1}^{\prime},x_{2}^{\prime},x_{3}^{\prime})\in\mathsf{%
A}} and y,z∈ℝ{y,z\in\mathbb{R}}.
On the other hand, it would require positively homogeneous score differences, that is, there is some b∈ℝ{b\in\mathbb{R}} such that
for all (x1,x2,x3),(x1′,x2′,x3′)∈𝖠{(x_{1},x_{2},x_{3}),(x_{1}^{\prime},x_{2}^{\prime},x_{3}^{\prime})\in\mathsf{%
A}}, y∈ℝ{y\in\mathbb{R}} and for all c>0{c>0}.
While translation invariance seems to be particularly important when RVaR is used as a location parameter, i.e., when α=1-β<12{\alpha=1-\beta<\frac{1}{2}}, corresponding to the α-trimmed mean, positively homogeneous score differences are relevant in a risk management context:
the forecast ranking should not depend on the unit in which the risk measures and the gains and losses are reported, be it in, say Euros or in Euro Cents.
We also refer to [45, 43, 22] for further motivations.
This section establishes that, unfortunately, there are no strictly consistent scoring functions for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})} which admit translation invariant or positively homogeneous score differences under practically relevant settings.
If one is interested in scoring functions with an action domain of the form
possessing the additional property of translation invariant score differences, the only sensible choice is cmin=-∞{c_{\min}=-\infty}, cmax=∞{c_{\max}=\infty}, amounting to the maximal action domain 𝖠0{\mathsf{A}_{0}}.
Similarly, for scoring functions with positively homogeneous score differences, the most interesting choices for action domains are
Under the conditions of Theorem 3.7 there are no strictly F{\mathcal{F}}-consistent scoring functions for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}, 0<α<β<1{0<\alpha<\beta<1}, on A0{\mathsf{A}_{0}} with translation invariant score differences.
Proof.
By using Theorem 3.7, any strictly ℱ{\mathcal{F}}-consistent scoring function for the functional
must be of the form (3.3), where in particular ϕ is strictly convex, twice differentiable and ϕ′{\phi^{\prime}} is bounded. Assume that S has translation invariant score differences. That means that the function
Therefore, ϕ′′{\phi^{\prime\prime}} needs to be constant. Since ϕ
is convex that means that ϕ′(x3)=dx3+d′{\phi^{\prime}(x_{3})=dx_{3}+d^{\prime}} with d>0{d>0}.
But since 𝖠3′=ℝ{\mathsf{A}^{\prime}_{3}=\mathbb{R}}, ϕ′{\phi^{\prime}} is unbounded, which is a contradiction.
∎
The proof of Proposition 4.1 closely follows the one of
[22, Proposition 4.10]. The fact that the latter assertion entails a positive result has the following background:
The strictly consistent scoring function for (VaRα,ESα){(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})} given in [22, Proposition 4.10] works only on a very restricted action domain. To guarantee strict consistency on such an action domain, one would need a refinement of Theorem 3.3 in the spirit of [23, Proposition 2 of the supplement]. However, since such a positive result on a quite restricted action domain is practically irrelevant, we dispense with such a refinement and only state the relevant negative result here.
Proposition 4.2 (Positive homogeneity).
Under the conditions of Theorem 3.7 there are no strictly F{\mathcal{F}}-consistent scoring functions for (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}, 0<α<β<1{0<\alpha<\beta<1}, on A∈{A0,A0+,A0-}{\mathsf{A}\in\{\mathsf{A}_{0},\mathsf{A}_{0}^{+},\mathsf{A}_{0}^{-}\}} with positively homogeneous score differences.
Proof.
By using Theorem 3.7, any strictly ℱ{\mathcal{F}}-consistent scoring function for the functional
must be of the form (3.3), where in particular ϕ is strictly convex, twice differentiable and ϕ′{\phi^{\prime}} is bounded. Assume that S has positively homogeneous score differences of some degree b∈ℝ{b\in\mathbb{R}}. That means that the function Ψ:(0,∞)×𝖠×𝖠×ℝ→ℝ{\Psi\colon(0,\infty)\times\mathsf{A}\times\mathsf{A}\times\mathbb{R}\to%
\mathbb{R}} defined by
For the sake of brevity, we only consider the case 𝖠=𝖠0-{\mathsf{A}=\mathsf{A}_{0}^{-}}, the other cases being similar.
Equation (4.1) implies that ϕ′′(-x3)=ϕ′′(-1)x3b-2{\phi^{\prime\prime}(-x_{3})=\phi^{\prime\prime}(-1)x_{3}^{b-2}} for any x3>0{x_{3}>0}.
Due to the strict convexity of ϕ, we need that ϕ′′(-1)>0{\phi^{\prime\prime}(-1)>0}. However, for b≥1{b\geq 1} we have infx3>0ϕ′(-x3)=-∞{\inf_{x_{3}>0}\phi^{\prime}(-x_{3})=-\infty}, and for b≤1{b\leq 1} we have supx3>0ϕ′(-x3)=∞{\sup_{x_{3}>0}\phi^{\prime}(-x_{3})=\infty}. Hence, ϕ′{\phi^{\prime}} cannot be bounded.
∎
Remark 4.3.
The negative result of Proposition 4.2 should be compared with the results of
Nolde and Ziegel [43, Theorem C.3] characterizing homogeneous strictly consistent scoring functions for the pair (VaRβ,ESβ){(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})}. Since they use a different sign convention for VaR{\operatorname{VaR}} and ES{\operatorname{ES}} than we do in this paper, their choice of the action domain ℝ×(0,∞){\mathbb{R}\times(0,\infty)} corresponds to our choice 𝖠0-{\mathsf{A}_{0}^{-}}. When interpreting RVaRα,β{\operatorname{RVaR}_{\alpha,\beta}} as a risk measure, negative values of RVaR{\operatorname{RVaR}} are the more interesting and relevant ones, using our sign convention.
Inspecting the proofs of Proposition 4.2 and of Proposition 3.5 (i) one makes the following observation: for b≥1{b\geq 1}, Nolde and Ziegel [43] state an impossibility result for their choice of action domain. In fact, the problem occurring in our context is that ϕ′{\phi^{\prime}} is not bounded from below. In Proposition 3.5, this property is implied by the fact that the function G2,x3{G_{2,x_{3}}} at (3.5) is increasing. And it is exactly such a condition that is also present for strictly consistent scoring functions for the pair (VaRβ,ESβ){(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})}; see [21, Theorem 5.2]. On the other hand, the complication for b<1{b<1} stems from the fact that ϕ′{\phi^{\prime}} is not bounded from above. This condition is related to the monotonicity of G1,x3{G_{1,x_{3}}} at (3.4). Such a condition is not present for strictly consistent scoring functions for the pair (VaRβ,ESβ){(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})}. Correspondingly, there can be homogeneous and strictly consistent scoring functions for b<1{b<1} for this pair [43], while this is not possible for the triplet (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}.
Mixture representation of scoring functions
When forecasts are compared and ranked with respect to consistent scoring functions, one has to be aware that in the presence of non-nested information sets, model mis-specification and/or finite samples, the ranking may depend on the chosen consistent scoring function [46]. In the specific case of (VaRα,VaRβ,RVaRα,β){(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{%
\alpha,\beta})}, the forecast ranking may depend on the specific choice for the functions g1{g_{1}}, g2{g_{2}}, and ϕ appearing in Theorem 3.3. A possible remedy to this problem is to compare forecasts
simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams as introduced by Ehm, Gneiting, Jordan and Krüger [12]. Murphy diagrams are based on the fact that the class of all consistent scoring functions can be characterized as a class of mixtures of elementary scoring functions that depend on a
low-dimensional parameter. The following theorem provides such a mixture representation for the scoring functions
at (3.3).
The applicability is illustrated in Section 6.
Recall that Sα(x,y)=(𝟙{y≤x}-α)x-𝟙{y≤x}y{S_{\alpha}(x,y)=(\mathbb{1}\{y\leq x\}-\alpha)x-\mathbb{1}\{y\leq x\}y}.
Theorem 5.1.
Let 0<α<β<1{0<\alpha<\beta<1}.
Any scoring function