Abstract
In this paper we give a partial response to one of the most important statistical questions, namely, what optimal statistical decisions are and how they are related to (statistical) information theory. We exemplify the necessity of understanding the structure of information divergences and their approximations, which may in particular be understood through deconvolution. Deconvolution of information divergences is illustrated in the exponential family of distributions, leading to the optimal tests in the Bahadur sense. We provide a new approximation of I-divergences using the Fourier transformation, saddle point approximation, and uniform convergence of the Euler polygons. Uniform approximation of deconvoluted parts of I-divergences is also discussed. Our approach is illustrated on a real data example.
We would like to extend our gratitude for the support from Fondecyt Proyecto Regular No. 1151441 and LIT-2016-1-SEE-023. This work was also supported by Grants P403/15/09663S and GA16-07089S of the Czech Science Foundation, and grant VEGA No. 2/0047/15. Support from the BELSPO IAP P7/06 StUDyS network is also prominently acknowledged. The authors are very grateful to the Editor, Associate Editor and anonymous referees for their valuable comments and extremely careful reading.
Communicated by Gejza Wimmer
References
[1] ANTOCH, J.—JARUšKOVá, D.: Testing a homogeneity of stochastic processes, Kybernetika 41 (2007),415–430.Search in Google Scholar
[2] CORLESS, R. M.—GONNET, G. H.—HARE, D. E. G.—JEFFREY, D. J: On Lambert’s W Function. Technical Report Cs-93-03, Department of Computer Science, University of Waterloo.Search in Google Scholar
[3] JOHNSON, N. L.—KOTZ, S.—BALAKRISHNAN, N.: Continuous Univariate Distributions, Volume 1, 2nd ed., J. Wiley, New York, 1994.Search in Google Scholar
[4] MILLER, G. K.: Maximum likelihood estimation for the Erlang integer parameter, Statist. Probab. Lett. 43 (1999), 335–341.10.1016/S0167-7152(98)00186-2Search in Google Scholar
[5] PáZMAN, A.: Nonlinear Statistical Models, Dordrecht, Kluwer Academic Publishers, 1993.10.1007/978-94-017-2450-0Search in Google Scholar
[6] PáZMAN, A.: The density of the parameter estimators when the observations are distributed exponentially, Metrika 44 (1996), 9–26.10.1007/BF02614051Search in Google Scholar
[7] RÉNYI, A.: Wahrscheinlichkeitsrechnung mit einem Anhang über Informationstheorie. VEB Deutcher Verlag der Wissenschaften, Berlin, 1962.Search in Google Scholar
[8] RUBLíK, F.: On optimality of the likelihood ratio tests in the sense of exact slopes. Part 1, General case, Kybernetika 25 (1989), 13–25.Search in Google Scholar
[9] RUBLíK, F.: On optimality of the likelihood ratio tests in the sense of exact slopes. Part 2, Application to individual distributions, Kybernetika 25 (1989), 117–135.Search in Google Scholar
[10] STEHLíK, M.: Exact likelihood ratio tests of the scale in the Gamma family, Tatra Mt. Math. Publ. 26 (2003), 381–390.Search in Google Scholar
[11] STEHLíK, M.: Distributions of exact tests in the exponential family, Metrika 57 (2003), 145–164.10.1007/s001840200206Search in Google Scholar
[12] STEHLíK, M.: Decompositions of information divergences: Recent development, open problems and applications. In: AIP Conf. Proc. 1493, 2012, pp. 972–976.10.1063/1.4765604Search in Google Scholar
[13] STEHLíK, M.—ECONOMOU, P.—KISEL’áK, J.—RICHTER, W. D.: Kullback-Leibler life time testing, Appl. Math. Comput. 240 (2014), 122–139.10.1016/j.amc.2014.04.027Search in Google Scholar
[14] STEHLíK, M.—OSOSKOV, G. A.: Efficient testing of the homogeneity, scale parameters and number of components in the Rayleigh mixture. JINR Rapid Communications E-11-2003-116, 2003, 1493.Search in Google Scholar
[15] STEHLíK, M.—WAGNER, H.: Exact likelihood ratio testing for homogeneity of the exponential distribution, Comm. Statist. Simulation Comput. 40 (2011), 663–684.10.1080/03610918.2010.551011Search in Google Scholar
[16] WILKS, S. S.: Mathematical Statistics, New York – London, J. Wiley and Sons, 1962.Search in Google Scholar
Appendix
A Maple implentation of the approximation (4.4) and (4.5) of the function LW (0, -e-t)
Euler_W0 =
proc(x)
local knot,f_knot;
knot = a;
f_knot = evalf(W(0,-exp(-a)));
while (b-a)/n < x-knot do
knot = knot+(b-a)/n; f_knot = f_knot + (b-a)/n/(1+f_knot) - (b-a)/n
od;
RETURN( f_knot - f_knot/(1+f_knot)*(x-knot) )
end
Proof of Lemma 3.1
It holds that
so that x-ln x attains its global minimum 1 at x= 1, and
It is evident that for all t ≥ 1 there exist solutions x1(t) ∈ (0, 1] and x2(t) ∈ [1, ∞) of the equation x-ln x= t. For the sake of simplicity, we will use the notation x1 and x2 instead of x1 (t) and x2 (t) below.
Moreover, the equation x-ln x= t is equivalent to the equation (-x)e-x = -e-t, so that, according to the definition of LW(t), we have -x = LW(-e-t), cf. (3.8). Now it is sufficient to determine in which branch of the multifunction LW(t) the solutions are contained. Since x1 ∈ (0, 1] and x2 ∈ (1, +∞], it holds that
and the proof is complete. □
Proof of Lemma 3.2
The real function F(a, b) = beb-a is continuously differentiable in both variables. From the properties of the Lambert W-function we know that, for any a0 ∈ (-e-1, 0), the terms b01 = LW (0, a0) and b02 = LW (-1, a0) are real, and (a0, b01) and (a0, b02) are solutions of the equation F(a, b). Moreover,
Applying the implicit function theorem at points (a0, b01) and (a0, b02), a0 ∈ (-e-1, 0), we find that there exist continuously differentiable functions bi: (-e-1, 0) → ℝ, i = 1, 2, satisfying the equality F(a, bi(a)) = 0, i = 1, 2 for a ∈ (-e-1, 0). However, b1(a) = LW (0, a) and b2(a) = LW (-1, a), i.e., the functions LW (0, -e-x) and LW (-1, -e-x) are continuously differentiable on the interval (-1, +∞).
Differentiation of the equality (3.8) leads to
from which
Multiplying both sides by
so that
Substituting z = -e-x and
with the left-hand side being
Note 6. By Lemmas 3.1 and 3.2 it is easy to derive the distribution function FY(t) and the density fY(t) of the random variable Y = X-ln (X), where X ∼ Exp(1) and P(X < x) = 1-e-x. Indeed,
where x1 and x2 are the real numbers guaranteed by Lemma 3.1. We can therefore write
from which for all t ∈ [1, +∞)
being the expression (3.7).
Proof of Lemma 4.1
a) We use mathematical induction to show that y(ti) < 0 for all i = 0, 1,…,n, and y(.) is increasing on the intervals [ti, ti+1]. As y(.) is piecewise linear, the assertion will follow.
1° Let i = 0, then y(t0) = LW (0, -e-a) < 0 and for all t ∈ [t0, t1]
i.e., y(.) is increasing on the interval [t0, t1].
2° Let i > 0 and
According to the induction assumption, y(ti) ≥ y(t0) > -1. Moreover,
b) This proof also uses mathematical induction. We will show that for all t ∈ (ti, ti+1], i = 0, 1, …, n-1, it holds that y(t) > u(t).
1° Let i = 0. Then, from the definition of y(t), we have y(t0) = u(t0) and
2° Let i > 0 and assume that y(t) ≤ u(t) for a certain t ∈ (ti, ti+1]. Then, from the induction assumption, we have y(ti) > u(ti). From the continuity of y(.) and u(.) and from the nonlinearity of u(.) there exists T ∈ (ti, ti+1], such that (T, u(T)) = (T, y(T)), i.e., the graphs of the functions u(.) and y(.) intersect and
Note that y(.) is increasing, u(T) = y(T), and the induction assumption implies u(ti) < y(ti). These facts yield that there exists T0 ∈ (ti, T) such that u(T0) = y(ti). Further,
and together with (7.3) we obtain
being a contradiction with the descent of the first derivative, because
Proof of Lemma 4.2
Since u(.) is increasing, it is sufficient to take x > y and prove the inequality without the absolute value. Lagrange’s mean value theorem assures that there exists t0 ∈ [y, x] such that
Since
so that, using Lemma 3.2, we get
Finally, from (7.4), (7.5) and (7.6), we get the assertion of Lemma 4.2. □
Proof of Theorem 4.1
Let us denote by di the difference between the grid points (knots), i.e., di = y (ti) - u (ti). Using Lagrange’s mean value theorem, we have, for i ≥ 0,
For i = 0 we use the facts that y(t0) = u(t0), d0, u(.) is increasing, and u(t0) > -1; hence
Now, again using Lagrange’s mean value theorem, we get
Let 0 < i < n and u(T) - y(ti) > 0. Then, using the fact that both functions u(.) and y(.) are increasing, y(t) ≥ u(t) > -1 for all t ∈ [a, b] (this follows from Lemma 4.1) and formula (7.7), we obtain that
and by the inequality
which follows directly from Lemma 4.2, we have
where
An iterative use of (7.9) leads to
From (7.8) it follows that
The function y(.) is linear on each interval [ti, ti+1], i = 0, 1,…, n, and u(.) is concave. Therefore, the function y(t) - u(t) is convex on each [ti, ti+1], and since it is continuous, it can be shown easily, that its supremum is attained on one of the grid points ti or ti+1. Lemma 4.1 ensures that y(t) ≥ u(t) for all t ∈ [a, b]. It follows that
Finally, using (7.11) we get
which proves the uniform convergence. □
Proof of Lemma 4.3
a) The proof follows the approach used in the proof of Lemma 4.1 a). In step 1∘ it is sufficient to use the fact that
and in step 2∘ the fact that
b) We proceed analogously to the proof of Lemma 4.1 b). In step 1∘, the only difference is that we use the strict convexity of u(.). In step 2∘ we can show easily that
Proof of Lemma 4.4
Since u(.) is decreasing, it is enough to prove that u(y)-u(x) < c(x-y) for all y. Similarly as in the proof of Lemma 4.2 we get
Since
Moreover, according to Lemma 3.2 we have
Substituting (7.13) and (7.14) into (7.12), we finally get
which completes the proof. □
Proof of Theorem 4.2
In this proof we again denote by di the difference between the grid points (knots); however, now di = u(ti)-y(ti). Then analogously to the proof of Theorem 4.1 we have
If i = 0, then y(t0) = u(t0) and d0 = 0. Further, u(.) is decreasing and u(t) < -1 for all t ∈ [a, b]. Similarly to the proof of Theorem 4.1 we have
Let 0 < i < n and y(ti) - u(T) > 0. Then, using the fact that both functions u(.) and y(.) are decreasing, y(t) < u(t) < -1 for all t ∈ [a, b] (see Lemma 4.3), and (7.15), similarly to the proof of Theorem 4.1 we get that the following equality holds, i.e.,
Finally, we use the inequality u(ti) - u(ti+1) < c(ti+1 - ti) = c ⋅ h, which follows directly from Lemma 4.4, to get
where
© 2018 Mathematical Institute Slovak Academy of Sciences