Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Open Mathematics

formerly Central European Journal of Mathematics

Editor-in-Chief: Gianazza, Ugo / Vespri, Vincenzo


IMPACT FACTOR 2017: 0.831
5-year IMPACT FACTOR: 0.836

CiteScore 2018: 0.90

SCImago Journal Rank (SJR) 2018: 0.323
Source Normalized Impact per Paper (SNIP) 2018: 0.821

Mathematical Citation Quotient (MCQ) 2017: 0.32

ICV 2017: 161.82

Open Access
Online
ISSN
2391-5455
See all formats and pricing
More options …
Volume 16, Issue 1

Issues

Volume 13 (2015)

Majorization, “useful” Csiszár divergence and “useful” Zipf-Mandelbrot law

Naveed Latif
  • Corresponding author
  • Department of General Studies, Jubail Industrial College, Jubail, Industrial City 31961, Kingdom of Saudi Arabia
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Đilda Pečarić / Josip Pečarić
  • Faculty of Textile Technology Zagreb, University of Zagreb, Prilaz Baruna Filipovića 28A, 10000, Zagreb, Croatia
  • RUDN University, 6 Miklukho-Maklay St, Moscow, 117198, Russia
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2018-12-26 | DOI: https://doi.org/10.1515/math-2018-0113

Abstract

In this paper, we consider the definition of “useful” Csiszár divergence and “useful” Zipf-Mandelbrot law associated with the real utility distribution to give the results for majorizatioQn inequalities by using monotonic sequences. We obtain the equivalent statements between continuous convex functions and Green functions via majorization inequalities, “useful” Csiszár functional and “useful” Zipf-Mandelbrot law. By considering “useful” Csiszár divergence in the integral case, we give the results for integral majorization inequality. Towards the end, some applications are given.

Keywords: “Useful” Csiszár divergence; “Useful” Zipf-Mandelbrot law; Majorization inequality; Convex functions; Green functions; Information theory

MSC 2010: 94A15; 94A17; 26A51; 26D15

1 Introduction and Preliminaries

Zipf’s law [1, 2, 3] and the power laws in general [4, 5, 6] have and continue to attract considerable attention in a wide variety of disciplines from astronomy to demographics to software structure to economics to zoology, and even to warfare [7]. Typically one is dealing with integer-valued observables (number of objects, people, cities, words, animals, corpses), with n ∈ {1, 2, 3, …}. As given in [8], sometimes the range of values is allowed to be infinite (at least in principle), sometimes a hard upper bound N is fixed (e.g., total population if one is interested in subdividing a fixed population into sub-classes). Particularly interesting probability distributions are the probability laws of the form:

  • Zipf’s law: pn ∝ 1/n;

  • power laws: pn ∝ 1/nz;

  • hybrid geometric/power laws: pnwn/nz.

Distance or divergence measures are of key importance in different fields like theoretical and applied statistical inference and data processing problems such as estimation, detection, classification, compression, recognition, indexation, diagnosis and model selection etc. Traditionally, the information conveyed by observing X is measured by the entropy which is defined as (see [9, p.111])

H(p):=i=1npilog21/pi,

and is associated with the distribution p, pi > 0 (1 ≤ in), where i=1npi=1. A generalization of this is to attach a utility qi > 0 to the outcome xi (1 ≤ in) and speak of the “useful” information measure

H(p;q):=i=1nqipilog21/pi,

which is associated with the utility distribution q = (q1, …, qn).

Bhaker and Hooda [10] (see also [9, p.112]) introduced the measures

E(p;q):=k=1nqkpklog21/pkk=1nqkpk(1)

and

Eα(p;q):=11αlog2k=1nqkpkαk=1nqkpk,0<α1,(2)

which have a number of useful properties. It is readily verified that these alternations leave intact the property that (2) reduces to (1) when α → 1. Also, if u ≡ 1 so that there are effectively no utilities, (1) and (2) reduce to Renyi’s entropies of order 1 and α, respectively.

Csiszár introduced the functional in [11] and later discussed it in [12]. Here, we consider “useful” Csiszár divergence (see [13, p.3], [9, 14, 15]):

Definition 1.1

(“Useful” Csiszár divergence). Assume J ⊂ ℝ be an interval, and let f : J → ℝ be a function with distribution p := (p1, …, pn), associated with the utility distribution u := (u1, …, un), where pi, ui ∈ ℝ for 1 ≤ in, and q := (q1, …, qn) ∈ ]0, ∞[n be such that

piqiJ,i=1,,n,(3)

then we denote the “useful” Csiszár divergence

Ifp,q,u:=i=1nuiqifpiqi.(4)

Remark 1.2

One can easily seen that if we substitute u = 1, then (4) becomes

Ifp,q,1:=Ifp,q=i=1nqifpiqi.

One can see the various results in information theory in [3, 16, 17].

The following theorem is a generalization of the Classical Majorization Theorem known as Weighted Majorization Theorem and was proved by Fuchs in [19] (see also [20], [21, p.323]):

Theorem 1.3

(Weighted Majorization Theorem). Let x = (x1, …, xn), y = (y1, …, yn) be two decreasing real n-tuples such that xi, yiJ for i = 1, …, n. Let w = (w1, …, wn) be a real n-tuple such that

i=1jwiyii=1jwixi,(5)

for j = 1, 2, …, n − 1 and

i=1nwiyi=i=1nwixi.(6)

Then for every continuous convex function f : J → ℝ, we have the following inequality

i=1nwifyii=1nwifxi.(7)

The following theorem is valid ([22, p.32]):

Theorem 1.4

Let f : JR be a continuous convex function on an interval J, w be a positive n-tuple and x, yJn satisfying

i=1kwiyii=1kwixifork=1,,n1,(8)

and

i=1nwiyi=i=1nwixi.(9)

  1. If y is a decreasing n-tuple, then

    i=1nwifyii=1nwifxi.(10)

  2. If x is an increasing n-tuple, then

    i=1nwifxii=1nwifyi.(11)

If f is strictly convex and xy, then (10) and (11) are strict.

One can see the various generalizations of the majorization inequality and bounds for Zipf-Mandelbrot entropy in [23, 24, 25].

Benoit Mandelbrot in [26] gave generalization of Zipf’s law, now known as the Zipf-Mandelbrot law which gave improvement in account for the low-rank words in corpus where k < 100 [27]:

f(k)=C(k+q)s,

and when q = 0, we get Zipf’s law.

For n ∈ ℕ, q ≥ 0, s > 0, k ∈ {1, 2, …, n}, in a more clear form, the Zipf-Mandelbrot law (probability mass function) is defined with

fk,n,q,s:=1/(k+q)sHn,q,s,whereHn,q,s:=i=1n1(i+q)s,

n ∈ ℕ, q ≥ 0, s > 0, k ∈ {1, 2, …, n}.

Application of the Zipf-Mandelbrot law can also be found in linguistics [27], information sciences [28, 29] and ecological field studies [30].

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x and we often denote CDF as the following ratio:

CDF:=Hk,t,sHn,t,s.(12)

The cumulative distribution function is an important application of majorization.

We consider the following definition of “useful” Zipf-Mandelbrot law (see [9, 11, 12, 14, 15]):

Definition 1.5

(“Useful” Zipf-Mandelbrotl law). Assume J ⊂ ℝ be an interval, and f : J → ℝ be a function with n ∈ {1, 2, 3, …}, t1 ≥ 0. Let also distribution qi > 0 and associated with the utility distribution ui ∈ ℝ for (i = 1, …, n) such that

1qi(i+t1)s1Hn,t1,s1J,i=1,,n,(13)

then we denote “useful” Zipf-Mandelbrot law as

Ifi,n,t1,s1,q,u:=i=1nuiqif1qi(i+t1)s1Hn,t1,s1.

Remark 1.6

One can easily seen that for u = 1, then

Ifi,n,t1,s1,q,1=Ifi,n,t1,s1,q:=i=1nqif1qi(i+t1)s1Hn,t1,s1.

If we substitute qi=1(i+t3)s3Hn,t3,s3, then

Ifi,n,t1,t3,s1,s3:=i=1n1(i+t3)s3Hn,t3,s3f(i+t3)s3Hn,t3,s3(i+t1)s1Hn,t1,s1.

This paper is oragnised as follows. In section 2, we give the results as the connection between useful Csisár divergence, useful Zipf-Mandelbrot law and majorization inequality for one monotonic sequence or both of them. We obtain some corollaries for our obtained results. In section 3, we present the equivalent statements between continuous convex functions and defined Green functions. In section 4, we give the results for integral majorization inequality for considering the integral form of useful Csisár divergence. Finally, in section 5 we give some applications for obtained results.

2 Main results

Assume p and q be n-tuples such that qi > 0 (i = 1, …, n) and define

pq:=p1q1,p2q2,,pnqn.

We start with the following theorem which provides the connection between “useful” Csiszár divergence and weighted majorization as one sequence is monotonic:

Theorem 2.1

Assume J ⊂ ℝ be an interval, f : J → ℝ be a continuous convex function, pi, ri (i = 1, …, n) be real numbers and qi, ui (i = 1, …, n) be positive real numbers such that

i=1kuirii=1kuipifork=1,,n1,(14)

and

i=1nuiri=i=1nuipi,(15)

with piqi,riqiJ(i=1,,n).

  1. If rq is decreasing, then

    Ifr,q,uIfp,q,u.(16)

  2. If pq is increasing, then

    Ifr,q,uIfp,q,u.(17)

If f is a continuous concave function, then the reverse inequalities hold in (16) and (17).

Proof

(a): We use Theorem 1.4 (a) with substitutions xi:=piqi,yi:=riqi, wi = ui qi as qi > 0, (i = 1, …, n) then we get (16).

We can prove part (b) with the similar substitutions in Theorem 1.4 (b). □

We present the following theorem as the connection between “useful” Csiszár divergence and weighted majorization theorem as both sequences are decreasing:

Theorem 2.2

Assume J ⊂ ℝ be an interval, f : J → ℝ be a continuous convex function, pi, ri, ui (i = 1, …, n) be real numbers and qi (i = 1, …, n) be positive real numbers such that pqandrq be decreasing satisfying (14) and (15) with piqi,riqiJ(i=1,,n), then

Ifr,q,uIfp,q,u.(18)

Proof

We use Theorem 1.3 with substitutions xi:=piqi,yi:=riqi and wi = ui qi as qi > 0 (i = 1, …, n) then we get (18). □

The following two theorem gives the connection between “useful” Zipf-Mandelbrot law and weighted majorization inequality:

Theorem 2.3

Assume J ⊂ ℝ be an interval, f : J → ℝ be a continuous convex function with ui > 0, n ∈ {1, 2, 3, …}, t1, t2 ≥ 0 and s1, s2 > 0 such that satisfying

i=1kui(i+t2)s2Hn,t2,s2Hn,t1,s1i=1kui(i+t1)s1,k=1,,n1,(19)

and

i=1nui(i+t2)s2=Hn,t2,s2Hn,t1,s1i=1nui(i+t1)s1,(20)

and also let qi > 0, (i = 1, …, n) with

1qi(i+t1)s1Hn,t1,s1,1qi(i+t2)s2Hn,t2,s2J(i=1,,n).

  1. If (i+t2)s2(i+1+t2)s2qi+1qi(i=1,,n), then

    Ifi,n,t2,s2,q,u:=i=1nuiqif1qi(i+t2)s2Hn,t2,s2Ifi,n,t1,s1,q,u:=i=1nuiqif1qi(i+t1)s1Hn,t1,s1.(21)

  2. If (i+t1)s1(i+1+t1)s1qi+1qi(i=1,,n), then

    i=1nuiqif1qi(i+t2)s2Hn,t2,s2i=1nuiqif1qi(i+t1)s1Hn,t1,s1.(22)

If f is continuous concave function, then the reverse inequalities hold in (21) and (22).

Proof

(a) Let us consider that pi:=1(i+t1)s1Hn,t1,s1 and ri:=1(i+t2)s2Hn,t2,s2, then

i=1kuipi:=i=1kui(i+t1)s1Hn,t1,s1=1Hn,t1,s1i=1kui(i+t1)s1,k=1,,n1,

and similarly

i=1kuiri:=1Hn,t2,s2i=1kui(i+t2)s2,k=1,,n1,

leading to

i=1kuirii=1kuipii=1kui(i+t2)s2Hn,t2,s2Hn,t1,s1i=1kui(i+t1)s1,k=1,,n1.

One can see easily that 1(i+t1)s1Hn,t1,s1 is decreasing over i = 1, …, n and similarly ri too. Now, we find the behaviour of rq for qi > 0 (i = 1, 2, …, n), take

riqi=1qi(i+t2)s2Hn,t2,s2andri+1qi+1=1qi+1(i+1+t2)s2Hn,t2,s2,ri+1qi+1riqi=1Hn,t2,s21qi+1(i+1+t2)s21qi(i+t2)s20,(i+t2)s2(i+1+t2)s2qi+1qi,

which shows that rq is decreasing. So, all the assumptions of Theorem 2.1 (a) are true, then by using (16) we get (21).

(b) If we switch the role of ri to pi in the first part (a), then by using (17) in Theorem 2.1 (b) we get (22). □

Theorem 2.4

Assume J ⊂ ℝ be an interval, f : J → ℝ be a continuous convex function with ui ∈ ℝ, n ∈ {1, 2, 3, …}, t1, t2 ≥ 0 and s1, s2 > 0, such that satisfying (19), (20) and

  • (i+t1)s1(i+1+t1)s1qi+1qi(i=1,,n),

  • (i+t2)s2(i+1+t2)s2qi+1qi(i=1,,n),

hold and also let qi > 0, (i = 1, …, n) with

1qi(i+t1)s1Hn,t1,s1,1qi(i+t2)s2Hn,t2,s2J(i=1,,n),

then the following inequality holds

Ifi,n,t2,s2,q,u:=i=1nuiqif1qi(i+t2)s2Hn,t2,s2Ifi,n,t1,s1,q,u:=i=1nuiqif1qi(i+t1)s1Hn,t1,s1.(23)

Proof

Let us consider that pi:=1(i+t1)s1Hn,t1,s1 and ri:=1(i+t2)s2Hn,t2,s2, so as given in the proof of Theorem 2.3, we get y = r/q is decreasing (i+t2)s2(i+1+t2)s2qi+1qi, for (i = 1, …, n), similarly we can prove that x = p/q is also decreasing (i+t1)s1(i+1+t1)s1qi+1qi for (i = 1, …, n). So, all the assumptions of Theorem 2.2 are true, then by using (18) we get (23). □

The following two corollaries obtain form Theorem 5 and Theorem 6 respectively but we use three the Zipf-Mandelbrot laws for different parameters:

Corollary 2.5

Assume J ⊂ ℝ be an interval, f : J → ℝ be a continuous convex function with ui > 0, n ∈ {1, 2, 3, …}, t1, t2 ≥ 0 and s1, s2 > 0 such that satisfying (19) and (20) and

(i+t3)s3Hn,t3,s3(i+t1)s1Hn,t1,s1,(i+t3)s3Hn,t3,s3(i+t2)s2Hn,t2,s2J(i=1,,n).

  1. If (i+1+t2)s2(i+1+t3)s3(i+t2)s2(i+t3)s3(i=1,,n), then

    Ifi,n,t2,s2,t3,s3,u:=i=1nui(i+t3)s3Hn,t3,s3f(i+t3)s3Hn,t3,s3(i+t2)s2Hn,t2,s2Ifi,n,t1,s1,t3,s3,u:=i=1nui(i+t3)s3Hn,t3,s3f(i+t3)s3Hn,t3,s3(i+t1)s1Hn,t1,s1.(24)

  2. If (i+1+t2)s2(i+1+t3)s3(i+t2)s2(i+t3)s3(i=1,,n), then

    i=1nui(i+t3)s3Hn,t3,s3f(i+t3)s3Hn,t3,s3(i+t2)s2Hn,t2,s2i=1nui(i+t3)s3Hn,t3,s3f(i+t3)s3Hn,t3,s3(i+t1)s1Hn,t1,s1.(25)

If f is continuous concave function, then the reverse inequalities hold in (24) and (25).

Proof

(a) Let pi:=1(i+t1)s1Hn,t1,s1,qi:=1(i+t2)s2Hn,t2,s2 and ri:=1(i+t3)s3Hn,t3,s3, here pi, qi and ri are decreasing over i = 1, …, n. Now, we investigate the behaviour of rq, take

riqi=(i+t2)s2Hn,t2,s2(i+t3)s3Hn,t3,s3andri+1qi+1=(i+1+t2)s2Hn,t2,s2(i+1+t3)s3Hn,t3,s3,ri+1qi+1riqi=(i+1+t2)s2Hn,t2,s2(i+1+t3)s3Hn,t3,s3(i+t2)s2Hn,t2,s2(i+t3)s3Hn,t3,s3,ri+1qi+1riqi=Hn,t2,s2Hn,t3,s3(i+1+t2)s2(i+1+t3)s3(i+t2)s2(i+t3)s3,

the R. H. S. is non-positive by using the assumption, which shows that rq is decreasing, therefore using Theorem 5(a) we get (24).

(b) If we switch the role of rq with pq in the part (a) and using Theorem 5(b), we get (25). □

Corollary 2.6

Assume J ⊂ ℝ be an interval, f : J → ℝ be a continuous convex function with ui ∈ ℝ, n ∈ {1, 2, 3, …}, t1, t2 ≥ 0 and s1, s2 > 0, such that satisfying (19) and (20) and

  • (i+t1)s1(i+1+t1)s1(i+t3)s3(i+1+t3)s3(i=1,,n),

  • (i+t2)s2(i+1+t2)s2(i+t3)s3(i+1+t3)s3(i=1,,n),

hold with

(i+t3)s3Hn,t3,s3(i+t1)s1Hn,t1,s1,(i+t3)s3Hn,t3,s3(i+t2)s2Hn,t2,s2J(i=1,,n),

then the following inequality holds

Ifi,n,t2,s2,t3,s3,u:=i=1nui(i+t3)s3Hn,t3,s3f(i+t3)s3Hn,t3,s3(i+t2)s2Hn,t2,s2Ifi,n,t1,s1,t3,s3,u:=i=1nui(i+t3)s3Hn,t3,s3f(i+t3)s3Hn,t3,s3(i+t1)s1Hn,t1,s1.(26)

Proof

(a) Let us consider that pi:=1(i+t1)s1Hn,t1,s1 and ri:=1(i+t2)s2Hn,t2,s2, so as given in the proof of Corollary 2.5 for qi > 0 where (i = 1, 2, … ., n), we get y = r/q is decreasing (i+t2)s2(i+1+t2)s2(i+t3)s3(i+1+t3)s3, for (i = 1, …, n), similarly we can prove that x = p/q is also decreasing (i+t1)s1(i+1+t1)s1(i+t3)s3(i+1+t3)s3 for (i = 1, …, n). Therefore, all the assumptions of Theorem 2.4 are true, then by using (23) we get (26). □

Remark 2.7

We can give Theorem 2.1, Theorem 2.2, Theorem 2.3, Theorem 2.4, Corollary 2.5 and Corollary 2.6 for u := 1 as special case, some of them has been given in [14].

3 "Useful” information measure via Green functions

Consider the Green function G1 defined on [ϑ1, ϑ2] × [ϑ1, ϑ2] by

G1(u,v)=(uϑ2)(vϑ1)ϑ2ϑ1,ϑ1uv;(vϑ2)(uϑ1)ϑ2ϑ1,uvϑ2.(27)

The function G1 is convex in v, it is symmetric, so it is also convex in u. The function G1 is continuous in v and continuous in u.

For any function f : [ϑ1, ϑ2] → ℝ, fC2([ϑ1, ϑ2]), we can easily show by integrating by parts that the following is valid

f(u)=ϑ2uϑ2ϑ1f(ϑ1)+uϑ1ϑ2ϑ1f(ϑ2)+ϑ1ϑ2G(u,v)f(v)dv,

where the function G1 is defined as above in (27) ([31]).

Let [ϑ1, ϑ2] ⊂ ℝ and d = 2, 3, 4, 5. Recently in (2017), Mehmood et al. [32] (also see [33]) introduced some new types of Green functions, Gd: [ϑ1, ϑ2] × [ϑ1, ϑ2] → ℝ and give Lemma 1, which are defined as follows:

G2(u,v)=(ϑ1v),ϑ1vu,(ϑ1u),uvϑ2,(28)

G3(u,v)=(uϑ2),ϑ1vu,(vϑ2),uvϑ2,(29)

G4(u,v)=(uϑ1),ϑ1vu,(vϑ1),uvϑ2,(30)

G5(u,v)=(ϑ2v),ϑ1vu,(ϑ2u),uvϑ2.(31)

Lemma 3.1

Let f : [ϑ1, ϑ2] → ℝ such that fC2([ϑ1, ϑ2]) and Gd (d = 2, 3, 4, 5) be Green functions as defined in (28), (29), (30) and (31), then we have the following identities.

f(u)=f(ϑ1)+(uϑ1)f(ϑ2)+ϑ1ϑ2G2(u,v)f(v)dv,(32)

f(u)=f(ϑ2)+(uϑ2)f(ϑ1)+ϑ1ϑ2G3(u,v)f(v)dv,(33)

f(u)=f(ϑ2)(ϑ2ϑ1)f(ϑ2)+(uϑ1)f(ϑ1)+ϑ1ϑ2G4(u,v)f(v)dv,(34)

f(u)=f(ϑ1)+(ϑ2ϑ1)f(ϑ1)(ϑ2u)f(ϑ2)+ϑ1ϑ2G5(u,v)f(v)dv.(35)

The following theorem gives the equivalent statements between continuous convex functions and Green functions via majorization inequality and “useful” Csiszár divergence.

Theorem 3.2

Assume J ⊂ ℝ be an interval, pi, ri (i = 1, …, n) be real numbers and qi, ui (i = 1, …, n) be positive real numbers such that satisfying

i=1nuiri=i=1nuipi,(36)

with piqi,riqiJ(i=1,,n). If rq is decreasing and Gd (d = 1, 2, 3, 4, 5) be defined as in (27)-(31), then we have following equivalent statements.

  1. For every continuous convex function f : [ϑ1, ϑ2] → ℝ, we have

    Ifp,q,uIfr,q,u0.(37)

  2. For all v ∈ [ϑ1, ϑ2], we have

    IGdp,q,uIGdr,q,u0,d=1,2,3,4,5.(38)

Moreover, if we change the sign of inequality in both inequalities (37) and (38), then the above result still holds.

Proof

The scheme of proof is similar for each d = 1, 2, 3, 4, 5, therefore we will only give the proof for d = 5.

(i) ⇒ (ii): Let statement (i) holds. As the function G5: [ϑ1, ϑ2] × [ϑ1, ϑ2] → ℝ is convex and continuous, so it will satisfy the condition (37), i.e.,

IG5p,q,uIG5r,q,u0.

(ii) ⇒ (i): Let f : [ϑ1, ϑ2] → ℝ be a convex function such that fC2([ϑ1, ϑ2]), and further, assume that the statement (ii) holds. Then by Lemma 3.1, we have

f(xi)=f(ϑ1)+(ϑ2ϑ1)f(ϑ1)(ϑ2xi)f(ϑ2)+ϑ1ϑ2G5(xi,v)f(v)dv,(39)

f(yi)=f(ϑ1)+(ϑ2ϑ1)f(ϑ1)(ϑ2yi)f(ϑ2)+ϑ1ϑ2G5(yi,v)f(v)dv.(40)

From (39) and (40), we get

Ifp,q,uIfr,q,u=i=1nuiqifpiqii=1nuiqifriqi=i=1nuiqiϑ2piqif(ϑ2)+i=1nuiqiϑ2riqif(ϑ2)+ϑ1ϑ2i=1nuiqiG5piqi,vi=1nuiqiG5riqi,vf(v)dv.(41)

Using (36), we have

Ifp,q,uIfr,q,u=ϑ1ϑ2i=1nuiqiG5piqi,vi=1nuiqiG5riqi,vf(v)dv.(42)

As f is convex function, therefore f″(v) ≥ 0 for all v ∈ [ϑ1, ϑ2]. Hence using (38) in (42), we get (37).

Note that the condition for the existence of second derivative of f is not necessary ([21, p.172]). As it is possible to approximate uniformly a continuous convex function by convex polynomials, so we can directly eliminate this differentiability condition. □

The following theorem gives equivalent statements between continuous convex functions and Green functions via majorization inequality and “useful” Zipf-Mandelbrot law.

Theorem 3.3

Assume n ∈ {1, 2, 3, …}, t1, t2 ≥ 0 and s1, s2 > 0 such that satisfying

i=1nui(i+t2)s2=Hn,t2,s2Hn,t1,s1i=1nui(i+t1)s1,(43)

with

1qi(i+t1)s1Hn,t1,s1,1qi(i+t2)s2Hn,t2,s2J(i=1,,n).

If (i+t2)s2(i+1+t2)s2qi+1qi(i=1,,n) and Gd (d = 1, 2, 3, 4, 5) be defined as in (27)-(31), then we have following equivalent statements.

  1. For every continuous convex function f : [ϑ1, ϑ2] → ℝ, we have

    Ifi,n,t1,s1,q,uIfi,n,t2,s2,q,u0.(44)

  2. For all v ∈ [ϑ1, ϑ2], we have

    IGdi,n,t1,s1,q,uIGdi,n,t2,s2,q,u0,d=1,2,3,4,5.(45)

Moreover, if we change the sign of inequality in both inequalities (44) and (45), then the above result still holds.

Proof

(i) ⇒ (ii): The proof is similar to the proof of Theorem 3.2.

(ii) ⇒ (i): Let f : [ϑ1, ϑ2] → ℝ be a convex function such that fC2([ϑ1, ϑ2]), and further, assume that the statement (ii) holds. Then by Lemma 3.1, we have (39) and (40).

From (39) and (40), we get

Ifi,n,t1,s1,q,uIfi,n,t2,s2,q,u=i=1nuiqifλii=1nuiqifμi=i=1nuiqiϑ2λif(ϑ2)+i=1nuiqiϑ2μif(ϑ2)+ϑ1ϑ2i=1nuiqiG5λi,vi=1nuiqiG5μi,vf(v)dv,

where,

λi:=1qi(i+t1)s1Hn,t1,s1,andμi:=1qi(i+t2)s2Hn,t2,s2.

Using (43), we have

Ifi,n,t1,s1,q,uIfi,n,t2,s2,q,u=ϑ1ϑ2i=1nuiqiG5λi,vi=1nuiqiG5μi,vf(v)dv.(46)

As f is convex function, therefore f″(v) ≥ 0 for all v ∈ [ϑ1, ϑ2]. Hence using (45) in (46), we get (44). □

4 “Useful” information measure in integral form

The following theorem is a slight extension of Lemma 2 in [34] which is proved by Maligranda et al. (also see [35]):

Theorem 4.1

Let w, x and y be positive functions on [a, b]. Suppose that f : [0, ∞) → ℝ is a convex function and that

aνy(t)w(t)dtaνx(t)w(t)dt,ν[a,b]andaby(t)w(t)dt=abx(t)w(t)dt.

  1. If y is a decreasing function on [a, b], then

    abfy(t)w(t)dtabfx(t)w(t)dt.(47)

  2. If x is an increasing function on [a, b], then

    abfx(t)w(t)dtabfy(t)w(t)dt.(48)

If f is strictly convex function and xy (a. e.), then (47) and (48) are strict.

We consider “useful” Csiszár functional [11, 12] in integral form:

Definition 4.2

(“Useful” Csiszár divergence as integral form). Assume J := [α, β] ⊂ ℝ be an interval, and let f : J → ℝ be a function with densities p : [a, b] → J, q : [a, b] → (0,∞) and associated with the utility density u : [a, b] → J such that

p(x)q(x)J,x[a,b],

then we denote “useful” Csiszár divergence in integral form as

I^f(p,q,u):=abu(t)q(t)fp(t)q(t)dt.(49)

Remark 4.3

One can easily seen that if we substitute u(t) = 1 for all t ∈ [a, b], then (49) becomes

I^fp,q,1:=I^fp,q=abq(t)fp(t)q(t)dt.

Theorem 4.4

Assume J := [0, ∞) ⊂ ℝ be an interval, f : J → ℝ be a convex function and p, q, r, u : [a, b] → (0, ∞) such that

aυu(t)r(t)dtaυu(t)p(t)dt,υ[a,b](50)

and

abu(t)r(t)dt=abu(t)p(t)dt,(51)

with

p(t)q(t),r(t)q(t)J,t[a,b].

  1. If r(t)q(t) is a decreasing function on [a, b], then

    I^f(r,q,u)I^f(p,q,u).(52)

  2. If p(t)q(t) is an increasing function on [a, b], then the inequality is reversed, i.e.

    I^f(r,q,u)I^f(p,q,u).(53)

If f is strictly convex function and p(t) ≠ r(t) (a. e.), then strict inequality holds in (52) and (53).

If f is concave function then the reverse inequalities hold in (52) and (53). If f is strictly concave and p(t) ≠ r(t) (a. e.), then the strict reverse inequalities hold in (52) and (53).

Proof

(i): We use Theorem 4.1 (i) with substitutions x(t):=p(t)q(t),y(t):=r(t)q(t),w(t):=u(t)q(t)>0t[a,b] and also using the fact that r(t)q(t) is a decreasing function then we get (52).

(ii) We can prove with the similar substitutions as in the first part by using Theorem 4.1 (ii) that is the fact that p(t)q(t) is an increasing function. □

Remark 4.5

We can give Theorem 4.4 for u(t) := 1 for all t ∈ [a, b] as special case which has been given in [36].

5 Applications

Here, we present several special cases of the previous results as applications.

The first case corresponds to the entropy of a continuous probability density (see [18, p.506]):

Definition 5.1

(Shannon Entropy). Let p : [a, b] → (0, ∞) be a positive probability density, then the Shannon entropy of p(x) is defined by

Hp(x),u(x):=abu(x)p(x)logp(x)dx,(54)

and is associated with the utility density u : [a, b] → ℝ, whenever the integral exists.

Note that there is no problem with the definition in the case of a zero probability, since

limx0xlogx=0.(55)

Corollary 5.2

Assume p, q, r, u : [a, b] → (0, ∞) be functions such that satisfying (50) and (51) with

p(t)q(t),r(t)q(t)J:=(0,),t[a,b].

  1. If r(t)q(t) is a decreasing function and the base of log is greater than 1, then we have estimates for the Shannon entropy of q(t) associatedy with utility density u(t)

    abu(t)q(t)logr(t)q(t)H(q(t),u(t)).(56)

    If the base of log is in between 0 and 1, then the reverse inequality holds in (56).

  2. If p(t)q(t) is an increasing function and the base of log is greater than 1, then we have estimates for the Shannon entropy of q(t) associated with utility density u(t)

    Hq(t),u(t)abu(t)q(t)logp(t)q(t).(57)

    If the base of log is in between 0 and 1, then the reverse inequality holds in (57).

Proof

(i): Substitute f(x) := − log x and p(t) := 1, ∀ t ∈ [a, b] in Theorem 4.4 (i) then we get (56).

(ii) We can prove by switching the role of p(t) with r(t) i.e., r(t) := 1 ∀ t ∈ [a, b] and f(x) := − log x in Theorem 4.4 (ii) then we get (57). □

The second case corresponds to the relative entropy or the Kullback-Leibler divergence between two probability densities associated with the utility density u(t):

Definition 5.3

(Kullback-Leibler Divergence). Let p, q : [a, b] → (0, ∞) be a positive probability densities, then the Kullback-Leibler (K-L) divergence between p(t) and q(t) is defined by

Lp(t),q(t),u(t):=abu(t)p(t)logp(t)q(t)dt,

and is associated with the utility density u : [a, b] → ℝ.

Corollary 5.4

Assume p, q, r, u : [a, b] → (0, ∞) be functions such that satisfying (50) and (51) with

p(t)q(t),r(t)q(t)J:=(0,),t[a,b].

  1. If r(t)q(t) is a decreasing function and the base of log is greater than 1, then

    I^(logx)(r,q,u)I^(logx)(p,q,u).(58)

    If the base of log is in between 0 and 1, then the reverse inequality holds in (58).

  2. If p(t)q(t) is an increasing function and the base of log is greater than 1, then

    I^(logx)(r,q,u)I^(logx)(p,q,u).(59)

    If the base of log is in between 0 and 1 then the reverse inequality holds in (59).

Proof

(i): Substitute f(x) := − log x in Theorem 4.4 (i) then we get (58).

(ii) We can prove with substitution f(x) := − log x in Theorem 4.4 (ii). □

In Information Theory and Statistics, various divergences are applied in addition to the Kullback-Leibler divergence.

Definition 5.5

(Variational Distance). Let p, q : [a, b] → (0, ∞) be a positive probability densities, then variation distance between p(t) and q(t) is defined by

I^vp(t),q(t),u(t):=abu(t)|p(t)q(t)|dt,

and associated with the utility density u : [a, b] → ℝ.

Corollary 5.6

Assume p, q, r, u : [a, b] → (0, ∞) be functions such that satisfying (50) and (51) with

p(t)q(t),r(t)q(t)J:=(0,),t[a,b].

  1. If r(t)q(t) is a decreasing function, then

    I^vr(t),q(t),u(t)I^vp(t),q(t),u(t).(60)

  2. If p(t)q(t) is an increasing function, then the inequality is reversed, i.e.

    I^vr(t),q(t),u(t)I^vp(t),q(t),u(t).(61)

Proof

(i): Since f(x) := ∣x − 1∣ be a convex function for x ∈ ℝ+, therefore substitute f(x) := ∣x − 1∣ in Theorem 4.4 (i) then

abu(t)q(t)r(t)q(t)1dtabu(t)q(t)p(t)q(t)1dt,abu(t)q(t)r(t)q(t)q(t)dtabu(t)q(t)p(t)q(t)q(t)dt,

since q(t) > 0 then we get (60).

(ii) We can prove with substitution f(x) := ∣x − 1 ∣ in Theorem 4.4 (ii). □

Definition 5.7

(Hellinger Distance). Let p, q : [a, b] → (0, ∞) be a positive probability densities, then the Hellinger distance between p(t) and q(t) is defined by

I^Hp(t),q(t),u(t):=abu(t)p(t)q(t)2dt,

and is associated with the utility density u : [a, b] → ℝ.

Corollary 5.8

Assume p, q, r, u : [a, b] → (0, ∞) be functions such that satisfying (50) and (51) with

p(t)q(t),r(t)q(t)J:=(0,),t[a,b].

  1. If r(t)q(t) is a decreasing function, then

    I^Hr(t),q(t),u(t)I^Hp(t),q(t),u(t).(62)

  2. If p(t)q(t) is an increasing function, then the inequality is reversed, i.e.

    I^Hr(t),q(t),u(t)I^Hp(t),q(t),u(t).(63)

Proof

(i): Since f(x):=x12 is a convex function for x ∈ ℝ+, therefore substituting f(x):=x12 in Theorem 4.4 (i)

abu(t)q(t)r(t)q(t)12dtabu(t)q(t)p(t)q(t)12dt,

since q(t) > 0 then we get (62).

(ii) We can prove with substitution f(x):=x12 in Theorem 4.4 (ii). □

Definition 5.9

(Bhattacharyya Distance). Let p, q : [a, b] → (0, ∞) be a positive probability densities, then the Bhattacharyya distance between p(t) and q(t) is defined by

I^Bp(t),q(t),u(t):=abu(t)p(t)q(t)dt,

and associated with the utility density u : [a, b] → ℝ.

Corollary 5.10

Assume p, q, r, u : [a, b] → (0, ∞) be functions such that satisfying (50) and (51) with

p(t)q(t),r(t)q(t)J:=(0,),t[a,b].

  1. If r(t)q(t) is a decreasing function, then

    I^Bp(t),q(t),u(t)I^Br(t),q(t),u(t).(64)

  2. If p(t)q(t) is an increasing function, then the inequality is reversed, i.e.

    I^Bp(t),q(t),u(t)I^Br(t),q(t),u(t).(65)

Proof

(i): Since f(x):=x be a convex function for x ∈ ℝ+, therefore substitute f(x):=x in Theorem 4.4 (i) then

abu(t)q(t)r(t)q(t)dtabu(t)q(t)p(t)q(t)dt,

we get (64).

(ii) We can prove with substitution f(x):=x in Theorem 4.4 (ii). □

Definition 5.11

(Jeffreys Distance). Let p, q : [a, b] → (0, ∞) be a positive probability densities, then the Jeffreys distance between p(t) and q(t) is defined by

I^Jp(t),q(t),u(t):=abu(t)p(t)q(t)lnp(t)q(t)dt,

and associated with the utility density u : [a, b] → ℝ.

Corollary 5.12

Assume p, q, r, u : [a, b] → (0, ∞) be functions such that satisfying (50) and (51) with

p(t)q(t),r(t)q(t)J:=(0,),t[a,b].

  1. If r(t)q(t) is a decreasing function, then

    I^Jr(t),q(t),u(t)I^Jp(t),q(t),u(t).(66)

  2. If p(t)q(t) is an increasing function, then the inequality is reversed, i.e.

    I^Jr(t),q(t),u(t)I^Jp(t),q(t),u(t).(67)

Proof

(i): Since f(x) := (x − 1) ln x be a convex function for x ∈ ℝ+, therefore substituting f(x) := (x − 1) ln x in Theorem 4.4 (i)

abu(t)q(t)r(t)q(t)1lnr(t)q(t)dtabu(t)q(t)p(t)q(t)1lnp(t)q(t)dt,

we get (66).

(ii) We can prove with substitution f(x) := (x − 1) ln x in Theorem 4.4 (ii). □

Definition 5.13

(Triangular Discrimination). Let p, q : [a, b] → (0, ∞) be a positive probability densities, then the triangular discrimination between p(t) and q(t) is defined by

I^Δp(t),q(t),u(t):=abu(t)p(t)q(t)2p(t)+q(t)dt,

and is associated with the utility density u : [a, b] → ℝ.

Corollary 5.14

Assume p, q, r, u : [a, b] → (0, ∞) be functions such that satisfying (50) and (51) with

p(t)q(t),r(t)q(t)J:=(0,),t[a,b].

  1. If r(t)q(t) is a decreasing function, then

    I^Δr(t),q(t),u(t)I^Δp(t),q(t),u(t).(68)

  2. If p(t)q(t) is an increasing function, then the inequality is reversed, i.e.

    I^Δr(t),q(t),u(t)I^Δp(t),q(t),u(t).(69)

Proof

(i): Since f(x):=(x1)2x+1 be a convex function for x ≥ 0, therefore substitute f(x):=(x1)2x+1 in Theorem 4.4 (i) then

abu(t)q(t)r(t)/q(t)12r(t)/q(t)+1dtabu(t)q(t)p(t)/q(t)12p(t)/q(t)+1dt,abu(t)q(t)(r(t)q(t))/q(t)2(r(t)+q(t))/q(t)dtabu(t)q(t)(p(t)q(t))/q(t)2(p(t)+q(t))/q(t)dt,

we get (68).

(ii) We can prove with substitution f(x):=(x1)2x+1 in Theorem 4.4 (ii). □

Remark 5.15

We can give all the results of section 5 for u(t) = 1 for all t ∈ [a, b] as a special case, which has been given in [36].

Acknowledgement

The publication was supported by the Ministry of Education and Science of the Russian Federation (the Agreement number No. 02.a03.21.0008.) This publication is partially supported by Royal Commission (RC) Jubail Industrial College, Jubail, Kingdom of Saudi Arabia.

References

  • [1]

    Saichev A., Malevergne Y., Sornette D., Theory of Zipf’s law and beyond, in: Lecture notes in Economics and Mathematical systems 362, Berlin: Springer, 2009. Google Scholar

  • [2]

    Zipf G. K., The psychobiology of language, Cambridge, MA: Houghton-Mifflin, 1935. Google Scholar

  • [3]

    Zipf G. K., Human behavior and the principle of least effort, Reading, MA: Addison-Wesley, 1949. Google Scholar

  • [4]

    Baxter G., Frean M., Noble J., Rickerby M., Smith H., Visser M., Melton H., Tempero E., Understanding the shape of Java software, OOPSLA Proc. 21st Annual ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications, Eds. Tarr P.L., Cook W.R., New York: ACM, 2006, 379-412. Google Scholar

  • [5]

    Clauset A., Shalizi C. R., Newman M. E. J., Power-law distributions in empirical data, SIAM Rev., 2009, 51, 661–703. CrossrefWeb of ScienceGoogle Scholar

  • [6]

    Newmann M. E. J., Power laws, Pareto distributions and Zipf’s law, Contemp. Phys., 2007, 46, 323-351. Google Scholar

  • [7]

    Richardson L. F., Statistics of deadly quarrels, Pacific Grove, CA: Boxwood Press, 1960. Google Scholar

  • [8]

    Visser M., Zipf’s law, power laws and maximmum entropy, New J. Phys., 2013, 15, 1-13. Google Scholar

  • [9]

    Matić M., Pearce C. E. M., Pečarić J., Some comparison theorems for the mean-value characterization of “useful” information measures, SEA Bull. Math., 1999, 23, 111-116. Google Scholar

  • [10]

    Bhaker U. S., Hooda D. S., Mean value characterization of ’useful’ information measures, Tamkang J. Math., 1993, 24, 383-394. Google Scholar

  • [11]

    Csiszár I., Information-type measures of difference of probability distributions and indirect observations, Studia Sci. Math. Hungar., 1967, 2, 299-318. Google Scholar

  • [12]

    Csiszár I., Information measure: A critical survey, Trans. 7th Prague Conf. on Info. Th., Statist. Decis. Funct., Random Processes and 8th European Meeting of Statist., B, Academia Prague, 1978, 73-86. Google Scholar

  • [13]

    Horváth L., Pečarić Ð., and Pečarić J., Estimations off- and Rényi divergences by using a cyclic refinement of the Jensen’s inequality, Bull. Malays. Math. Sci. Soc., 2017, . CrossrefGoogle Scholar

  • [14]

    Latif N., Pečarić Ð., Pečarić J., Majorization, Csisár divergence and Zipf-Mandelbrot law, J. Inequal. Appl., 2017, 1-15. Google Scholar

  • [15]

    Latif N., Pečarić Ð., Pečarić J., Majorization and Zipf-Mandelbrot law, submitted. Google Scholar

  • [16]

    Matić M., Pearce C. E. M., Pečarić J., Improvements of some bounds on entropy measures in information theory, Math. Inequal. Appl., 1998, 1, 295-304, 1998. Google Scholar

  • [17]

    Matić M., Pearce C. E. M., Pečarić J., On an inequality for the entropy of a probability distribution, Acta Math. Hungar., 1999, 85, 345-349. CrossrefGoogle Scholar

  • [18]

    Matić M., Pearce C. E. M., Pečarić J., Shannon’s and related inequalities in information theory, Survey on classical inequalities, editor Themistocles M. Rassias, Kluwer Academic Publishers, 2000, 127-164. Google Scholar

  • [19]

    Fuchs L., A new proof of an inequality of Hardy-Littlewood-Polya, Math. Tidsskr, 1947, 53-54. Google Scholar

  • [20]

    Marshall A. W., Olkin I., Arnold B. C., Inequalities: Theory of Majorization and Its Applications (Second Edition), Springer Series in Statistics, New York, 2011. Web of ScienceGoogle Scholar

  • [21]

    Pečarić J., Proschan F., Tong Y. L., Convex functions, Partial Orderings and Statistical Applications, Academic Press, New York, 1992. Google Scholar

  • [22]

    Niculescu C. P., Persson L. E., Convex functions and their applications, A contemporary Approach, CMS Books in Mathematics, 23, Springer-Verlag, New York, 2006. Google Scholar

  • [23]

    Adil Khan M., Latif N., and Pečarić J., Generalization of majorization theorem, J. Math. Inequal., 2015, 9(3), 847-872. Google Scholar

  • [24]

    Adil Khan M., Khalid S., Pečarić J., Refinements of some majorization type inequalities, J. Math. Inequal., 2013, 7(1), 73-92. Web of ScienceGoogle Scholar

  • [25]

    Adil Khan M., Pečarić D., Pečarić J., Bounds for Shannon and Zipf-mandelbrot entropies, Math. Methods Appl. Sci., 2017, 40(18), 7316-7322. CrossrefWeb of ScienceGoogle Scholar

  • [26]

    Mandelbrot B., Information Theory and Psycholinguistics: A Theory of Words Frequencies, In Reading in Mathematical Social Science, (ed.) P. Lazafeld, N. Henry Cambridge MA, MIT Press, 1966. Google Scholar

  • [27]

    Montemurro M. A., Beyond the Zipf-Mandelbrot law in quantitative linguistics, 2001, URL: arXiv:cond-mat/0104066v2. Google Scholar

  • [28]

    Egghe L., Rousseau R., Introduction to Informetrics. Quantitative Methods in Library, Documentation and Information Science, Elsevier Science Publishers, New York, 1990. Google Scholar

  • [29]

    Silagadze Z. K., Citations and the Zipf-Mandelbrot Law, Complex Systems, 1997, (11), 487-499. Google Scholar

  • [30]

    Mouillot D., Lepretre A., Introduction of relative abundance distribution (RAD) indices, estimated from the rank-frequency diagrams (RFD), to assess changes in community diversity, Environmental Monitoring and Assessment, Springer, 2000, 63 (2), 279-295. Google Scholar

  • [31]

    Widder D. V., Completely convex function and Lidstone series, Trans. Am. Math. Soc., 1942, 51, 387-398. CrossrefGoogle Scholar

  • [32]

    Mehmood N., Agarwal R. P., Butt S. I., Pečarić J., New generalizations of Popoviciu-type inequalities via new Green’s functions and Montgomery identity, J. Inequl. Appl., 2017, 108, 1-21. Google Scholar

  • [33]

    Butt S. I., Khan K. A., Pečarić J., Popoviciu type inequalities via Green function and Generalized Montgomery Identity, Math. Inequal. Appl., 2015, 18(4), 1519-1538. Web of ScienceGoogle Scholar

  • [34]

    Maligranda L., Pečarić J., Persson L. E., Weighted Favard’s and Berwald’s inequalities, J. Math. Anal. Appl., 1995, 190, 248-262. CrossrefGoogle Scholar

  • [35]

    Latif N., Pečarić J., Perić I., On Majorization, Favard and Berwald’s Inequalities, Annals of Functional Analysis, 2011, 2, no. 1, 31-50, ISSN: 2008-8752. CrossrefWeb of ScienceGoogle Scholar

  • [36]

    Latif N., Pečarić Ð., Pečarić J., Majorization in Information Theory, JIASF, 2017, (8) 4, 42-56. Google Scholar

  • [37]

    Matić M., Pearce C. E. M., Pečarić J., Some refinements of Shannon’s inequalities, ANZIAM J. (formerly J. Austral. Math. Soc. Ser. B), 2002, 43, 493-511.Google Scholar

About the article

Received: 2017-12-08

Accepted: 2018-06-20

Published Online: 2018-12-26


Author’s contribution All authors contributed equally. All authors read and approved the final manuscript.

Competing interest The authors declare that they have no competing interests.


Citation Information: Open Mathematics, Volume 16, Issue 1, Pages 1357–1373, ISSN (Online) 2391-5455, DOI: https://doi.org/10.1515/math-2018-0113.

Export Citation

© 2018 Latif et al., published by De Gruyter. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in