Sherman’s and related inequalities with applications in information theory

In this paper we give extensions of Sherman’s inequality considering the class of convex functions of higher order. As particular cases, we get an extended weighted majorization inequality as well as Jensen’s inequality which have direct connection to information theory. We use the obtained results to derive new estimates for Shannon’s and Rényi’s entropy, information energy, and some well-known measures between probability distributions. Using the Zipf–Mandelbrot law, we introduce new functionals to derive some related results.


Introduction and preliminaries
We start with a brief overview of divided differences and n-convex functions and give some basic results from the majorization theory.
For two vectors x, y ∈ [α, β] l , let x [i] , y [i] denote the ith largest entries of x and y, respectively. It is well known that x [i] for m = 1, 2, . . . , l -1 and i.e., we say that x majorizes y, in symbol y ≺ x, iff y = xA for some doubly stochastic matrix A = (a ij ) ∈ M l (R), i.e., a matrix with nonnegative entries and rows and columns sums equal to 1. Moreover, y ≺ x implies for every continuous convex function φ : [α, β] → R. This result, obtained by Hardy et al. (1929 [2]), is well known as a majorization inequality and plays an important role in the study of majorization theory.
Sherman [3] considered the weighted concept of majorization  where A T denotes the transpose matrix.
As a special case of Sherman's inequality, when l = m and a j = b i , for all i, j = 1, . . . , m, we get the weighted version of majorization inequality Putting m i=1 a i = 1 and y 1 = y 2 = · · · = y m = m i=1 a i x i , we get Jensen's inequality in the form We can get Jensen's inequality (1.7) directly from (1.5) by setting l = 1 and b = (1). The concept of majorization has a large number of appearances in many different fields of applications, particular in many branches of mathematics. A complete and superb reference on the subject is the monograph [4], and many results from the theory of majorization are directly or indirectly inspired by it. In this paper we give extensions of Sherman's inequality by considering the class of convex functions of higher order. As a particular case, we get an extension of weighted majorization inequality and Jensen's inequality which can be used to derive some new estimates for some entropies and measures between probability distributions. Also, we use the Zipf-Mandelbrot law to illustrate the obtained results.

Some technical lemmas
In this section we present two technical lemmas that give us two identities which will be very useful for us to obtain main results.
Let us consider the function G : which presents Green's function of the boundary value problem This function is convex and continuous with respect to both variables x and y. Integration by parts easily yields that, for any function φ ∈ C 2 ([α, β]), the following holds: Since ( i.e., we get identity (2.3).
We use the Abel-Gontscharoff interpolation for two points with integral remainder to obtain another identity.
Let n, k ∈ N, n ≥ 2, 0 ≤ k ≤ n -1, and φ ∈ C n ([α, β]). Then is the Abel-Gontscharoff interpolating polynomial for two points of degree n -1, and the remainder is given by (2.5) Further, for α ≤ u, t ≤ β, the following inequalities hold: For more information, see [5]. Now we use interpolation (2.4) on φ to obtain the second identity.
, and G, G n,k be defined by (2.1), (2.5), respectively. Then, for every function φ ∈ C n ([α, β]), the following identity holds: Proof If we apply formula (2.4) to a function φ , it implies substitution of n with n -2 in (2.4), and we get , we obtain the required result.

Extensions of Sherman's inequality
We start this section with an extension of Sherman's inequality to a more general class of n-convex functions.
If the reverse inequality in (3.1) holds, then also the reverse inequality in (3.2) holds.
Proof Under the assumptions of the theorem, identity (2.7) holds. Since φ is n-convex, Remark 2 Since we have (-1) n-k-3 G n-2,k (y, u) ≥ 0 by (2.6), then in case nk is odd, instead assumption (3.1), it is enough to assume that The following extension of Sherman's inequality, under Sherman's condition of nonnegativity of vectors a, b, and matrix A, also holds.
Remark 3 In case nk is even, then the reverse inequality in (3.1) holds, i.e., the reverse inequality in (3.2) holds.
Theorem 3 Let all the assumptions of Theorem 2 be satisfied.
Proof (i) Under the assumptions,the nonnegativity of the right-hand side of (3.2) is obvious, i.e., the double inequality (3.3) holds.
(ii) The right-hand side of (3.2) can be written in the form m j=1 a j F( i.e., we again get the nonnegativity of the right-hand side of (3.2), which we need to prove.

Remark 4 Note that inequality (3.3) includes a new lower bound for Sherman's difference in the form
Specially, for n = 4, k = 1, the lower bound has the form (3.6) Using notation · p for the standard p-norm and applying the well-known Hölder inequality, we obtain the following result.
where G(y) = m j=1 a j G(x j , y) -l i=1 b i G(y i , y) and (3.8) Remark 5 Specially, if we set l = m and a j = b i for each i, j = 1, . . . , l, from the previous result, as a direct consequence, we obtain the following extension of majorization inequality: Remark 6 By setting l = 1, b = (1), from (3.7), as a direct consequence, we get the extension of Jensen's inequality

Applications in information theory
Throughout the rest of paper, let α, β be positive real numbers such that 0 < α < β. By X we denote a discrete random variable with distribution x 1 x 2 . . .

Shannon entropy [6] is defined by
It is well known that the maximum Proof Substituting ξ i in place of x i , p i in place of a i in (3.10) and choosing φ(x) = -ln x, we obtain (4.1).
Proof Substituting p λ-1 i in place of ξ i in (4.1), we obtain the required result.
(ii) Substituting ξ i in place of x i , p i in place of a i in (3.10), and choosing φ(x) = x λ , λ ∈ (1, ∞), we obtain the required result.
Proof (i) Substituting p i in place of ξ i in (4.2) and taking into account that we get (4.4).
Let u, v be two positive probability distributions. The following measures are well known in information theory: • Hellinger discrimination: • χ 2 -divergence: • Triangular discrimination: When t = 0, we get so-called Zipf 's law.
The Zipf-Mandelbrot, as well as Zipf 's, law has wide applications in many branches of science as well as linguistics [15], information sciences [16,17], ecological field studies [18], etc. For more information, see also [15,19].

Conclusions
In this paper we have given generalized results for Sherman's inequality by considering the class of convex functions of higher order. We obtained an extended weighted majorization inequality as well as Jensen's inequality as special cases directly connected to information theory. We used the obtained results to derive new estimates for Shannon's and Rényi's entropy, information energy, and some well-known measures between probability distributions. Using the Zipf-Mandelbrot law, we introduced new functionals to derive some related results.