About the sharpness of the Jensen inequality

The main aim of this paper is to give an improvement of the recent result on the sharpness of the Jensen inequality. The results given here are obtained using different Green functions and considering the case of the real Stieltjes measure, not necessarily positive. Finally, some applications involving various types of f-divergences and Zipf–Mandelbrot law are presented.


Introduction
The Jensen inequality is one of the most famous and most important inequalities in mathematical analysis.
In [2], some estimates about the sharpness of the Jensen inequality are given. In particular, the difference is estimated, where ϕ is a convex function of class C 2 .
The authors in [2] expanded ϕ(f (x)) around any given value of f (x), say around c = f (x 0 ), which can be arbitrarily chosen in the domain I of ϕ, such that c = f (x 0 ) is in the interior of I, and as their first result, they get the following inequalities: where I 2 denotes the domain of ϕ .
The main aim of our paper is to give an improvement of that result using various Green functions and considering the case of the real Stieltjes measure, not necessarily positive.
All these functions are convex and continuous with respect to both s and t.
Proof By integrating by parts we get The other identities are proved analogously.
Remark 1 The result (7) given in the previous lemma represents a special case of the representation of the function ϕ using the so-called "two-point right focal" interpolating polynomial in the case where n = 2 and p = 0 (see [1]).
Using the results from the previous lemma, the authors in [16] and [17] gave the uniform treatment of the Jensen-type inequalities, allowing the measure also to be negative. In this paper, we give some further interesting results.
Remark 2 We can get the same result using the Lagrange mean-value theorems from [16] and [17], as they state that, for the functions g, ϕ, λ, G k (k = 0, 1, 2, 3, 4) defined as in the previous theorem, if inequality (14) or the reverse inequality in (14) holds for all s ∈ [α, β], The next result represents an improvement of the aforementioned result from [2]. (1)-(5). Let x 0 ∈ [a, b] be arbitrarily chosen, and let g(x 0 ) = c. If for any k ∈ {0, 1, 2, 3, 4}, inequality (14) or the the reverse inequality in (14) holds for all Proof Let x 0 ∈ [a, b] be arbitrarily chosen, and let g( Under the prepositions of the previous corollary, applying (22) in (15) and then using the triangle inequality, we get:

Discrete case
Discrete Jensen's inequality states that for a convex function ϕ : I → R, I ⊆ R, an n-tuple x = (x 1 , . . . , x n ) (n ≥ 2), and a nonnegative n-tuple r = (r 1 , . . . , r n ) such that n i=1 r i > 0. In [16] and [17], we have a generalization of that result. It is allowed that r i can also be negative with the sum different from 0, but there is given an additional condition on r i , x i in terms of the Green functions G k : [α, β] × [α, β] → R defined in (1)-(5).
To simplify the notation, we denote R n = n i=1 r i and x = 1 As we already know (from Lemma 1) how to represent every function ϕ ∈ C 2 [α, β] in an adequate form using the previously defined functions G k (k = 0, 1, 2, 3, 4), by some calculation it is easy to show that Similarly to the integral case, applying the Hölder inequality to (24), we get the following result.  (1)- (5). Furthermore, let p, q ∈ R, 1 ≤ p, q ≤ ∞, be such that 1 p + 1 q = 1. Then Set q = 1, p = ∞. If the positivity of the term 1 or the reverse inequality in (27) holds for all s ∈ [α, β], then Let c ∈ [a, b] ⊆ [α, β] be arbitrarily chosen. Then and we have the following result.
5 Some applications

Applications to Csiszár f -divergence
Divergences between probability distributions have been introduced to measure the difference between them. A lot of different types of divergences exist, for example, the f -divergence, Rényi divergence, Jensen-Shannon divergence, and so on (see, e.g., [8] and [18]). There are numerous applications of divergences in many fields: anthropology and genetic, economics, ecological studies, music, signal processing, and pattern recognition. The Jensen inequality plays an important role in obtaining inequalities for divergences between probability distributions, and there are many papers dealing with inequalities for divergences and entropies (see, e.g., [7,9,12]).
In this section, we give some applications of our results, and we first introduce the basic notions.
Csiszár [3,4] defined the f -divergence functional as follows. This definition of the f -divergence functional can be generalized for a function f : I → R, I ⊂ R, where p i q i ∈ I for i = 1, . . . , n, as follows (see also [7]).

Definition 2
Let I ⊂ R be an interval, and let f : I → R be a function. Let p := (p 1 , . . . , p n ) ∈ R n and q := (q 1 , . . . , q n ) ∈ 0, ∞ n be such that Then let Now we apply Theorem 2 toD f (p, q) and get the following result.
The second one corresponds to the relative entropy or Kullback-Leibler divergence between two probability distributions.

Applications to Zipf-Mandelbrot law
The forthcoming results deal with the so called Zipf-Mandelbrot law. George Kingsley Zipf (1902-1950) was a linguist who investigated the frequencies of different words in the text. The Zipf law is one of the basic laws in information science, bibliometrics, and linguistics (see [5]). In certain fields, like economics and econometrics, this distribution is known as Pareto's law. There it analyzes the distribution of the wealthiest members of the community (see [5], p. 125). Though, in the mathematical sense, these two laws are the same; the only difference is that they are applied in different contexts (see [6], p. 294). The same kind of distribution can be also found in other scientific disciplines, such as physics, biology, earth and planetary sciences, computer science, demography, and social sciences. For more information, we refer [15].
When t = 0, then the Zipf-Mandelbrot law becomes the Zipf law. Now, we will apply our results for distributions given in Theorem 3 to the Zipf-Mandelbrot law, a sort of discrete probability distributions.
id·f (p 1 , p 2 ) - where id is the identity function, and Although it is a particular case of the result just given, here we also present a result for the Shannon entropy.  (1)-(5). Furthermore, let p, q ∈ R, 1 ≤ p, q ≤ ∞, be such that 1 p + 1 q = 1. Then