- Research
- Open access
- Published:
Generalized Jensen and Jensen–Mercer inequalities for strongly convex functions with applications
Journal of Inequalities and Applications volume 2024, Article number: 112 (2024)
Abstract
Strongly convex functions as a subclass of convex functions, still equipped with stronger properties, are employed through several generalizations and improvements of the Jensen inequality and the Jensen–Mercer inequality. This paper additionally provides applications of obtained main results in the form of new estimates for so-called strong f-divergences: the concept of the Csiszár f-divergence for strongly convex functions f, together with particular cases (Kullback–Leibler divergence, \(\chi ^{2}\)-divergence, Hellinger divergence, Bhattacharya distance, Jeffreys distance, and Jensen–Shannon divergence.) Furthermore, new estimates for the Shannon entropy are obtained, and new Chebyshev-type inequalities are derived.
1 Introduction
One of the extended approaches to convexity developed in the last century includes strongly convex functions as a subclass of convex functions (see [20] and for more recent contributions, [10, 11, 18]).
Let us recall that a function \(f\colon [a,b]\subseteq \mathbb{R}\rightarrow \mathbb{R} \) is strongly convex with modulus \(c>0\) if
for all \(x,y\in \lbrack a,b]\) and \(\lambda \in \lbrack 0,1]\).
A function f that satisfies (1.1) with \(c=0\), i.e.,
is convex in the usual sense. Obviously, strong convexity implies convexity, but the reverse implication is not true in general. For example, a linear function is convex but is not strongly convex.
Comparing with convex functions, the strongly convex ones possess stronger versions of the analogous properties. One of their useful characterizations is given in the following lemma (see [23, p. 268], [11, 20], and the references therein).
Lemma 1
A function \(f\colon [a,b]\rightarrow \mathbb{R} \) is strongly convex with modulus \(c>0\) iff the function \(g\colon \lbrack a,b]\rightarrow \mathbb{R} \) defined by \(g(x)=f(x)-cx^{2}\) is convex.
We further use the well-known theorem proved by Stolz [19, p. 25].
Theorem 1
(Stolz)
Let \(f\colon [a,b]\rightarrow \mathbb{R} \) be a convex function. Then f is continuous on \((a,b)\) and has finite left and right derivatives at each point of \((a,b)\). Both \(f_{-}^{\prime}\) and \(f_{+}^{\prime}\) are nondecreasing on \((a,b)\). Moreover, for all \(x,y\in (a,b)\), \(x< y\), we have
Strongly convex functions are accompanied by the corresponding Jensen inequality, which was proved in [20].
Theorem 2
Let a function \(f\colon (a,b)\rightarrow \mathbb{R} \) be strongly convex with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) with \(\bar{x}=\frac{1}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\). Then
It is easily seen that for \(c=0\), inequality (1.3) becomes the Jensen inequality for convex functions:
Inequality (1.3) provides a better upper bound for \(f\left ( \bar{x}\right ) \) because of the nonnegativity of the term \(\frac{c}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}(x_{i}-\bar{x})^{2}\). Thus (1.3) is an improvement of (1.4) and is considered as its stronger variant.
Another Jensen-type inequality was established by Mercer [17]. Given a convex function \(f\colon (a,b)\rightarrow \mathbb{R} \) with \(m,M\in (a,b)\), \(m< M\), for \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\) and a nonnegative n-tuple \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) with \(\bar{x}=\frac{1}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\), the Jensen–Mercer inequality states that
Numerous improvements and generalizations of (1.5) have been obtained since. Here we accentuate two such results. In [15] the authors proved that for a convex function \(f\colon (a,b)\rightarrow \mathbb{R}\), \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\), where \(m,M\in (a,b)\), \(m< M\), and a nonnegative n-tuple \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\), we have the inequalities
for all \(c,d\in \lbrack m,M]\).
Furthermore, the following variant of the Jensen–Mercer inequality was proved in [18] for strongly convex functions.
Theorem 3
Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function, and let \(m,M\in (a,b)\), \(m< M\). Let \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\), and let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple such that \({\textstyle \sum \nolimits _{i=1}^{n}} a_{i}=1\) with \(\bar{x}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\). Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then
For some recent results on the Jensen–Mercer inequality, see [1–3, 9, 12–14, 16, 24].
With the aim of new improvements and elaborating the existing results, the paper is divided into five sections. In Section 1, we recall a few results needed further: some on strongly convex functions and some well-known ones, concerning convex functions. Sections 2 and 3 deal with the Jensen and Jensen–Mercer inequalities, both generalized by means of strongly convex functions. In Sect. 4, we discuss applications to Csiszár strong f-divergences introduced in [10], for which we provide new estimates and their particular types in the same manner. We also derive new estimates for the Shannon entropy. Section 5 deals with new Chebyshev-type inequalities.
2 The Jensen-type inequalities
We start this section with important properties of strongly convex functions, which are direct consequences of the characterizations given in Lemma 1 and Theorem 1.
Lemma 2
Let \(f\colon [a,b]\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Then it is continuous on \((a,b)\) and has finite left and right derivatives at each point of \((a,b)\). Both \(f_{-}^{\prime}\) and \(f_{+}^{\prime}\) are nondecreasing on \((a,b)\). Moreover, for all \(x,y\in (a,b)\), \(x< y\), we have
If f is differentiable, then \(f^{\prime}\) is strongly increasing on \((a,b)\), i.e., for all \(x,y\in (a,b)\), \(x< y\),
Proof
Let id denote the identity function, i.e., \(id(t)=t\) for all \(t\in \lbrack a,b]\). Since f is strongly convex with modulus \(c>0\), the function \(g=f-c\cdot id^{2}\) is convex. Now, as an easy consequence of Theorem 1 applied to the convex function \(g=f-c\cdot id^{2}\), we get the first part of the statement.
If f is differentiable, then \(f^{\prime}(x)=f_{-}^{\prime}(x)=f_{+}^{\prime }(x)\) and \(f^{\prime}(y)=f_{-}^{\prime}(y)=f_{+}(y)\), and (2.1) implies (2.2). □
Bearing in mind the statement of the previous lemma, for a strongly convex function \(f\colon [a,b]\rightarrow \mathbb{R} \), by \(f^{\prime}(x)\), \(x\in (a,b)\), we mean that \(f^{\prime }(x)\) is any element from the interval \(\ [f_{-}^{\prime}(x),f_{+}^{\prime }(x)]\). If f is differentiable, then \(f^{\prime}(x)=f_{-}^{\prime}(x)=f_{+}^{\prime}(x)\).
Furthermore, for a strongly convex function \(f\colon [a,b]\rightarrow \mathbb{R} \) with modulus \(c>0\), we have
for all \(x,y\in (a,b)\). This inequality is as an easy consequence of the characterization of convex functions via support lines (see [21, Theorem 1.6]) applied to the convex function \(g=f-c\cdot id^{2}\).
A generalization and an improvement of Jensen’s inequality (1.3) for strongly convex functions is included in the following theorem.
Theorem 4
Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\). Let \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) and \(\hat {x}_{i}=(1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}\), \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then
Proof
Applying the triangle inequality \(\left \vert \left \vert u\right \vert -\left \vert v\right \vert \right \vert \leq \left \vert u-v\right \vert \) to (2.3), we get
Setting \(y=\hat{x}_{i}\) and \(x=x_{i}\), \(i\in \{1,\ldots,n\}\), from (2.5) we have
Now multiplying by \(a_{i}\), summing over i, \(i=1,\ldots,n\), and then dividing by \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), we get
By the triangle inequality (\(\left \vert \sum _{i=1}^{n}a_{i}z_{i}\right \vert \leq \sum _{i=1}^{n}a_{i}\left \vert z_{i}\right \vert \)), we also have
Now combining (2.6) and (2.7), we get (2.4). □
The following corollary is a direct consequence of Theorem 4.
Corollary 1
Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Then
Proof
Setting \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (2.4) we get
Note that
Now combining (2.9) and (2.10), we get (2.8). □
Finally, in a similar manner, we get an inequality, which counterparts the Jensen inequality (1.3).
Theorem 5
Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then
Proof
With (2.3) slightly modified, for \(x=(1-\lambda _{i})\bar {x}+\lambda _{i}x_{i }\) and \(y=y_{i}\), \(i\in \{1,\ldots,n\}\), we have
Now multiplying by \(a_{i}\), summing over i, \(i=1,\ldots,n\), and then dividing by \(A_{n}>0\), we get
which is equivalent to (2.11). □
Again, a direct consequence of Theorem 5 follows by setting \(\lambda _{i}=0\) for \(i=1,\ldots,n\).
Corollary 2
Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Then
Remark 1
Our results generalize and improve the main results obtained in [7, 8], which were related to convex functions.
3 The Jensen–Mercer-type inequalities
We embark on further investigation of the Jensen–Mercer inequality (1.5). Along the way, we generalize and improve results (1.6) from [15] and (1.7) from [18].
Theorem 6
Let a function \(f\colon (a,b)\rightarrow \mathbb{R} \) be strongly convex with modulus \(c>0\), and let \(m,M\in (a,b)\), \(m< M\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Suppose \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) Then
for all \(d,e\in \lbrack m,M]\).
Proof
Let \(\lambda _{i}\in \lbrack 0,1]\), \(x_{i}\in \lbrack m,M]\), and \(y_{i}=m+M-x_{i}\), \(i\in \{1,\ldots,n\}\). Then we can write as convex combinations:
Applying (1.1) twice, we have
Further, applying (2.3), we get
which, combined with the previous inequality, implies
Furthermore, for \(x_{i},e\in \lbrack m,M]\), \(i\in \{1,\ldots,n\}\), by (2.3) we have
Using (3.3), we have
Since \(f^{\prime}\) is strongly increasing and \(x_{i}\leq M\), by (2.2) we have \(-f^{\prime}(M)\leq -f^{\prime}(x_{i})-2c(M-x_{i})\), i.e.,
Combining (3.4) and (3.5), we get
Finally, from (3.2) and (3.6) we have
Multiplying it by \(a_{i}\), summing over \(i,i=1,\ldots,n\), and then dividing by \(A_{n}>0\), we get (3.1). □
Remark 2
In particular, if we set \(A_{n}=1\) and \(d=m+M-\bar{x}\), then the first inequality in (3.1) becomes (1.7) from [18], which makes it a generalization. Furthermore, our result (3.1) improves (1.6) from [15].
As an easy consequence of the previous theorem, we get the following inequality of the Jensen–Mercer type.
Corollary 3
Let the assumptions of Theorem 6hold. Then
Proof
Choosing \(e=\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) and \(d=m+M-\bar{x}\), from (3.1) we get (3.7). □
4 Applications to strong f-divergences and the Shannon entropy
Let \(\mathcal{P}_{n}=\left \{ \mathbf{p}=(p_{1},\ldots,p_{n})\colon p_{1},\ldots,p_{n}>0,{\textstyle \sum \nolimits _{i=1}^{n}} p_{i}=1\right \} \) be the set of all complete finite discrete probability distributions. The restriction to positive distributions is only for convenience. If we take \(p_{i}=0\) for some \(i\in \left \{ 1,\ldots,n\right \} \), then in the following results, we need to interpret undefined expressions as \(f(0)=\lim _{t\rightarrow 0+}f(t)\), \(0f\left ( \frac{0}{0}\right ) =0\), and \(0f\left ( \frac{e}{0}\right ) =\lim _{\varepsilon \rightarrow 0+}f \left ( \dfrac{e}{\varepsilon}\right ) =e\lim _{t\rightarrow \infty} \frac{f(t)}{t}\), \(e>0\).
I. Csiszár [5] introduced an important class of statistical divergences by means of convex functions.
Definition 1
Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a convex function, and let \(\mathbf{p,q}\in \mathcal{P}_{n}\). The Csiszár f-divergence is defined as
It has deep and fruitful applications in various branches of science (see, e.g., [4, 22] with references therein) and is involved in the following Csiszár–Körner inequality (see [6]).
Theorem 7
Let \(\mathbf{p,q}\in \mathcal{P}_{n}\). If \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a convex function, then
Remark 3
If f is normalized, i.e., \(f(1)=0\), then from (4.2) it follows that
Two distributions q and p are very similar if \(D_{f}(\mathbf{q},\mathbf{p})\) is very close to zero.
Recently, in [10] a new concept of f-divergences was introduced: when (4.1) is defined for a strongly convex function f, it is denoted with \(\tilde{D}_{f}(\mathbf{q},\mathbf{p})\) and is referred to as strong f-divergence. Accordingly, in [10] the following improvement of the Csiszár–Körner inequality for strong f-divergences was obtained.
Theorem 8
Let \(\mathbf{p,q}\in \mathcal{P}_{n}\). If \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a strongly convex function with modulus \(c>0\), then
where \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \limits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}} \right ) ^{2}-1\).
Remark 4
Here \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \limits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}} \right ) ^{2}-1\) denotes the strong chi-squared distance obtained for the strongly convex function \(f(x)=(x-1)^{2}\) with modulus \(c=1\).
Additionally, if \(f(1)=0\), then from (4.4) we have
Inequalities (4.4) and (4.5) improve (4.2) and (4.3).
We further use the results from the previous sections to prove new estimates for strong f-divergences.
Corollary 4
Let \(\mathbf{p,q}\in \mathcal{P}_{n}\), \(r_{i}=1-\lambda _{i}\left ( 1-\frac{q_{i}}{p_{i}}\right )\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Then
In particular, we have
where \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \nolimits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}}\right ) ^{2}-1\).
If, in addition, f is normalized, then
Proof
Applying (2.4) to \(x_{i}=\frac{q_{i}}{p_{i}}\), \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\) and \(\hat{x}_{i}=(1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}=(1-\lambda _{i})+ \lambda _{i}\frac{q_{i}}{p_{i}}=1-\lambda _{i}\left ( 1- \frac{q_{i}}{p_{i}}\right ) =r_{i}\), \(i\in \{1,\ldots,n\}\), we get
which is equivalent to (4.6).
If \(\lambda _{i}=0\), \(i=1,\ldots,n\), then \(r_{i}=1\), \(i=1,\ldots,n\), and from (4.6) we get (4.7). If, in addition, f is normalized, i.e., \(f(1)=0\), then (4.7) implies (4.8). □
Corollary 5
Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\), and let \(\mathbf{p,q}\in \mathcal{P}_{n}\). Suppose \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a strongly convex function with modulus \(c>0\). Then
In particular,
If, in addition, f is normalized, then
Proof
Applying (2.11) to \(x_{i}=\frac{q_{i}}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\), we get
which is equivalent to (4.9).
Choosing \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (4.9) we get (4.10). Further, for a normalized function f, (4.10) implies (4.11). □
Corollary 6
Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Let \(\mathbf{p,q}\in \mathcal{P}_{n}\) with \(\frac{q_{i}}{p_{i}}\in \lbrack m,M]\), \(0< m< M\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then
for all \(d,e\in \lbrack m,M]\).
In particular,
If, in addition, f is normalized, then
Proof
Applying (3.1) to \(x_{i}=\frac{q_{i}}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\), we get (4.12).
In a particular case, for \(e=1\) and \(d=m+M-1\), from (4.12) we get (4.13). If, in addition, \(f(1)=0\), then (4.13) implies (4.14). □
Applying the previous corollaries to the corresponding generating strongly convex function f, we derive new estimates for some well-known divergences, which are particular cases of the strong f-divergence. Here we consider a few of the most commonly used divergences.
Example 1
The strong Kullback–Leibler divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by
where the generating function is \(f(t)=t\ln t\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{t}\), we have \(f^{\prime \prime }\geqslant \frac{1}{l}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{2l}\).
Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=t\ln t\) with \(c=\frac{1}{2l}\), we may derive new estimates for the strong Kullback–Leibler divergence \(\tilde{D}_{KL}(\mathbf{\mathbf{q},\mathbf{p}})\).
Example 2
The strong squared Hellinger divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by
where the generating function is \(f(t)=\left ( \sqrt{t}-1\right ) ^{2}\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{2\sqrt{l^{3}}}\), we have \(f^{\prime \prime}\geqslant \frac{1}{2\sqrt{l^{3}}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{4\sqrt{l^{3}}}\).
Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=\left ( \sqrt{t}-1\right ) ^{2}\) with \(c=\frac{1}{4\sqrt{l^{3}}}\), we may derive new estimates for the strong squared Hellinger divergence \(\tilde{D}_{h^{2}}(\mathbf{\mathbf{q},\mathbf{p}})\).
Example 3
The strong Bhattacharya distance of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by
where the generating function is \(f(t)=-\sqrt{t}\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{4\sqrt{l^{3}}}\), we have \(f^{\prime \prime}\geqslant \frac{1}{4\sqrt{l^{3}}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{8\sqrt{l^{3}}}\).
Applying inequalities (4.6), (4.7), (4.8), (4.9), (4.10), (4.11), and (4.12) to \(f(t)=-\sqrt{t}\) with \(c=\frac{1}{8\sqrt{l^{3}}}\), we may derive new estimates for the strong Bhattacharya distance \(\tilde{D}_{B}(\mathbf{\mathbf{q},\mathbf{p}})\).
Example 4
The strong Jeffreys distance of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by
where the generating function is \(f(t)=(t-1)\ln t\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{t+1}{t^{2}}\), we have \(f^{\prime \prime}\geqslant \frac{l+1}{l^{2}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{l+1}{2l^{2}}\).
Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=(t-1)\ln t\) with \(c=\frac{l+1}{2l^{2}}\), we may derive new estimates for the strong Jeffreys distance \(\tilde{D}_{J}(\mathbf{\mathbf{q},\mathbf{p}})\).
Example 5
The strong Jensen–Shannon divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by
where the generating function is \(f(t)=\frac{1}{2}\left ( t\ln \frac{2t}{1+t}+\ln \frac{2}{1+t}\right ) \) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{2t(1+t)}\), we have \(f^{\prime \prime}\geqslant \frac{1}{2l(1+l)}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{4l(1+l)}\).
Applying inequalities (4.6)), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=\frac{1}{2}\big ( t\ln \frac{2t}{1+t}+ \ln \frac{2}{1+t}\big ) \) with \(c=\frac{1}{4l(1+l)}\), we may derive new estimates for the strong Jensen–Shannon divergence \(\tilde{D}_{JS}(\mathbf{\mathbf{q},\mathbf{p}})\).
We now consider the Shannon entropy [25], defined for a random variable X in terms of its probability distribution p as
It quantifies the unevenness in p and satisfies the relation
Using the results from the previous sections, we obtain new estimates for the Shannon entropy.
Corollary 7
Let \(l>0\), and let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in (0,l]\). Let \(\bar{p}_{i}=n-\lambda _{i}\left ( n-\frac{1}{p_{i}}\right ) \), \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then
In particular, we have
Proof
Applying (2.4) to the function \(f(t)=-\ln t\), \(t\in (0,l]\), strongly convex with modulus \(c=\frac{1}{2l^{2}}\), and \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i}\frac{1}{p_{i}}=n\) and \(\hat{x}_{i}=(1-\lambda _{i})\bar {x}+\lambda _{i}x_{i}=(1-\lambda _{i})n+ \lambda _{i}\frac{1}{p_{i}}=n-\lambda _{i}\left ( n-\frac{1}{p_{i}} \right ) =\bar{p}_{i}\), \(i\in \{1,\ldots,n\}\), we get
which is equivalent to (4.17).
Choosing \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (4.17) we get (4.18). □
Corollary 8
Let \(l>0\), let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in (0,l]\), and let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then
In particular, we have
Proof
Applying (2.11) to the strongly convex function \(f(t)=-\ln t\), \(t\in (0,l]\), with modulus \(c=\frac{1}{2l^{2}}\), and to \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i} \frac{1}{p_{i}}=n\), we get
which is equivalent to (4.19). If we choose \(\lambda _{i}=0\), \(i=1,\ldots,n\), then (4.19) implies (4.20). □
Corollary 9
Let \(0< m< l\), let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in \lbrack m,l]\), and let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then
for all \(d,e\in \lbrack m,l]\).
In particular, we have
Proof
Applying (3.1) to the strongly convex function \(f(t)=-\ln t\), \(t\in (0,l]\), with modulus \(c=\frac{1}{2l^{2}}\) and to \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i} \frac{1}{p_{i}}=n\), we get
which is equivalent to (4.21). Choosing \(e=n\) and \(d=m+l-n\), from (4.21) we get (4.22). □
5 New bounds for the Chebyshev functional
One of the fundamental inequalities in probability is the discrete Chebyshev inequality, which we quote in the following form (see [21]).
Theorem 9
Let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), and let \(\boldsymbol{p}=(p_{1},\ldots,p_{n})\) and \(\boldsymbol{q}=(q_{1},\ldots,q_{n})\) be monotonic real n-tuples in the same direction. Then
If p and q are monotonic in the opposite direction, then we have the reverse inequality of (5.1).
We can find many papers that study the Chebyshev functional \(T(\boldsymbol{a;p,q})\) derived from the Chebyshev inequality (5.1) by subtracting its right side from its left one:
and in the normalized form as
By (5.1) we have
Using the results from Sect. 2, we obtain improvements of the Chebyshev inequality (5.1), i.e., we get new bounds for the Chebishev functional of types (5.2) and (5.3) without the assumption of monotonicity.
Corollary 10
Let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), and let \(\boldsymbol{p}=(p_{1},\ldots,p_{n})\) and \(\boldsymbol{q}=(q_{1},\ldots,q_{n})\) be real n-tuples with \(\bar{p}=\frac {1}{A_{n}}\sum _{i=1}^{n}a_{i}p_{i}\) and \(P_{n}=\sum _{i=1}^{n}p_{i}\). Then
In particular, we have
Proof
Combining inequalities (2.8) and (2.12), we have
Setting \(f^{\prime}(x_{i})=q_{i}\) and \(x_{i}=p_{i}\), \(i\in \{1,\ldots,n\}\) and using (5.6), we get (5.4).
If we set \(a_{i}=\frac{1}{n}\), \(i=1,\ldots,n\), then \(\bar{p}=\frac{1}{n}\sum _{i=1}^{n}p_{i}=\frac{P_{n}}{n}\), where \(P_{n}=\sum _{i=1}^{n}p_{i}\). Now inequality (5.5) immediately follows from (5.4). □
Data Availability
No datasets were generated or analysed during the current study.
References
Adil Khan, M., Husain, Z., Chu, Y.-M.: New estimates for Csiszár divergence and Zipf–Mandelbrot entropy via Jensen–Mercer’s inequality. Complexity 2020, 1–8 (2020)
Butt, S.I., Agarwal, P., Yousaf, S., Guirao, J.L.G.: Generalized fractal Jensen and Jensen–Mercer inequalities for harmonic convex function with applications. J. Inequal. Appl. 2022, 1 (2022)
Butt, S.I., Sayyari, Y., Agarwal, P., Nieto, J.J., Umar, M.: On some inequalities for uniformly convex mapping with estimations to normal distributions. J. Inequal. Appl. 2023, 89 (2023)
Crooks, G.E.: On measures of entropy and information. Tech. Note 009 v0.8 (2021)
Csiszár, I.: Information-type measures of difference of probability functions and indirect observations. Studia Sci. Math. Hung. 2, 299–318 (1967)
Csiszár, I., Körner, J.: Information Theory: Coding Theorem for Discrete Memoryless Systems. Academic Press, New York (1981)
Dragomir, S.S., Ionescu, N.M.: Some converse of Jensen’s inequality and applications. Rev. Anal. Numér. Théor. Approx. 23, 71–78 (1994)
Dragomir, S.S., Scarmozzino, F.P.: A Refinement of Jensen’s discrete inequality for differentiable convex functions. RGMIA Res. Rep. Collect. 5(4) (2002)
Horváth, L.: Some notes on Jensen–Mercer’s type inequalities; extensions and refinements with applications. Math. Inequal. Appl. 24(4), 1093–1111 (2021)
Ivelić Bradanović, S.: Sherman’s inequality and its converse for strongly convex functions with applications to generalized f-divergences. Turk. J. Math. 6(43), 2680–2696 (2019)
Ivelić Bradanović, S.: Improvements of Jensen’s inequality and its converse for strongly convex functions with applications to strongly f-divergences. J. Math. Anal. Appl. 2(531), 1–16 (2024)
Ivelić, S., Matković, A., Pečarić, J.: On a Jensen–Mercer operator inequality. Banach J. Math. Anal. 5(1), 19–28 (2011)
Jarad, F., Sahoo, S.K., Nisar, K.S., Treanţă, S., Emadifar, H., Botmart, T.: New stochastic fractional integral and related inequalities of Jensen–Mercer and Hermite–Hadamard–Mercer type for convex stochastic processes. J. Inequal. Appl. 2023, 51 (2023)
Khan, A.R., Rubab, F.: Mercer type variants of the Jensen–Steffensen inequality. Rocky Mt. J. Math. 52(5), 1693–1712 (2022)
Klaričić Bakula, M., Matić, M., Pečarić, J.: On some general inequalities related to Jensen’s inequality. Int. Ser. Numer. Math. 157, 233–243 (2008)
Krnić, M., Lovričević, N., Pečarić, J.: On some properties of Jensen–Mercer’s functional. J. Math. Inequal. 6(1), 125–139 (2012)
Mercer, A.M.: A variant of Jensen’s inequality. JIPAM. J. Inequal. Pure Appl. Math. 4(4), 1–2 (2003)
Moradi, H.R., Omidvar, M.E., Adil Khan, M., Nikodem, K.: Around Jensen’s inequality for strongly convex functions. Aequ. Math. 92, 25–37 (2018)
Niculescu, C.P., Persson, L.E.: Convex Functions and Their Applications. A Contemporary Approach, 2nd edn. CMS Books in Mathematics, vol. 2. Springer, New York (2018)
Nikodem, K.: On Strongly Convex Functions and Related Classes of Functions, Handbook of Functional Equations, pp. 365–405. Springer, New York (2014)
Pečarić, J., Proschan, F., Tong, Y.L.: Convex Functions, Partial Orderings and Statistical Applications. Academic Press, New York (1992)
Polyanskiy, Y., Wu, Y.: Lecture, Information Theory: From Coding to Learning. Cambridge University Press, Cambridge (2022)
Roberts, A.W., Varberg, D.E.: Convex Functions. Academic Press, New York (1973)
Sayyari, Y., Barsam, H.: Jensen–Mercer inequality for uniformly convex functions with some applications. Afr. Math. 34, 38 (2023)
Shannon, C.E., Weaver, W.: The Mathemtiatical Theory of Comnunication, Urbana. University of Illinois Press, Champaign (1949)
Funding
This research is partially supported through KK.01.1.1.02.0027, a project cofinanced by the Croatian Government and the European Union through the European Regional Development Fund – the Competitiveness and Cohesion Operational Programme.
Author information
Authors and Affiliations
Contributions
Each author contributed the same level of work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ivelić Bradanović, S., Lovričević, N. Generalized Jensen and Jensen–Mercer inequalities for strongly convex functions with applications. J Inequal Appl 2024, 112 (2024). https://doi.org/10.1186/s13660-024-03189-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-024-03189-z