Skip to main content

Generalized Jensen and Jensen–Mercer inequalities for strongly convex functions with applications

Abstract

Strongly convex functions as a subclass of convex functions, still equipped with stronger properties, are employed through several generalizations and improvements of the Jensen inequality and the Jensen–Mercer inequality. This paper additionally provides applications of obtained main results in the form of new estimates for so-called strong f-divergences: the concept of the Csiszár f-divergence for strongly convex functions f, together with particular cases (Kullback–Leibler divergence, \(\chi ^{2}\)-divergence, Hellinger divergence, Bhattacharya distance, Jeffreys distance, and Jensen–Shannon divergence.) Furthermore, new estimates for the Shannon entropy are obtained, and new Chebyshev-type inequalities are derived.

1 Introduction

One of the extended approaches to convexity developed in the last century includes strongly convex functions as a subclass of convex functions (see [20] and for more recent contributions, [10, 11, 18]).

Let us recall that a function \(f\colon [a,b]\subseteq \mathbb{R}\rightarrow \mathbb{R} \) is strongly convex with modulus \(c>0\) if

$$ f(\lambda x+(1-\lambda )y)\leq \lambda f(x)+(1-\lambda )f(y)-c \lambda (1-\lambda )(x-y)^{2} $$
(1.1)

for all \(x,y\in \lbrack a,b]\) and \(\lambda \in \lbrack 0,1]\).

A function f that satisfies (1.1) with \(c=0\), i.e.,

$$ f(\lambda x+(1-\lambda )y)\leq \lambda f(x)+(1-\lambda )f(y), $$
(1.2)

is convex in the usual sense. Obviously, strong convexity implies convexity, but the reverse implication is not true in general. For example, a linear function is convex but is not strongly convex.

Comparing with convex functions, the strongly convex ones possess stronger versions of the analogous properties. One of their useful characterizations is given in the following lemma (see [23, p. 268], [11, 20], and the references therein).

Lemma 1

A function \(f\colon [a,b]\rightarrow \mathbb{R} \) is strongly convex with modulus \(c>0\) iff the function \(g\colon \lbrack a,b]\rightarrow \mathbb{R} \) defined by \(g(x)=f(x)-cx^{2}\) is convex.

We further use the well-known theorem proved by Stolz [19, p. 25].

Theorem 1

(Stolz)

Let \(f\colon [a,b]\rightarrow \mathbb{R} \) be a convex function. Then f is continuous on \((a,b)\) and has finite left and right derivatives at each point of \((a,b)\). Both \(f_{-}^{\prime}\) and \(f_{+}^{\prime}\) are nondecreasing on \((a,b)\). Moreover, for all \(x,y\in (a,b)\), \(x< y\), we have

$$ f_{-}^{\prime}(x)\leq f_{+}^{\prime}(x)\leq f_{-}^{\prime}(y)\leq f_{+}^{ \prime}(y). $$

Strongly convex functions are accompanied by the corresponding Jensen inequality, which was proved in [20].

Theorem 2

Let a function \(f\colon (a,b)\rightarrow \mathbb{R} \) be strongly convex with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) with \(\bar{x}=\frac{1}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\). Then

$$ f\left ( \bar{x}\right ) \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f \left ( x_{i}\right ) -\frac{c}{A_{n}}{\displaystyle \sum \limits _{i=1}^{n}} a_{i}(x_{i}-\bar{x})^{2}. $$
(1.3)

It is easily seen that for \(c=0\), inequality (1.3) becomes the Jensen inequality for convex functions:

$$ f\left ( \bar{x}\right ) \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f \left ( x_{i}\right ) . $$
(1.4)

Inequality (1.3) provides a better upper bound for \(f\left ( \bar{x}\right ) \) because of the nonnegativity of the term \(\frac{c}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}(x_{i}-\bar{x})^{2}\). Thus (1.3) is an improvement of (1.4) and is considered as its stronger variant.

Another Jensen-type inequality was established by Mercer [17]. Given a convex function \(f\colon (a,b)\rightarrow \mathbb{R} \) with \(m,M\in (a,b)\), \(m< M\), for \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\) and a nonnegative n-tuple \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) with \(\bar{x}=\frac{1}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\), the Jensen–Mercer inequality states that

$$ f\left ( m+M-\bar{x}\right ) \leq f(m)+f(M)-\frac{1}{A_{n}}{\displaystyle \sum _{i=1}^{n}} a_{i}f(x_{i}). $$
(1.5)

Numerous improvements and generalizations of (1.5) have been obtained since. Here we accentuate two such results. In [15] the authors proved that for a convex function \(f\colon (a,b)\rightarrow \mathbb{R}\), \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\), where \(m,M\in (a,b)\), \(m< M\), and a nonnegative n-tuple \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\), we have the inequalities

$$\begin{aligned} & f(c)+f^{\prime}(c)\left ( m+M-c-\dfrac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\right ) \\ & \hspace{0.5cm} \leq f(m)+f(M)-\dfrac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \hspace{0.5cm} \leq f(d)+f^{\prime}(m)(m-d)+f^{\prime}(M)(M-d)-\dfrac {1}{A_{n}} \sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})(x_{i}-d) \end{aligned}$$
(1.6)

for all \(c,d\in \lbrack m,M]\).

Furthermore, the following variant of the Jensen–Mercer inequality was proved in [18] for strongly convex functions.

Theorem 3

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function, and let \(m,M\in (a,b)\), \(m< M\). Let \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\), and let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple such that \({\textstyle \sum \nolimits _{i=1}^{n}} a_{i}=1\) with \(\bar{x}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\). Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} f(m+M-\bar{x}) & \leq f(m)+f(M)-\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & -c\left [ 2(M-m)^{2}\sum _{i=1}^{n}a_{i}\lambda _{i}(1-\lambda _{i})+\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}\right ] . \end{aligned}$$
(1.7)

For some recent results on the Jensen–Mercer inequality, see [13, 9, 1214, 16, 24].

With the aim of new improvements and elaborating the existing results, the paper is divided into five sections. In Section 1, we recall a few results needed further: some on strongly convex functions and some well-known ones, concerning convex functions. Sections 2 and 3 deal with the Jensen and Jensen–Mercer inequalities, both generalized by means of strongly convex functions. In Sect. 4, we discuss applications to Csiszár strong f-divergences introduced in [10], for which we provide new estimates and their particular types in the same manner. We also derive new estimates for the Shannon entropy. Section 5 deals with new Chebyshev-type inequalities.

2 The Jensen-type inequalities

We start this section with important properties of strongly convex functions, which are direct consequences of the characterizations given in Lemma 1 and Theorem 1.

Lemma 2

Let \(f\colon [a,b]\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Then it is continuous on \((a,b)\) and has finite left and right derivatives at each point of \((a,b)\). Both \(f_{-}^{\prime}\) and \(f_{+}^{\prime}\) are nondecreasing on \((a,b)\). Moreover, for all \(x,y\in (a,b)\), \(x< y\), we have

$$ f_{-}^{\prime}(x)-2cx\leq f_{+}^{\prime}(x)-2cx\leq f_{-}^{\prime}(y)-2cy \leq f_{+}^{\prime}(y)-2cy. $$
(2.1)

If f is differentiable, then \(f^{\prime}\) is strongly increasing on \((a,b)\), i.e., for all \(x,y\in (a,b)\), \(x< y\),

$$ f^{\prime}(x)+2c(y-x)\leq f^{\prime}(y). $$
(2.2)

Proof

Let id denote the identity function, i.e., \(id(t)=t\) for all \(t\in \lbrack a,b]\). Since f is strongly convex with modulus \(c>0\), the function \(g=f-c\cdot id^{2}\) is convex. Now, as an easy consequence of Theorem 1 applied to the convex function \(g=f-c\cdot id^{2}\), we get the first part of the statement.

If f is differentiable, then \(f^{\prime}(x)=f_{-}^{\prime}(x)=f_{+}^{\prime }(x)\) and \(f^{\prime}(y)=f_{-}^{\prime}(y)=f_{+}(y)\), and (2.1) implies (2.2). □

Bearing in mind the statement of the previous lemma, for a strongly convex function \(f\colon [a,b]\rightarrow \mathbb{R} \), by \(f^{\prime}(x)\), \(x\in (a,b)\), we mean that \(f^{\prime }(x)\) is any element from the interval \(\ [f_{-}^{\prime}(x),f_{+}^{\prime }(x)]\). If f is differentiable, then \(f^{\prime}(x)=f_{-}^{\prime}(x)=f_{+}^{\prime}(x)\).

Furthermore, for a strongly convex function \(f\colon [a,b]\rightarrow \mathbb{R} \) with modulus \(c>0\), we have

$$ f(x)\geq f(y)+f^{\prime}(y)(x-y)+c(x-y)^{2} $$
(2.3)

for all \(x,y\in (a,b)\). This inequality is as an easy consequence of the characterization of convex functions via support lines (see [21, Theorem 1.6]) applied to the convex function \(g=f-c\cdot id^{2}\).

A generalization and an improvement of Jensen’s inequality (1.3) for strongly convex functions is included in the following theorem.

Theorem 4

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\). Let \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) and \(\hat {x}_{i}=(1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}\), \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} 0 & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \hat{x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert - \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})\left \vert f^{ \prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}} \sum _{i=1}^{n}a_{i}f(\hat{x}_{i}) \\ & -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}( \hat{x}_{i})(\bar{x}-x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.4)

Proof

Applying the triangle inequality \(\left \vert \left \vert u\right \vert -\left \vert v\right \vert \right \vert \leq \left \vert u-v\right \vert \) to (2.3), we get

$$\begin{aligned} & \left \vert \left \vert f(x)-f(y)-c(x-y)^{2}\right \vert -\left \vert f^{\prime}(y)\right \vert \left \vert (x-y)\right \vert \right \vert \\ & \leq \left \vert f(x)-f(y)-c(x-y)^{2}-f^{\prime}(y)(x-y)\right \vert \\ & =f(x)-f(y)-c(x-y)^{2}-f^{\prime}(y)(x-y). \end{aligned}$$
(2.5)

Setting \(y=\hat{x}_{i}\) and \(x=x_{i}\), \(i\in \{1,\ldots,n\}\), from (2.5) we have

$$\begin{aligned} & \left \vert \left \vert f(x_{i})-f(\hat{x}_{i})-c(1-\lambda _{i})^{2}( \bar {x}-x_{i})^{2}\right \vert -(1-\lambda _{i})\left \vert f^{ \prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \left \vert f(x_{i})-f(\hat{x}_{i})-(1-\lambda _{i})f^{\prime}( \hat {x}_{i})(\bar{x}-x_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2} \right \vert \\ & =f(x_{i})-f(\hat{x}_{i})-(1-\lambda _{i})f^{\prime}(\hat{x}_{i})( \bar {x}-x_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$

Now multiplying by \(a_{i}\), summing over i, \(i=1,\ldots,n\), and then dividing by \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), we get

$$\begin{aligned} & \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert \left \vert f(x_{i})-f( \hat {x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert -(1- \lambda _{i})\left \vert f^{\prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \hat{x}_{i})-(1-\lambda _{i})f^{\prime}(\hat{x}_{i})(\bar{x}-x_{i})-c(1- \lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert \\ & =\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(\hat{x}_{i}) \\ & -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}( \hat{x}_{i})(\bar{x}-x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.6)

By the triangle inequality (\(\left \vert \sum _{i=1}^{n}a_{i}z_{i}\right \vert \leq \sum _{i=1}^{n}a_{i}\left \vert z_{i}\right \vert \)), we also have

$$\begin{aligned} & \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \hat {x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert - \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})\left \vert f^{\prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert \left \vert f(x_{i})-f(\hat{x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert -(1-\lambda _{i})\left \vert f^{\prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert . \end{aligned}$$
(2.7)

Now combining (2.6) and (2.7), we get (2.4). □

The following corollary is a direct consequence of Theorem 4.

Corollary 1

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Then

$$\begin{aligned} 0 & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \bar{x})-c(x_{i}-\bar{x})^{2}\right \vert -\left \vert f^{\prime }( \bar{x})\right \vert \cdot \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert x_{i}-\bar{x}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-f(\bar{x})- \frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}-\bar{x})^{2}. \end{aligned}$$
(2.8)

Proof

Setting \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (2.4) we get

$$\begin{aligned} 0 & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \bar{x})-c(\bar{x}-x_{i})^{2}\right \vert -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i} \left \vert f^{\prime}(\bar{x})\right \vert \left \vert \bar {x}-x_{i} \right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}} \sum _{i=1}^{n}a_{i}f(\bar{x}) \\ & -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(\bar{x})(\bar{x}-x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.9)

Note that

$$ \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(\bar{x})(x_{i}- \bar{x})=f^{\prime}(\bar{x})\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}- \bar{x})=0. $$
(2.10)

Now combining (2.9) and (2.10), we get (2.8). □

Finally, in a similar manner, we get an inequality, which counterparts the Jensen inequality (1.3).

Theorem 5

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f((1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}) \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}(x_{i})(x_{i}-\bar{x})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.11)

Proof

With (2.3) slightly modified, for \(x=(1-\lambda _{i})\bar {x}+\lambda _{i}x_{i }\) and \(y=y_{i}\), \(i\in \{1,\ldots,n\}\), we have

$$ f((1-\lambda _{i})\bar{x}+\lambda _{i}x_{i})-f(x_{i})\geq f^{\prime}(x_{i})(1-\lambda _{i})(\bar{x}-x_{i})+c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. $$

Now multiplying by \(a_{i}\), summing over i, \(i=1,\ldots,n\), and then dividing by \(A_{n}>0\), we get

$$\begin{aligned} & \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f((1-\lambda _{i})\bar{x}+ \lambda _{i}x_{i})-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \geq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}(x_{i})(\bar{x}-x_{i})+\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}( \bar {x}-x_{i})^{2}, \end{aligned}$$

which is equivalent to (2.11). □

Again, a direct consequence of Theorem 5 follows by setting \(\lambda _{i}=0\) for \(i=1,\ldots,n\).

Corollary 2

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Then

$$\begin{aligned} & 0\leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-f(\bar{x}) \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})x_{i}- \frac{1}{A_{n}^{2}}\sum _{i=1}^{n}a_{i}x_{i}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.12)

Remark 1

Our results generalize and improve the main results obtained in [7, 8], which were related to convex functions.

3 The Jensen–Mercer-type inequalities

We embark on further investigation of the Jensen–Mercer inequality (1.5). Along the way, we generalize and improve results (1.6) from [15] and (1.7) from [18].

Theorem 6

Let a function \(f\colon (a,b)\rightarrow \mathbb{R} \) be strongly convex with modulus \(c>0\), and let \(m,M\in (a,b)\), \(m< M\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Suppose \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) Then

$$\begin{aligned} & f(d)+f^{\prime}(d)\left ( m+M-d-\bar{x}\right ) +\frac{c}{A_{n}} \sum _{i=1}^{n}a_{i}(m+M-d-x_{i})^{2} \\ & +\frac{2c(M-m)^{2}}{A_{n}}\sum _{i=1}^{n}a_{i}\lambda _{i}(1- \lambda _{i}) \\ & \leq f(m)+f(M)-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\frac{1}{A_{n}} \sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})(x_{i}-e) \\ & -\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(M-x_{i})(3x_{i}-2e-M)+c(m-e)^{2} \end{aligned}$$
(3.1)

for all \(d,e\in \lbrack m,M]\).

Proof

Let \(\lambda _{i}\in \lbrack 0,1]\), \(x_{i}\in \lbrack m,M]\), and \(y_{i}=m+M-x_{i}\), \(i\in \{1,\ldots,n\}\). Then we can write as convex combinations:

$$\begin{aligned} x_{i} & =\lambda _{i}m+(1-\lambda _{i})M, \\ y_{i} & =(1-\lambda _{i})m+\lambda _{i}M,\text{ \ \ }i\in \{1,\ldots,n\}. \end{aligned}$$

Applying (1.1) twice, we have

$$\begin{aligned} f(m+M-x_{i}) & =f((1-\lambda _{i})m+\lambda _{i}M) \\ & \leq (1-\lambda _{i})f(m)+\lambda _{i}f(M)-c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(m)+f(M)-\lambda _{i}f(m)+\lambda _{i}f(M)-f(M)-c\lambda _{i}(1- \lambda _{i})(M-m)^{2} \\ & =f(m)+f(M)-\left [ \lambda _{i}f(m)+(1-\lambda _{i})f(M)\right ] -c \lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(m)+f(M)-f(\lambda _{i}m+(1-\lambda _{i})M)-2c\lambda _{i}(1- \lambda _{i})(M-m)^{2} \\ & =f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$

Further, applying (2.3), we get

$$ f(d)+f^{\prime}(d)(m+M-x_{i}-d)+c(m+M-x_{i}-d)^{2}\leq f(m+M-x_{i}), $$

which, combined with the previous inequality, implies

$$\begin{aligned} & f(d)+f^{\prime}(d)(m+M-x_{i}-d)+c(m+M-x_{i}-d)^{2} \\ & \leq f(m+M-x_{i}) \\ & \leq f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.2)

Furthermore, for \(x_{i},e\in \lbrack m,M]\), \(i\in \{1,\ldots,n\}\), by (2.3) we have

$$\begin{aligned} f(m)-f(e) & \leq f^{\prime}(m)(m-e)+c(m-e)^{2}, \\ f(M)-f(x_{i}) & \leq f^{\prime}(M)(M-x_{i})+c(M-x_{i})^{2}. \end{aligned}$$
(3.3)

Using (3.3), we have

$$\begin{aligned} & f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(e)+f(m)-f(e)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-x_{i}) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-f^{\prime}(M)(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.4)

Since \(f^{\prime}\) is strongly increasing and \(x_{i}\leq M\), by (2.2) we have \(-f^{\prime}(M)\leq -f^{\prime}(x_{i})-2c(M-x_{i})\), i.e.,

$$\begin{aligned} & f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-f^{\prime}(M)(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\left [ f^{\prime}(x_{i})+2c(M-x_{i})\right ] (x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.5)

Combining (3.4) and (3.5), we get

$$\begin{aligned} & f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\left [ f^{\prime}(x_{i})+2c(M-x_{i})\right ] (x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e) \\ & -f^{\prime}(x_{i})(x_{i}-e)-2c(M-x_{i})(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.6)

Finally, from (3.2) and (3.6) we have

$$\begin{aligned} & f(d)+f^{\prime}(d)(m+M-x_{i}-d)+c(m+M-x_{i}-d)^{2} \\ & \leq f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e) \\ & -f^{\prime}(x_{i})(x_{i}-e)-2c(M-x_{i})(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$

Multiplying it by \(a_{i}\), summing over \(i,i=1,\ldots,n\), and then dividing by \(A_{n}>0\), we get (3.1). □

Remark 2

In particular, if we set \(A_{n}=1\) and \(d=m+M-\bar{x}\), then the first inequality in (3.1) becomes (1.7) from [18], which makes it a generalization. Furthermore, our result (3.1) improves (1.6) from [15].

As an easy consequence of the previous theorem, we get the following inequality of the Jensen–Mercer type.

Corollary 3

Let the assumptions of Theorem 6hold. Then

$$\begin{aligned} & f(m+M-\bar{x})+\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}+\frac{2c(M-m)^{2}}{A_{n}}\sum _{i=1}^{n}a_{i}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \leq f(\bar{x})+f^{\prime}(m)(m-\bar{x})+f^{\prime}(M)(M-\bar{x})- \frac {1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})(x_{i}-\bar{x}) \\ & -\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(M-x_{i})(3x_{i}-2\bar{x}-M)+c(m- \bar {x})^{2}. \end{aligned}$$
(3.7)

Proof

Choosing \(e=\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) and \(d=m+M-\bar{x}\), from (3.1) we get (3.7). □

4 Applications to strong f-divergences and the Shannon entropy

Let \(\mathcal{P}_{n}=\left \{ \mathbf{p}=(p_{1},\ldots,p_{n})\colon p_{1},\ldots,p_{n}>0,{\textstyle \sum \nolimits _{i=1}^{n}} p_{i}=1\right \} \) be the set of all complete finite discrete probability distributions. The restriction to positive distributions is only for convenience. If we take \(p_{i}=0\) for some \(i\in \left \{ 1,\ldots,n\right \} \), then in the following results, we need to interpret undefined expressions as \(f(0)=\lim _{t\rightarrow 0+}f(t)\), \(0f\left ( \frac{0}{0}\right ) =0\), and \(0f\left ( \frac{e}{0}\right ) =\lim _{\varepsilon \rightarrow 0+}f \left ( \dfrac{e}{\varepsilon}\right ) =e\lim _{t\rightarrow \infty} \frac{f(t)}{t}\), \(e>0\).

I. Csiszár [5] introduced an important class of statistical divergences by means of convex functions.

Definition 1

Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a convex function, and let \(\mathbf{p,q}\in \mathcal{P}_{n}\). The Csiszár f-divergence is defined as

$$ D_{f}(\mathbf{q},\mathbf{p})=\sum \limits _{i=1}^{n}p_{i}f\left ( \frac{q_{i}}{p_{i}}\right ) . $$
(4.1)

It has deep and fruitful applications in various branches of science (see, e.g., [4, 22] with references therein) and is involved in the following Csiszár–Körner inequality (see [6]).

Theorem 7

Let \(\mathbf{p,q}\in \mathcal{P}_{n}\). If \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a convex function, then

$$ 0\leqslant D_{f}(\mathbf{q},\mathbf{p})-f\left ( 1\right ) . $$
(4.2)

Remark 3

If f is normalized, i.e., \(f(1)=0\), then from (4.2) it follows that

$$ 0\leqslant D_{f}(\mathbf{q},\mathbf{p})\text{ \ \ with \ \ }D_{f}(\mathbf{q},\mathbf{p})=0\text{ \ \ if and only if \ \ } \mathbf{\mathbf{q}}=\mathbf{\mathbf{p.}} $$
(4.3)

Two distributions q and p are very similar if \(D_{f}(\mathbf{q},\mathbf{p})\) is very close to zero.

Recently, in [10] a new concept of f-divergences was introduced: when (4.1) is defined for a strongly convex function f, it is denoted with \(\tilde{D}_{f}(\mathbf{q},\mathbf{p})\) and is referred to as strong f-divergence. Accordingly, in [10] the following improvement of the Csiszár–Körner inequality for strong f-divergences was obtained.

Theorem 8

Let \(\mathbf{p,q}\in \mathcal{P}_{n}\). If \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a strongly convex function with modulus \(c>0\), then

$$ 0\leqslant \tilde{D}_{f}(\mathbf{q},\mathbf{p})-f\left ( 1\right ) -c \tilde {D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p}), $$
(4.4)

where \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \limits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}} \right ) ^{2}-1\).

Remark 4

Here \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \limits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}} \right ) ^{2}-1\) denotes the strong chi-squared distance obtained for the strongly convex function \(f(x)=(x-1)^{2}\) with modulus \(c=1\).

Additionally, if \(f(1)=0\), then from (4.4) we have

$$ 0\leqslant c\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p}) \leqslant \tilde {D}_{f}(\mathbf{q},\mathbf{p}). $$
(4.5)

Inequalities (4.4) and (4.5) improve (4.2) and (4.3).

We further use the results from the previous sections to prove new estimates for strong f-divergences.

Corollary 4

Let \(\mathbf{p,q}\in \mathcal{P}_{n}\), \(r_{i}=1-\lambda _{i}\left ( 1-\frac{q_{i}}{p_{i}}\right )\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Then

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}}\right ) -f\left ( r_{i}\right ) -c(1-\lambda _{i})\left ( 1- \frac {q_{i}}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n}(1- \lambda _{i})\left \vert f^{\prime}\left ( r_{i}\right ) \right \vert \left \vert p_{i}-q_{i}\right \vert \right \vert \\ & \leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})-\sum _{i=1}^{n}p_{i}f(r_{i}) \\ & -\sum _{i=1}^{n}(1-\lambda _{i})f^{\prime}(r_{i})\left ( p_{i}-q_{i} \right ) -c\sum _{i=1}^{n}(1-\lambda _{i})^{2}\left ( p_{i}-q_{i} \right ) ^{2}. \end{aligned}$$
(4.6)

In particular, we have

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}}\right ) -f(1)-c\left ( \frac{q_{i}}{p_{i}}-1\right ) ^{2} \right \vert -\left \vert f^{\prime}(1)\right \vert \cdot \sum _{i=1}^{n} \left \vert p_{i}-q_{i}\right \vert \right \vert \\ & \leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})-f(1)-c\tilde{D}_{\chi ^{2}}(\mathbf{q},\mathbf{p}), \end{aligned}$$
(4.7)

where \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \nolimits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}}\right ) ^{2}-1\).

If, in addition, f is normalized, then

$$ 0\leq \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}} \right ) -c\left ( \frac{q_{i}}{p_{i}}-1\right ) ^{2}\right \vert \leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})-c\tilde{D}_{\chi ^{2}}(\mathbf{q}, \mathbf{p}). $$
(4.8)

Proof

Applying (2.4) to \(x_{i}=\frac{q_{i}}{p_{i}}\), \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\) and \(\hat{x}_{i}=(1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}=(1-\lambda _{i})+ \lambda _{i}\frac{q_{i}}{p_{i}}=1-\lambda _{i}\left ( 1- \frac{q_{i}}{p_{i}}\right ) =r_{i}\), \(i\in \{1,\ldots,n\}\), we get

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}}\right ) -f\left ( r_{i}\right ) -c(1-\lambda _{i})\left ( 1- \frac {q_{i}}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n}p_{i}(1- \lambda _{i})\left \vert f^{\prime}\left ( r_{i}\right ) \right \vert \left \vert 1-\frac{q_{i}}{p_{i}}\right \vert \right \vert \\ & \leq \sum _{i=1}^{n}p_{i}f\left ( \frac{q_{i}}{p_{i}}\right ) - \sum _{i=1}^{n}p_{i}f\left ( r_{i}\right ) \\ & -\sum _{i=1}^{n}p_{i}(1-\lambda _{i})f^{\prime}\left ( r_{i}\right ) \left ( 1-\frac{q_{i}}{p_{i}}\right ) -c\sum _{i=1}^{n}p_{i}(1- \lambda _{i})^{2}\left ( 1-\frac{q_{i}}{p_{i}}\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.6).

If \(\lambda _{i}=0\), \(i=1,\ldots,n\), then \(r_{i}=1\), \(i=1,\ldots,n\), and from (4.6) we get (4.7). If, in addition, f is normalized, i.e., \(f(1)=0\), then (4.7) implies (4.8). □

Corollary 5

Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\), and let \(\mathbf{p,q}\in \mathcal{P}_{n}\). Suppose \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a strongly convex function with modulus \(c>0\). Then

$$\begin{aligned} & \tilde{D}_{f}(\mathbf{q},\mathbf{p})-\sum _{i=1}^{n}p_{i}f\left ( 1- \lambda _{i}\left ( 1-\frac{q_{i}}{p_{i}}\right ) \right ) \\ & \leq \sum _{i=1}^{n}(1-\lambda _{i})f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i}\right ) -c\sum _{i=1}^{n}p_{i}(1- \lambda _{i})^{2}\left ( 1-\frac{q_{i}}{p_{i}}\right ) ^{2}. \end{aligned}$$
(4.9)

In particular,

$$ \tilde{D}_{f}(\mathbf{q},\mathbf{p})-f\left ( 1\right ) \leq \sum _{i=1}^{n}f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i} \right ) -c\tilde{D}_{\chi ^{2}}(\mathbf{q},\mathbf{p}). $$
(4.10)

If, in addition, f is normalized, then

$$ 0\leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})\leq \sum _{i=1}^{n}f^{ \prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i}\right ) -c\tilde{D}_{\chi ^{2}}(\mathbf{q},\mathbf{p}). $$
(4.11)

Proof

Applying (2.11) to \(x_{i}=\frac{q_{i}}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\), we get

$$\begin{aligned} & \sum _{i=1}^{n}p_{i}f\left ( \frac{q_{i}}{p_{i}}\right ) -\sum _{i=1}^{n}p_{i}f\left ( 1-\lambda _{i}+\lambda _{i}\frac{q_{i}}{p_{i}} \right ) \\ & \leq \sum _{i=1}^{n}p_{i}(1-\lambda _{i})f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( \frac{q_{i}}{p_{i}}-1\right ) -c\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( 1-\frac{q_{i}}{p_{i}}\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.9).

Choosing \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (4.9) we get (4.10). Further, for a normalized function f, (4.10) implies (4.11). □

Corollary 6

Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Let \(\mathbf{p,q}\in \mathcal{P}_{n}\) with \(\frac{q_{i}}{p_{i}}\in \lbrack m,M]\), \(0< m< M\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & f(d)+f^{\prime}(d)\left ( m+M-d-1\right ) +c\sum _{i=1}^{n}p_{i} \left ( m+M-d-\frac{q_{i}}{p_{i}}\right ) ^{2} \\ & +2c(M-m)^{2}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\tilde{D}_{f}(\mathbf{q},\mathbf{p}) \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\sum _{i=1}^{n}p_{i}f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( \frac{q_{i}}{p_{i}}-e\right ) \\ & -c\sum _{i=1}^{n}p_{i}\left ( M-\frac{q_{i}}{p_{i}}\right ) \left ( 3 \frac{q_{i}}{p_{i}}-2e-M\right ) +c(m-e)^{2} \end{aligned}$$
(4.12)

for all \(d,e\in \lbrack m,M]\).

In particular,

$$\begin{aligned} & f(m+M-1)+c\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})+2c(M-m)^{2}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\tilde{D}_{f}(\mathbf{q},\mathbf{p}) \\ & \leq f(1)+f^{\prime}(m)(m-1)+f^{\prime}(M)(M-1)-\sum _{i=1}^{n}f^{ \prime }\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i} \right ) \\ & -c\sum _{i=1}^{n}p_{i}\left ( M-\frac{q_{i}}{p_{i}}\right ) \left ( 3 \frac{q_{i}}{p_{i}}-2-M\right ) +c(m-1)^{2}. \end{aligned}$$
(4.13)

If, in addition, f is normalized, then

$$\begin{aligned} & f(m+M-1)+c\tilde{D}_{f}(\mathbf{q},\mathbf{p})+2c(M-m)^{2}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\tilde{D}_{f}(\mathbf{q},\mathbf{p}) \\ & \leq f^{\prime}(m)(m-1)+f^{\prime}(M)(M-1)-\sum _{i=1}^{n}f^{\prime} \left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i}\right ) \\ & -c\sum _{i=1}^{n}p_{i}\left ( M-\frac{q_{i}}{p_{i}}\right ) \left ( 3 \frac{q_{i}}{p_{i}}-2-M\right ) +c(m-1)^{2}. \end{aligned}$$
(4.14)

Proof

Applying (3.1) to \(x_{i}=\frac{q_{i}}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\), we get (4.12).

In a particular case, for \(e=1\) and \(d=m+M-1\), from (4.12) we get (4.13). If, in addition, \(f(1)=0\), then (4.13) implies (4.14). □

Applying the previous corollaries to the corresponding generating strongly convex function f, we derive new estimates for some well-known divergences, which are particular cases of the strong f-divergence. Here we consider a few of the most commonly used divergences.

Example 1

The strong Kullback–Leibler divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{KL}(\mathbf{\mathbf{q},\mathbf{p}})={\displaystyle \sum \limits _{i=1}^{n}} q_{i}\ln \left ( \frac{q_{i}}{p_{i}}\right ) , $$
(4.15)

where the generating function is \(f(t)=t\ln t\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{t}\), we have \(f^{\prime \prime }\geqslant \frac{1}{l}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{2l}\).

Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=t\ln t\) with \(c=\frac{1}{2l}\), we may derive new estimates for the strong Kullback–Leibler divergence \(\tilde{D}_{KL}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 2

The strong squared Hellinger divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{h^{2}}(\mathbf{q,p})=\sum _{i=1}^{n}(\sqrt{p_{i}}-\sqrt{q_{i}})^{2}, $$

where the generating function is \(f(t)=\left ( \sqrt{t}-1\right ) ^{2}\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{2\sqrt{l^{3}}}\), we have \(f^{\prime \prime}\geqslant \frac{1}{2\sqrt{l^{3}}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{4\sqrt{l^{3}}}\).

Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=\left ( \sqrt{t}-1\right ) ^{2}\) with \(c=\frac{1}{4\sqrt{l^{3}}}\), we may derive new estimates for the strong squared Hellinger divergence \(\tilde{D}_{h^{2}}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 3

The strong Bhattacharya distance of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{B}(\mathbf{q,p})=-{\displaystyle \sum \limits _{i=1}^{n}} \sqrt{p_{i}q_{i}}, $$

where the generating function is \(f(t)=-\sqrt{t}\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{4\sqrt{l^{3}}}\), we have \(f^{\prime \prime}\geqslant \frac{1}{4\sqrt{l^{3}}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{8\sqrt{l^{3}}}\).

Applying inequalities (4.6), (4.7), (4.8), (4.9), (4.10), (4.11), and (4.12) to \(f(t)=-\sqrt{t}\) with \(c=\frac{1}{8\sqrt{l^{3}}}\), we may derive new estimates for the strong Bhattacharya distance \(\tilde{D}_{B}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 4

The strong Jeffreys distance of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{J}(\mathbf{q,p})={\displaystyle \sum \limits _{i=1}^{n}} (q_{i}-p_{i})\ln \frac{q_{i}}{p_{i}}=\tilde{D}_{KL}(\mathbf{\mathbf{q},\mathbf{p}})+\tilde{D}_{KL}(\mathbf{p,q}), $$

where the generating function is \(f(t)=(t-1)\ln t\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{t+1}{t^{2}}\), we have \(f^{\prime \prime}\geqslant \frac{l+1}{l^{2}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{l+1}{2l^{2}}\).

Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=(t-1)\ln t\) with \(c=\frac{l+1}{2l^{2}}\), we may derive new estimates for the strong Jeffreys distance \(\tilde{D}_{J}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 5

The strong Jensen–Shannon divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$\begin{aligned} \tilde{D}_{JS}(\mathbf{q},\mathbf{p}) & =\frac{1}{2}\left [ \sum \limits _{i=1}^{n}q_{i}\ln \frac{2q_{i}}{p_{i}+q_{i}}+\sum \limits _{i=1}^{n}p_{i}\frac{2p_{i}}{p_{i}+q_{i}}\right ] \\ & =\frac{1}{2}\left [ \tilde{D}_{KL}\left ( \mathbf{\mathbf{q},}\frac{\mathbf{\mathbf{p+q}}}{2}\right ) +\tilde{D}_{KL}\left ( \mathbf{\mathbf{p},}\frac{\mathbf{\mathbf{p+q}}}{2}\right ) \right ] , \end{aligned}$$

where the generating function is \(f(t)=\frac{1}{2}\left ( t\ln \frac{2t}{1+t}+\ln \frac{2}{1+t}\right ) \) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{2t(1+t)}\), we have \(f^{\prime \prime}\geqslant \frac{1}{2l(1+l)}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{4l(1+l)}\).

Applying inequalities (4.6)), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=\frac{1}{2}\big ( t\ln \frac{2t}{1+t}+ \ln \frac{2}{1+t}\big ) \) with \(c=\frac{1}{4l(1+l)}\), we may derive new estimates for the strong Jensen–Shannon divergence \(\tilde{D}_{JS}(\mathbf{\mathbf{q},\mathbf{p}})\).

We now consider the Shannon entropy [25], defined for a random variable X in terms of its probability distribution p as

$$ S(\mathbf{p})={\displaystyle \sum \limits _{i=1}^{n}} p_{i}\ln \frac{1}{p_{i}}=-{\displaystyle \sum \limits _{i=1}^{n}} p_{i}\ln p_{i}. $$
(4.16)

It quantifies the unevenness in p and satisfies the relation

$$ 0\leqslant S(\mathbf{p})\leqslant \ln n. $$

Using the results from the previous sections, we obtain new estimates for the Shannon entropy.

Corollary 7

Let \(l>0\), and let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in (0,l]\). Let \(\bar{p}_{i}=n-\lambda _{i}\left ( n-\frac{1}{p_{i}}\right ) \), \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} S(\mathbf{p}) & \leq S(\mathbf{p})+\left \vert \sum _{i=1}^{n}p_{i} \left \vert \ln p_{i}\bar{p}_{i}-\frac{1-\lambda _{i}}{2l^{2}}\left ( n- \frac{1}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n} \frac{p_{i}}{\bar{p}_{i}}(1-\lambda _{i})\left \vert n-\frac{1}{p_{i}}\right \vert \right \vert \\ & \leq \sum _{i=1}^{n}p_{i}\ln \bar{p}_{i}+\sum _{i=1}^{n} \frac{p_{i}}{\bar {p}_{i}}(1-\lambda _{i})\left ( n-\frac{1}{p_{i}} \right ) -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( n-\frac{1}{p_{i}} \right ) ^{2}. \end{aligned}$$
(4.17)

In particular, we have

$$\begin{aligned} S(\mathbf{p}) & \leq S(\mathbf{p})+\left \vert \sum _{i=1}^{n}p_{i} \left \vert \ln p_{i}\bar{p}_{i}-\frac{1}{2l^{2}}\left ( n- \frac{1}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n} \frac{p_{i}}{\bar{p}_{i}}\left \vert n-\frac {1}{p_{i}}\right \vert \right \vert \\ & \leq \sum _{i=1}^{n}p_{i}\ln \bar{p}_{i}+\sum _{i=1}^{n} \frac{p_{i}}{\bar {p}_{i}}\left ( n-\frac{1}{p_{i}}\right ) - \frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( n-\frac{1}{p_{i}}\right ) ^{2}. \end{aligned}$$
(4.18)

Proof

Applying (2.4) to the function \(f(t)=-\ln t\), \(t\in (0,l]\), strongly convex with modulus \(c=\frac{1}{2l^{2}}\), and \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i}\frac{1}{p_{i}}=n\) and \(\hat{x}_{i}=(1-\lambda _{i})\bar {x}+\lambda _{i}x_{i}=(1-\lambda _{i})n+ \lambda _{i}\frac{1}{p_{i}}=n-\lambda _{i}\left ( n-\frac{1}{p_{i}} \right ) =\bar{p}_{i}\), \(i\in \{1,\ldots,n\}\), we get

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert -\ln \frac{1}{p_{i}}+\ln \bar{p}_{i}-\frac{1-\lambda _{i}}{2l^{2}}\left ( n- \frac{1}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n}p_{i}(1- \lambda _{i})\left \vert \frac{1}{\bar {p}_{i}}\right \vert \left \vert n-\frac{1}{p_{i}}\right \vert \right \vert \\ & \leq -\sum _{i=1}^{n}p_{i}\ln \frac{1}{p_{i}}+\sum _{i=1}^{n}p_{i} \ln \bar {p}_{i}+\sum _{i=1}^{n}\frac{p_{i}}{\bar{p}_{i}}(1-\lambda _{i}) \left ( n-\frac{1}{p_{i}}\right ) \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( n- \frac {1}{p_{i}}\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.17).

Choosing \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (4.17) we get (4.18). □

Corollary 8

Let \(l>0\), let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in (0,l]\), and let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & \sum _{i=1}^{n}p_{i}^{2}(1-\lambda _{i})\left ( \frac{1}{p_{i}}-n \right ) +\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2} \left ( \frac{1}{p_{i}}-n\right ) ^{2} \\ & +\sum _{i=1}^{n}p_{i}\ln \left ( (1-\lambda _{i})n+ \frac{\lambda _{i}}{p_{i}}\right ) \\ & \leq S(\mathbf{p}). \end{aligned}$$
(4.19)

In particular, we have

$$ \ln n+1-n\sum _{i=1}^{n}p_{i}^{2}+\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i} \left ( \frac{1}{p_{i}}-n\right ) ^{2}\leq S(\mathbf{p}). $$
(4.20)

Proof

Applying (2.11) to the strongly convex function \(f(t)=-\ln t\), \(t\in (0,l]\), with modulus \(c=\frac{1}{2l^{2}}\), and to \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i} \frac{1}{p_{i}}=n\), we get

$$\begin{aligned} & -\sum _{i=1}^{n}p_{i}\ln \frac{1}{p_{i}}+\sum _{i=1}^{n}p_{i}\ln \left ( (1-\lambda _{i})n+\frac{\lambda _{i}}{p_{i}}\right ) \\ & \leq -\sum _{i=1}^{n}p_{i}(1-\lambda _{i})\left ( \frac{1}{p_{i}} \right ) ^{-1}\left ( \frac{1}{p_{i}}-n\right ) -\frac{1}{2l^{2}} \sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( \frac{1}{p_{i}}-n\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.19). If we choose \(\lambda _{i}=0\), \(i=1,\ldots,n\), then (4.19) implies (4.20). □

Corollary 9

Let \(0< m< l\), let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in \lbrack m,l]\), and let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & \frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( m+l-d-\frac{1}{p_{i}} \right ) ^{2}-\frac{1}{d}\left ( m+l-d-n\right ) \\ & +\frac{(l-m)^{2}}{l^{2}}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i})+ \ln \frac{ml}{d} \\ & \leq S(\mathbf{p}) \\ & \leq \frac{1}{m}(e-m)+\frac{1}{l}(e-l)+\sum _{i=1}^{n}p_{i}^{2} \left ( e-\frac{1}{p_{i}}\right ) +\ln \frac{ml}{e} \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( l-\frac{1}{p_{i}} \right ) \left ( \frac{3}{p_{i}}-2e-l\right ) + \frac{(m-e)^{2}}{2l^{2}} \end{aligned}$$
(4.21)

for all \(d,e\in \lbrack m,l]\).

In particular, we have

$$\begin{aligned} & \frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( n-\frac{1}{p_{i}} \right ) ^{2}+\frac{(l-m)^{2}}{l^{2}}\sum _{i=1}^{n}\lambda _{i}(1- \lambda _{i})+\ln \frac{ml}{m+l-n} \\ & \leq S(\mathbf{p}) \\ & \leq \frac{1}{m}(n-m)+\frac{1}{l}(n-l)+\sum _{i=1}^{n}p_{i}^{2} \left ( n-\frac{1}{p_{i}}\right ) +\ln \frac{ml}{n} \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( l-\frac{1}{p_{i}} \right ) \left ( \frac{3}{p_{i}}-2n-l\right ) + \frac{(m-n)^{2}}{2l^{2}}. \end{aligned}$$
(4.22)

Proof

Applying (3.1) to the strongly convex function \(f(t)=-\ln t\), \(t\in (0,l]\), with modulus \(c=\frac{1}{2l^{2}}\) and to \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i} \frac{1}{p_{i}}=n\), we get

$$\begin{aligned} & -\ln d-\frac{1}{d}\left ( m+l-d-n\right ) +\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( m+l-d-\frac{1}{p_{i}}\right ) ^{2} \\ & +\frac{(l-m)^{2}}{l^{2}}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq -\ln m-\ln l+\sum _{i=1}^{n}p_{i}\ln \frac{1}{p_{i}} \\ & \leq -\ln e-\frac{1}{m}(m-e)-\frac{1}{l}(l-e)+\sum _{i=1}^{n}p_{i} \left ( \frac{1}{p_{i}}\right ) ^{-1}\left ( \frac{1}{p_{i}}-e\right ) \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( l-\frac{1}{p_{i}} \right ) \left ( l-\frac{3}{p_{i}}+2e\right ) + \frac{(m-e)^{2}}{2l^{2}}, \end{aligned}$$

which is equivalent to (4.21). Choosing \(e=n\) and \(d=m+l-n\), from (4.21) we get (4.22). □

5 New bounds for the Chebyshev functional

One of the fundamental inequalities in probability is the discrete Chebyshev inequality, which we quote in the following form (see [21]).

Theorem 9

Let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), and let \(\boldsymbol{p}=(p_{1},\ldots,p_{n})\) and \(\boldsymbol{q}=(q_{1},\ldots,q_{n})\) be monotonic real n-tuples in the same direction. Then

$$ \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}p_{i}q_{i}-\frac{1}{A_{n}^{2}} \sum _{i=1}^{n}a_{i}p_{i}\sum _{i=1}^{n}a_{i}q_{i}\geq 0. $$
(5.1)

If p and q are monotonic in the opposite direction, then we have the reverse inequality of (5.1).

We can find many papers that study the Chebyshev functional \(T(\boldsymbol{a;p,q})\) derived from the Chebyshev inequality (5.1) by subtracting its right side from its left one:

$$ T(\boldsymbol{a;p,q})=A_{n}\sum _{i=1}^{n}a_{i}p_{i}q_{i}-\sum _{i=1}^{n}a_{i}p_{i}\sum _{i=1}^{n}a_{i}q_{i}, $$
(5.2)

and in the normalized form as

$$ \bar{T}(\boldsymbol{p,q})=\frac{1}{n}\sum _{i=1}^{n}p_{i}q_{i}- \frac{1}{n^{2}}\sum _{i=1}^{n}p_{i}\sum _{i=1}^{n}q_{i}. $$
(5.3)

By (5.1) we have

$$ T(\boldsymbol{a;p,q})\geq 0\text{ \ \ and \ \ }\bar{T}( \boldsymbol{p,q})\geq 0. $$

Using the results from Sect. 2, we obtain improvements of the Chebyshev inequality (5.1), i.e., we get new bounds for the Chebishev functional of types (5.2) and (5.3) without the assumption of monotonicity.

Corollary 10

Let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), and let \(\boldsymbol{p}=(p_{1},\ldots,p_{n})\) and \(\boldsymbol{q}=(q_{1},\ldots,q_{n})\) be real n-tuples with \(\bar{p}=\frac {1}{A_{n}}\sum _{i=1}^{n}a_{i}p_{i}\) and \(P_{n}=\sum _{i=1}^{n}p_{i}\). Then

$$\begin{aligned} 0 & \leq \frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(p_{i}-\bar{p})^{2} \\ & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(p_{i})-f(\bar{p})-c(p_{i}-\bar{p})^{2}\right \vert -\left \vert f^{\prime}( \bar {p})\right \vert \cdot \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert (p_{i}-\bar{p})\right \vert \right \vert \\ & +\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(p_{i}-\bar{p})^{2} \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(p_{i})-f(\bar{p}) \\ & \leq T(\boldsymbol{a;p,q})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}( \bar{p}-p_{i})^{2}\leq T(\boldsymbol{a;p,q}). \end{aligned}$$
(5.4)

In particular, we have

$$\begin{aligned} 0 & \leq \frac{c}{n}\sum _{i=1}^{n}\left ( p_{i}-\frac{P_{n}}{n} \right ) ^{2} \\ & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(p_{i})-f\left ( \frac{P_{n}}{n}\right ) -c\left ( p_{i}-\frac{P_{n}}{n} \right ) ^{2}\right \vert -\left \vert f^{\prime}\left ( \frac{P_{n}}{n}\right ) \right \vert \cdot \frac{1}{n}\sum _{i=1}^{n} \left \vert \left ( p_{i}-\frac {P_{n}}{n}\right ) \right \vert \right \vert \\ & +\frac{c}{n}\sum _{i=1}^{n}\left ( p_{i}-\frac{P_{n}}{n}\right ) ^{2} \\ & \leq \frac{1}{n}\sum _{i=1}^{n}f(p_{i})-f\left ( \frac{P_{n}}{n} \right ) \\ & \leq T(\boldsymbol{p,q})-\frac{c}{n}\sum _{i=1}^{n}\left ( p_{i}- \frac {P_{n}}{n}\right ) ^{2}\leq T(\boldsymbol{p,q}). \end{aligned}$$
(5.5)

Proof

Combining inequalities (2.8) and (2.12), we have

$$\begin{aligned} 0 & \leq \frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}-\bar{x})^{2} \\ & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f(\bar{x})-c(x_{i}-\bar{x})^{2}\right \vert -\left \vert f^{\prime}( \bar {x})\right \vert \cdot \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert (x_{i}-\bar{x})\right \vert \right \vert \\ & +\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}-\bar{x})^{2} \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-f(\bar{x}) \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})x_{i}- \frac{1}{A_{n}^{2}}\sum _{i=1}^{n}a_{i}x_{i}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2} \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})x_{i}- \frac{1}{A_{n}^{2}}\sum _{i=1}^{n}a_{i}x_{i}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i}). \end{aligned}$$
(5.6)

Setting \(f^{\prime}(x_{i})=q_{i}\) and \(x_{i}=p_{i}\), \(i\in \{1,\ldots,n\}\) and using (5.6), we get (5.4).

If we set \(a_{i}=\frac{1}{n}\), \(i=1,\ldots,n\), then \(\bar{p}=\frac{1}{n}\sum _{i=1}^{n}p_{i}=\frac{P_{n}}{n}\), where \(P_{n}=\sum _{i=1}^{n}p_{i}\). Now inequality (5.5) immediately follows from (5.4). □

Data Availability

No datasets were generated or analysed during the current study.

References

  1. Adil Khan, M., Husain, Z., Chu, Y.-M.: New estimates for Csiszár divergence and Zipf–Mandelbrot entropy via Jensen–Mercer’s inequality. Complexity 2020, 1–8 (2020)

    Article  Google Scholar 

  2. Butt, S.I., Agarwal, P., Yousaf, S., Guirao, J.L.G.: Generalized fractal Jensen and Jensen–Mercer inequalities for harmonic convex function with applications. J. Inequal. Appl. 2022, 1 (2022)

    Article  MathSciNet  Google Scholar 

  3. Butt, S.I., Sayyari, Y., Agarwal, P., Nieto, J.J., Umar, M.: On some inequalities for uniformly convex mapping with estimations to normal distributions. J. Inequal. Appl. 2023, 89 (2023)

    Article  MathSciNet  Google Scholar 

  4. Crooks, G.E.: On measures of entropy and information. Tech. Note 009 v0.8 (2021)

  5. Csiszár, I.: Information-type measures of difference of probability functions and indirect observations. Studia Sci. Math. Hung. 2, 299–318 (1967)

    Google Scholar 

  6. Csiszár, I., Körner, J.: Information Theory: Coding Theorem for Discrete Memoryless Systems. Academic Press, New York (1981)

    Google Scholar 

  7. Dragomir, S.S., Ionescu, N.M.: Some converse of Jensen’s inequality and applications. Rev. Anal. Numér. Théor. Approx. 23, 71–78 (1994)

    MathSciNet  Google Scholar 

  8. Dragomir, S.S., Scarmozzino, F.P.: A Refinement of Jensen’s discrete inequality for differentiable convex functions. RGMIA Res. Rep. Collect. 5(4) (2002)

  9. Horváth, L.: Some notes on Jensen–Mercer’s type inequalities; extensions and refinements with applications. Math. Inequal. Appl. 24(4), 1093–1111 (2021)

    MathSciNet  Google Scholar 

  10. Ivelić Bradanović, S.: Sherman’s inequality and its converse for strongly convex functions with applications to generalized f-divergences. Turk. J. Math. 6(43), 2680–2696 (2019)

    Article  MathSciNet  Google Scholar 

  11. Ivelić Bradanović, S.: Improvements of Jensen’s inequality and its converse for strongly convex functions with applications to strongly f-divergences. J. Math. Anal. Appl. 2(531), 1–16 (2024)

    MathSciNet  Google Scholar 

  12. Ivelić, S., Matković, A., Pečarić, J.: On a Jensen–Mercer operator inequality. Banach J. Math. Anal. 5(1), 19–28 (2011)

    Article  MathSciNet  Google Scholar 

  13. Jarad, F., Sahoo, S.K., Nisar, K.S., Treanţă, S., Emadifar, H., Botmart, T.: New stochastic fractional integral and related inequalities of Jensen–Mercer and Hermite–Hadamard–Mercer type for convex stochastic processes. J. Inequal. Appl. 2023, 51 (2023)

    Article  MathSciNet  Google Scholar 

  14. Khan, A.R., Rubab, F.: Mercer type variants of the Jensen–Steffensen inequality. Rocky Mt. J. Math. 52(5), 1693–1712 (2022)

    Article  MathSciNet  Google Scholar 

  15. Klaričić Bakula, M., Matić, M., Pečarić, J.: On some general inequalities related to Jensen’s inequality. Int. Ser. Numer. Math. 157, 233–243 (2008)

    Article  MathSciNet  Google Scholar 

  16. Krnić, M., Lovričević, N., Pečarić, J.: On some properties of Jensen–Mercer’s functional. J. Math. Inequal. 6(1), 125–139 (2012)

    Article  MathSciNet  Google Scholar 

  17. Mercer, A.M.: A variant of Jensen’s inequality. JIPAM. J. Inequal. Pure Appl. Math. 4(4), 1–2 (2003)

    MathSciNet  Google Scholar 

  18. Moradi, H.R., Omidvar, M.E., Adil Khan, M., Nikodem, K.: Around Jensen’s inequality for strongly convex functions. Aequ. Math. 92, 25–37 (2018)

    Article  MathSciNet  Google Scholar 

  19. Niculescu, C.P., Persson, L.E.: Convex Functions and Their Applications. A Contemporary Approach, 2nd edn. CMS Books in Mathematics, vol. 2. Springer, New York (2018)

    Book  Google Scholar 

  20. Nikodem, K.: On Strongly Convex Functions and Related Classes of Functions, Handbook of Functional Equations, pp. 365–405. Springer, New York (2014)

    Google Scholar 

  21. Pečarić, J., Proschan, F., Tong, Y.L.: Convex Functions, Partial Orderings and Statistical Applications. Academic Press, New York (1992)

    Google Scholar 

  22. Polyanskiy, Y., Wu, Y.: Lecture, Information Theory: From Coding to Learning. Cambridge University Press, Cambridge (2022)

    Google Scholar 

  23. Roberts, A.W., Varberg, D.E.: Convex Functions. Academic Press, New York (1973)

    Google Scholar 

  24. Sayyari, Y., Barsam, H.: Jensen–Mercer inequality for uniformly convex functions with some applications. Afr. Math. 34, 38 (2023)

    Article  MathSciNet  Google Scholar 

  25. Shannon, C.E., Weaver, W.: The Mathemtiatical Theory of Comnunication, Urbana. University of Illinois Press, Champaign (1949)

    Google Scholar 

Download references

Funding

This research is partially supported through KK.01.1.1.02.0027, a project cofinanced by the Croatian Government and the European Union through the European Regional Development Fund – the Competitiveness and Cohesion Operational Programme.

Author information

Authors and Affiliations

Authors

Contributions

Each author contributed the same level of work.

Corresponding author

Correspondence to Neda Lovričević.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ivelić Bradanović, S., Lovričević, N. Generalized Jensen and Jensen–Mercer inequalities for strongly convex functions with applications. J Inequal Appl 2024, 112 (2024). https://doi.org/10.1186/s13660-024-03189-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-024-03189-z

Mathematics Subject Classification

Keywords