# Approximations of Jensen divergence for twice differentiable functions

## Abstract

The Jensen divergence is used to measure the difference between two probability distributions. This divergence has been generalised to allow the comparison of more than two distributions. In this paper, we consider some bounds for generalised Jensen divergence of twice differentiable functions with bounded second derivatives. Evidently, these bounds provide approximations for the Jensen divergence of twice differentiable functions by the Jensen divergence of simpler functions such as the power functions and the paired entropies associated to the Harvda-Charvát functions.

MSC:26D15, 94A17.

## 1 Introduction

One of the more important applications of probability theory is finding an appropriate measure of distance (or difference) between two probability distributions [1]. A number of these divergence measures have been widely studied and applied by a number of mathematicians such as Burbea and Rao [2], Havrda and Charvát [3], Lin [4] and others.

In Burbea and Rao [2], a generalisation of Jensen divergence is considered to allow the comparison of more than two distributions. If Φ is a function defined on an interval I of the real line , the (generalised) Jensen divergence between two elements $x=\left({x}_{1},\dots ,{x}_{n}\right)$ and $y=\left({y}_{1},\dots ,{y}_{n}\right)$ in ${I}^{n}$ (where $n\ge 1$) is given by the following equation (cf. Burbea and Rao [2]):

${\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right):=\sum _{i=1}^{n}\left[\frac{1}{2}\left[\mathrm{\Phi }\left({x}_{i}\right)+\mathrm{\Phi }\left({y}_{i}\right)\right]-\mathrm{\Phi }\left(\frac{{x}_{i}+{y}_{i}}{2}\right)\right]$
(1)

for all $x,y\in {I}^{n}×{I}^{n}$. Several measures have been proposed to quantify the difference (also known as the divergence) between two (or more) probability distributions. We refer to Grosse et al. [5], Kullback and Leibler [6] and Csiszar [7] for further references.

We denote by ${S}_{n}$

${S}_{n}=\left\{\left({x}_{1},\dots ,{x}_{n}\right)\in {I}^{n},\sum _{i=1}^{n}{x}_{i}=1\right\},\phantom{\rule{1em}{0ex}}I=\left[0,1\right].$

Utilising the family of functions, for $\alpha \in {\mathbb{R}}_{+}$,

${\mathrm{\Phi }}_{\alpha }\left(t\right):=\left\{\begin{array}{cc}{\left(\alpha -1\right)}^{-1}\left({t}^{\alpha }-t\right),\hfill & \alpha \ne 1,\hfill \\ tlogt,\hfill & \alpha =1,\hfill \end{array}$

by Havrda and Charvát in [3] to introduce their entropies of degree α, Burbea and Rao [2] introduced the following family of Jensen divergences:

${\mathcal{J}}_{n,\alpha }:=\left\{\begin{array}{cc}{\left(\alpha -1\right)}^{-1}{\sum }_{i=1}^{n}\left[\frac{1}{2}\left({x}_{i}^{\alpha }+{y}_{i}^{\alpha }\right)-{\left(\frac{{x}_{i}+{y}_{i}}{2}\right)}^{\alpha }\right],\hfill & \alpha \ne 1,\hfill \\ \frac{1}{2}{\sum }_{i=1}^{n}\left[{x}_{i}log{x}_{i}+{y}_{i}log{y}_{i}-\left({x}_{i}+{y}_{i}\right)log\left(\frac{{x}_{i}+{y}_{i}}{2}\right)\right],\hfill & \alpha =1,\hfill \end{array}$

that can be defined on ${S}_{n}×{S}_{n}$ with the convention that $0log0=0$ for $\alpha \in {\mathbb{R}}_{+}$. We note that the divergence ${\mathcal{J}}_{n,1}$ is also known as the Jensen-Shannon divergence [8].

These measures have been applied in a variety of fields, for example, in information theory [9]. The Jensen divergence introduced in Burbea and Rao [2] has its applications in bioinformatics [10, 11], where it is usually utilised to compare two samples of healthy population (control) and diseased population (case) in detecting gene expression for a certain disease. We refer the readers to Dragomir [1] for the applications in other areas.

In a recent paper by Dragomir et al. [12], the authors found sharp upper and lower bounds for the Jensen divergence for various classes of functions Φ, including functions of bounded variation, absolutely continuous functions, Lipschitzian continuous functions, convex functions and differentiable functions. We recall some of these results in Section 2, which motivates the new results we obtain in this paper.

In this paper, we provide bounds for Jensen divergence of twice differentiable function Φ whose second derivative ${\mathrm{\Phi }}^{″}$ satisfies some boundedness conditions. These bounds provide approximations of the Jensen divergence ${\mathcal{J}}_{n,\mathrm{\Phi }}$ (cf. (1)) by the divergence of simpler functions such as the power functions (cf. Section 3) and the above mentioned family of Jensen divergences ${\mathcal{J}}_{n,\alpha }$ (cf. Section 4). Finally, we apply these bounds to some elementary functions in Section 5.

## 2 Definitions, notation and previous results

In this section, we provide definitions and notation that will be used in the paper. We also provide some results regarding sharp bounds for the generalised Jensen divergence as stated in Dragomir et al. [12].

### 2.1 Definitions and notation

Throughout the paper, for any real number $r>1$, we define ${r}^{\prime }$ to be its Hölder conjugate, that is, $1/r+1/{r}^{\prime }=1$.

Definition 1 (Bullen [13])

If s is an extended real number, the generalised logarithmic mean of order s of two positive numbers x and y is defined by

(2)

and ${\mathfrak{L}}^{\left[s\right]}\left(x,x\right)=x$.

This mean is homogeneous and symmetric [[13], p.385]. In particular, there is no loss in generality by assuming $0. Note also that

${\mathfrak{L}}^{\left[s\right]}\left(x,y\right)={\left({\int }_{x}^{y}{\left[\left(1-t\right)x+ty\right]}^{s}\phantom{\rule{0.2em}{0ex}}dt\right)}^{1/s}$

for $0 and $s\in \left[1,\mathrm{\infty }\right)$. This mean generalises not only logarithmic mean (when $s=-1$), which is particularly useful in distribution of electrical charge of a conductor, but also arithmetic mean (when $s=1$) and geometric mean (when $s=-2$).

We use the following notations for Lebesgue integrable functions: for any Lebesgue integrable function g on $\left[a,b\right]$, we define, for $a\le x\le y\le b$,

and for $g\in {L}_{\mathrm{\infty }}\left[a,b\right]$, we denote ${\parallel g\parallel }_{\left[x,y\right],\mathrm{\infty }}:={ess sup}_{s\in \left[x,y\right]}|g\left(s\right)|$.

We recall that a function $f:\left[a,b\right]\to \mathbb{R}$ is absolutely continuous on $\left[a,b\right]$ if and only if it is differentiable almost everywhere in $\left[a,b\right]$, the derivative ${f}^{\mathrm{\prime }}$ is Lebesgue integrable on this interval and $f\left(y\right)-f\left(x\right)={\int }_{x}^{y}{f}^{\mathrm{\prime }}\left(t\right)\phantom{\rule{0.2em}{0ex}}dt$ for any $x,y\in \left[a,b\right]$.

### 2.2 Previous results

In a recent paper by Dragomir et al. [12], the authors provide sharp upper and lower bounds for the Jensen divergence for various classes of functions Φ. Some results are stated in the following.

Theorem 2 (Dragomir et al. [12])

Assume that $\mathrm{\Phi }:\left[a,b\right]\to \mathbb{R}$ is absolutely continuous on $\left[a,b\right]$. Then we have the bounds

$\begin{array}{rl}|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)|& \le \frac{1}{2}×\left\{\begin{array}{cc}{\sum }_{i=1}^{n}|{y}_{i}-{x}_{i}|{\parallel {\mathrm{\Phi }}^{\mathrm{\prime }}\parallel }_{\left[{x}_{i},{y}_{i}\right],\mathrm{\infty }}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{\mathrm{\prime }}\in {L}_{\mathrm{\infty }}\left[a,b\right],\hfill \\ {\sum }_{i=1}^{n}|{y}_{i}-{x}_{i}{|}^{\frac{p-1}{p}}{\parallel {\mathrm{\Phi }}^{\mathrm{\prime }}\parallel }_{\left[{x}_{i},{y}_{i}\right],p}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{\mathrm{\prime }}\in {L}_{p}\left[a,b\right],p>1,\hfill \\ {\sum }_{i=1}^{n}{\parallel {\mathrm{\Phi }}^{\mathrm{\prime }}\parallel }_{\left[{x}_{i},{y}_{i}\right],1}\hfill \end{array}\\ \le \frac{1}{2}×\left\{\begin{array}{cc}{\parallel {\mathrm{\Phi }}^{\mathrm{\prime }}\parallel }_{\left[a,b\right],\mathrm{\infty }}{\sum }_{i=1}^{n}|{y}_{i}-{x}_{i}|\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{\mathrm{\prime }}\in {L}_{\mathrm{\infty }}\left[a,b\right],\hfill \\ {\parallel {\mathrm{\Phi }}^{\mathrm{\prime }}\parallel }_{\left[a,b\right],p}{\sum }_{i=1}^{n}|{y}_{i}-{x}_{i}{|}^{\frac{p-1}{p}}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{\mathrm{\prime }}\in {L}_{p}\left[a,b\right],p>1,\hfill \\ n{\parallel {\mathrm{\Phi }}^{\mathrm{\prime }}\parallel }_{\left[a,b\right],1}\hfill \end{array}\end{array}$
(3)

for any $x=\left({x}_{1},\dots ,{x}_{n}\right),y=\left({y}_{1},\dots ,{y}_{n}\right)\in {\left[a,b\right]}^{n}$.

Moreover, if the modulus of the derivative is convex, then we have the inequality

$\begin{array}{rl}|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)|& \le \frac{1}{4}\sum _{i=1}^{n}|{y}_{i}-{x}_{i}|\left[|{\mathrm{\Phi }}^{\mathrm{\prime }}\left(\frac{{x}_{i}+{y}_{i}}{2}\right)|+\frac{|{\mathrm{\Phi }}^{\mathrm{\prime }}\left({x}_{i}\right)|+|{\mathrm{\Phi }}^{\mathrm{\prime }}\left({y}_{i}\right)|}{2}\right]\\ \le \frac{1}{4}\sum _{i=1}^{n}|{y}_{i}-{x}_{i}|\left[|{\mathrm{\Phi }}^{\mathrm{\prime }}\left({x}_{i}\right)|+|{\mathrm{\Phi }}^{\mathrm{\prime }}\left({y}_{i}\right)|\right]\\ \phantom{\left(}\left(\le {\parallel {\mathrm{\Phi }}^{\mathrm{\prime }}\parallel }_{\left[a,b\right],\mathrm{\infty }}\delta \left(x,y\right)\right)\end{array}$
(4)

for any $x=\left({x}_{1},\dots ,{x}_{n}\right),y=\left({y}_{1},\dots ,{y}_{n}\right)\in {\left[a,b\right]}^{n}$, where $\delta \left(x,y\right)=\frac{1}{2}{\sum }_{i=1}^{n}|{y}_{i}-{x}_{i}|$.

The constant $1/4$ is best possible in both inequalities.

Some more assumptions for Φ lead to the following results.

Theorem 3 (Dragomir et al. [12])

Let $\mathrm{\Phi }:\left[a,b\right]\to \mathbb{R}$ be a differentiable function on the interval $\left[a,b\right]$ of real numbers .

1. (i)

If the derivative ${\mathrm{\Phi }}^{\mathrm{\prime }}$ is of bounded variation on $\left[a,b\right]$, then

$\begin{array}{rl}|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)|& \le \frac{1}{4}\sum _{i=1}^{n}|{y}_{i}-{x}_{i}||\underset{{x}_{i}}{\overset{{y}_{i}}{\bigvee }}\left({\mathrm{\Phi }}^{\mathrm{\prime }}\right)|\\ \le \frac{1}{4}\underset{a}{\overset{b}{\bigvee }}\left({\mathrm{\Phi }}^{\mathrm{\prime }}\right)\sum _{i=1}^{n}|{y}_{i}-{x}_{i}|\\ =\frac{1}{2}\underset{a}{\overset{b}{\bigvee }}\left({\mathrm{\Phi }}^{\mathrm{\prime }}\right)\delta \left(x,y\right)\end{array}$
(5)

for any $x=\left({x}_{1},\dots ,{x}_{n}\right),y=\left({y}_{1},\dots ,{y}_{n}\right)\in {\left[a,b\right]}^{n}$.

The constant $1/4$ is best possible in both inequalities (5).

1. (ii)

If the derivative ${\mathrm{\Phi }}^{\mathrm{\prime }}$ is K-Lipschitzian on $\left[a,b\right]$ with the constant $K>0$, then

$|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)|\le \frac{1}{8}K\sum _{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}=\frac{1}{2}K{\mathcal{J}}_{n,2}\left(x,y\right)$
(6)

for any $x=\left({x}_{1},\dots ,{x}_{n}\right),y=\left({y}_{1},\dots ,{y}_{n}\right)\in {\left[a,b\right]}^{n}$, where

${\mathcal{J}}_{n,2}\left(x,y\right)=\frac{1}{4}\sum _{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}.$

The constant $1/8$ is best possible in (6).

Motivated by these results, we state bounds for ${\mathcal{J}}_{n,\mathrm{\Phi }}$ for twice differentiable functions Φ with some boundedness conditions for the second derivative in the next sections.

## 3 Approximating with Jensen divergence for power functions

In this section we provide some bounds for the generalised Jensen divergence for twice differentiable function $\mathrm{\Phi }:I\subset \mathbb{R}\to \mathbb{R}$, whose second derivative ${\mathrm{\Phi }}^{″}$ is bounded above and below in the following sense:

$\gamma \le \frac{{t}^{2-p}}{p\left(p-1\right)}{\mathrm{\Phi }}^{″}\left(t\right)\le \mathrm{\Gamma }$
(7)

for some $\gamma <\mathrm{\Gamma }$ and $p\in \left(-\mathrm{\infty },0\right)\cup \left(1,\mathrm{\infty }\right)$ and all $t\in I$; and

$\delta \le \frac{{t}^{2-q}}{q\left(q-1\right)}{\mathrm{\Phi }}^{″}\left(t\right)\le \mathrm{\Delta }$
(8)

for some $\delta \le \mathrm{\Delta }$, some $q\in \left(0,1\right)$ and all $t\in I$. These conditions enable us to provide approximations of the Jensen divergence for Φ via the functions $f\left(t\right)={t}^{p}$ for $p\ne 0,1$ and $t\in {\mathbb{R}}_{+}$, i.e.

${\mathcal{J}}_{n,{\left(\cdot \right)}^{p}}=\sum _{i=1}^{n}\left[\frac{1}{2}\left({x}_{i}^{p}+{y}_{i}^{p}\right)-{\left(\frac{{x}_{i}+{y}_{i}}{2}\right)}^{p}\right].$

Lemma 4 (Dragomir et al. [12])

Let $\mathrm{\Phi }:\left[a,b\right]\to \mathbb{R}$ be a differentiable function and let the derivative ${\mathrm{\Phi }}^{\prime }$ be absolutely continuous. Then

$\begin{array}{r}|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)|\\ \phantom{\rule{1em}{0ex}}\le \left\{\begin{array}{cc}\frac{1}{8}{\parallel {\mathrm{\Phi }}^{″}\parallel }_{\left[a,b\right],\mathrm{\infty }}{\sum }_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{″}\in {L}_{\mathrm{\infty }}\left[a,b\right],\hfill \\ \frac{{\parallel {\mathrm{\Phi }}^{″}\parallel }_{\left[a,b\right],r}}{{\left({r}^{\prime }+1\right)}^{1/{r}^{\prime }}{2}^{1+1/{r}^{\prime }}}{\sum }_{i=1}^{n}{|{y}_{i}-{x}_{i}|}^{1+1/{r}^{\prime }}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{″}\in {L}_{r}\left[a,b\right],r>1.\hfill \end{array}\end{array}$
(9)

We refer to [12] for the proof of the above lemma.

Lemma 5 Let $\mathrm{\Phi }:\left[a,b\right]\to \mathbb{R}$ be a twice differentiable function and $0. If ${\mathrm{\Phi }}^{″}$ satisfies (7), then

${\parallel {\left(\mathrm{\Phi }-\frac{\gamma +\mathrm{\Gamma }}{2}{\left(\cdot \right)}^{p}\right)}^{″}\parallel }_{\left[a,b\right],\mathrm{\infty }}\le p\left(p-1\right)\frac{\mathrm{\Gamma }-\gamma }{2}max\left\{{a}^{p-2},{b}^{p-2}\right\};$
(10)

and

${\parallel {\left(\mathrm{\Phi }-\frac{\gamma +\mathrm{\Gamma }}{2}{\left(\cdot \right)}^{p}\right)}^{″}\parallel }_{\left[a,b\right],r}\le p\left(p-1\right)\frac{\mathrm{\Gamma }-\gamma }{2}{\left({\mathfrak{L}}^{\left[\left(p-2\right)r\right]}\left(a,b\right)\right)}^{p-2},\phantom{\rule{1em}{0ex}}r>1,$
(11)

where ${\mathfrak{L}}^{\left[s\right]}$ is the sth generalised logarithmic mean.

Proof Note that condition (7) is equivalent to

$\gamma p\left(p-1\right){t}^{p-2}\le {\mathrm{\Phi }}^{″}\left(t\right)\le \mathrm{\Gamma }p\left(p-1\right){t}^{p-2}$

since $p\left(p-1\right)>0$. This is also equivalent to

$|{\mathrm{\Phi }}^{″}\left(t\right)-p\left(p-1\right)\frac{\gamma +\mathrm{\Gamma }}{2}{t}^{p-2}|\le p\left(p-1\right)\frac{\mathrm{\Gamma }-\gamma }{2}{t}^{p-2}.$
(12)

We take the supremum of both sides to obtain (10). For $r>1$, we note that (12) is equivalent to

$\begin{array}{rl}{\parallel {\mathrm{\Phi }}^{″}\left(t\right)-p\left(p-1\right)\frac{\gamma +\mathrm{\Gamma }}{2}{t}^{p-2}\parallel }_{\left[a,b\right],r}& ={\left({\int }_{a}^{b}|{\mathrm{\Phi }}^{″}\left(t\right)-p\left(p-1\right)\frac{\gamma +\mathrm{\Gamma }}{2}{t}^{p-2}{|}^{r}\phantom{\rule{0.2em}{0ex}}dt\right)}^{1/r}\\ \le p\left(p-1\right)\frac{\mathrm{\Gamma }-\gamma }{2}{\left({\int }_{a}^{b}{t}^{r\left(p-2\right)}\phantom{\rule{0.2em}{0ex}}dt\right)}^{1/r}\\ =p\left(p-1\right)\frac{\mathrm{\Gamma }-\gamma }{2}{\left({\mathfrak{L}}^{\left[r\left(p-2\right)\right]}\left(a,b\right)\right)}^{p-2},\end{array}$

which proves (11). □

Theorem 6 Let $\mathrm{\Phi }:\left[a,b\right]\to \mathbb{R}$ be a twice differentiable function and $0. If ${\mathrm{\Phi }}^{″}$ satisfies (7), then

$\begin{array}{r}|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)-\frac{\gamma +\mathrm{\Gamma }}{2}{\mathcal{J}}_{n,{\left(\cdot \right)}^{p}}\left(x,y\right)|\\ \phantom{\rule{1em}{0ex}}\le \left\{\begin{array}{cc}\frac{1}{16}p\left(p-1\right)\left(\mathrm{\Gamma }-\gamma \right)max\left\{{a}^{p-2},{b}^{p-2}\right\}{\sum }_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{″}\in {L}_{\mathrm{\infty }}\left[a,b\right],\hfill \\ \frac{p\left(p-1\right)\left(\mathrm{\Gamma }-\gamma \right)}{{\left({r}^{\prime }+1\right)}^{1/{r}^{\prime }}{2}^{2+1/{r}^{\prime }}}{\mathfrak{L}}^{\left[\left(p-2\right)r\right]}\left(a,b\right){\sum }_{i=1}^{n}{|{y}_{i}-{x}_{i}|}^{1+1/{r}^{\prime }}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{″}\in {L}_{r}\left[a,b\right],r>1.\hfill \end{array}\end{array}$

Proof Since any differentiable function is absolutely continuous, we may employ Lemma 4. Combining this with Lemma 5, we have

$\begin{array}{r}|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)-\frac{\gamma +\mathrm{\Gamma }}{2}{\mathcal{J}}_{n,{\left(\cdot \right)}^{p}}\left(x,y\right)|\\ \phantom{\rule{1em}{0ex}}\le \left\{\begin{array}{c}\frac{1}{8}{\parallel {\left(\mathrm{\Phi }-\frac{\gamma +\mathrm{\Gamma }}{2}{\left(\cdot \right)}^{p}\right)}^{″}\parallel }_{\left[a,b\right],\mathrm{\infty }}{\sum }_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2},\hfill \\ \frac{1}{{\left({r}^{\prime }+1\right)}^{1/{r}^{\prime }}{2}^{1+1/{r}^{\prime }}}{\parallel {\left(\mathrm{\Phi }-\frac{\gamma +\mathrm{\Gamma }}{2}{\left(\cdot \right)}^{p}\right)}^{″}\parallel }_{\left[a,b\right],r}{\sum }_{i=1}^{n}{|{y}_{i}-{x}_{i}|}^{1+1/{r}^{\prime }},\hfill \end{array}\\ \phantom{\rule{1em}{0ex}}\le \left\{\begin{array}{c}\frac{1}{8}p\left(p-1\right)\frac{\mathrm{\Gamma }-\gamma }{2}max\left\{{a}^{p-2},{b}^{p-2}\right\}{\sum }_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2},\hfill \\ \frac{1}{{\left({r}^{\prime }+1\right)}^{1/{r}^{\prime }}{2}^{1+1/{r}^{\prime }}}p\left(p-1\right)\frac{\mathrm{\Gamma }-\gamma }{2}{\left({\mathfrak{L}}^{\left[\left(p-2\right)r\right]}\left(a,b\right)\right)}^{p-2}{\sum }_{i=1}^{n}{|{y}_{i}-{x}_{i}|}^{1+1/{r}^{\prime }},\hfill \end{array}\end{array}$

as desired. □

We omit the proofs for the next results as they follow similarly to those of Lemma 5 and Theorem 6.

Lemma 7 Let $\mathrm{\Phi }:\left[a,b\right]\to \mathbb{R}$ be a twice differentiable function and $0. If ${\mathrm{\Phi }}^{″}$ satisfies (8), then

${\parallel {\left(\mathrm{\Phi }-\frac{\delta +\mathrm{\Delta }}{2}{\left(\cdot \right)}^{q}\right)}^{″}\parallel }_{\left[a,b\right],\mathrm{\infty }}\le q\left(1-q\right)\frac{\mathrm{\Delta }-\delta }{2}max\left\{{a}^{q-2},{b}^{q-2}\right\};$
(13)

and

${\parallel {\left(\mathrm{\Phi }-\frac{\delta +\mathrm{\Delta }}{2}{\left(\cdot \right)}^{q}\right)}^{″}\parallel }_{\left[a,b\right],q}\le q\left(1-q\right)\frac{\mathrm{\Delta }-\delta }{2}{\left({\mathfrak{L}}^{\left[r\left(q-2\right)\right]}\left(a,b\right)\right)}^{p-2},\phantom{\rule{1em}{0ex}}r>1,$
(14)

where ${\mathfrak{L}}^{\left[s\right]}$ is the sth generalised logarithmic mean.

Theorem 8 Let $\mathrm{\Phi }:\left[a,b\right]\to \mathbb{R}$ be a twice differentiable function and $0. If ${\mathrm{\Phi }}^{″}$ satisfies (8), then

$\begin{array}{r}|{\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)-\frac{\delta +\mathrm{\Delta }}{2}{\mathcal{J}}_{n,{\left(\cdot \right)}^{q}}\left(x,y\right)|\\ \phantom{\rule{1em}{0ex}}\le \left\{\begin{array}{cc}\frac{1}{16}q\left(1-q\right)\left(\mathrm{\Delta }-\delta \right)max\left\{{a}^{q-2},{b}^{q-2}\right\}{\sum }_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{″}\in {L}_{\mathrm{\infty }}\left[a,b\right],\hfill \\ \frac{q\left(1-q\right)\left(\mathrm{\Delta }-\delta \right)}{{\left({r}^{\prime }+1\right)}^{1/{r}^{\prime }}{2}^{2+1/{r}^{\prime }}}{\left({\mathfrak{L}}^{\left[\left(q-2\right)r\right]}\left(a,b\right)\right)}^{p-2}{\sum }_{i=1}^{n}{|{y}_{i}-{x}_{i}|}^{1+1/{r}^{\prime }}\hfill & \mathit{\text{if}}\phantom{\rule{0.1em}{0ex}}{\mathrm{\Phi }}^{″}\in {L}_{r}\left[a,b\right],r>1.\hfill \end{array}\end{array}$

## 4 Further approximations

In this section, we present approximations for ${\mathcal{J}}_{n,\mathrm{\Phi }}$ by utilising the family of the Jensen divergence

${\mathcal{J}}_{n,\alpha }:={\left(\alpha -1\right)}^{-1}\sum _{i=1}^{n}\left[\frac{1}{2}\left({x}_{i}^{\alpha }+{y}_{i}^{\alpha }\right)-{\left(\frac{{x}_{i}+{y}_{i}}{2}\right)}^{\alpha }\right],\phantom{\rule{1em}{0ex}}\alpha \ne 1;$
(15)

and

${\mathcal{J}}_{n,1}\left(x,y\right):=\frac{1}{2}\sum _{i=1}^{n}\left[{x}_{i}log{x}_{i}+{y}_{i}log{y}_{i}-\left({x}_{i}+{y}_{i}\right)log\left(\frac{{x}_{i}+{y}_{i}}{2}\right)\right].$
(16)

Although ${\mathcal{J}}_{n,\alpha }$ is defined for $\alpha \in {\mathbb{R}}_{+}$ in [2], we may let α to be negative in (15), and for $\alpha =0$, we define

${\mathcal{J}}_{n,0}\left(x,y\right):=\sum _{i=1}^{n}\left[log\left(\frac{{x}_{i}+{y}_{i}}{2}\right)-\frac{1}{2}\left(log{x}_{i}+log{y}_{i}\right)\right].$
(17)

Theorem 9 Let $\mathrm{\Phi }:I\subset \left(0,\mathrm{\infty }\right)\to \mathbb{R}$ be a twice differentiable function on I. If ${\mathrm{\Phi }}^{″}$ satisfies (7), then

$\gamma \left(p-1\right){\mathcal{J}}_{n,p}\left(x,y\right)\le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le \mathrm{\Gamma }\left(p-1\right){\mathcal{J}}_{n,p}\left(x,y\right)\phantom{\rule{1em}{0ex}}\mathit{\text{for any}}\phantom{\rule{0.1em}{0ex}}x,y\in {I}^{n}.$
(18)

Furthermore, if ${\mathrm{\Phi }}^{″}$ satisfies (8), then

$\delta \left(q-1\right){\mathcal{J}}_{n,q}\left(x,y\right)\ge {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\ge \mathrm{\Delta }\left(q-1\right){\mathcal{J}}_{n,q}\left(x,y\right)\phantom{\rule{1em}{0ex}}\mathit{\text{for any}}\phantom{\rule{0.1em}{0ex}}x,y\in {I}^{n}.$
(19)

Proof We consider the auxiliary function ${g}_{\gamma ,p}:I\to \mathbb{R}$ defined by ${g}_{\gamma ,p}\left(t\right)=\mathrm{\Phi }\left(t\right)-\gamma {t}^{p}$, where $p\in \left(-\mathrm{\infty },0\right)\cup \left(1,\mathrm{\infty }\right)$. We observe that ${g}_{\gamma ,p}$ is twice differentiable on I and the second derivative is given by

Utilising condition (7) and since $p\left(p-1\right){t}^{p-2}>0$ for $t\in I$, we deduce that ${g}_{\gamma ,p}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)\ge 0$ for any $t\in I$ which means that ${g}_{\gamma ,p}$ is convex on I. Since for a convex function $g:I\to \mathbb{R}$ we have that ${\mathcal{J}}_{n,g}\left(x,y\right)\ge 0$, then we can write that

$\begin{array}{rl}0& \le {\mathcal{J}}_{n,{g}_{\gamma ,p}}\left(x,y\right)=\sum _{i=1}^{n}\left[\frac{{g}_{\gamma ,p}\left({x}_{i}\right)+{g}_{\gamma ,p}\left({y}_{i}\right)}{2}-{g}_{\gamma ,p}\left(\frac{{x}_{i}+{y}_{i}}{2}\right)\right]\\ =\sum _{i=1}^{n}\left[\frac{\mathrm{\Phi }\left({x}_{i}\right)+\mathrm{\Phi }\left({y}_{i}\right)}{2}-\mathrm{\Phi }\left(\frac{{x}_{i}+{y}_{i}}{2}\right)\right]-\gamma \sum _{i=1}^{n}\left[\frac{{x}_{i}^{p}+{y}_{i}^{p}}{2}-{\left(\frac{{x}_{i}+{y}_{i}}{2}\right)}^{p}\right]\\ ={\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)-\gamma \left(p-1\right){\mathcal{J}}_{n,p}\left(x,y\right),\end{array}$

and the first inequality in (18) is proved. To prove the second inequality in (18), we consider the auxiliary function ${g}_{\mathrm{\Gamma },p}\left(x,y\right):I\to \mathbb{R}$ with ${g}_{\mathrm{\Gamma },p}\left(x,y\right)=\mathrm{\Gamma }{t}^{p}-\mathrm{\Phi }\left(t\right)$, for which we perform a similar argument; and we omit the details.

Now, if $q\in \left(0,1\right)$ and if we consider the auxiliary function ${\psi }_{\delta ,q}\left(x,y\right):I\to \mathbb{R}$ with ${\psi }_{\delta ,q}\left(x,y\right)=\mathrm{\Phi }\left(t\right)-\delta {t}^{q}$, then ψ is twice differentiable and

since $q\in \left(0,1\right)$. Therefore ${\psi }_{\delta ,q}$ is concave on I, which implies that ${\mathcal{J}}_{n,{\psi }_{\delta ,q}}\left(x,y\right)\le 0$ for any $x,y\in {I}^{n}$ and, as above, we obtain

${\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le \delta \sum _{i=0}^{n}\left[\frac{{x}_{i}^{q}+{y}_{i}^{q}}{2}-{\left(\frac{{x}_{i}+{y}_{i}}{2}\right)}^{q}\right]=\delta \left(q-1\right){\mathcal{J}}_{n,q}\left(x,y\right).$

The second inequality in (19) follows by considering the auxiliary function ${\psi }_{\mathrm{\Delta },q}\left(x,y\right):I\to \mathbb{R}$ with ${\psi }_{\mathrm{\Delta },q}\left(x,y\right)=\mathrm{\Delta }{t}^{q}-{\mathrm{\Phi }}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)$, and we omit the details. This completes the proof. □

Theorem 10 Let $\mathrm{\Phi }:I\subset \left(0,\mathrm{\infty }\right)\to \mathbb{R}$ be a twice differentiable function on I. If there exist the constants $\omega <\mathrm{\Omega }$ such that

$\omega \le {t}^{2}{\mathrm{\Phi }}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)\le \mathrm{\Omega }\phantom{\rule{1em}{0ex}}\mathit{\text{for any}}\phantom{\rule{0.1em}{0ex}}t\in I,$
(20)

then we have the bounds

$\omega {\mathcal{J}}_{n,0}\left(x,y\right)\le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le \mathrm{\Omega }{\mathcal{J}}_{n,0}\left(x,y\right)\phantom{\rule{1em}{0ex}}\mathit{\text{for any}}\phantom{\rule{0.1em}{0ex}}x,y\in {I}^{n}.$
(21)

If there exist the constants $\lambda <\mathrm{\Lambda }$ such that

$\lambda \le t{\mathrm{\Phi }}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)\le \mathrm{\Lambda }\phantom{\rule{1em}{0ex}}\mathit{\text{for any}}\phantom{\rule{0.1em}{0ex}}t\in I,$
(22)

then we have the bounds

$\lambda {\mathcal{J}}_{n,1}\left(x,y\right)\le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le \mathrm{\Lambda }{\mathcal{J}}_{n,1}\left(x,y\right)\phantom{\rule{1em}{0ex}}\mathit{\text{for any}}\phantom{\rule{0.1em}{0ex}}x,y\in {I}^{n}.$
(23)

Proof Consider the auxiliary function ${g}_{\omega ,0}:I\to \mathbb{R}$ with ${g}_{\omega ,0}\left(t\right)=\mathrm{\Phi }\left(t\right)+\omega logt$. We observe that ${g}_{\omega ,0}$ is twice differentiable, and by (20) we have ${g}_{\omega ,0}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)={t}^{-2}\left({t}^{2}{\mathrm{\Phi }}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)-\omega \right)\ge 0$ for any $t\in I$, then we can conclude that ${g}_{\omega ,0}$ is a convex function on I. Therefore we have ${\mathcal{J}}_{n,{g}_{\omega ,0}}\left(x,y\right)\ge 0$ for any $x,y\in {I}^{n}$, which implies that

$\begin{array}{rl}0& \le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)+\omega \sum _{i=1}^{n}\left[\frac{log{x}_{i}+log{y}_{i}}{2}-log\left(\frac{{x}_{i}+{y}_{i}}{2}\right)\right]\\ ={\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)-\omega {\mathcal{J}}_{n,0}\left(x,y\right)\end{array}$

and the first inequality in (21) is proved. Now, consider the auxiliary function ${g}_{\mathrm{\Omega },0}:I\to \mathbb{R}$ with ${g}_{\mathrm{\Omega },0}\left(t\right)=-\mathrm{\Omega }logt-\mathrm{\Phi }\left(t\right)$. Then ${g}_{\mathrm{\Omega },0}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)={t}^{-2}\left(\mathrm{\Omega }-{t}^{2}{\mathrm{\Phi }}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)\right)$ for any $t\in I$; and by (20) it is a convex function on I. By similar arguments, we deduce the second inequality in (21).

To prove the second part of the theorem, consider the auxiliary function ${g}_{\lambda ,1}:I\to \mathbb{R}$, ${g}_{\lambda ,1}\left(t\right)=\mathrm{\Phi }\left(t\right)-\lambda tlogt$. We observe that ${g}_{\lambda ,1}$ is twice differentiable and ${g}_{\lambda ,1}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right):={\mathrm{\Phi }}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)-\frac{1}{t}\lambda$, for $t\in I$. Since by (22) we have ${g}_{\lambda ,1}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)={t}^{-1}\left(t{\mathrm{\Phi }}^{\mathrm{\prime }\mathrm{\prime }}\left(t\right)-\lambda \right)\ge 0$ for all $t\in I$, then we can conclude that ${g}_{\lambda ,1}^{\mathrm{\prime }\mathrm{\prime }}$ is a convex function on I. The proof now follows along the lines outlined above and the first part of (23) is proved. The second part of (23) also follows by employing the auxiliary function ${g}_{\mathrm{\Lambda },1}:I\to \mathbb{R}$, ${g}_{\mathrm{\Lambda },1}\left(t\right)=\mathrm{\Lambda }tlogt-\mathrm{\Phi }\left(t\right)$; and this completes the proof. □

## 5 Applications to some elementary functions

We consider the approximations mentioned in Section 4 for some elementary functions.

We consider the function $\mathrm{\Phi }\left(t\right)={e}^{-t}$ for $t\in \left[a,b\right]\subset \left[0,1\right]$ and have the following bounds for all $x,y\in {\left[a,b\right]}^{n}$:

$\begin{array}{r}{a}^{2}{e}^{-a}{\mathcal{J}}_{n,0}\left(x,y\right)\le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le {b}^{2}{e}^{-b}{\mathcal{J}}_{n,0}\left(x,y\right),\\ a{e}^{-a}{\mathcal{J}}_{n,1}\left(x,y\right)\le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le b{e}^{-b}{\mathcal{J}}_{n,1}\left(x,y\right),\\ \frac{1}{2}{e}^{-b}{\mathcal{J}}_{n,2}\left(x,y\right)\le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le \frac{1}{2}{e}^{-a}{\mathcal{J}}_{n,2}\left(x,y\right).\end{array}$

In what follows, we apply these bounds to the above function on the interval $\left[0.1,1\right]$, where $x=\left(0.2,0.25,0.3,\dots ,1\right)$ and $y=\left(1,\dots ,1\right)$ (cf. Figure 1).

Discussion In this example, the best lower approximation (amongst the three) is given by $\frac{1}{2}{e}^{-1}{\mathcal{J}}_{n,2}\left(x,y\right)$, and the best upper approximation is given by ${e}^{-1}{\mathcal{J}}_{n,1}\left(x,y\right)$, where $x=\left(0.2,0.25,0.3,\dots ,1\right)$ and $y=\left(1,\dots ,1\right)$. However, it remains an open question whether this is true in general.

We consider the Havrda-Charvát function

For $\alpha =1$, we have the following bounds for all $x,y\in {\left[a,b\right]}^{n}$:

We have the following bounds for all $x,y\in {\left[a,b\right]}^{n}$:

In Figure 2, we apply these bounds to the above function on the interval $\left[0.1,1\right]$, where $x=\left(0.2,0.201,0.202,\dots ,1\right)$, $y=\left(1,\dots ,1\right)$, $\alpha =3/2$.

We also have, for all $x,y\in {\left[a,b\right]}^{n}$,

$\gamma \left(p-1\right){\mathcal{J}}_{n,p}\left(x,y\right)\le {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\le \mathrm{\Gamma }\left(p-1\right){\mathcal{J}}_{n,p}\left(x,y\right)$

for $p\in \left(-\mathrm{\infty },0\right)\cup \left(1,\mathrm{\infty }\right)$, where

$\mathrm{\Gamma }=\alpha \frac{{b}^{\alpha -p}}{p\left(p-1\right)}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\gamma =\alpha \frac{{a}^{\alpha -p}}{p\left(p-1\right)}$

for $\alpha \ge p$ and $\left[a,b\right]\subset \left[0,\mathrm{\infty }\right)$.

In Figure 3, we apply these bounds to the above function on the interval $\left[0.1,1\right]$, where $x=\left(0.2,0.201,0.202,\dots ,1\right)$, $y=\left(1,\dots ,1\right)$, $\alpha =3$ and $p=3/2,2$.

Similarly, we have, for all $x,y\in {\left[a,b\right]}^{n}$,

$\delta \left(q-1\right){\mathcal{J}}_{n,q}\left(x,y\right)\ge {\mathcal{J}}_{n,\mathrm{\Phi }}\left(x,y\right)\ge \mathrm{\Delta }\left(q-1\right){\mathcal{J}}_{n,q}\left(x,y\right)$

for $q\in \left(0,1\right)$ and $\alpha >1$, where

$\mathrm{\Delta }=\alpha \frac{{a}^{\alpha -q}}{q\left(q-1\right)}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\delta =\alpha \frac{{b}^{\alpha -q}}{q\left(q-1\right)}$

for $\left[a,b\right]\subset \left[0,\mathrm{\infty }\right)$. In Figure 4, we apply these bounds to the above function on the interval $\left[0.1,1\right]$, where $x=\left(0.2,0.201,0.202,\dots ,1\right)$, $y=\left(1,\dots ,1\right)$, $q=1/2$ and $\alpha =3$.

Discussion In this example, the best lower approximation (amongst the five) is given by $2{\left(0.1\right)}^{3/2}{\mathcal{J}}_{n,3/2}$, and the best upper approximation is given by $\left(3/2\right){\mathcal{J}}_{n,1}\left(x,y\right)$, where $x=\left(0.2,0.201,0.202,\dots ,1\right)$, $y=\left(1,\dots ,1\right)$. However, it remains an open question whether this is true in general.

## References

1. Dragomir SS: Some reverses of the Jensen inequality with applications. RGMIA Research Report Collection (Online) 2011., 14: Article ID v14a72

2. Burbea J, Rao CR: On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 1982, 28(3):489–495. 10.1109/TIT.1982.1056497

3. Havrda ME, Charvát F: Quantification method of classification processes: concept of structural α -entropy. Kybernetika 1967, 3: 30–35.

4. Lin J: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37(1):145–151. 10.1109/18.61115

5. Grosse I, Bernaola-Galvan P, Carpena P, Roman-Roldan R, Oliver J, Stanley HE: Analysis of symbolic sequences using the Jensen-Shannon divergence. Phys. Rev. E, Stat. Nonlinear Soft Matter Phys. 2002., 65(4): Article ID 041905. doi:10.1103/PhysRevE.65.041905

6. Kullback S, Leibler RA: On information and sufficiency. Ann. Math. Stat. 1951, 22: 79–86. 10.1214/aoms/1177729694

7. Csiszar I: Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hung. 1967, 2: 299–318.

8. Shannon CE: A mathematical theory of communications. Bell Syst. Tech. J. 1948, 27: 379–423. 623–565

9. Menendez ML, Pardo JA, Pardo L: Some statistical applications of generalized Jensen difference divergence measures for fuzzy information systems. Fuzzy Sets Syst. 1992, 52: 169–180. 10.1016/0165-0114(92)90047-8

10. Arvey AJ, Azad RK, Raval A, Lawrence JG: Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Res. 2009, 37(16):5255–5266. 10.1093/nar/gkp576

11. Gómez RM, Rosso OA, Berretta R, Moscato P: Uncovering molecular biomarkers that correlate cognitive decline with the changes of Hippocampus’ gene expression profiles in Alzheimer’s disease. PLoS ONE 2010., 5(4): Article ID e10153. doi:10.1371/journal.pone.0010153

12. Dragomir SS, Dragomir NM, Sherwell D: Sharp bounds for the Jensen divergence with applications. RGMIA Research Report Collection (Online) 2011., 14: Article ID v14a47

13. Bullen PS Mathematics and Its Applications 560. In Handbook of Means and Their Inequalities. Kluwer Academic, Dordrecht; 2003.

## Author information

Authors

### Corresponding author

Correspondence to Eder Kikianty.

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

EK, SSD, ITD and DS contributed equally in all stages of writing the paper. All authors read and approved the final manuscript.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

Reprints and Permissions

Kikianty, E., Dragomir, S.S., Dintoe, I.T. et al. Approximations of Jensen divergence for twice differentiable functions. J Inequal Appl 2013, 267 (2013). https://doi.org/10.1186/1029-242X-2013-267