The idea of the Shannon entropy is the focal point of data hypothesis, now and then alluded to as the measure of uncertainty. The entropy of a random variable is characterized regarding its probability distribution and can be shown to be a decent measure of randomness or uncertainty. The Shannon entropy permits one to evaluate the normal least number of bits expected to encode a series of images dependent on the letters in order size and the recurrence of the symbols.
Divergences between probability distributions have become familiar with a measure of the difference between them. A variety of sorts of divergences exist, for instance the f-difference (particularly, the Kullback–Leibler divergence, the Hellinger distance and the total variation distance), the Rényi divergence, the Jensen–Shannon divergence, and so forth (see [13, 21]). There are a lot of papers managing inequalities and entropies, see, e.g., [8, 10, 20] and the references therein. The Jensen inequality assumes a crucial role in a part of these inequalities. In any case, Jensen’s inequality deals with one sort of information focus and Levinson’s inequality manages two type information points.
The Zipf law is one of the central laws in data science, and it has been utilized in linguistics. Zipf in 1932 found that we can tally how frequently each word shows up in the content. So on the off chance that we rank (r) word as per the recurrence of word event \((f)\), at that point the result of these two numbers is a steady \((C): C = r \times f\). Aside from the utilization of this law in data science and linguistics, the Zipf law is utilized in city population, sun powered flare power, site traffic, earthquack magnitude, the span of moon pits, and so forth. In financial aspects this distribution is known as the Pareto law, which analyzes the distribution of the wealthiest individuals in the community [6, p. 125]. These two laws are equivalent in the mathematical sense, yet they are involved in different contexts [7, p. 294].
Csiszár divergence
In [4, 5] Csiszár gave the following definition.
Definition 1
Let f be a convex function from \(\mathbb{R}^{+}\) to \(\mathbb{R}^{+}\). Let \(\tilde{\mathbf{r}}, \tilde{\mathbf{k}} \in \mathbb{R}_{+}^{n}\) be such that \(\sum_{s=1}^{n}r_{s}=1\) and \(\sum_{s=1}^{n}q_{s}=1\). Then the f-divergence functional is defined by
$$\begin{aligned} I_{f}(\tilde{\mathbf{r}}, \tilde{\mathbf{k}}) := \sum_{s=1}^{n}q_{s}f \biggl( \frac{r_{s}}{q_{s}} \biggr). \end{aligned}$$
By defining the following:
$$\begin{aligned} f(0) := \lim_{x \rightarrow 0^{+}}f(x); \qquad 0f \biggl( \frac{0}{0} \biggr):=0; \qquad 0f \biggl(\frac{a}{0} \biggr):= \lim_{x \rightarrow 0^{+}}xf \biggl(\frac{a}{0} \biggr), \quad a>0, \end{aligned}$$
he stated that nonnegative probability distributions can also be used.
Using the definition of f-divergence functional, Horv́ath et al. [9] gave the following functional.
Definition 2
Let I be an interval contained in \(\mathbb{R}\) and \(f: I \rightarrow \mathbb{R}\) be a function. Also let \(\tilde{\mathbf{r}}=(r_{1}, \ldots , r_{n})\in \mathbb{R}^{n}\) and \(\tilde{\mathbf{k}}=(k_{1}, \ldots , k_{n})\in (0, \infty )^{n}\) be such that
$$\begin{aligned} \frac{r_{s}}{k_{s}} \in I, \quad s= 1, \ldots , n. \end{aligned}$$
Then
$$\begin{aligned} \hat{I}_{f}(\tilde{\mathbf{r}}, \tilde{\mathbf{k}}) : = \sum _{s=1} ^{n}k_{s}f \biggl( \frac{r_{s}}{k_{s}} \biggr). \end{aligned}$$
Theorem 6
Let
\(\tilde{\mathbf{r}}= (r_{1}, \ldots , r_{n} )\), \(\tilde{\mathbf{k}}= (k_{1}, \ldots , k_{n} )\)
be in
\((0, \infty )^{n}\)
and
\(\tilde{\mathbf{w}}= (w_{1}, \ldots , w _{m} )\), \(\tilde{\mathbf{t}}= (t_{1}, \ldots , t_{m} )\)
are in
\((0, \infty )^{m}\)
such that
$$\begin{aligned} \frac{r_{s}}{k_{s}} \in I, \quad s = 1, \ldots , n, \end{aligned}$$
and
$$\begin{aligned} \frac{w_{u}}{t_{u}} \in I, \quad u = 1, \ldots , m. \end{aligned}$$
If
$$\begin{aligned}& \Biggl[\frac{1}{\sum_{u=1}^{m}t_{u}}\sum_{u=1}^{m} \frac{(w_{u})^{2}}{t _{u}}- \Biggl(\sum_{u=1}^{m} \frac{w_{u}}{\sum_{u=1}^{m}t_{u}} \Biggr)^{2} -\frac{1}{\sum_{v=1}^{n}k_{v}}\sum _{v=1}^{n}\frac{(r_{v})^{2}}{k_{v}} \\& \quad {}+ \Biggl(\sum_{v=1}^{n} \frac{r_{v}}{\sum_{v=1}^{n}k_{v}} \Biggr)^{2} \Biggr]f^{(2)}(\zeta _{2})\geq 0, \end{aligned}$$
(35)
then the following are equivalent.
-
(i)
For every continuous 3-convex function
\(f: I \rightarrow \mathbb{R}\),
$$ J_{\hat{f}}(r, w, k, t)\geq 0. $$
(36)
-
(ii)
$$ J_{G_{k}}(r, w, k, t) \geq 0, $$
(37)
where
$$\begin{aligned} J_{\hat{f}}(r, w, k, t) =&\frac{1}{\sum_{u=1}^{m}t_{u}} \hat{I}_{f}( \tilde{\mathbf{w}}, \tilde{\mathbf{t}})- f \Biggl(\sum _{u=1}^{m}\frac{w _{u}}{\sum_{u=1}^{m}t_{u}} \Biggr) -\frac{1}{\sum_{v=1}^{n}k_{v}} \hat{I}_{f}(\tilde{\mathbf{r}}, \tilde{ \mathbf{k}}) \\ &{}+f \Biggl(\sum_{v=1}^{n} \frac{r_{v}}{\sum_{v=1}^{n}k_{v}} \Biggr). \end{aligned}$$
(38)
Proof
Using \(p_{v} = \frac{k_{v}}{\sum_{v=1}^{n}k_{v}}\), \(x_{v} = \frac{r _{v}}{k_{v}}\), \(q_{u} = \frac{t_{u}}{\sum_{u=1}^{m}t_{u}}\) and \(y_{u} = \frac{w_{u}}{t_{u}}\) in Theorem 2, we get the required results. □
Shannon entropy
Definition 3
(See [9])
The \(\mathcal{S}\)hannon entropy of a positive probability distribution \(\tilde{\mathbf{r}}=(r_{1}, \ldots , r_{n})\) is defined by
$$\begin{aligned} \mathcal{S} : = - \sum_{v=1}^{n}r_{v} \log (r_{v}). \end{aligned}$$
(39)
Corollary 6
Let
\(\tilde{\mathbf{r}}= (r_{1}, \ldots , r_{n} )\)
and
\(\tilde{\mathbf{w}}= (w_{1}, \ldots , w_{m} )\)
be probability distributions, \(\tilde{\mathbf{k}}= (k_{1}, \ldots , k_{n} )\)
be in
\((0, \infty )^{n}\)
and
\(\tilde{\mathbf{t}}= (t_{1}, \ldots , t_{m} )\)
be in
\((0, \infty )^{m}\). If
$$\begin{aligned}& \Biggl[\frac{1}{\sum_{u=1}^{m}t_{u}}\sum_{u=1}^{m} \frac{(w_{u})^{2}}{t _{u}}- \Biggl(\sum_{u=1}^{m} \frac{w_{u}}{\sum_{u=1}^{m}t_{u}} \Biggr)^{2} -\frac{1}{\sum_{v=1}^{n}k_{v}}\sum _{v=1}^{n}\frac{(r_{v})^{2}}{k_{v}} \\& \quad {}+ \Biggl(\sum_{v=1}^{n} \frac{r_{v}}{\sum_{v=1}^{n}k_{v}} \Biggr)^{2} \Biggr]\geq 0 \end{aligned}$$
(40)
and
$$ J_{G_{k}}(r, w, k, t)\leq 0, $$
(41)
then
$$ J_{s}(r, w, k, t) \leq 0, $$
(42)
where
$$\begin{aligned} J_{s}(r, w, k, t) =&\frac{1}{\sum_{u=1}^{m}t_{u}} \Biggl[\tilde{ \mathcal{S}}+\sum_{u=1}^{m}w_{u} \log (t_{u}) \Biggr]+ \Biggl[\sum_{u=1} ^{m}\frac{w_{u}}{\sum_{u=1}^{m} t_{u}}\log \Biggl(\sum _{u=1}^{m}\frac{w _{u}}{\sum_{u=1}^{m}t_{u}} \Biggr) \Biggr] \\ &{}-\frac{1}{\sum_{v=1}^{n}k_{v}} \Biggl[\mathcal{S}- \sum _{v=1}^{n}r_{v}\log (k_{v}) \Biggr] \\ &{} - \Biggl[\sum_{v=1}^{n} \frac{r_{v}}{ \sum_{v=1}^{n} k_{v}}\log \Biggl(\sum_{v=1}^{n} \frac{r_{v}}{\sum_{v=1} ^{n}k_{v}} \Biggr) \Biggr]; \end{aligned}$$
(43)
\(\mathcal{S}\)
is defined in (39) and
$$ \tilde{\mathcal{S}} : = - \sum_{u=1}^{m}w_{u} \log (w_{u}). $$
If the base of the log is less than 1, then (42) and (41) hold in reverse direction.
Proof
The function \(f(x) \mapsto -x\log (x)\) is 3-convex for a base of the log is greater than 1. So using \(f(x):= -x\log (x)\) in (35) and (36), we get the required results by Remark 1. □
Rényi divergence and entropy
The Rényi divergence and the Rényi entropy are given in [19].
Definition 4
Let \(\tilde{\mathbf{r}}, \tilde{\mathbf{q}} \in \mathbb{R}_{+}^{n}\) be such that \(\sum_{1}^{n}r_{i}=1\) and \(\sum_{1}^{n}q_{i}=1\), and let \(\delta \geq 0\), \(\delta \neq 1\).
-
(a)
The Rényi divergence of order δ is defined by
$$\begin{aligned} \mathcal{D}_{\delta }(\tilde{\mathbf{r}}, \tilde{ \mathbf{q}}) : = \frac{1}{\delta - 1} \log \Biggl(\sum _{i=1} ^{n}q_{i} \biggl( \frac{r_{i}}{q_{i}} \biggr)^{\delta } \Biggr). \end{aligned}$$
(44)
-
(b)
The Rényi entropy of order δ of \(\tilde{\mathbf{r}}\) is defined by
$$\begin{aligned} \mathcal{H}_{\delta }(\tilde{\mathbf{r}}): = \frac{1}{1 - \delta } \log \Biggl( \sum_{i=1}^{n} r_{i}^{\delta } \Biggr). \end{aligned}$$
(45)
These definitions also hold for non-negative probability distributions.
Theorem 7
Let
\(\tilde{\mathbf{r}} = (r_{1}, \ldots , r_{n})\), \(\tilde{\mathbf{k}} = (k_{1}, \ldots , k_{n}) \in \mathbb{R}_{+}^{n}\), \(\tilde{\mathbf{w}} = (w_{1}, \ldots , w_{m})\), \(\tilde{\mathbf{t}} = (t_{1}, \ldots , t_{m}) \in \mathbb{R}_{+}^{m}\)
be such that
\(\sum_{1}^{n}r_{v}=1\), \(\sum_{1}^{n}k_{v}=1\), \(\sum_{1}^{m}w_{u}=1\)
and
\(\sum_{1}^{m}t_{u}=1\).
If either
\(1 < \delta \)
and the base of the log is greater than 1 or
\(\delta \in [0, 1)\)
and the base of the log is less than 1, and if
$$\begin{aligned}& \Biggl[\sum_{u=1}^{m} \frac{(t_{u})^{2}}{w_{u}} \biggl(\frac{w_{u}}{t _{u}} \biggr)^{2\delta }- \Biggl(\sum_{u=1}^{m}t_{u} \biggl(\frac{w_{u}}{t _{u}} \biggr) ^{\delta } \Biggr)^{2} - \sum_{v=1}^{n}\frac{(k_{v})^{2}}{r _{v}} \biggl(\frac{r_{v}}{k_{v}} \biggr)^{2\delta } \\& \quad {}+ \Biggl(\sum_{v=1}^{n}k_{v} \biggl(\frac{r_{v}}{k_{v}} \biggr)^{\delta } \Biggr)^{2} \Biggr]\geq 0 \end{aligned}$$
(46)
and
$$\begin{aligned} \sum_{v=1}^{n}r_{v}G_{k} \biggl( \biggl(\frac{r_{v}}{k_{v}} \biggr)^{\delta -1}, s\biggr) -G_{k}\Biggl(\sum_{v=1}^{n}r_{v} \biggl(\frac{r_{v}}{k_{v}} \biggr)^{\delta -1}, s\Biggr) \geq& \sum _{u=1}^{m}w_{u}G_{k} \biggl( \biggl(\frac{w_{u}}{t_{u}} \biggr)^{ \delta -1}, s\biggr) \\ &{}-G_{k}\Biggl(\sum_{u=1}^{m}w_{u} \biggl(\frac{w_{u}}{t_{u}} \biggr)^{\delta -1}, s\Biggr), \end{aligned}$$
(47)
then
$$\begin{aligned} \sum_{v=1}^{n}r_{v} \log \biggl( \frac{r_{v}}{k_{v}} \biggr)-\mathcal{D}_{\delta }(\tilde{ \mathbf{r}}, \tilde{\mathbf{k}}) \geq \sum_{u=1}^{m}w_{u} \log \biggl( \frac{w_{u}}{t_{u}} \biggr)-\mathcal{D}_{\delta }(\tilde{ \mathbf{w}}, \tilde{\mathbf{t}}). \end{aligned}$$
(48)
If either
\(1 < \delta \)
and the base of the log is greater than 1 or
\(\delta \in [0, 1)\)
and the base of the log is less than 1, then (47) and (48) hold in reverse direction.
Proof
The proof is only for the case when \(\delta \in [0, 1)\) and the base of the log is greater than 1 and similarly the remaining cases are simple to prove.
Choosing \(I = (0, \infty )\)
\(f=\log \) so \(f^{(2)}(x)\) is negative and \(f^{(3)}(x)\) is positive, therefore f is 3-convex. So using \(f=\log \) and the substitutions
$$ p_{v} : = r_{v}, \qquad x_{v} : = \biggl( \frac{r_{v}}{k_{v}} \biggr)^{\delta - 1}, \quad v = 1, \ldots , n, $$
and
$$ q_{u} : = w_{u}, \qquad y_{u} : = \biggl( \frac{w_{u}}{t_{u}} \biggr)^{\delta - 1}, \quad u = 1, \ldots , m, $$
in the reverse of inequality (18) (by Remark 1), we have
$$\begin{aligned} (\delta -1)\sum_{v=1}^{n}r_{v} \log \biggl( \frac{r_{v}}{k_{v}} \biggr)- \log \Biggl(\sum _{v=1}^{n}k_{v}\biggl( \frac{r_{v}}{k_{v}}\biggr)^{\delta } \Biggr) \geq & (\delta -1)\sum _{u=1}^{m}w_{u} \log \biggl( \frac{w_{u}}{t_{u}} \biggr) \\ &{}-\log \Biggl(\sum_{u=1}^{m}t_{u} \biggl(\frac{w_{u}}{t_{u}}\biggr)^{\delta } \Biggr). \end{aligned}$$
(49)
Dividing (49) with \((\delta -1)\) and using
$$\begin{aligned}& \mathcal{D}_{\delta }(\tilde{\mathbf{r}}, \tilde{\mathbf{k}})= \frac{1}{\delta -1}\log \Biggl(\sum_{v=1}^{n}k_{v} \biggl(\frac{r _{v}}{k_{v}}\biggr)^{\delta } \Biggr), \\& \mathcal{D}_{\delta }(\tilde{\mathbf{w}}, \tilde{\mathbf{t}})= \frac{1}{\delta -1}\log \Biggl(\sum_{u=1}^{m}t_{u} \biggl(\frac{w _{u}}{t_{u}}\biggr)^{\delta } \Biggr), \end{aligned}$$
we get (48). □
Corollary 7
Let
\(\tilde{\mathbf{r}} = (r_{1}, \ldots , r_{n}) \in \mathbb{R}_{+} ^{n}\), \(\tilde{\mathbf{w}} = (w_{1}, \ldots , w_{m}) \in \mathbb{R} _{+}^{m}\)
be such that
\(\sum_{1}^{n}r_{v}=1\)
and
\(\sum_{1}^{m}w_{u}=1\).
Also let
$$\begin{aligned}& \Bigg[\sum_{u=1}^{m} \frac{1}{m^{2}w_{u}} (mw_{u} )^{2\delta }- \Biggl(\sum _{u=1}^{m}\frac{1}{m} (m w_{u} )^{\delta } \Biggr)^{2}- \sum _{v=1}^{n}\frac{1}{n^{2}r_{v}} (nr_{v} )^{2\delta } \\& \quad+ \Biggl(\sum_{v=1}^{n} \frac{1}{n} (n r_{v} )^{\delta } \Biggr)^{2} \geq 0 \end{aligned}$$
(50)
and
$$\begin{aligned} \sum_{v=1}^{n}r_{v}G_{k} \bigl((nr_{v})^{\delta -1}, s \bigr) -G_{k} \Biggl( \sum_{v=1}^{n}r_{v}(nr_{v})^{\delta -1}, s \Biggr) \geq& \sum_{u=1} ^{m}w_{u}G_{k} \bigl((mw_{u})^{\delta -1}, s \bigr) \\ &{}-G_{k} \Biggl(\sum_{u=1}^{m}w_{u}(mw_{u})^{\delta -1}, s \Biggr). \end{aligned}$$
(51)
If
\(1 < \delta \)
and the base of the log is greater than 1, then
$$ \sum_{v=1}^{n}r_{v} \log (r_{v})+\mathcal{H}_{\delta }( \tilde{\mathbf{r}})\geq \sum_{u=1}^{m}w_{u}\log (w_{u})+\mathcal{H}_{\delta }(\tilde{\mathbf{w}}). $$
(52)
The reverse inequality holds in (51) and (52) if the base of the log is less than 1.
Proof
Suppose \(\tilde{\mathbf{k}}= (\frac{\textbf{1}}{\textbf{n}}, \ldots , \frac{\textbf{1}}{\textbf{n}} )\) and \(\tilde{\mathbf{t}}= (\frac{\textbf{1}}{\textbf{m}}, \ldots , \frac{\textbf{1}}{ \textbf{m}} )\). Then from (44), we have
$$\begin{aligned} \mathcal{D}_{\delta } (\tilde{\mathbf{r}}, \tilde{\mathbf{k}}) = \frac{1}{\delta - 1} \log \Biggl(\sum_{v=1}^{n}n ^{\delta - 1}r_{v}^{\delta } \Biggr) = \log (n) + \frac{1}{\delta - 1}\log \Biggl(\sum_{v=1}^{n}r_{v}^{\delta } \Biggr) \end{aligned}$$
and
$$\begin{aligned} \mathcal{D}_{\delta } (\tilde{\mathbf{w}}, \tilde{\mathbf{t}}) = \frac{1}{\delta - 1} \log \Biggl(\sum_{u=1}^{m}m ^{\delta - 1}w_{u}^{\delta } \Biggr) = \log (m) + \frac{1}{\delta - 1}\log \Biggl(\sum_{u=1}^{m}w_{u}^{\delta } \Biggr). \end{aligned}$$
This implies
$$\begin{aligned} \mathcal{H}_{\delta }(\tilde{\mathbf{r}}) = \log (n) - \mathcal{D}_{\delta } \biggl(\tilde{\mathbf{r}}, \frac{\textbf{1}}{ \textbf{n}} \biggr) \end{aligned}$$
(53)
and
$$\begin{aligned} \mathcal{H}_{\delta }(\tilde{\mathbf{w}}) = \log (m) - \mathcal{D}_{\delta } \biggl(\tilde{\mathbf{w}}, \frac{\textbf{1}}{ \textbf{m}} \biggr). \end{aligned}$$
(54)
It follows from Theorem 7, \(\tilde{\mathbf{k}}= \frac{ \textbf{1}}{\textbf{n}}\), \(\tilde{\mathbf{t}}= \frac{\textbf{1}}{ \textbf{m}}\), (53) and (54), that
$$ \sum_{v=1}^{n}r_{v} \log (nr_{v})-\log (n)+\mathcal{H}_{ \delta }(\tilde{ \mathbf{r}})\geq \sum_{u=1}^{m}w_{u} \log (m w_{u})- \log (m)+\mathcal{H}_{\delta }(\tilde{ \mathbf{w}}). $$
(55)
After some simple calculations we get (52). □
Zipf–Mandelbrot law
In [14] the authors gave some contribution in analyzing the Zipf–Mandelbrot law which is defined as follows.
Definition 5
Zipf–Mandelbrot law is a discrete probability distribution depending on the three parameters: \(\mathcal{N} \in \{1, 2, \ldots , \}\), \(\phi \in [0, \infty )\) and \(t > 0\), and is defined by
$$\begin{aligned} f(s; \mathcal{N}, \phi , t) : = \frac{1}{(s + \phi )^{t}\mathcal{H}_{\mathcal{N}, \phi , t}}, \quad s = 1, \ldots , \mathcal{N}, \end{aligned}$$
where
$$\begin{aligned} \mathcal{H}_{\mathcal{N}, \phi , t} = \sum_{\nu =1}^{ \mathcal{N}} \frac{1}{(\nu + \phi )^{t}}. \end{aligned}$$
For all values of \(\mathcal{N}\) if the total mass of the law is taken, then, for \(0 \leq \phi \), \(1< t\), \(s \in \mathcal{N}\), the density function of the Zipf–Mandelbrot law becomes
$$\begin{aligned} f(s; \phi , t) = \frac{1}{(s + \phi )^{t}\mathcal{H}_{ \phi , t}}, \end{aligned}$$
where
$$\begin{aligned} \mathcal{H}_{\phi , t} = \sum_{\nu =1}^{\infty } \frac{1}{( \nu + \phi )^{t}}. \end{aligned}$$
For \(\phi = 0\), the Zipf–Mandelbrot law becomes the Zipf law.
Theorem 8
Let
\(\tilde{\mathbf{r}}\)
and
\(\tilde{\mathbf{w}}\)
be the Zipf–Mandelbrot laws.
If (50) and (51) hold for
\(r_{v}=\frac{1}{(v+k)^{v} {\mathcal{H}_{\mathcal{N}, k, v}}}\), \(w_{u}=\frac{1}{(u+w)^{u} {\mathcal{H}_{\mathcal{N}, w, u}}}\), and if the base of the log is greater than 1, then
$$\begin{aligned}& \sum_{v=1}^{n}\frac{1}{(v+k)^{v}{\mathcal{H}_{ \mathcal{N}, k, v}}}\log \biggl(\frac{1}{(v+k)^{v}{\mathcal{H}_{\mathcal{N}, k, v}}} \biggr)+\frac{1}{1 - \delta }\log \Biggl( \frac{1}{\mathcal{H}_{\mathcal{N}, k, v}^{\delta }} \sum_{v=1}^{n} \frac{1}{(v + k)^{\delta v}} \Biggr) \\& \quad \geq \sum_{u=1}^{m} \frac{1}{(u+w)^{u}{\mathcal{H}_{ \mathcal{N}, w, u}}}\log \biggl(\frac{1}{(u+w)^{u}{\mathcal{H}_{\mathcal{N}, w, u}}} \biggr) \\& \qquad {} +\frac{1}{1 - \delta } \log \Biggl(\frac{1}{\mathcal{H}_{\mathcal{N}, w, u}^{ \delta }}\sum _{u=1}^{m}\frac{1}{(u + w)^{\delta u}} \Biggr). \end{aligned}$$
(56)
The inequality is reversed in (51) and (56) if the base of the log is less than 1.
Proof
The proof is similar to Corollary 7; by using Definition 5 and the hypothesis given in the statement we get the required result. □