Majorization, Csiszár divergence and Zipf-Mandelbrot law
- Naveed Latif^{1}Email authorView ORCID ID profile,
- Ðilda Pečarić^{2} and
- Josip Pečarić^{3}
https://doi.org/10.1186/s13660-017-1472-2
© The Author(s) 2017
Received: 24 April 2017
Accepted: 9 August 2017
Published: 24 August 2017
Abstract
In this paper we show how the Shannon entropy is connected to the theory of majorization. They are both linked to the measure of disorder in a system. However, the theory of majorization usually gives stronger criteria than the entropic inequalities. We give some generalized results for majorization inequality using Csiszár f-divergence. This divergence, applied to some special convex functions, reduces the results for majorization inequality in the form of Shannon entropy and the Kullback-Leibler divergence. We give several applications by using the Zipf-Mandelbrot law.
Keywords
majorization inequality Csiszár f-divergence Shannon entropy Kullback-Leibler divergence Zipf-Mandelbrot lawMSC
94A15 94A17 26A51 26D151 Introduction and preliminaries
Well over a century ago measures were derived for assessing the distance between two models of probability distributions. Most relevant is Boltzmann’s [1] concept of generalized entropy in physics and thermodynamics (see Akaike [2] for a brief review). Shannon [3] employed entropy in his famous treatise on communication theory. Kullback-Leibler [4] derived an information measure that happened to be the negative of Boltzmann’s entropy, now referred to as the Kullback-Leibler (K-L) distance. The motivation for the Kullback-Leibler work was to provide a rigorous definition of information in relation to Fisher’s sufficient statistics. The K-L distance has also been called the K-L discrepancy, divergence, information and number. These terms are synonyms; we use the term ’distance’ in the material to follow.
Matić et al. [5, 8, 9] and [10] continuously worked on Shannon’s inequality and related inequalities in the probability distribution and information science. They studied and discussed in [5, 10] several aspects of Shannon’s inequality in discrete as well as in integral forms, by presenting upper estimates of the difference between its two sides. Applications to the bounds in information theory were also given.
Now we introduce the main mathematical theory explored in the presented work, the theory of majorization. It is a powerful and elegant mathematical tool which can be applied to a wide variety of problems as well as in quantum mechanics. The theory of majorization is closely related to the notions of randomness and disorder. It indeed allows us to compare two probability distributions in order to know which one is more random. Let us now give the most general definition of majorization.
The following definition is given in [11], p.319.
The following theorem, called the classical majorization theorem, is given in the monograph by Marshall et al. [12], p.11 (see also [11], p.320):
Theorem 1
Classical majorization theorem
The following theorem is a generalization of Theorem 1, known as weighted majorization theorem, and was proved by Fuchs in [13] (see also [11], p.323):
Theorem 2
Weighted majorization theorem
The following theorem is valid ([14], p.32).
Theorem 3
- (a)If y is a decreasing n-tuple, then$$ \sum_{i=1}^{n} w_{i} f (y_{i} ) \leq\sum_{i=1}^{n} w_{i} f (x_{i} ). $$(11)
- (b)If x is an increasing n-tuple, then$$ \sum_{i=1}^{n} w_{i} f (x_{i} ) \leq\sum_{i=1}^{n} w_{i} f (y_{i} ). $$(12)
It is generally common to take log with a base of 2 in the introduced notions, but in our investigations this is not essential.
In Section 2, we present our main generalized results obtained from majorization inequality by using Csiszár f-divergence and then obtain corollaries in the form of Shannon entropy and the K-L distance. In Section 3, we give several applications using the Zipf-Mandelbrot law.
2 Csiszár f-divergence for majorization
Csiszár introduced in [15] and then discussed in [16] the following notion.
Definition 1
Definition 2
Motivated by the ideas in [5] and [10], in this paper we study and discuss the majorization results in the form of divergences and entropies. The following theorem is a generalization of the result given in [5], i.e., (15).
Theorem 4
- (a)If \(\frac{\mathbf{r}}{\mathbf{q}}\) is decreasing, then$$ \hat{I}_{f} (\mathbf{r}, \mathbf{q} ) \leq \hat {I}_{f} (\mathbf{p}, \mathbf{q} ). $$(19)
- (b)
Proof
(a): We use Theorem 3(a) with substitutions \(x_{i}:=\frac {p_{i}}{q_{i}}\), \(y_{i}:=\frac{r_{i}}{q_{i}}\), \(w_{i}:= q_{i}\) and \(q_{i} >0\) (\(i=1, \ldots, n\)). Then we get (19).
We can prove part (b) with similar substitutions in Theorem 3(b). □
Theorem 5
- (a)If \(\frac{\mathbf{r}}{\mathbf{q}}\) is decreasing, then$$ \hat{I}_{g} (\mathbf{r}, \mathbf{q} ):=\sum _{i=1}^{n} r_{i} g \biggl( \frac{r_{i}}{q_{i}} \biggr) \leq \hat{I}_{g} (\mathbf{p}, \mathbf{q} ). $$(21)
- (b)
Proof
(a): We use Theorem 3(a) with substitutions \(x_{i}=\frac {p_{i}}{q_{i}}\), \(y_{i}=\frac{r_{i}}{q_{i}}\), \(w_{i}= q_{i}\) as \(q_{i} >0\) (\(i=1, \ldots, n\)), and \(f(x):= xg(x)\). Then we get (21).
We can prove part (b) with similar substitutions in Theorem 3(b) for \(f(x):= xg(x)\). □
The theory of majorization and the notion of entropic measure of disorder are closely related. Based on this fact, the aim of this paper is to look for majorization relations with the connection to entropic inequalities. This was interesting to do for two main reasons. The first one is the fact that the majorization relations are usually stronger than the entropic inequalities, in the sense that they imply these entropic inequalities, but the converse is not true. The second reason is the fact that, when we dispose of majorization relations between two different quantum states, we know that we can transform one of the states into the other using some unitary transformation. The concept of entropy alone would not allow us to prove such a property.
The Shannon entropy was introduced in the field of classical information. There are two ways of viewing the Shannon entropy. Suppose we have a random variable X, and we learn its value. In one point of view, the Shannon entropy quantifies the amount of information as regards the value of X (after measurement). In another point of view, the Shannon entropy tells us the amount of uncertainty about the variable of X before we learn its value (before measurement).
We mention two special cases of the previous result.
The first case corresponds to the entropy of a discrete probability distribution.
Definition 3
Corollary 1
- (a)If \(\frac{\mathbf{r}}{\mathbf{q}}\) is a decreasing n-tuple and the base of log is greater than 1, then the following estimates for the Shannon entropy of q hold:If the base of log is in between 0 and 1, then the reverse inequality holds in (25).$$ \sum_{i=1}^{n} q_{i} \log \biggl(\frac{r_{i}}{q_{i}} \biggr) \geq H(\mathbf{q}). $$(25)
- (b)If \(\frac{\mathbf{p}}{\mathbf{q}}\) is an increasing n-tuple and the base of log is greater than 1, then the following estimates for the Shannon entropy of q hold:If the base of log is in between 0 and 1, then the reverse inequality holds in (26).$$ H(\mathbf{q}) \leq \sum_{i=1}^{n} q_{i}\log \biggl(\frac{p_{i}}{q_{i}} \biggr). $$(26)
Proof
(a): Substitute \(f(x):= \log x\) and \(p_{i}=1\) (\(i=1, \ldots, n\)) in Theorem 4(a). Then we get (25).
We can prove the part (b) with similar substitutions for \(r_{i}=1\) (\(i=1, \ldots, n\)). □
Corollary 2
- (a)If r is a decreasing n-tuple and the base of log is greater than 1, then for the connection between the Shannon entropies of p and rIf the base of log is in between 0 and 1, then the reverse inequality holds in (27).$$ H(\mathbf{r}) \geq H(\mathbf{p}). $$(27)
- (b)If p is an increasing n-tuple and the base of log is greater than 1, then for the connection between Shannon entropies of p and rIf the base of log is in between 0 and 1, then the reverse inequality holds in (28).$$ H(\mathbf{r}) \leq H(\mathbf{p}). $$(28)
Proof
(a): Substitute \(g(x):= \log x\) and \(q_{i}=1\) (\(i=1, \ldots, n\)) in Theorem 5(a). Then we get (27).
We can prove part (b) with similar substitutions. □
The second case corresponds to the relative entropy or the K-L distance between two probability distributions.
Definition 4
Corollary 3
- (a)If \(\frac{\mathbf{r}}{\mathbf{q}}\) is a decreasing n-tuple and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (29).$$ \sum_{i=1}^{n} q_{i} \log \biggl(\frac{r_{i}}{q_{i}} \biggr) \geq \sum _{i=1}^{n} q_{i} \log \biggl( \frac{p_{i}}{q_{i}} \biggr). $$(29)
- (b)If \(\frac{\mathbf{p}}{\mathbf{q}}\) is an increasing n-tuple and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (30).$$ \sum_{i=1}^{n} q_{i} \log \biggl(\frac{r_{i}}{q_{i}} \biggr) \leq \sum _{i=1}^{n} q_{i} \log \biggl( \frac{p_{i}}{q_{i}} \biggr). $$(30)
Proof
(a): Substitute \(f(x):= \log x\) in Theorem 4(a). Then we get (29).
We can prove part (b) with substitution \(f(x):= \log x\) in Theorem 4(b). □
Corollary 4
- (a)If \(\frac{\mathbf{r}}{\mathbf{q}}\) is a decreasing n-tuple and the base of log is greater than 1, then the following comparison inequality between K-L distance of \((\mathbf {r},\mathbf{q})\) and \((\mathbf{p}, \mathbf{q})\) holds:If the base of log is in between 0 and 1, then the reverse inequality holds in (31).$$ L (\mathbf{r},\mathbf{q} ):=\sum_{i=1}^{n} r_{i} \log \biggl(\frac{r_{i}}{q_{i}} \biggr) \leq L (\mathbf{p}, \mathbf {q} ):= \sum_{i=1}^{n} p_{i} \log \biggl(\frac{p_{i}}{q_{i}} \biggr). $$(31)
- (b)If \(\frac{\mathbf{p}}{\mathbf{q}}\) is an increasing n-tuple and the base of log is greater than 1, then the following comparison inequality between K-L distance of \((\mathbf {r},\mathbf{q})\) and \((\mathbf{p}, \mathbf{q})\) holds:If the base of log is in between 0 and 1, then the reverse inequality holds in (32).$$ \sum_{i=1}^{n} r_{i} \log \biggl(\frac{r_{i}}{q_{i}} \biggr) \geq \sum _{i=1}^{n} p_{i} \log \biggl( \frac{p_{i}}{q_{i}} \biggr). $$(32)
3 Applications to the Zipf-Mandelbrot entropy
The term Zipfian distribution refers to a distribution of probabilities of occurrence that follows Zipf’s law. Zipf’s law is an experimental law, not a theoretical one; i.e. it describes an occurrence rather than predicting it from some kind of theory: the observation that, in many natural and man-made phenomena, the probability of occurrence of many random items starts high and tapers off. Thus, a few occur very often while many others occur rarely. The formal definition of this law is \(\mathbf{P_{n}}= 1/\mathbf{n^{a}}\), where Pn is the frequency of occurrence of the nth ranked item and a is close to 1.
Converted to language, this means that the rank of a word (in terms of its frequency) is approximately inversely proportional to its actual frequency, and so produces a hyperbolic distribution. To put Zipf’s law in another way (see [18, 19]): \(fr=C\), where r= the rank of a word, f= the frequency of occurrence of that word, and C= a constant (the value of which depends on the subject under consideration). Essentially this shows an inverse proportional relationship between a word’s frequency and its frequency rank. Zipf called this curve the ‘standard curve’. Texts from natural languages do not, of course, behave with such absolute mathematical precision. They can not, because, for one thing, any curve representing empirical data from large texts will be a stepped graph, since many non-high-frequency words will share the same frequency. But the overall consensus is that texts match the standard curve significantly well. Li [20] writes ‘this distribution, also called Zipf’s law, has been checked for accuracy for the standard corpus of the present-day English [Kućera and Francis] with very good results.’ See Miller [21] for a concise summary of the match between actual data and the standard curve.
Zipf also studied the relationship between the frequency of occurrence of a word and its length. In The Psycho-Biology of Language, he stated that ‘it seems reasonably clear that shorter words are distinctly more favored in language than longer words.’
Apart from the use of this law in information science and linguistics, Zipf’s law is used in economics. This distribution in economics is known as Pareto’s law, which analyzes the distribution of the wealthiest members of the community [22], p.125. These two laws are the same in the mathematical sense, but they are applied in different contexts [23], p.294. The same type of distribution that we have in Zipf’s and Pareto’s law, also known as the power law, can also be found in other scientific disciplines, such as physics, biology, earth and planetary sciences, computer science, demography and the social sciences [24].
Application of the Zipf-Mandelbrot law can also be found in linguistics [26], information sciences [19, 23] and ecological field studies [27].
In the case of a continuous distribution, it gives the area under the probability distribution functions, also used to specify the distribution of multivariable random variables.
There are various applications of CDF. For example, in learning to rank, the CDF arises naturally as a probability measure over inequality events of the type \(\{X \leq x\}\). The joint CDF lends itself to problems that are easily described in terms of inequality events in which statistical dependence relationships also exist among events. Examples of this type of problem include web search and document retrieval [28–31], predicting rating of movies [32] or predicting multiplayer game outcomes with a team structure [33]. In contrast to the canonical problems of classification or regression, in learning to rank we are required to learn some mapping from inputs to inter-dependent output variables, so that we may wish to model both stochastic orderings of variable states and statistical dependence relationships between variables.
In the following application, we use two of the Zipf-Mandelbrot laws for different parameters.
Application 1
- (a)If \(\frac{(i+t_{2})^{s_{2}}}{(i+1+t_{2})^{s_{2}}} \leq \frac{q_{i+1}}{q_{i}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (37).$$\begin{aligned} &\sum_{i=1}^{n} \frac{1}{(i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}} \log \biggl(\frac{1}{q_{i} (i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}} \biggr) \\ &\quad \leq \sum_{i=1}^{n} \frac{1}{(i+t_{1})^{s_{1}} H_{n, t_{1}, s_{1}}} \log \biggl(\frac{1}{q_{i} (i+t_{1})^{s_{1}} H_{n, t_{1}, s_{1}}} \biggr). \end{aligned}$$(37)
- (b)If \(\frac{(i+t_{1})^{s_{1}}}{(i+1+t_{1})^{s_{1}}} \geq \frac{q_{i+1}}{q_{i}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (38).$$\begin{aligned} &\sum_{i=1}^{n} \frac{1}{(i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}} \log \biggl(\frac{1}{q_{i} (i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}} \biggr) \\ &\quad \geq\sum_{i=1}^{n} \frac{1}{(i+t_{1})^{s_{1}} H_{n, t_{1}, s_{1}}} \log \biggl(\frac{1}{q_{i} (i+t_{1})^{s_{1}} H_{n, t_{1}, s_{1}}} \biggr). \end{aligned}$$(38)
Proof
(b) If we switch the role of \(r_{i}\) into \(p_{i}\), then by using (32) in Corollary 4(b) we get (38). □
The following application is a special case of the above result.
Application 2
Assume p and r to be the Zipf-Mandelbrot laws with parameters \(n \in\{ 1, 2, \ldots\}\), \(t_{1}, t_{2} \geq0\) and \(s_{1}, s_{2} > 0\), respectively, satisfying (36).
Application 3
- (a)If \(\frac{(i+t_{2})^{s_{2}}}{(i+1+t_{2})^{s_{2}}} \leq \frac{q_{i+1}}{q_{i}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (40).$$ \sum_{i=1}^{n} q_{i} \log \biggl(\frac{1}{q_{i} (i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}} \biggr) \geq\sum _{i=1}^{n} q_{i} \log \biggl( \frac{1}{q_{i} (i+t_{1})^{s_{1}} H_{1, t_{1}, s_{1}}} \biggr). $$(40)
- (b)If \(\frac{(i+t_{1})^{s_{1}}}{(i+1+t_{1})^{s_{1}}} \geq \frac{q_{i+1}}{q_{i}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (41).$$ \sum_{i=1}^{n} q_{i} \log \biggl(\frac{1}{q_{i} (i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}} \biggr) \leq\sum _{i=1}^{n} q_{i} \log \biggl( \frac{1}{q_{i} (i+t_{1})^{s_{1}} H_{1, t_{1}, s_{1}}} \biggr). $$(41)
Proof
We can prove by a similar method as given in Application 1 with substitutions \(p_{i}:= \frac{1}{(i+t_{1})^{s_{1}}H_{n, t_{1}, s_{1}}}\) and \(r_{i}:= \frac{1}{(i+t_{2})^{s_{2}}H_{n, t_{2}, s_{2}}}\) in Corollary 3 instead of Corollary 4, to get the required results. □
The following result is a special case of Application 3.
Application 4
Application 5
- (a)If \(\frac{(i+t_{2})^{s_{2}}}{(i+1+t_{2})^{s_{2}}} \leq \frac{q_{i+1}}{q_{i}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (43).$$ \sum_{i=1}^{n} q_{i} \log \biggl(\frac{1}{q_{i} (i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}} \biggr) \geq H(\mathbf{q}). $$(43)
- (b)If \(\frac{(i+t_{1})^{s_{1}}}{(i+1+t_{1})^{s_{1}}} \geq \frac{q_{i+1}}{q_{i}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (44).$$ H(\mathbf{q}) \leq \sum_{i=1}^{n} q_{i} \log \biggl(\frac{1}{q_{i} (i+t_{1})^{s_{1}} H_{n, t_{1}, s_{1}}} \biggr). $$(44)
Proof
(a) We can prove (43), by a similar method to that given in Application 1, with substitutions \(p_{i}:= 1\) and \(r_{i}:= \frac {1}{(i+t_{2})^{s_{2}}H_{n, t_{2}, s_{2}}}\), in Corollary 1(a) instead of Corollary 4(a).
(b) For this part, switch the role of p and r in part (a), like \(p_{i}:= \frac{1}{(i+t_{1})^{s_{1}}H_{n, t_{1}, s_{1}}}\) and \(r_{i}:= 1\) \((i=1,2,\ldots, n)\), and applying Corollary 1(b) instead of Corollary 4(b), we get (44). □
At the end, in the following application, we use three of the Zipf-Mandelbrot laws for different parameters.
Application 6
- (a)If \(\frac{(i+1+t_{2})^{s_{2}}}{(i+1+t_{3})^{s_{3}}} \leq \frac{(i+t_{2})^{s_{2}}}{(i+t_{3})^{s_{3}}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (45).$$\begin{aligned} &\sum_{i=1}^{n} \frac{1}{(i+t_{3})^{s_{3}}H_{n, t_{3}, s_{3}}} \log \biggl(\frac{(i+t_{2})^{s_{2}}H_{n, t_{2}, s_{2}}}{(i+t_{3})^{s_{3}}H_{n, t_{3}, s_{3}}} \biggr) \\ &\quad \leq \sum_{i=1}^{n} \frac{1}{(i+t_{1})^{s_{1}}H_{n, t_{1}, s_{1}}} \log \biggl(\frac{(i+t_{2})^{s_{2}}H_{n, t_{2}, s_{2}}}{(i+t_{1})^{s_{1}}H_{n, t_{1}, s_{1}}} \biggr). \end{aligned}$$(45)
- (b)If \(\frac{(i+1+t_{2})^{s_{2}}}{(i+1+t_{3})^{s_{3}}} \geq \frac{(i+t_{2})^{s_{2}}}{(i+t_{3})^{s_{3}}}\) \((i=1, \ldots, n)\) and the base of log is greater than 1, thenIf the base of log is in between 0 and 1, then the reverse inequality holds in (46).$$\begin{aligned} &\sum_{i=1}^{n} \frac{1}{(i+t_{3})^{s_{3}}H_{n, t_{3}, s_{3}}} \log \biggl(\frac{(i+t_{2})^{s_{2}}H_{n, t_{2}, s_{2}}}{(i+t_{3})^{s_{3}}H_{n, t_{3}, s_{3}}} \biggr) \\ &\quad \geq \sum_{i=1}^{n} \frac{1}{(i+t_{1})^{s_{1}}H_{n, t_{1}, s_{1}}} \log \biggl(\frac{(i+t_{2})^{s_{2}}H_{n, t_{2}, s_{2}}}{(i+t_{1})^{s_{1}}H_{n, t_{1}, s_{1}}} \biggr). \end{aligned}$$(46)
Proof
(a) Let \(p_{i}:= \frac{1}{(i+t_{1})^{s_{1}} H_{n, t_{1}, s_{1}}}\), \(q_{i}:= \frac{1}{(i+t_{2})^{s_{2}} H_{n, t_{2}, s_{2}}}\) and \(r_{i}:= \frac{1}{(i+t_{3})^{s_{3}} H_{n, t_{3}, s_{3}}}\). Here \(p_{i},q_{i}\) and \(r_{i}\) are decreasing over \(i=1, \ldots, n\). Now, we investigate the behavior of \(\frac{\mathbf{r}}{\mathbf{q}}\).
(b) If we replace \(\frac{\mathbf{r}}{\mathbf{q}}\) with \(\frac {\mathbf{p}}{\mathbf{q}}\) in part (a) and use Corollary 4(b), we get (46). □
4 Conclusions
In this paper we show how the Shannon entropy is connected to the theory of majorization. They are both linked to the measure of disorder in a system. However, the theory of majorization usually gives stronger criteria than the entropic inequalities. The theory of majorization and the notion of entropic measure of disorder are closely related. Based on this fact, the aim of this paper is to look for majorization relations with entropic inequalities. We give some generalized results for Csiszár f-divergence of majorization inequality. We mention two special cases of these generalized results; the first case corresponds to the entropy of a discrete probability distribution, and the second case corresponds to the relative entropy or the Kullback-Leibler distance between two probability distributions. The cumulative distribution function (CDF) is an important application of majorization. We give several applications by using the Zipf-Mandelbrot law with (CDF).
Declarations
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Boltzmann, L: Ubber die Beziehung zwischen dem Hauptsatze der mechanischen Warmetheorie und der Wahrscheinlicjkeitsrechnung repective den Satzen uber das Warmegleichgewicht. Wien. Ber. Nat.wiss. Kunst 76, 373-435 (1877) Google Scholar
- Akaike, H: Prediction and Entropy. Springer, New York (1985) MATHGoogle Scholar
- Shannon, CE: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379-423 and 623-656 (1948) MathSciNetView ArticleMATHGoogle Scholar
- Kullback, S, Leibler, RA: On information and sufficiency. Ann. Math. Stat. 22(1), 79-86 (1951) MathSciNetView ArticleMATHGoogle Scholar
- Matić, M, Pearce, CEM, Pečarić, J: Shannon’s and related inequalities in information theory. In: Rassias, TM (ed.) Survey on Classical Inequalities, pp. 127-164. Kluwer Academic, Norwell (2000) Google Scholar
- Mitrinović, DS, Pečarić, J, Fink, AM: Classical and New Inequalities in Analysis. Kluwer Academic, Dordrecht (1993) View ArticleMATHGoogle Scholar
- McEliece, RJ: The Theory of Information and Coding. Addison-Wesley, Reading (1977) MATHGoogle Scholar
- Matić, M, Pearce, CEM, Pečarić, J: Improvements of some bounds on entropy measures in information theory. Math. Inequal. Appl. 1, 295-304 (1998) MathSciNetMATHGoogle Scholar
- Matić, M, Pearce, CEM, Pečarić, J: On an inequality for the entropy of a probability distribution. Acta Math. Hung. 85, 345-349 (1999) MathSciNetView ArticleMATHGoogle Scholar
- Matić, M, Pearce, CEM, Pečarić, J: Some refinements of Shannon’s inequalities. ANZIAM J. 43, 493-511 (2002) MathSciNetMATHGoogle Scholar
- Pečarić, J, Proschan, F, Tong, YL: Convex Functions, Partial Orderings and Statistical Applications. Academic Press, New York (1992) MATHGoogle Scholar
- Marshall, AW, Olkin, I, Arnold, BC: Inequalities: Theory of Majorization and Its Applications, 2nd edn. Springer Series in Statistics. Springer, New York (2011) View ArticleMATHGoogle Scholar
- Fuchs, L: A new proof of an inequality of Hardy-Littlewood-Polya. Math. Tidsskr, 53-54 (1947) Google Scholar
- Niculescu, CP, Persson, LE: Convex Functions and Their Applications, a Contemporary Approach. CMS Books in Mathematics, vol. 23. Springer, New York (2006) View ArticleMATHGoogle Scholar
- Csiszár, I: Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hung. 2, 299-318 (1967) MathSciNetMATHGoogle Scholar
- Csiszár, I: Information measure: a critical survey. In: Trans. 7th Prague Conf. on Info. Th., Statist. Decis. Funct., Random Processes and 8th European Meeting of Statist., B, pp. 73-86. Academia, Prague (1978) Google Scholar
- Horváth, L, Pečarić, Ð, Pečarić, J: Estimations of f- and Rényi divergences by using a cyclic refinement of the Jensen’s inequality. Bull. Malays. Math. Sci. Soc. (2017). doi:10.1007/s40840-017-0526-4 Google Scholar
- Adil Khan, M, Pečarić, Ð, Pečarić, J: Bounds for Shannon and Zipf-Mandelbrot entropies. Math. Methods Appl. Sci. to appear Google Scholar
- Silagadze, ZK: Citations and the Zipf-Mandelbrot law. Complex Syst. 11, 487-499 (1997) MATHGoogle Scholar
- Li, W: Random texts exhibits Zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 38(6), 1842-1845 (1992) View ArticleGoogle Scholar
- Miller, GA: Language and Communication. McGraw-Hill, New York (1951) View ArticleGoogle Scholar
- Diodato, V: Dictionary of Bibliometrics. Haworth Press, New York (1994) Google Scholar
- Egghe, L, Rousseau, R: Introduction to Informetrics. Quantitative Methods in Library, Documentation and Information Science. Elsevier, New York (1990) Google Scholar
- Newmann, MEJ: Power laws, Pareto distributions and Zipf’s law. arXiv:cond-mat/0412004
- Mandelbrot, B: Information theory and psycholinguistics: a theory of words frequencies. In: Lazafeld, P, Henry, N (eds.) Reading in Mathematical Social Science. MIT Press, Cambridge (1966) Google Scholar
- Montemurro, MA: Beyond the Zipf-Mandelbrot law in quantitative linguistics (2001). arXiv:cond-mat/0104066v2
- Mouillot, D, Lepretre, A: Introduction of relative abundance distribution (RAD) indices, estimated from the rank-frequency diagrams (RFD), to assess changes in community diversity. Environ. Monit. Assess. 63(2), 279-295 (2000) View ArticleGoogle Scholar
- Burgas, CJC, Shaked, T, Renshaw, E, Lazier, A, Deeds, M, Hamilton, N, Hallender, G: Learning to rank using gradient descent. In: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML), pp. 89-96 (2005) Google Scholar
- Cao, Z, Qin, T, Liu, TY, Tsai, MF, Li, H: Learning to rank: from pair-wise approach to listwise approach. In: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML), pp. 129-136 (2007) Google Scholar
- Jaochims, T: A support vector method for multivariable performance measures. In: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML), pp. 377-384 (2005) Google Scholar
- Xia, F, Liu, TY, Wang, J, Zhang, W, Li, H: Listwise approach to learning to rank - theory and algorithms. In: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML), pp. 1192-1199 (2008) Google Scholar
- Rennie, J, Srebro, N: Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML), pp. 713-719 (2005) Google Scholar
- Herbrich, R, Minka, TP, Graepel, T: TrueSkill^{TM}: a Bayesian skill rating system. Adv. Neural Inf. Process. Syst. 19, 569-576 (2007) Google Scholar