The Kolmogorov Distance between the Binomial and Poisson Laws: Efficient Algorithms and Sharp Estimates
- José A. Adell^{1}Email author,
- José M. Anoz^{1} and
- Alberto Lekuona^{1}
https://doi.org/10.1155/2009/965712
© José A. Adell et al. 2009
Received: 21 May 2009
Accepted: 9 October 2009
Published: 30 December 2009
Abstract
We give efficient algorithms, as well as sharp estimates, to compute the Kolmogorov distance between the binomial and Poisson laws with the same mean . Such a distance is eventually attained at the integer part of . The exact Kolmogorov distance for is also provided. The preceding results are obtained as a concrete application of a general method involving a differential calculus for linear operators represented by stochastic processes.
1. Introduction and Main Results
There is a huge amount of literature on estimates of different probability metrics between random variables, measuring the rates of convergence in various limit theorems, such as Poisson approximation and the central limit theorem. However, as far as we know, there are only a few papers devoted to obtain exact values for such probability metrics, even in the most simple and paradigmatic examples. In this regard, we mention the results by Kennedy and Quine [1] giving the exact total variation distance between binomial and Poisson distributions, when their common mean is smaller than , approximately, as well as the efficient algorihm provided in the work of Adell et al. [2] to compute this distance for arbitrary values of . On the other hand, closed-form expressions for the Kolmogorov and total variation distances between some familiar discrete distributions with different parameters can be found in Adell and Jodrá [3]. Finally, Hipp and Mattner [4] have recently computed the exact Kolmogorov distance in the central limit theorem for symmetric binomial distributions.
The aim of this paper is to obtain efficient algorithms and sharp estimates in the highly classical problem of evaluating the Kolmogorov distance between binomial and Poisson laws having the same mean. The techniques used here are analogous to those in [2] dealing with the total variation distance between the aforementioned laws.
To state our main results, let us introduce some notation. Denote by the set of nonnegative integers, and , . If is a set of real numbers, stands for the indicator function of . For any , we set and . For any , the th forward differences of a function are recursively defined by , , , and .
Throughout this note, it will be assumed that , , and . Let be a sequence of independent identically distributed random variables having the uniform distribution on . The random variable
has the binomial distribution with parameters and . Let be a random variable having the Poisson distribution with mean . Recall that the Kolmogorov distance between and is defined by
where
An efficient algorithm to compute is based on the zeroes of the second Krawtchouk and Charlier polynomials, which are the orthogonal polynomials with respect to the binomial and Poisson distributions, respectively. Interesting references for general orthogonal polynomials are the monographs by Chihara [5] and Schoutens [6].
The two zeroes of this polynomial are
As , , and , converges to the second Charlier polynomial with respect to defined by
the two zeroes of which are
Finally, we denote by
and by
Our first main result is the following.
Theorem 1.1.
Looking at Figure 1 and taking into account (1.8), (1.9), and (1.12) we see the following. The number of computations needed to evaluate is approximately , that is, , approximately. This last quantity is relatively small, since approximates if and only if is close to zero. Moreover, the set has two points at most, whenever , and this happens if
As follows from (1.2), the natural way to compute the Kolmogorov distance is to look at the maximum absolute value of the function
From a computational point of view, the main question is to ask how many evaluations of the probability differences are required to exactly compute . According to Theorem 1.1 and (1.8), the number of such evaluations is at least, and at most, approximately.
On the other hand, and converge, respectively, to and , as . Thus, Theorem 1.1 leads us to the following asymptotic result.
Corollary 1.2.
Unfortunately, is not uniformly bounded when varies in an arbitrary compact set. In fact, since , , and , , it can be verified that , when from the left, , or when from the right, This explains why and in Theorem 1.1 have no simple form in general.
Finally, it may be of interest to compare Theorem 1.1 and Corollary 1.2 with the exact value of the Kolmogorov distance in the central limit theorem for symmetric binomial distributions obtained by Hipp and Mattner [4]. These authors have shown that (cf. [4, Corollary ])
where is a standard normal random variable. Roughly speaking, (1.17) tells us that the Kolmogorov distance in this version of the central limit theorem is attained at the mean of the respective distributions; whereas according to Theorem 1.1 and Corollary 1.2, the Kolmogorov distance in our Poisson approximation setting is attained at the mean the standard deviation of the corresponding distributions.
For small values of , we are able to give the following closed-form expression.
Corollary 1.3.
Corollary 1.3 can be seen as a counterpart of the total variation result established by Kennedy and Quine [1, Theorem ], stating that
for any and , where stands for the total variation distance.
For any , , and , we denote by
where
Sharp estimates for the Kolmogorov distance are given in the following.
Theorem 1.4.
Upper bounds for the Kolmogorov distance in Poisson approximation for sums of independent random indicators have been obtained by many authors using different techniques. We mention the following estimates in the case at hand:
(Serfling [7]),
(Hipp [8]),
(Deheuvels et al. [9]),
and the constant in the last estimate is best possible (cf. Roos [10]). It is readily seen from (1.23) that
On the other hand, it follows from Roos [10] and Lemma 2.1 below that
S | H | D | R | A | ||
---|---|---|---|---|---|---|
100 | 0.01 | 0.0050 | 0.007854 | 0.003091 | 0.003173 | 0.001916 |
200 | 0.0100 | 0.007854 | 0.002605 | 0.003173 | 0.001416 | |
500 | 0.0250 | 0.007854 | 0.002656 | 0.003173 | 0.001454 | |
1000 | 0.0500 | 0.007854 | 0.002603 | 0.003173 | 0.001396 | |
200 | 0.005 | 0.0025 | 0.003927 | 0.001348 | 0.001376 | 0.000938 |
400 | 0.0050 | 0.003927 | 0.001105 | 0.001376 | 0.000692 | |
1000 | 0.0125 | 0.003927 | 0.001131 | 0.001376 | 0.000714 | |
2000 | 0.0250 | 0.003927 | 0.001104 | 0.001376 | 0.000687 |
On the other hand, the referee has drawn our attention to a recent paper by Vaggelatou [11], where the author obtains upper bounds for the Kolmogorov distance between sums of independent integer-valued random variables. Specializing Corollary in [11] to the case at hand, Vaggelatou gives the upper bound
V | A | ||
---|---|---|---|
0.6 | 20 | 0.0054116 | 0.0059268 |
50 | 0.0020499 | 0.0021122 | |
100 | 0.0010063 | 0.0010199 | |
200 | 0.0004985 | 0.0005017 | |
500 | 0.0001983 | 0.0001988 | |
1000 | 0.0000990 | 0.0000991 | |
0.9 | 20 | 0.0098476 | 0.0103392 |
50 | 0.0035463 | 0.0035676 | |
100 | 0.0017095 | 0.0017106 | |
200 | 0.0008390 | 0.0008388 | |
500 | 0.0003318 | 0.0003317 | |
1000 | 0.0001653 | 0.0001653 | |
1 | 20 | 0.0114410 | 0.0117428 |
50 | 0.0040305 | 0.0040086 | |
100 | 0.0019267 | 0.0019162 | |
200 | 0.0009415 | 0.0009382 | |
500 | 0.0003714 | 0.0003708 | |
1000 | 0.0001848 | 0.0001847 | |
2 | 20 | 0.0367148 | 0.0228808 |
50 | 0.0090741 | 0.0065533 | |
100 | 0.0036183 | 0.0029671 | |
200 | 0.0015808 | 0.0014156 | |
500 | 0.0005777 | 0.0005510 | |
1000 | 0.0002798 | 0.0002731 |
We finally establish that, for small values of , the Kolmogorov distance is attained at , that is, at , approximately. This completes the statement in Corollary 1.3.
Corollary 1.5.
Remark 1.6.
As far as upper bounds are concerned, the methods used in this paper can be adapted to cover more general cases referring to Poisson approximation (see, e.g., the Introduction in [2] and the references therein). However, the obtention of efficient algorithms leading to exact values is a more delicate question. As we will see in Section 2, specially in formula (2.1), such a problem is based on two main facts: first, the explicit form of the orthogonal polynomials associated to the random variables to be approximated, and, second, the relation between expectations involving forward differences and expectations involving these orthogonal polynomials. For instance, an explicit expression for the orthogonal polynomials associated to general sums of independent random indicators seems to be unknown.
2. The Proofs
The key tool to prove the previous results is the following formula established in [2, formula ( )]. For any function for which the expectations below exist, we have
and and are independent identically distributed random variables having the uniform distribution on , also independent of the sequence in (1.1).
Proof of Theorem 1.1.
To show (1.12) and (1.13), we apply the second equality in (2.1) to the function , thus obtaining by virtue of (1.3)
Again by (1.9) and (1.10), this means that , for any . This fact, in conjunction with (2.2) and (2.5), shows (2.6).
To prove (2.7), we distinguish the following two cases.
which implies that , for any . As before, this property shows (2.7).
In this occasion, we have . Since and the remaining inequalities in (2.9) are satisfied, we conclude as in the previous case that (2.7) holds. The proof is complete.
Proof of Corollary 1.3.
where is the convex function given by , . Since , the first inequality in (2.1) proves that the right-hand side in (2.11) is nonnegative. This, together with (2.10), shows that and completes the proof.
Let , , and . For any function , we have
where is defined in (1.20). Formula (2.12) can be found in Barbour et al. [12, Lemma ]; whereas estimate (2.13) is established in Adell et al. [2, formula ( )]. Choosing , in (2.12), we consider the function
where and are defined in (1.23) and (1.28), respectively.
Lemma 2.1.
For any , one has . In addition, for any , one has .
Proof.
This completes the proof.
Proof of Theorem 1.4.
Thus, the conclusion follows from (2.16) and Lemma 2.1.
We have been aware that Boutsikas and Vaggelatou have recently provided in [13] an independent proof of Lemma 2.1.
Proof of Corollary 1.5.
Therefore, applying (2.13) to the function , as well as Theorem 1.4, we obtain the desired conclusion.
Declarations
Acknowledgments
The authors thank the referees for their careful reading of the manuscript and for their remarks and suggestions, which greatly improved the final outcome. This work has been supported by Research Grants MTM2008-06281-C02-01/MTM and DGA E-64, and by FEDER funds.
Authors’ Affiliations
References
- Kennedy JE, Quine MP: The total variation distance between the binomial and Poisson distributions. The Annals of Probability 1989,17(1):396–400. 10.1214/aop/1176991519MathSciNetView ArticleMATHGoogle Scholar
- Adell JA, Anoz JM, Lekuona A: Exact values and sharp estimates for the total variation distance between binomial and Poisson distributions. Advances in Applied Probability 2008,40(4):1033–1047. 10.1239/aap/1231340163MathSciNetView ArticleMATHGoogle Scholar
- Adell JA, Jodrá P: Exact Kolmogorov and total variation distances between some familiar discrete distributions. Journal of Inequalities and Applications 2006, 2006:-8.Google Scholar
- Hipp C, Mattner L: On the normal approximation to symmetric binomial distributions. Theory of Probability and Its Applications 2008,52(3):516–523. 10.1137/S0040585X97983213MathSciNetView ArticleMATHGoogle Scholar
- Chihara TS: An Introduction to Orthogonal Polynomials. Gordon and Breach, New York, NY, USA; 1978:xii+249.MATHGoogle Scholar
- Schoutens W: Stochastic Processes and Orthogonal Polynomials, Lecture Notes in Statistics. Volume 146. Springer, New York, NY, USA; 2000:xiv+163.View ArticleMATHGoogle Scholar
- Serfling RJ: Some elementary results on Poisson approximation in a sequence of Bernoulli trials. SIAM Review 1978,20(3):567–579. 10.1137/1020070MathSciNetView ArticleMATHGoogle Scholar
- Hipp C: Approximation of aggregate claims distributions by compound Poisson distributions. Insurance: Mathematics & Economics 1985,4(4):227–232. 10.1016/0167-6687(85)90032-0MathSciNetMATHGoogle Scholar
- Deheuvels P, Pfeifer D, Puri ML: A new semigroup technique in Poisson approximation. Semigroup Forum 1989,38(2):189–201.MathSciNetView ArticleMATHGoogle Scholar
- Roos B: Sharp constants in the Poisson approximation. Statistics & Probability Letters 2001,52(2):155–168. 10.1016/S0167-7152(00)00208-XMathSciNetView ArticleMATHGoogle Scholar
- Vaggelatou E: A new method for bounding the distance between sums of independent integer-valued random variables. Methodology and Computing in Applied Probability. In press Methodology and Computing in Applied Probability. In pressGoogle Scholar
- Barbour AD, Holst L, Janson S: Poisson Approximation, Oxford Studies in Probability. Volume 2. The Clarendon Press, Oxford University Press, New York, NY, USA; 1992:x+277.MATHGoogle Scholar
- Boutsikas MV, Vaggelatou E: A new method for obtaining sharp compound Poisson approximation error estimates for sums of locally dependent random variables. to appear in Bernoulli to appear in BernoulliGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.