We give efficient algorithms, as well as sharp estimates, to compute the Kolmogorov distance between the binomial and Poisson laws with the same mean . Such a distance is eventually attained at the integer part of . The exact Kolmogorov distance for is also provided. The preceding results are obtained as a concrete application of a general method involving a differential calculus for linear operators represented by stochastic processes.
1. Introduction and Main Results
There is a huge amount of literature on estimates of different probability metrics between random variables, measuring the rates of convergence in various limit theorems, such as Poisson approximation and the central limit theorem. However, as far as we know, there are only a few papers devoted to obtain exact values for such probability metrics, even in the most simple and paradigmatic examples. In this regard, we mention the results by Kennedy and Quine  giving the exact total variation distance between binomial and Poisson distributions, when their common mean is smaller than , approximately, as well as the efficient algorihm provided in the work of Adell et al.  to compute this distance for arbitrary values of . On the other hand, closed-form expressions for the Kolmogorov and total variation distances between some familiar discrete distributions with different parameters can be found in Adell and Jodrá . Finally, Hipp and Mattner  have recently computed the exact Kolmogorov distance in the central limit theorem for symmetric binomial distributions.
The aim of this paper is to obtain efficient algorithms and sharp estimates in the highly classical problem of evaluating the Kolmogorov distance between binomial and Poisson laws having the same mean. The techniques used here are analogous to those in  dealing with the total variation distance between the aforementioned laws.
To state our main results, let us introduce some notation. Denote by the set of nonnegative integers, and , . If is a set of real numbers, stands for the indicator function of . For any , we set and . For any , the th forward differences of a function are recursively defined by , , , and .
Throughout this note, it will be assumed that , , and . Let be a sequence of independent identically distributed random variables having the uniform distribution on . The random variable
has the binomial distribution with parameters and . Let be a random variable having the Poisson distribution with mean . Recall that the Kolmogorov distance between and is defined by
Observe that for any we have
An efficient algorithm to compute is based on the zeroes of the second Krawtchouk and Charlier polynomials, which are the orthogonal polynomials with respect to the binomial and Poisson distributions, respectively. Interesting references for general orthogonal polynomials are the monographs by Chihara  and Schoutens .
More precisely, let with , and . The second Krawtchouk polynomial with respect to is given by
The two zeroes of this polynomial are
As , , and , converges to the second Charlier polynomial with respect to defined by
the two zeroes of which are
Finally, we denote by
the smallest zero of and the greatest zero of , respectively, (see Figure 1).
Our first main result is the following.
Let and . Then,
where is defined in (1.2),
Looking at Figure 1 and taking into account (1.8), (1.9), and (1.12) we see the following. The number of computations needed to evaluate is approximately , that is, , approximately. This last quantity is relatively small, since approximates if and only if is close to zero. Moreover, the set has two points at most, whenever , and this happens if
As follows from (1.2), the natural way to compute the Kolmogorov distance is to look at the maximum absolute value of the function
From a computational point of view, the main question is to ask how many evaluations of the probability differences are required to exactly compute . According to Theorem 1.1 and (1.8), the number of such evaluations is at least, and at most, approximately.
On the other hand, and converge, respectively, to and , as . Thus, Theorem 1.1 leads us to the following asymptotic result.
Let and . Let be the smallest integer such that and , for . Then, one has for any
Unfortunately, is not uniformly bounded when varies in an arbitrary compact set. In fact, since , , and , , it can be verified that , when from the left, , or when from the right, This explains why and in Theorem 1.1 have no simple form in general.
Finally, it may be of interest to compare Theorem 1.1 and Corollary 1.2 with the exact value of the Kolmogorov distance in the central limit theorem for symmetric binomial distributions obtained by Hipp and Mattner . These authors have shown that (cf. [4, Corollary ])
where is a standard normal random variable. Roughly speaking, (1.17) tells us that the Kolmogorov distance in this version of the central limit theorem is attained at the mean of the respective distributions; whereas according to Theorem 1.1 and Corollary 1.2, the Kolmogorov distance in our Poisson approximation setting is attained at the mean the standard deviation of the corresponding distributions.
For small values of , we are able to give the following closed-form expression.
Let . If , then
Corollary 1.3 can be seen as a counterpart of the total variation result established by Kennedy and Quine [1, Theorem ], stating that
for any and , where stands for the total variation distance.
For any , , and , we denote by
Sharp estimates for the Kolmogorov distance are given in the following.
Let , and . Then,
Upper bounds for the Kolmogorov distance in Poisson approximation for sums of independent random indicators have been obtained by many authors using different techniques. We mention the following estimates in the case at hand:
and the constant in the last estimate is best possible (cf. Roos ). It is readily seen from (1.23) that
On the other hand, it follows from Roos  and Lemma 2.1 below that
Such properties, together with simple numerical computations performed with Maple 9.01, show that estimate (1.22) is always better than the preceding ones for and . Numerical comparisons are exhibited in Table 1.
On the other hand, the referee has drawn our attention to a recent paper by Vaggelatou , where the author obtains upper bounds for the Kolmogorov distance between sums of independent integer-valued random variables. Specializing Corollary in  to the case at hand, Vaggelatou gives the upper bound
Comparing Corollary 1.3 and (1.22) with (1.31), we see the following. The constant in the main term of the order of in (1.22) is better than that in (1.31). The constant in the remainder term of the order of in (1.22) is uniformly bounded in , whereas; is not. However, is better than for small values of (recall that Corollary 1.3 gives the exact distance for ). As a result, for moderate or large values of , estimate (1.31) is sometimes better than (1.22) for , approximately. Otherwise, Corollary 1.3 and (1.22) provide better bounds than (1.31). This is illustrated in Table 2.
We finally establish that, for small values of , the Kolmogorov distance is attained at , that is, at , approximately. This completes the statement in Corollary 1.3.
For any , one has
As far as upper bounds are concerned, the methods used in this paper can be adapted to cover more general cases referring to Poisson approximation (see, e.g., the Introduction in  and the references therein). However, the obtention of efficient algorithms leading to exact values is a more delicate question. As we will see in Section 2, specially in formula (2.1), such a problem is based on two main facts: first, the explicit form of the orthogonal polynomials associated to the random variables to be approximated, and, second, the relation between expectations involving forward differences and expectations involving these orthogonal polynomials. For instance, an explicit expression for the orthogonal polynomials associated to general sums of independent random indicators seems to be unknown.
2. The Proofs
The key tool to prove the previous results is the following formula established in [2, formula ()]. For any function for which the expectations below exist, we have
and and are independent identically distributed random variables having the uniform distribution on , also independent of the sequence in (1.1).
Proof of Theorem 1.1.
Let , , and . The function defined in (1.4) decreases in and increases in . This property, together with definitions (1.2)–(1.4), readily implies the following. There are integers such that
As a consequence of (2.3), the function defined in (1.2) starts from , decreases in , increases in , decreases in , and tends to zero as . We therefore conclude that
To show (1.12) and (1.13), we apply the second equality in (2.1) to the function , thus obtaining by virtue of (1.3)
In view of (2.3), statements (1.12) and (1.13) will follow as soon as we show that
as well as
Observe that some of the sets in (2.6) and (2.7) could be empty. To this end, let with , and . Since the functions defined in (1.6) are increasing in , we have by virtue of (1.9) and (1.10)
Again by (1.9) and (1.10), this means that , for any . This fact, in conjunction with (2.2) and (2.5), shows (2.6).
To prove (2.7), we distinguish the following two cases.
Case 1 ().
By (1.6), (1.8), (1.9), and (1.10), we have
which implies that , for any . As before, this property shows (2.7).
Case 2 ().
In this occasion, we have . Since and the remaining inequalities in (2.9) are satisfied, we conclude as in the previous case that (2.7) holds. The proof is complete.
Proof of Corollary 1.3.
For , (1.8) implies that , and, therefore, , as follows from Theorem 1.1. By (1.11), this in turn implies that
On the other hand, we have from (1.2)
where is the convex function given by , . Since , the first inequality in (2.1) proves that the right-hand side in (2.11) is nonnegative. This, together with (2.10), shows that and completes the proof.
Let , , and . For any function , we have
where is defined in (1.20). Formula (2.12) can be found in Barbour et al. [12, Lemma ]; whereas estimate (2.13) is established in Adell et al. [2, formula ()]. Choosing , in (2.12), we consider the function
Therefore, the function in (2.14) starts from , decreases in , increases in , and decreases to zero in . We therefore have from (2.14)
where and are defined in (1.23) and (1.28), respectively.
As shown in the following auxiliary result, it turns out that , . In this respect, we will need the well-known inequalities
for and .
For any , one has . In addition, for any , one has .
We will only show that , , with the proof of the remaining inequalities being similar. Let . Since the function defined in (1.8) is increasing and , we see that
As follows by calculus, in each interval , attains its minimum at the endpoints. On the other hand, converges to , as . Therefore, it will be enough to show that the sequence is decreasing, or, in other words, that
Simple numerical computations show that (2.19) holds for . Assume that . By (2.17), the left-hand side in (2.19) is bounded above by
This completes the proof.
Proof of Theorem 1.4.
Applying (2.13) to , , and using the converse triangular inequality for the usual sup-norm, we obtain
Thus, the conclusion follows from (2.16) and Lemma 2.1.
We have been aware that Boutsikas and Vaggelatou have recently provided in  an independent proof of Lemma 2.1.
Proof of Corollary 1.5.
From (2.16) and the orthogonality of , we get
Therefore, applying (2.13) to the function , as well as Theorem 1.4, we obtain the desired conclusion.
Kennedy JE, Quine MP: The total variation distance between the binomial and Poisson distributions.The Annals of Probability 1989,17(1):396–400. 10.1214/aop/1176991519
Adell JA, Anoz JM, Lekuona A: Exact values and sharp estimates for the total variation distance between binomial and Poisson distributions.Advances in Applied Probability 2008,40(4):1033–1047. 10.1239/aap/1231340163
Vaggelatou E: A new method for bounding the distance between sums of independent integer-valued random variables.Methodology and Computing in Applied Probability. In press Methodology and Computing in Applied Probability. In press
Barbour AD, Holst L, Janson S: Poisson Approximation, Oxford Studies in Probability. Volume 2. The Clarendon Press, Oxford University Press, New York, NY, USA; 1992:x+277.
Boutsikas MV, Vaggelatou E: A new method for obtaining sharp compound Poisson approximation error estimates for sums of locally dependent random variables. to appear in Bernoulli to appear in Bernoulli
The authors thank the referees for their careful reading of the manuscript and for their remarks and suggestions, which greatly improved the final outcome. This work has been supported by Research Grants MTM2008-06281-C02-01/MTM and DGA E-64, and by FEDER funds.
Authors and Affiliations
Departamento de Métodos Estadísticos, Facultad de Ciencias, Universidad de Zaragoza, 50009, Zaragoza, Spain
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Adell, J.A., Anoz, J.M. & Lekuona, A. The Kolmogorov Distance between the Binomial and Poisson Laws: Efficient Algorithms and Sharp Estimates.
J Inequal Appl2009, 965712 (2009). https://doi.org/10.1155/2009/965712