Explicit constants in the nonuniform local limit theorem for Poisson binomial random variables

In a recent paper the authors proved a nonuniform local limit theorem concerning normal approximation of the point probabilities P ( S = k ) when S = (cid:2) ni =1 X i and X 1 , X 2 , ... , X n are independent Bernoulli random variables that may have diﬀerent success probabilities. However, their main result contained an undetermined constant, somewhat limiting its applicability. In this paper we give a nonuniform bound in the same setting but with explicit constants. Our proof uses Stein’s method and, in particular, the K -function and concentration inequality approaches. We also prove a new uniform local limit theorem for Poisson binomial random variables that is used to help simplify the proof in the nonuniform case.


Introduction
Approximation of complicated distributions by simpler ones, on the basis of asymptotic theory, is a ubiquitous theme in probability and statistics.By far the most commonly used and well-known such result is the central limit theorem (CLT), which ensures the weak convergence of appropriately normalized sums of independent random variables to a standard normal distribution.Statisticians frequently invoke the CLT to construct approximate confidence intervals and hypothesis tests.Due to their widespread use, it is clearly important to understand the quality of commonly applied probability approximations as a function of the sample size.
In order to improve the quality of the normal approximation of an integer-valued random variable, it is standard to apply a continuity correction [1,2].Thus, if S is an integervalued random variable with mean μ and variance σ 2 , and Z μ,σ 2 ∼ N(μ, σ 2 ), a continuity corrected normal approximation of P(a ≤ S ≤ b), a, b ∈ Z, is P(a -0.5 ≤ Z μ,σ 2 ≤ b + 0.5).
Section 7.1 of [3] studies the accuracy of the normal approximation with continuity correction in the case where S = n i=1 X i and X 1 , . . ., X n are independent Bernoulli random variables with distributions P(X i = 1) = p i = 1-P(X i = 0), p i ∈ (0, 1).In this case, S is said to have a Poisson binomial distribution.It is shown, in their Theorem 7.1, that if σ 2 = Var(S) where d TV is the total variation distance and Y is an integer-valued random variable with distribution and Z ∼ N(0, 1).The random variable Y defined by (1.2) is said to have a discretized normal distribution with parameters μ and σ 2 , written Y ∼ N d (μ, σ 2 ).The proof of (1.1) uses Stein's method and the zero bias coupling of [4].[5] also considers discretized normal approximation via Stein's method, giving bounds in the total variation distance for a wide range of examples including sums of locally dependent integer-valued random variables.
In addition to considering central limit theorems and bounds in the total variation metric, we may analyze the accuracy of a local normal approximation of the point probabilities P(S = k), when S is integer-valued, via the quantity Proving local limit theorems for a general integer-valued random variable is more delicate than proving central limit theorems as conditions are required to ensure that S does not concentrate on a lattice of span greater than 1.For example, if S is a sum of random variables that are concentrated on the even integers, then P(S = k) = 0 for odd k, and a normal approximation for S cannot be expected to be successful uniformly over Z. Consequently, local limit theorems have been comparatively less studied than central limit theorems although they came first in the historical development of probability [6].
Local limit theorems with uniform error bounds for sums of independent integer-valued random variables are studied in Chap.7 of [7] via Fourier analysis of characteristic functions.Sufficient conditions are given that ensure sup k∈Z k = O(1/σ 2 ), which is shown to be the optimal order for the error as a function of σ .However, explicit constants are not given in the error bounds, and much of the subsequent literature on local limit theorems presents uniform bounds for k using the O symbol without explicit constants.More recently, [8] and [9] give uniform bounds for k with explicit constants in the cases where S has a binomial and Poisson binomial distribution respectively.
Theorem 1.1 of [10] gives a nonuniform bound for k when S has a Poisson binomial distribution.It was shown that if σ 2 ≥ 1, then for each k ∈ Z ∩ [0, n] we have for some positive absolute constant C. The main novelty in this result is the nonuniformity in k, which makes explicit how k decays the further k is into the tail of the distribution, an aspect lost in previous studies that only give uniform bounds.The presence of an undetermined constant in (1.4) somewhat limits the results applicability.We remedy this here with the following explicit nonuniform bound.Theorem 1.1 Let X 1 , X 2 , . . ., X n be jointly independent Bernoulli random variables such that P(X i = 1) = 1 -P(X i = 0) = p i ∈ (0, 1), and let S = n i=1 X i , μ = ES, and σ 2 = Var(S).If where C 1 = 3.15 + 7.39e , A trivial corollary of Theorem 1.1 is to give a value of the constant C appearing in (1.4), albeit under a slightly more restrictive condition on σ 2 .For example, if σ 2 ≥ 5, 25 and 100, then one may take C = 38.6,22.7 and 18.4 respectively.
Our proof of Theorem 1.1 uses Stein's method, in particular the K -function and concentration inequality approaches, which are both discussed in Sect. 2. In [10], (1.4) was proved using the zero bias coupling [4].The use of the K-function approach here allows for a more direct determination of constants as we avoid the need to prove an intermediate result concerning normal approximation of the zero biased random variable as in Theorem 3.1 of [10].While the use of the K-function and concentration inequalities is a standard approach for proving quantitative Berry-Esseen bounds for sums of independent random variables [3,Chap. 3], and even for locally dependent random variables [11], this paper appears to be the first to use this approach to prove a local limit theorem.The zero bias coupling still plays a role when we derive concentration inequalities in Sect.2.2.Although some previous studies have used Stein's method to prove local limit theorems in more general settings, they consider only uniform bounds with different approximating distributions such as the translated Poisson [12,13] or symmetric binomial [14] distributions.
It is easily checked that the normal density function appearing in Theorem 1.1 may be replaced by the discretized normal distribution (1.2) at the cost of different constants, as we make explicit in Lemma 2.2 of Sect.2.1.However, the formulation in terms of the normal density is in keeping with the classical literature on local limit theorems such as [7].
In our proof of Theorem 1.1, we will also make use of the following uniform local limit theorem, which we prove using the same basic approach as for Theorem 1.1.Theorem 1.2Under the same setup as Theorem 1.1 but assuming only σ 2 ≥ 1, we have We will not consider the question of whether a bound of the form (1.4) is optimal.It is conceivable that one could obtain a faster decaying function of k than the exponential decay in our result.Proving optimality of any such bound would likely involve more sophisticated techniques than those used in this paper, and we leave it as an interesting open problem for future research.
The remainder of the paper is structured as follows.Section 2 covers the appropriate background material, in particular Stein's method for local limit theorems as developed in [10] required for proving our main results, as well as giving some useful auxiliary lemmas.Section 3 gives the proof of our main result, Theorem 1.1, as well as that of Theorem 1.2, while proofs of some of the auxiliary results are given in Sect. 4.

Background and auxiliary results
In this section we cover the necessary prerequisites and give some auxiliary results required to prove Theorem 1.1.Section 2.1 introduces Stein's method for normal approximation and the setup of [10] required for local limit theorems.Section 2.2 introduces the zero bias coupling, which is used to derive various concentration inequalities.Section 2.3 considers properties of the solution of the Stein equation and its derivative while, finally, Sect.2.4 introduces the K -function, which is our main technique for manipulating the Stein equation and proving Theorem 1.1.

Stein's method for local limit theorems
Let F be the set of absolutely continuous functions f : R → R such that f exists almost everywhere and E|f (Z)| < ∞ where, here and for the remainder of the paper, Z ∼ N(0, 1).Stein's method for normal approximation revolves around the following characterization of the normal distribution.A random variable W has a standard normal distribution if and only if For a proof of this characterization, see Lemma 1 of [15] or Lemma 2.1 of [3].Now let f := f h be the bounded solution of the ordinary differential equation with h ∈ H, where H is a class of test functions that will be chosen depending on the problem at hand.For example, suppose that we wish to bound the Kolmogorov distance between the random variable W , not necessarily normally distributed, and Z.The class of test functions in this case is The Kolmogorov metric gives a uniform bound on the absolute differences of the distribution functions of W and Z and is the appropriate metric to consider in order to prove the Berry-Esseen theorem [3,Chap. 3].Replacing w by W in (2.2) and taking expectations, we see that a bound on the Kolmogorov metric may be obtained from Boundedness properties of f and f together with various coupling techniques that have been developed [3, Chap.2] mean that it is often more straightforward to obtain a bound from (2.5) than to work directly with (2.3).
In order to utilize the Stein framework for our problem, we let W be a normalized version of S with mean 0 and unit variance.In particular, we let The set of test functions we consider is (2.6) and our problem is to bound The next result quantifies Eh(Z) and verifies that H defined in (2.6) is indeed an appropriate set of test functions for proving Theorem 1.1.

Lemma 2.1 Let x ∈ A n and h x
Proof Part (a) is just a restatement of Lemma 4.1 (a) in [10].We note that (a) implies (b) if |x| ≤ 1 since in this case Thus to prove (b), we may assume that |x| > 1.
If x > 1, then x -1/σ > 0, and so d > 0 and |d| = d.In this case then, As a consequence of Lemma 2.1, we have that for each and where f := f x is the bounded solution of Our problem then is reduced to bounding |E{f (W ) -Wf (W )}| with f the bounded solution of (2.10).Our approach to bounding this quantity is discussed in Sect.2.4.Before we can deal with this problem, we need to acquire some further auxiliary results in Sects.2.2 and 2.3.
We end this section by making explicit the fact that Theorems 1.1 and 1.2 imply analogous results with the discretized normal distribution replacing the normal density.

Lemma 2.2 Let Y have a discretized normal distribution with parameters μ and σ
Proof The proof follows in essentially the same way as that of Lemma 2.1, and we omit the details.

Concentration inequalities via the zero bias coupling
In this section we derive some concentration inequalities for P(a ≤ W ≤ b) and give bounds for the point probabilities for all absolutely continuous functions f such that the above expectations exist.The notion of zero biasing was introduced in [4], where the existence of Y * was established for any mean zero random variable Y .For further applications of the zero bias coupling beyond Berry-Esseen bounds in the classical central limit theorem, see [16] and [17].Throughout the remainder of this section, an asterisk * in the exponent of a random variable denotes a random variable with the corresponding zero biased distribution.
In our setting, from Lemma 2.1 of [4], W * may be constructed on the same space as W by setting W * = Wξ I + ξ * I , where I is a random index with distribution P( , and thus, as ξ i and ξ * i have the same support and W -W * = ξ Iξ * I , we have that (2.12) [10] use (2.12) to prove a nonuniform bound on the local normal approximation of W * that forms one of the key steps in the proof of their Theorem 1.1.Here, we will prove concentration inequalities for W * and use these together with (2.12) to obtain concentration inequalities for W .

.13)
We now use this to obtain uniform concentration inequalities for W and Lemma 2.3 For all a ≤ b, we have where Proof From (2.12), (2.13) and the fact that σ 2 W = Var(W ) = 1, we have which is (2.14).For (2.15) we have that ≥ 0.974, and so using this in (2.13) with as required.
Lemma 2.3 may be used to uniformly bound P(W = x), e.g., by writing P(W = x) = P(x -/2 ≤ W ≤ x + /2) for small positive and letting → 0 + .However, this approach gives a worse constant than that of [18], which we state in Lemma 2.4 below together with an analogous result for P(W (i) = x).As before, A n denotes the support of W , and we will denote the support of W (i) by A (i)  n so that Lemma 2. 4 The following uniform bound holds: ) where S (i) = S -X i .
Proof The bound (2.16) is given in Lemma 1 from [18].Now, as S (i) is also a Poisson binomial random variable, we have from (2.16) and the fact that σ ≥ 1 that for each k ∈ [0, n -1] ∩ Z, which implies (2.17).
Lemma 2.5 and Corollary 2.1 below are nonuniform versions of Lemmas 2.3 and 2.4.Before stating these results, we recall from Lemma 3.3 of [10] that for each m ∈ N we have the bound E|W | 2m ≤ p(2m), uniformly in n, where p(2m) is the number of partitions of 2m, i.e., the number of ways that 2m may be written as a sum of positive integers irrespective of order.Since p(2m) 1/2m → 1 as m → ∞ [19, Sect.6.4], it follows that, uniformly in n, The same bound holds when W is replaced by We also make use of the fact that if σ 2 ≥ A 2 then ξ i ≤ 1/A for each 1 ≤ i ≤ n, and so by Lemma 8.1 in [3] with α = 1/A and B 2 = 1, we have that for each t > 0 (2.20) In particular, letting t = 2m/(2m -1) for m ∈ N, we find that, uniformly in n, with the same bound holding when W is replaced by W (i) .

.23)
Proof Define the function g : R → R by and By Holder's inequality, for all m ∈ N, Letting m → ∞ and applying (2.19) and (2.21) with A = √ 5, we get and hence Now, as Eg (W * ) = EWg(W ), we get from (2.24) and (2.25) The assumption a ≥ 1/σ was required in (2.27) to ensure that (2.26) could be applied.If 0 ≤ a < 1/σ , then a ∈ [0, 1/ √ 5), and the result follows from Lemma 2.3 since The proof of (2.23) follows in the same way except that, as Var(W (i) ) = 1, prior to (2.26) we must use that σ 2 ) and the fact that σ 2 (2.28) Arguing as in the paragraph prior to Lemma 2.4, we obtain from Lemma 2.5 the following nonuniform bound on the point probabilities P(W = x), x ∈ A n and P(W

The Stein equation
In this section we consider the properties of the function f , which is the bounded solution to the Stein equation (2.10).For the remainder of the paper, unless otherwise stated, it may be assumed that x ∈ A n where A n is as defined in Sect.2.1.We first recall the following basic properties of f from Lemma 3.2 in [10].
Lemma 2.6 Let f := f x be the bounded solution of (2.10).Then (a) 0 ≤ f (w) ≤ 1, w ∈ (x -1/σ , x], (b) f is continuous, increasing on the interval w ∈ (x -1/σ , x] and decreasing otherwise, (c) if σ 2 ≥ 1, we have It was also shown in [10] that the term Nh appearing in (2.10) is bounded above by Cσ -1 e -|x| for some absolute positive constant C. We now quantify the value of C.
Proof For (a), we divide the proof into three cases according to whether 2 .Since e -t 2 /2 ≤ e 1/2 e -t , when t > 0, we have Nh ≤ (σ which holds for all σ > 0.
Case 3: x < -1/ √ 5.In this case we have valid for all σ > 0, where we used the fact that e -x 2 /2 ≤ e 1/2 e -|x| .This completes the proof of (a).For (b), noting that the working in Case 3 above holds whenever x ≤ 0, we have Nh ≤ We recall, from equation (3.6) of [10], that the unique bounded solution f , of (2.10), may be written as and so and (2.34) We will not need to make use of the explicit expression for f (w) for w ∈ (x -1 σ , x]; it will suffice to know that 0 ≤ f (w) ≤ 1 in this case.For w / ∈ (x -1 σ , x], we know from Lemma 2.6 (b) that f (w) < 0 and together with Lemma 2.4 in [3] we have -2 ≤ f (w) ≤ 0 in this case.Our next result, Lemma 2.8, gives more detailed bounds on |f (w)|.We first recall the standard Gaussian tail bounds [3, p. 37   (d)(i) follows in essentially the same way as (b)(ii) but now using (2.33) with Nh ≤ (σ √ 2π ) -1 e -x 2 /2 and e -7t 2 /32 ≤ e 8/7 e -|t| .(d)(ii) follows in essentially the same way as (b)(iii) but using (2.33).For (d)(iii), if w ≤ x -1/σ then w < 0, and we get the result in this case as in (c), while for w > 0, we get the result as in (a).
We now use Lemma 2.8 to give O(1/σ ) bounds on |Ef ( W )| when W is a random variable that is sufficiently close to W . Lemma 2.9 If σ 2 ≥ 1 and W is a random variable strictly between W (i)p i /σ and W

The K-function
As discussed in Sect.2.1, our problem reduces to bounding |E{f (W ) -Wf (W )}|, where f is the bounded solution of the Stein equation (2.10).To this end, define the functions K i , 1 ≤ i ≤ n, by By Fubini's theorem we find that and so

Proofs of main results
We now give our proofs of Theorems 1.1 and 1.2, starting with Theorem 1.2, which is then used to simplify the proof of Theorem 1.1.We use the K -function approach and notation from Sect.2.4.Our problem is to bound the four terms (2.46)-(2.49),and we will consider each term in turn.

Proof of Theorem 1.2
Bounding ( 2.46): For (2.46), there is a random W between W (i) + ξ i and Since for each i we only need to consider t such that |ξ i -t| ≤ 1/σ , we may bound (2.46) as Bounding ( 2.47): As ξ i = (1p i )/σ with probability p i and ξ i = -p i /σ with probability 1p i , we have that for some random variable W strictly between W (i) + (1p i )/σ and W (i)p i /σ .Now, by Lemma 2.9 (a) and the fact that p(1p) ∈ (0, 1/4] when p ∈ (0, 1), (2.47) may be bounded as Bounding ( 2.49): We first find an expression for the functions K i , 1 ≤ i ≤ n.Since P(ξ i = (1p i )/σ ) = p i and P(ξ i = -p i /σ ) = 1p i , we have and hence Now we consider the value of and let n .Then we have x = x (i)p i /σ with |p i | < 1, and so Thus we have where where I is a random index with distribution P( Thus we may bound (2.49) as with the inequality following from the proof of Theorem 1.1 of [10].From (2.16) we get that as required.

Proof of Lemma 2.9
Proof Throughout the proof we set A 1 = (-∞, x -1/σ ], A 2 = (x -1/σ , x], and A 3 = (x, ∞).We will bound Ef ( W ) in two steps, first considering the case where W ∈ A 2 and then W / ∈ A 2 .We also use the facts from Lemma 2.6 that f (w) ≤ 0 when w / ∈ A 2 and f (w) ≥ 0 To see this latter fact, recall that W (i) takes values in the set A (i) n = A n + p i /σ , i.e., the support of W (i) equals that of W translated by p i /σ .Thus, for example, we cannot have W (i) = x + p i /σ since then W would lie in the interval (x, x + 1/σ ) contradicting W ∈ A 2 .From (2.17) we have that and we note that this holds for any x ∈ A n .Case 2: From Lemma 2.8 (a) and (b) and the fact that f (w) < 0 for w ∈ A 1 ∪ A 3 , we have that f (w) ∈ (-|w|/σ -Nh, 0] when w ∈ A 1 ∪ A 3 .Applying the Cauchy-Schwarz inequality gives In this case Lemma 2.8 (a) and (c) provide a tighter bound on |f (w)|, w ∈ A 1 ∪ A 3 , than in Subcase 2.1 so that (4.2) still holds.
Subcase 2.3: x < 0. In this case applying Lemma 2.8 (d) shows that f (w) ∈ (-|w|/σ -Nh, 0] for w ∈ A 1 ∪ A 3 , and the result follows in the same way as when x ≥ 0. Thus from each subcase we see that (4.1) and (4.2) hold for all x ∈ A n , which gives the result.
In this case we have from Lemma 2.6 that 0 ≤ f ( W ) ≤ 1, and as in the proof of part (a), where Now, by Holder's inequality, we have for each p ∈ N that Letting p → ∞ and applying (2.19) and (2.20), we find The result follows in a similar way when x ≤ -1/σ .
(a) We first consider the case where x ≥ 1/σ .For each p ∈ N, we have, by Hölder's inequality, that (4.7) We will bound the second factor appearing on the right of (4.7) separately for W / ∈ A 2 and W ∈ A 2 , starting with the case W / ∈ A 2 .
As in the proof of Lemma 2.9 (a), we note that when 0 ≤ x < 1/σ , then Lemma 2.8 (a) and (c) provide a tighter bound on |f (w)|, w ∈ A 1 ∪ A 3 .This together with the fact that (4.9) holds for all x ∈ A n gives the result when 0 ≤ x < 1/σ .The case x < 0 is dealt with in a similar way using part (d) of Lemma 2.8.
(b) Our strategy is slightly different than for part (a).We will consider the contributions to EW (i) f ( W ) when W is in A 1 , A 2 , and A 3 .As f ( W ) is positive when W ∈ A 2 and negative otherwise, together with the fact that | W -W (i) | ≤ 1/σ , we will be able to keep track of the signs of the various contributions and obtain some partial cancellation that would not be possible with a simple use of Hölder's inequality as in part (a).

.15)
For the second term in (4.11) we have Thus, as the second term in (4.11) is negative, we have and so together with (4.15) this implies

Proof of Lemma 2.11
We assume that x ≥ 0 and the sets A 1 , A 2 , and A 3 are as in the proofs of the previous lemmas.From (2.32), we see that f is negative on A 1 and positive on A 3 .From (2.32), the tail bound (2.35), and Lemma 2.7, we have that 0 ≤ f W (i) + t 1 A 3 W (i) + t ≤ Nh (W (i) + t) Finally, when W (i) + t ∈ A 2 , we have that W (i) = x -1/σ + p i /σ , and so then d TV L(S), L(Y ) := sup A⊂R P(S ∈ A) -P(Y ∈ A) ≤ e w (wa), a ≤ w ≤ b, e w (ba), w > b, for which we have g (w) ≥ 0, for w ∈ R and g (w) ≥ e a , for w ∈ [a, b].We also have 0 ≤ g(w) ≤ (ba)e w for w ∈ R. It follows that Eg W * ≥ Eg W * 1 [a,b] W * ≥ e a P a ≤ W * ≤ b (2.24)