Skip to main content

A nonuniform local limit theorem for Poisson binomial random variables via Stein’s method

Abstract

We prove a nonuniform local limit theorem concerning approximation of the point probabilities \(P(S=k)\), where \(S=\sum_{i=1}^{n}X_{i}\), and \(X_{1},\ldots ,X_{n}\) are independent Bernoulli random variables with possibly different success probabilities. Our proof uses Stein’s method, in particular, the zero bias transformation and concentration inequality approaches.

1 Introduction

In [1], Charles Stein introduced a powerful new method for bounding the approximation error in the central limit theorem and other normal approximations. Let \(X_{1},X_{2}\ldots ,X_{n}\) be independent random variables with finite third absolute moments, standardized so that \(\mathbb{E}X_{i} = 0\), \(1\leq i \leq n\), and \(\operatorname{Var}(\sum_{i=1}^{n}X_{i}) = 1\). Stein’s method yields a characteristic function-free proof of the Berry–Esseen theorem, i.e., that there exists an absolute constant C such that

$$ \sup_{x\in \mathbb{R}} \bigl\vert P(W\leq x) - \Phi (x) \bigr\vert \leq C\gamma , $$
(1.1)

where \(W=\sum_{i=1}^{n}X_{i}\), \(\gamma = \sum_{i=1}^{n}\mathbb{E}|X_{i}|^{3}\), and Φ is the standard normal distribution function. See Theorem 3.6 of [2] for a proof with absolute constant \(C=9.4\).

Nonuniform versions of the Berry–Esseen theorem, which are more informative than (1.1), have also been obtained by Stein’s method. For example, Chen and Shao [3] improve on the earlier bound of

$$ \bigl\vert P(W \leq x) - \Phi (x) \bigr\vert \leq \frac{C\gamma}{1 + \vert x \vert ^{3}} $$

from [4] and show that a similar result holds if we only assume the existence of second moments of the \(X_{i}\).

Local limit theorems, which quantify the accuracy of a normal approximation for the point probabilities \(P(S=k)\), \(k\in \mathbb{Z}\), when S is a sum of integer-valued random variables, have seldom been considered using Stein’s method. Suppose now that \(X_{1},X_{2}\ldots ,X_{n}\) are integer-valued random variables and set \(S = \sum_{i=1}^{n}X_{i}\). Formally, S is said to satisfy the local limit theorem if the quantity

$$ \triangle = \sup_{k\in \mathbb{Z}}\biggl\vert P(S = k) - \frac{1}{\sigma \sqrt{2\pi}}\exp \biggl\{ - \frac{(k-\mu )^{2}}{2\sigma ^{2}} \biggr\} \biggr\vert $$
(1.2)

satisfies \(\triangle = o(1/\sigma )\), where \(\mu = \mathbb{E}S\) and \(\sigma ^{2} = \operatorname{Var}(S)\). Here and throughout, we suppress the obvious dependence in n of quantities such as S, μ, and \(\sigma ^{2}\).

To understand the requirement that \(\triangle = o(1/\sigma )\), observe that (1.1) gives a bound on the difference between the distribution functions of \(W=(S-\mu )/\sigma \) and \(Z\sim N(0,1)\) that is proportional to \(\sigma ^{-3}\sum_{i=1}^{n}\mathbb{E}|X_{i}-\mu _{i}|^{3}\), where \(\mu _{i}=\mathbb{E} X_{i}\). In the typical situation where \(\sum_{i=1}^{n}\mathbb{E}|X_{i}-\mu _{i}|^{3} = O(n)\) and \(\sigma ^{-2} = O(n^{-1})\), the Berry–Esseen bound is \(O(1/\sigma )\), and it can also be shown that \(\triangle = O(1/\sigma )\) in this case. Thus the requirement that \(\triangle = o(1/\sigma )\) in the local limit theorem serves to ensure more refined information than is immediately available from the Berry–Esseen bound.

A historical overview of local limit theorems can be found in [5], where it is noted that they predate central limit theorems. The earliest such result, the DeMoivre–Laplace theorem [6, 7], establishes the local limit theorem in the case of sums of identically distributed Bernoulli random variables, i.e., where S is binomially distributed. In this case the definition of in (1.2) is modified by taking the supremum over \(k\in [0,n]\cap \mathbb{Z}\). The DeMoivre–Laplace theorem is also considered in [8], where it is shown that \(\triangle \leq 0.516/\sigma ^{2}\). Chapter 7 of [9] shows that under mild conditions, when S is a sum of independent integer-valued random variables, an approximation error of \(\triangle = O(1/\sigma ^{2})\) is optimal. Siripraparat and Neammanee [10] establish the optimal error of \(\triangle = O(1/\sigma ^{2})\) in the case of independent but not necessarily identically distributed Bernoulli random variables and give explicit constants in their bound. Siripraparat and Neammanee [11] generalize this work to sums of arbitrary independent integer-valued random variables. The proofs of these results typically involve Fourier analysis of characteristic functions.

Although Barbour et al. [12] use Stein’s method to prove local limit theorems, they consider a rather more general setup, which does not restrict them to local approximation of sums of integer-valued random variables. Consequently, the bounds they obtain are more complicated than expected when one considers sums, and when applied to the particular case of sums of independent integer-valued random variables, they do not yield the optimal rate of \(\triangle = O(1/\sigma ^{2})\), although the authors suggest how their methods can be adapted to yield the optimal rate in this case. Fang [13] uses Stein’s method to give bounds for the total-variation distance between an integer-valued random variable and a discretized normal distribution, although bounds in the local metric are not considered. Barbour and Choi [14] consider approximating the distribution of sums of integer-valued random variables by a translated Poisson distribution. They obtain nonuniform bounds in the total-variation metric that are roughly analogous to the classical results of [4] and [3] for the central limit theorem, although they do not consider local limit theorems.

All of the above studies involving local limit theorems consider only uniform bounds, so that some information regarding the quality of the normal approximation of \(P(S=k)\) for a specific fixed k is lost. In this paper, we prove a nonuniform local limit theorem when \(X_{1},X_{2},\ldots ,X_{n}\) are independent but not necessarily identically distributed Bernoulli random variables. In this case, \(S=\sum_{i=1}^{n}X_{i}\) is said to be a Poisson binomial random variable and follows a Poisson binomial distribution. Poisson binomial random variables, introduced in [15], have been used in a wide range of contexts from finance [16], reliability analysis [17], and machine learning [18, 19], to name a few. A survey of the Poisson binomial distribution may be found in [20]. We now state our main result.

Theorem 1.1

Let \(X_{1}, X_{2},\ldots , X_{n}\) be jointly independent Bernoulli random variables such that \(P(X_{i}=1) = 1-P(X_{i} = 0)=p_{i} \in (0,1)\), and let \(S = \sum_{i=1}^{n}X_{i}\), \(\mu = \mathbb{E} S\), and \(\sigma ^{2} = \operatorname{Var}(S)\). If \(\sigma ^{2} \geq 1\), then for each \(k\in [0,n]\cap \mathbb{Z}\),

$$ \biggl\vert P(S = k) - \frac{1}{\sigma \sqrt{2\pi}}\exp \biggl\{ - \frac{(k-\mu )^{2}}{2\sigma ^{2}} \biggr\} \biggr\vert \leq \frac{Ce^{- \vert \frac{k-\mu}{\sigma} \vert }}{\sigma ^{2}} $$
(1.3)

for some positive absolute constant C.

Our proof of Theorem 1.1 uses a combination of the zero-bias transformation of [21], the concentration inequality approach of [3], and some ideas from Chap. 7 of [2]. Our proof may also be modified to give the classical uniform local limit theorem with an explicit constant. Although we do not pursue this direction here, we intend to in a future note.

The remainder of the paper is structured as follows. Section 2 reviews the necessary background in Stein’s method and the zero-bias framework required for the remainder of the paper. Section 3 gives some auxiliary technical results needed for the proof of Theorem 1.1 in Sect. 4. The Appendix gives the proof of a lemma that is stated in Sect. 3.

2 Stein’s method

The starting point of Stein’s method is the following characterization of the standard normal distribution. If the random variable W has a standard normal distribution, then

$$ \mathbb{E} f'(W) - \mathbb{E} Wf(W) = 0 $$
(2.1)

for all absolutely continuous functions \(f:\mathbb{R}\to \mathbb{R}\) with \(\mathbb{E} |f'(W) | < \infty \). Conversely, if (2.1) holds for all bounded, continuous, and piecewise continuously differentiable functions f with \(\mathbb{E} |f'(Z) | < \infty \), \(Z\sim N(0,1)\), then W has a standard normal distribution.

Intuitively, if W is approximately standard normal, then for \(Z\sim N(0,1)\), \(\mathbb{E}h(W) -\mathbb{E}h(Z)\) should be close to zero for h in a sufficiently large class of test functions. Also, if W is in some sense close to Z, then from (2.1), \(\mathbb{E}f'(W)-\mathbb{E}Wf(W)\) should be close to zero. These two observations lead to consideration of the ordinary differential equation

$$ f'(w) - wf(w) = h(w) - \mathbb{E}h(Z), $$
(2.2)

known as the Stein equation, which may be solved for f by the method of integrating factors. For a given choice of h, with f the solution of the Stein equation (2.2), we see that bounding \(\mathbb{E}h(W) - \mathbb{E}h(Z)\) is equivalent to bounding \(\mathbb{E}\{f'(W) - Wf(W)\}\). The latter expectation often turns out to be easier to bound than the former, particularly when W is a sum of random variables.

To analyze the approximation error in the central limit theorem when W is a sum or sample mean, fix \(x\in \mathbb{R}\), and take h(w)= 1 ( , x ] (w). Then, replacing w by W and taking expectations, the right-hand side of (2.2) becomes \(P(W\leq x) - \Phi (x)\), which is the desired difference to be analyzed. The class of test functions to be used for our problem is identified in Sect. 3.1.

We will use the zero-bias framework of [21], which defines a new random variable \(W^{*}\) on the same space as W to assess the proximity of W to a normal random variable. When \(W\sim N(0, \sigma ^{2}_{ W})\), the characterizing equation analogous to (2.1) is \(\mathbb{E}Wf(W) = \sigma ^{2}_{ W}\mathbb{E}f'(W)\). This motivates the following definition.

Definition 1

Let W be a zero-mean random variable with finite nonzero variance \(\sigma ^{2}_{ W}\). We say that \(W^{*}\) has the W-zero bias distribution if for all differentiable f for which \(\mathbb{E}Wf(W)\) is finite,

$$ \mathbb{E}Wf(W) = \sigma ^{2}_{ W}\mathbb{E}f' \bigl(W^{*}\bigr). $$

Goldstein and Reinert [21] prove the existence and uniqueness of \(W^{*}\). Regarding the zero-bias transformation as the mapping \(W\to W^{*}\), a random variable with \(N(0, \sigma ^{2}_{ W})\) distribution is seen to be the unique fixed point of this transformation. If W is in some sense close to \(W^{*}\), then we expect W to be approximately normally distributed. Indeed, a key step in proving the nonuniform bound for S in Theorem 1.1 is showing that an analogous approximation holds for \(W^{*}\) when W is a sum of appropriately centered and scaled Bernoulli random variables.

An important example for our problem is when X is Bernoulli with \(P(X=1)=1-P(X=0)=p\). Although \(\mathbb{E}X = p\), so that \(X^{*}\) does not exist, we may calculate \((X-p)^{*}\) as follows. Letting \(Y=X-p\), which has the variance \(\sigma ^{2}_{ Y}=p(1-p)\), we have

$$\begin{aligned} \mathbb{E}Yf(Y) &= \mathbb{E}\bigl[(X-p)f(X-p)\bigr] = p(1-p)f(1-p)-(1-p)pf(-p) \\ & = \sigma ^{2}_{ Y}\bigl[f(1-p) - f(-p)\bigr] =\sigma ^{2}_{ Y} \int _{-p}^{1-p}f'(u)\,du = \sigma ^{2}_{ Y}\mathbb{E}f'(U), \end{aligned}$$

where U is uniformly distributed on \([-p, 1-p]\), and thus \((X-p)^{*}\overset{d}{=}U[-p, 1-p]\).

A useful and easily verified property of the zero-bias transformation is that if \(\mathbb{E}X=0\) and \(a\neq 0\), then \((aX)^{*}=aX^{*}\) [2, p. 29]. Note now, for later use, that if \(X \sim \text{Bernoulli}(p)\), then for \(\sigma > 0\),

$$ \biggl(\frac{X-p}{\sigma} \biggr)^{*} \sim U \biggl[ \frac{-p}{\sigma}, \frac{1-p}{\sigma} \biggr], $$
(2.3)

where \(U[-p/\sigma , (1-p)/\sigma ]\) is the uniform distribution on the interval \([-p/\sigma , (1-p)/\sigma ]\).

The following fundamental result from [21] shows how \(W^{*}\) may be obtained when W is a sum of independent zero-mean random variables.

Lemma 2.1

Let \(\xi _{i}\), \(1\leq i \leq n\), be independent zero-mean random variables with \(\operatorname{Var}(\xi _{i})=\sigma _{i}^{2}\), and set \(W=\sum_{i=1}^{n}\xi _{i}\). If \(\operatorname{Var}(W) =\sigma ^{2}_{ W}\), then let I be a random index independent of the \(\xi _{i}\) such that \(P(I=i)=\sigma _{i}^{2}/\sigma ^{2}_{ W}\). Then

$$ W^{*} \overset{d}{=} W - \xi _{I} + \xi ^{*}_{I}. $$
(2.4)

3 Auxiliary results

3.1 Selection of test functions and the Stein equation

In this section, we lay out our setup that is required to prove Theorem 1.1 using Stein’s method. As S has mean μ and variance \(\sigma ^{2}\), comparing its distribution to that of a standard normal random variable is inappropriate. For a random variable W with \(N(\mu , \sigma ^{2})\) distribution, the characterizing equation analogous to (2.1) is

$$ \sigma ^{2}\mathbb{E} f'(W) - \mathbb{E} \bigl[(W- \mu )f(W)\bigr]= 0. $$
(3.1)

Working with (3.1) and deriving properties of the solution f are more awkward than for (2.1) and moreover do not allow us to use the zero-bias framework. For these reasons, we temporarily reduce our problem to the zero-mean unit-variance case.

Thus, for the remainder of Sect. 3, we let \(W = (S-\mu )/\sigma \), so that \(W = \sum_{i=1}^{n}\xi _{i}\) with \(\xi _{i} = (X_{i}-p_{i})/\sigma \) and \(X_{i} \sim \text{Bernoulli}(p_{i})\). Then \(\mathbb{E} W= 0\), \(\operatorname{Var}(W) = 1\), and W takes values in the set \(A_{n} := \{(k-\mu )/\sigma : k \in [0,n]\cap \mathbb{Z}\}\). We may then write the difference to be analyzed in Theorem 1.1 as

$$ \biggl\vert P(S = k) - \frac{1}{\sigma \sqrt{2\pi}}\exp \biggl\{ - \frac{(k-\mu )^{2}}{2\sigma ^{2}} \biggr\} \biggr\vert = \biggl\vert P(W = x) - \frac{1}{\sigma \sqrt{2\pi}}e^{-x^{2}/2} \biggr\vert , $$
(3.2)

where \(x=(k-\mu )/\sigma \in A_{n}\) with \(k\in [0,n]\cap \mathbb{Z}\). Observing (3.2), we would like to select a test function h such that \(\mathbb{E} h(W) - \mathbb{E}h(Z) = P(W = x) - \phi (x)/\sigma \), where ϕ is the standard normal density function. To this end, for a given \(x\in \mathbb{R}\), define h x (w)= 1 ( x 1 σ , x ] (w). Then, for \(x = (k-\mu )/\sigma \in A_{n}\), we have \(\mathbb{E} h_{x}(W) = P(x-1/\sigma < W \leq x) = P(W=x) = P(S = k)\). We would also like to have \(\mathbb{E} h_{x}(Z) = \phi (x)/\sigma \) when \(Z\sim N(0,1)\). Although this is not the case, we may show, as in the proof of Lemma 4.1, that \(\mathbb{E} h_{x}(Z)\) is equal to \(\phi (x)/\sigma \) plus a remainder of magnitude \(O(1/\sigma ^{2})\).

Having identified the appropriate class of test functions, we are then led to consider the corresponding Stein equation

f x (w)w f x (w)= 1 ( x 1 σ , x ] (w)N h x ,
(3.3)

where here and throughout, \(Nh_{x} := \mathbb{E} h_{x}(Z)\), \(Z\sim N(0,1)\), and h x (w)= 1 ( x 1 / σ , x ] (w). Using the method of integrating factors, the unique bounded solution to (3.3) may be written in the two equivalent forms:

f x (w)= e w 2 / 2 w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 dt
(3.4)
= e w 2 / 2 w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 dt.
(3.5)

The equivalence of (3.4) and (3.5) follows from the fact that the difference in these two expressions is ( 2 π ) e w 2 / 2 E{ 1 ( x 1 σ , x ] (Z)N h x }=0.

Expanding out the integrals in terms of Φ, the standard normal distribution function, we may write \(f_{x}\) as

$$ f_{x}(w) = \textstyle\begin{cases} (\sqrt{2\pi})Nh_{x} e^{w^{2}/2} [1 - \Phi (w) ], \quad & w > x, \\ (\sqrt{2\pi})e^{w^{2}/2} [\Phi (w)(1 - \Phi (x)) - \Phi (x- \frac{1}{\sigma})(1-\Phi (w)) ], \quad & w \in (x - \frac{1}{\sigma}, x], \\ -(\sqrt{2\pi})Nh_{x}e^{w^{2}/2}\Phi (w), \quad & w \leq x - \frac{1}{\sigma}. \end{cases} $$
(3.6)

As \(h_{x}\) is a bounded function and \(|h_{x}(w) - Nh_{x}| \leq 1\) for all w, we have, by Lemma 2.4 of [2], that \(|f_{x}(w)| \leq \sqrt{\pi /2}\) and \(| f_{x}'(w)| \leq 2\) for all w. In Lemma 3.2, we derive some more useful bounds for \(f_{x}\) given our specific choice of \(h_{x}\). We first give a symmetry property of the solution of the Stein equation, which is useful in simplifying some calculations.

Lemma 3.1

The solution \(f_{x}\) of the Stein equation (3.3) satisfies

$$\begin{aligned} f_{-x+1/\sigma}(-w) &= -f_{x}(w). \end{aligned}$$
(3.7)

Proof

With \(Nh_{x} = P(x-1/\sigma < Z \leq x)\), using the fact that \(Z\overset{d}{=}-Z\) and Z is continuous, we have

$$ Nh_{-x+1/\sigma} = P(-x < Z \leq -x+1/\sigma ) = P(x-1/\sigma \leq -Z < x) = P(x-1/\sigma < Z \leq x) = Nh_{x}. $$

Writing the solution of the Stein equation as in (3.4), i.e.,

f x (w)= e w 2 / 2 w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 dt,

we get that

f x + 1 / σ ( w ) = e w 2 / 2 w [ 1 ( x , x + 1 σ ] ( t ) N h x + 1 σ ] e t 2 / 2 d t = e w 2 / 2 w [ 1 [ x 1 σ , x ) ( u ) N h x ] e u 2 / 2 d u = e w 2 / 2 w [ 1 ( x 1 σ , x ] ( u ) N h x ] e u 2 / 2 d u = f x ( w )

by (3.5). □

We now give some basic bounds and properties of the function \(f_{x}\) and the term \(Nh_{x}\) appearing in the Stein equation (3.3).

Lemma 3.2

For \(x\in \mathbb{R}\), let \(f_{x}\) be the solution of (3.3). Then

  1. (a)

    \(0 \leq f'_{x}(w) \leq 1\), \(w \in (x-1/\sigma , x]\),

  2. (b)

    \(f_{x}\) is continuous, increasing on the interval \(w \in (x-1/\sigma , x] \), and decreasing otherwise,

  3. (c)

    if \(\sigma ^{2} \geq 1\), then

    $$ \bigl\vert f_{x}(w) \bigr\vert \leq \frac{1}{\sigma}, \quad w\in \mathbb{R}, $$
    (3.8)
  4. (d)

    if \(\sigma ^{2} \geq 1\), then

    $$ Nh_{x} = P(x-1/\sigma < Z \leq x) \leq \frac{Ce^{-|x|}}{\sigma} $$

    for some positive absolute constant C.

Proof

The proof is given in the Appendix. □

Remark 1

Although in proving Theorem 1.1, we will choose x to be of the form \(x=(k-\mu )/\sigma \), \(k\in [0,n]\cap \mathbb{Z}\), the bounds of Lemma 3.2 hold for arbitrary \(x\in \mathbb{R}\).

3.2 Concentration inequalities and other bounds

In this section, we derive some useful bounds and concentration inequalities that will be used in the proofs of Theorems 1.1 and 3.1. Theorem 3.1 provides a bound on the error in the local normal approximation of \(W^{*}\) and is crucial in our proof of Theorem 1.1. As in the statement of Theorem 1.1, we also assume that \(\sigma ^{2} \geq 1\). For the remainder of the paper, throughout the proofs, C denotes an absolute constant that may take different values in different places.

As in our problem W is a bounded random variable, \(\mathbb{E} W^{q}\) is finite for all \(q \geq 1\), and since \(\mathbb{E}W=0\) and \(\operatorname{Var}(W)=1\), it is trivial that \(\mathbb{E} W^{2} = 1\) for all n. Lemma 3.3 below gives a bound on \(\mathbb{E} W^{q}\), which is uniform in n when q is even.

Lemma 3.3

For all \(n\in \mathbb{N}\), we have

$$ \mathbb{E} W^{2m} \leq p(2m),\quad m\in \mathbb{N}, $$
(3.9)

where \(p(2m)\) denotes the number of partitions of 2m, i.e., the number of ways that 2m may be expressed as a sum of positive integers irrespective of order.

Proof

Let \(\mathcal{P}_{1}(2m)\) denote the set of partitions of 2m that do not contain 1. So \((r_{1},\ldots ,r_{k}) \in \mathcal{P}_{1}(2m)\) if and only if \(r_{1},\ldots , r_{k} \in \mathbb{N} \setminus \{1\}\) and \(\sum_{j=1}^{k}r_{j} = 2m\).

By expanding \(W^{2m}\) and taking expectations we see that each \((r_{1},\ldots ,r_{k}) \in \mathcal{P}_{1}(2m)\) gives a nonzero contribution to \(\mathbb{E} W^{2m}\) of the form

$$ \sum_{(i_{1},\ldots ,i_{k})\in \mathcal{A}_{k}^{n}}\prod_{j=1}^{k} \mathbb{E} \xi _{i_{j}}^{r_{j}}, $$
(3.10)

where \(\mathcal{A}_{k}^{n}\) is the set of k-tuples \((i_{1},\ldots ,i_{k})\) with positive integer entries of at most n, such that no two elements of the k-tuple are the same. Since \(\mathbb{E} \xi _{i} = 0\) for each i, we do not need to consider the terms where any \(r_{j} = 1\) as these terms give zero contribution to \(\mathbb{E}W^{2m}\). Also, since \(|\xi _{i}| \leq 1\) for each i, we have \(|\xi _{i_{j}}|^{r_{j}} \leq \xi _{i_{j}}^{2}\), and so (3.10) is bounded in absolute value by \((\sum_{i=1}^{n}\mathbb{E} \xi _{i}^{2})^{k} = 1\).

As the number of terms of the form (3.10) in the expansion of \(\mathbb{E} W^{2m}\) is no more than \(p(2m)\), we have \(\mathbb{E} W^{2m} \leq p(2m)\), as required. □

Lemma 3.4

There exists an absolute constant C such that for all n,

$$ \mathbb{E} e^{(1+|W|)^{2}} \leq C. $$
(3.11)

Proof

First, we show that there is an absolute constant C such that \(\mathbb{E} e^{W^{2}} \leq C\) for all n. By a Taylor expansion we have

$$ \mathbb{E} e^{W^{2}} = 1 + \sum_{m=1}^{\infty} \frac{\mathbb{E}W^{2m}}{m!} \leq 1 + \sum_{m=1}^{\infty} \frac{p(2m)}{m!} $$
(3.12)

by Lemma 3.3. The second sum in (3.12) does not depend on n, and it converges by the root test and the fact that \(\lim_{m\to \infty}p(2m)^{1/2m}=1\) [22, Sect. 6.4]. Thus we have that, uniformly in n, \(\mathbb{E} e^{W^{2}} \leq C < \infty \). Since \((1+|W|)^{2m} \leq 2^{2m}(1+|W|^{2m})\), we have \(\mathbb{E}(1+|W|)^{2m} \leq \max \{2^{2m+1}, 2^{2m+1}\mathbb{E} |W|^{2m} \}\). Again, applying Lemma 3.3 and the root test to the Taylor expansion of \(\mathbb{E} e^{(1+|W|)^{2}}\) give the result. □

Remark 2

As W is a bounded random variable, it is trivial that the expectation in (3.11) is finite. However, as the support of W depends on n, the utility of Lemma 3.4 is that we can bound the expectation by the same constant for all n.

We now give two concentration inequalities that are used in the proofs of Theorems 1.1 and 3.1.

Lemma 3.5

For all \(a \leq b < \infty \),

$$ P(a \leq W \leq b) \leq b - a + \frac{2}{\sigma}, $$
(3.13)

and

$$ P(a \leq W \leq b) \leq C \biggl(b-a + \frac{1}{\sigma} \biggr)e^{-2a} $$
(3.14)

for some absolute positive constant C.

Proof

(3.13) follows immediately from Proposition 3.1 from [2] upon noting that since \(\sigma ^{2} \geq 1\), we have 1 [ | ξ i | > 1 ] 0 and 1 [ | ξ i | 1 ] 1, and thus since \(|X_{i} - p_{i}| \leq 1\),

β 3 := i = 1 n E| ξ i | 3 1 [ | ξ i | 1 ] = 1 σ 3 i = 1 n E| X i p i | 3 1 σ 3 i = 1 n E| X i p i | 2 = 1 σ .

The proof of (3.14) is similar to that of Lemma 3.1 and Proposition 8.1 from [2], and so we only give the essential differences.

Set \(\gamma = \sum_{i=1}^{n}\mathbb{E}|\xi _{i}|^{3}\), so that we have \(\gamma = \sigma ^{-3}\sum_{i=1}^{n}\mathbb{E}|X_{i}-\mu _{i}|^{3} \leq \sigma ^{-3}\sum_{i=1}^{n}\mathbb{E}|X_{i}-\mu _{i}|^{2} = 1/ \sigma \). We observe that for each \(t>0\), \(\mathbb{E}e^{tW} \leq A < \infty \), for some constant A, which depends only on t. This may be verified in a similar way to the proof of Lemma 3.4 or by applying Lemma 8.1 of [2].

Let \(\delta =\gamma /2\) and define

$$ f(w) = \textstyle\begin{cases} 0 & \text{if } w < a - \delta , \\ e^{2w}(w-a+\delta ) & \text{if } a-\delta \leq w \leq b + \delta , \\ e^{2w}(b-a+2\delta ) & \text{if } w > b+\delta , \end{cases} $$

for which \(f'(w) \geq 0\) and \(f'(w) \geq e^{2w}\) for \(w\in (a-\delta , b+\delta )\). Arguing in the same way as in the lead up to (8.16) in [2], we find

$$ \mathbb{E} Wf(W) \geq e^{-2\delta}(H_{1} - H_{2}), $$
(3.15)

where

H 1 = [ E e 2 W 1 [ a , b ] ( W ) ] [ i = 1 n E | ξ i | min ( δ , | ξ i | ) ]

and

$$ H_{2} = \mathbb{E} \Biggl[e^{2W} \Biggl\vert \sum _{i=1}^{n} \bigl\{ \vert \xi _{i} \vert \min \bigl(\delta , \vert \xi _{i} \vert \bigr) - \mathbb{E} \vert \xi _{i} \vert \min \bigl( \delta , \vert \xi _{i} \vert \bigr) \bigr\} \Biggr\vert \Biggr]. $$

Using \(\min (x,y) \geq y - y^{2}/4x\) for \(x > 0\) and \(y > 0\), we get

$$ \sum_{i=1}^{n}\mathbb{E} \vert \xi _{i} \vert \min \bigl(\delta , \vert \xi _{i} \vert \bigr) \geq \sum_{i=1}^{n} \bigl\{ \mathbb{E}\xi _{i}^{2} - \mathbb{E} \vert \xi _{i} \vert ^{3}/4 \delta \bigr\} = 1/2, $$

so that \(H_{1} \geq \frac{1}{2}e^{2a}P(a\leq W \leq b)\).

Also, arguing as in [2], we may bound \(H_{2}\) as \(H_{2} \leq C \delta \), where C is an absolute constant.

On the other hand, since \(0 \leq f(w) \leq e^{2w}(b-a+2\delta ) \) for all w, by the Cauchy–Schwarz inequality we have

$$ 0 \leq \mathbb{E}Wf(W) \leq (b-a+2\delta ) \bigl(\mathbb{E}W^{2} \bigr)^{1/2}\bigl( \mathbb{E}e^{4W}\bigr)^{1/2} \leq C(b-a+2\delta ), $$
(3.16)

where C is an absolute constant.

Rearranging (3.15) as \(H_{1} \leq e^{2\delta}\mathbb{E}Wf(W) + H_{2}\) and using our bounds for \(H_{1}\) and \(H_{2}\) together with (3.16) imply that

$$ \frac{1}{2}e^{2a}P(a \leq W \leq b) \leq C(b-a+\delta ), $$
(3.17)

which in turn implies (3.14) since \(\gamma \leq 1/\sigma \). □

Remark 3

It is easy to check that (3.14) holds in the more general case that \(X_{1}, X_{2},\ldots \) is a uniformly bounded sequence, i.e., where there exists a constant A such that \(|X_{i}| \leq A\) for all i. If a and b are both negative, then the bound implied by (3.14) may not be useful as the factor \(e^{-a}\) may be large. In this case, as −W is also a sum of mean zero uniformly bounded random variables, we have

$$ P(a \leq W \leq b) = P\bigl( \vert b \vert \leq -W \leq \vert a \vert \bigr) \leq C \biggl(b-a + \frac{1}{\sigma} \biggr)e^{-2 \vert b \vert }. $$
(3.18)

3.3 Local approximation of \(W^{*}\)

We now prove a nonuniform bound concerning normal approximation of \(W^{*}\) that forms a key step in our proof of Theorem 1.1.

Theorem 3.1

If h x (w)= 1 ( x 1 σ , x ] (w), \(x\in \mathbb{R}\), then

$$ \bigl\vert \mathbb{E}h_{x}\bigl(W^{*}\bigr) - Nh_{x} \bigr\vert \leq \frac{Ce^{- \vert x \vert }}{\sigma ^{2}} $$

for some positive absolute constant C.

Proof

Throughout the proof, we write f and \(f'\) in place of \(f_{x}\) and \(f'_{x}\), where \(f_{x} \) is the solution of the Stein equation (3.3). From (3.3) we have

$$\begin{aligned} \bigl\vert \mathbb{E}h_{x}\bigl(W^{*}\bigr) - Nh_{x} \bigr\vert & = \bigl\vert \mathbb{E}\bigl\{ f'\bigl(W^{*}\bigr) - W^{*}f \bigl(W^{*}\bigr) \bigr\} \bigr\vert = \bigl\vert \mathbb{E}\bigl\{ Wf(W) - W^{*}f\bigl(W^{*}\bigr)\bigr\} \bigr\vert \\ & \leq \bigl\vert \mathbb{E}\bigl\{ W\bigl[f(W) - f\bigl(W^{*} \bigr) \bigr] \bigr\} \bigr\vert + \bigl\vert \mathbb{E}\bigl\{ f \bigl(W^{*}\bigr)\bigl[W-W^{*}\bigr] \bigr\} \bigr\vert . \end{aligned}$$
(3.19)

For the first term of (3.19), we use the fact there exists a random between W and \(W^{*}\) such that \(f(W)-f(W^{*})=f'(\bar{W})(W-W^{*})\). From (2.4), \(W-W^{*}=\xi _{I}^{*} - \xi _{I}\), and hence by (2.3), \(|W-W^{*}| \leq 1/\sigma \). Therefore we may bound the first term of (3.19) as

|EW [ f ( W ) f ( W ) ] | 1 σ i = 1 3 E| f ( W ¯ )W| 1 A i ( W ¯ ),
(3.20)

where \(A_{1} = (-\infty , x-1/\sigma ]\), \(A_{2} = (x-1/\sigma , x]\), and \(A_{3} = (x, \infty )\).

Since from (3.3), \(f'(w) = wf(w) -Nh_{x}\) for \(w\in A_{1}\) and \(|f(w)| = (\sqrt{2\pi})Nh_{x}e^{w^{2}/2}\Phi (w)\) in this case, we have

E| f ( W ¯ )W| 1 A 1 ( W ¯ )( 2 π )N h x E|W W ¯ e W ¯ 2 / 2 Φ( W ¯ )| 1 A 1 ( W ¯ )+N h x .
(3.21)

As \(|\bar{W}|\leq |W|+1/\sigma \leq |W|+1\), the Cauchy–Schwarz inequality, together with Lemmas 3.3 and 3.4, shows that \(\mathbb{E}|W\bar{W}e^{\bar{W}^{2}/2}|\) may be bounded by the same absolute constant for each n, since

$$\begin{aligned} \mathbb{E} \bigl\vert W\bar{W}e^{\bar{W}^{2}/2} \bigr\vert & \leq \mathbb{E} W^{2}e^{ \frac{1}{2}(1+ \vert W \vert )^{2}} + \mathbb{E} \bigl\vert We^{\frac{1}{2}(1+ \vert W \vert )^{2}} \bigr\vert \\ & \leq \bigl(\mathbb{E}W^{4}\bigr)^{1/2}\bigl( \mathbb{E}e^{(1+ \vert W \vert )^{2}}\bigr)^{1/2} + \bigl( \mathbb{E}W^{2} \bigr)^{1/2}\bigl(\mathbb{E}e^{(1+ \vert W \vert )^{2}}\bigr)^{1/2} \leq C. \end{aligned}$$

Thus from (3.21) and the nonuniform bound for \(Nh_{x}\) by Lemma 3.2(d) we get

E| f ( W ¯ )W| 1 A 1 ( W ¯ ) C e | x | σ .
(3.22)

In exactly the same way, we also find

E| f ( W ¯ )W| 1 A 3 ( W ¯ ) C e | x | σ .
(3.23)

For the second term in (3.20), as \(|W-\bar{W}| \leq 1/\sigma \) and \(\bar{W}\in A_{2}\), this implies \(W\in [x-2/\sigma , x+1/\sigma ]\). Since \(|f'|\leq 1\) on \(A_{2}\), by Lemma 3.2(a) we have

E| f ( W ¯ )W| 1 A 2 ( W ¯ )max { | x 2 / σ | , | x + 1 / σ | } P(x2/σWx+1/σ).
(3.24)

To bound (3.24) further, we now consider three subcases according to the signs of \(x-2/\sigma \) and \(x+1/\sigma \).

Subcase 1: \(x-2/\sigma \geq 0\). In this case, applying (3.14), we find

E { | f ( W ¯ ) W | 1 A 2 ( W ¯ ) } C ( x + 1 / σ ) e 2 ( x 2 / σ ) σ C e x σ

since \(xe^{-2x} \leq Ce^{-x}\) for \(x \geq 0\).

Subcase 2: \(x+1/\sigma \leq 0\). In this case, noting that \(|x-2/\sigma | = |x+1/\sigma | + 3/\sigma \) and applying (3.18), we have

E { | f ( W ¯ ) W | 1 A 2 ( W ¯ ) } C | x 2 / σ | e 2 | x + 1 / σ | σ C e | x | σ .

Subcase 3: \(x-2/\sigma \leq 0 \leq x+1/\sigma \). In this case, we have \(|W| \leq 3/\sigma \), and so using (3.13) gives

E { | f ( W ¯ ) W | 1 A 2 ( W ¯ ) } 3 σ ( 5 σ ) C e | x | σ ,

as \(|x| \leq 2/\sigma \leq 2\).

Considering all three subcases, we see that

E { | f ( W ¯ ) W | 1 A 2 ( W ¯ ) } C e | x | σ ,

and so together with (3.22) and (3.23), by (3.20) the first term of (3.19) satisfies

$$ \mathbb{E} \bigl\vert W\bigl[f(W) - f\bigl(W^{*}\bigr) \bigr] \bigr\vert \leq \frac{Ce^{- \vert x \vert }}{\sigma ^{2}}. $$
(3.25)

Now we focus on the second term in (3.19). We have

E|f ( W ) ( W W ) |=E|f ( W ) ( W W ) | 1 A 2 c ( W ) +E|f ( W ) ( W W ) | 1 A 2 ( W ) .

From (3.6), \(|f(W^{*})| \leq (\sqrt{2\pi})Nh_{x}e^{(W^{*})^{2}/2}\) when \(W^{*}\in A_{2}^{c}\), and so

E|f ( W ) ( W W ) | 1 A 2 c ( W ) ( 2 π ) N h x σ E e ( W ) 2 / 2 C e | x | σ 2

using Lemma 3.2(d) and the fact that \(|W^{*}| \leq |W| + 1/\sigma \) with Lemma 3.4.

Also, since \(|f(w)| \leq 1/\sigma \), we get

E|f ( W ) ( W W ) | 1 A 2 ( W ) 1 σ 2 P(x2/σWx+1/σ) C e | x | σ 2

as \(W^{*} \in A_{2}\) implies \(W \in [x-2/\sigma , x+1/\sigma ]\). Hence

$$ \mathbb{E} \bigl\vert f\bigl(W^{*}\bigr) \bigl(W-W^{*} \bigr) \bigr\vert \leq \frac{Ce^{- \vert x \vert }}{\sigma ^{2}} , $$

which, together with (3.25), shows that

$$ \bigl\vert \mathbb{E}h_{x}\bigl(W^{*}\bigr) - Nh_{x} \bigr\vert \leq \frac{Ce^{- \vert x \vert }}{\sigma ^{2}}, $$
(3.26)

as required. □

Remark 4

A uniform version of Theorem 3.1, i.e., showing that there is a constant C not depending on x such that \(|\mathbb{E}h_{x}(W^{*}) - Nh_{x}| \leq C/\sigma ^{2}\), may be obtained in a similar manner using, instead of (3.14), the concentration inequality \(P(a\leq W \leq b) \leq 2(1+a)^{-1}(b-a+(2+\sqrt{2})/\sigma )\) for \(0\leq a < b\). The proof of this concentration inequality follows the same basic structure as that of (3.14) and allows us to derive an explicit constant in the uniform case.

4 Proof of Theorem 1.1

Before proceeding to the proof of Theorem 1.1, we observe that with our choice of test functions, the result in Theorem 3.1 may be written as

$$ \biggl\vert P \biggl(x-\frac{1}{\sigma} < W^{*} \leq x \biggr) - P \biggl( x- \frac{1}{\sigma} < Z \leq x \biggr) \biggr\vert \leq \frac{Ce^{- \vert x \vert }}{\sigma ^{2}}. $$
(4.1)

Specializing to values of x of the form \(x = \frac{k-\mu}{\sigma}\), \(k \in [0,n]\cap \mathbb{Z}\), (4.1) becomes

$$ \bigl\vert P \bigl(k-1 < \sigma W^{*} + \mu \leq k \bigr) - P (k-1 < \sigma Z + \mu \leq k ) \bigr\vert \leq \frac{Ce^{- \vert \frac{k-\mu}{\sigma} \vert }}{\sigma ^{2}}. $$
(4.2)

We define the integer-valued random variables \(Z_{\mu ,\sigma ^{2}}\) and \(W^{*}_{\mu ,\sigma ^{2}}\), which are discretizations of \(\sigma Z + \mu \) and \(\sigma W^{*} + \mu \), respectively, as

$$\begin{aligned} &P(Z_{\mu ,\sigma ^{2}} = k)= P \biggl(\frac{k-\mu -1}{\sigma} < Z \leq \frac{k-\mu}{\sigma} \biggr), \quad k\in \mathbb{Z}, \end{aligned}$$
(4.3)
$$\begin{aligned} &P\bigl(W^{*}_{\mu ,\sigma ^{2}} = k\bigr)= P \biggl( \frac{k-\mu -1}{\sigma} < W^{*} \leq \frac{k-\mu}{\sigma} \biggr), \quad k\in \mathbb{Z}. \end{aligned}$$
(4.4)

The result of Theorem 3.1, specialized to \(x=(k-\mu )/\sigma \), \(k \in [0,n]\cap \mathbb{Z}\), may then be written as

$$ \bigl\vert P(Z_{\mu ,\sigma ^{2}} = k) - P\bigl(W^{*}_{\mu ,\sigma ^{2}} = k\bigr) \bigr\vert \leq \frac{Ce^{- \vert \frac{k-\mu}{\sigma} \vert }}{\sigma ^{2}}. $$
(4.5)

In Sect. 3.1, our main reason for working with the normalized sums W, rather than with the raw sums S, was to allow us to use the zero-bias framework of [21]. It is also more straightforward to derive properties of the solution to the simple Stein equation (2.2) compared to the general form in (3.1). We have now translated the results of Theorem 3.1, regarding the centered random variables W and Z, to statements about the uncentered random variables \(Z_{\mu ,\sigma ^{2}}\) and \(W^{*}_{\mu ,\sigma ^{2}}\) in (4.5).

For the remainder of this section, we will work directly with the raw uncentered sums S rather than with W. Now for fixed k, we define the test function \(g_{k}\) as g k (w)= 1 ( k 1 , k ] (w), so that for \(k\in \mathbb{Z}\), \(\mathbb{E}g_{k}(S) = P(S = k)\).

Lemma 4.1

If \(Z\sim N(\mu , \sigma ^{2})\) and g k (w)= 1 ( k 1 , k ] (w), then \(\mathbb{E}g_{k}(Z) = (\sigma \sqrt{2\pi})^{-1} \exp \{- \frac{(k-\mu )^{2}}{2\sigma ^{2}} \} + R\), where

$$\begin{aligned} \textit{(a)}\quad & \vert R \vert \leq \frac{1}{\sigma ^{2}\sqrt{2\pi e}}, \quad \textit{and} \\ \textit{(b)} \quad & \vert R \vert \leq \frac{C}{\sigma ^{2}}e^{- \vert \frac{k-\mu}{\sigma} \vert } \end{aligned}$$

for some positive absolute constant C.

Proof

By the mean value theorem for integrals we have

$$ Ng_{k}:= \mathbb{E}g_{k}(Z) = \int _{k-1}^{k} \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(t-\mu )^{2}}{2\sigma ^{2}}}\,dt = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(c-\mu )^{2}}{2\sigma ^{2}}} $$

for some \(c\in (k-1,k)\). Then with ϕ the \(N(\mu , \sigma ^{2})\) density function, by the mean value theorem

$$ \biggl\vert \frac{1}{\sigma \sqrt{2\pi}}e^{- \frac{(k-\mu )^{2}}{2\sigma ^{2}}} - \frac{1}{\sigma \sqrt{2\pi}}e^{- \frac{(c-\mu )^{2}}{2\sigma ^{2}}} \biggr\vert \leq \phi '(d) $$

for some \(d\in (c,k)\), as \(|k-c| \leq 1\). As the maximum absolute value of the gradient on the normal \(N(\mu , \sigma ^{2})\) density curve is \(1/(\sigma ^{2}\sqrt{2\pi e})\), this completes the proof of (a).

For (b), since \(|te^{-t^{2}/2}| \leq 2.2e^{-|t|}\) for all \(t\in \mathbb{R}\), we have

$$ \vert R \vert \leq \bigl\vert \phi '(d) \bigr\vert = \frac{1}{\sigma ^{2}\sqrt{2\pi}} \biggl\vert \biggl( \frac{d-\mu}{\sigma} \biggr)e^{-\frac{1}{2} ( \frac{d-\mu}{\sigma} )^{2} } \biggr\vert \leq \frac{C}{\sigma ^{2}}e^{- \vert \frac{d-\mu}{\sigma} \vert }, $$

which completes the proof as \(d \in (k-1, k)\). □

Remark 5

We will use part (b) of Lemma 4.1 to obtain the nonuniform bound in Theorem 1.1. If we only care about giving a uniform bound, then part (a) suffices.

We now give the proof of Theorem 1.1.

Proof

With \(Ng_{k}\) and R as in the proof of Lemma 4.1, by the triangle inequality we have

$$\begin{aligned} \biggl\vert P(S = k) - \frac{1}{\sigma \sqrt{2\pi}}e^{- \frac{(k-\mu )^{2}}{2\sigma ^{2}}} \biggr\vert \leq{}& \bigl\vert P(S = k) - Ng_{k} \bigr\vert + \vert R \vert \\ \leq{}& \bigl\vert P(S = k) - P\bigl(S^{(I)} + 1 = k\bigr) \bigr\vert \end{aligned}$$
(4.6)
$$\begin{aligned} & {}+ \vert P\bigl(S^{(I)} + 1 = k\bigr) - Ng_{k} \vert \\ & {}+ \frac{C}{\sigma ^{2}}e^{- \vert \frac{k-\mu}{\sigma} \vert }, \end{aligned}$$
(4.7)

where \(S^{(I)} = S - X_{I}\), and as in Lemma 2.1, I is a random index with distribution \(P(I=i) = \sigma ^{2}_{i}/\sigma ^{2}\).

We now consider each term, (4.6) and (4.7), in turn.

For (4.6), using Lemma 7.1 from [2], as in their proof of Theorem 7.1, we have

|P(S=k)P ( S ( I ) + 1 = k ) |= 1 σ 2 |E { [ i = 1 n ( 1 p i ) ( X i p i ) ] 1 ( S = k ) } |.
(4.8)

Now, for each \(\gamma \in \mathbb{N}\), by Hölder’s inequality we have

| E { [ i = 1 n ( 1 p i ) ( X i p i ) ] 1 ( S = k ) } | E { | i = 1 n ( 1 p i ) ( X i p i ) | 1 ( S = k ) } ( E | i = 1 n ( 1 p i ) ( X i p i ) | 2 γ ) 1 / 2 γ ( P ( S = k ) ) 1 1 / 2 γ .
(4.9)

Arguing as in the proof of Lemma 3.3, we find

$$ \Biggl(\mathbb{E} \Biggl\vert \sum_{i=1}^{n}(1-p_{i}) (X_{i}-p_{i}) \Biggr\vert ^{2 \gamma} \Biggr)^{1/2\gamma} \leq p(2\gamma )^{1/2\gamma}\sigma , $$
(4.10)

where \(p(2\gamma )\) is the number of partitions of 2γ.

Now we use the concentration inequality (3.14) to bound \(P(S=k)\). In the case \(k\neq \mu \), we may choose \(\epsilon \in (0,1)\) sufficiently small so that \((k-\epsilon /2-\mu )/\sigma \) and \((k+\epsilon /2-\mu )/\sigma \) are of the same sign. In the case that \((k-\epsilon /2-\mu )/\sigma > 0\), we have

$$\begin{aligned} P(S=k) &= P(k-\epsilon /2 \leq S \leq k+\epsilon /2) = P \biggl( \frac{k-\epsilon /2-\mu}{\sigma} \leq W \leq \frac{k+\epsilon /2-\mu}{\sigma} \biggr) \\ & \leq C \biggl(\frac{\epsilon}{\sigma} + \frac{1}{\sigma} \biggr)e^{- (\frac{k-\epsilon /2-\mu}{\sigma} )}, \end{aligned}$$

and so letting \(\epsilon \to 0^{+}\), we get \(P(S=k) \leq C\sigma ^{-1}e^{- (\frac{k-\mu}{\sigma} )} \).

In the case that \((k+\epsilon /2-\mu )/\sigma < 0\), from (3.18) we get that

$$ P(S=k) \leq C \biggl(\frac{\epsilon}{\sigma} + \frac{1}{\sigma} \biggr)e^{- |\frac{k+\epsilon /2-\mu}{\sigma} |}, $$

so that \(P(S=k) \leq C\sigma ^{-1}e^{- |\frac{k-\mu}{\sigma} |}\).

Finally, in the case \(k=\mu \), we have from (3.13) that for each \(\epsilon \in (0,1)\), \(P(S=k) \leq C\{(\epsilon + 1)/\sigma \}\), and so \(P(S=k) \leq C\sigma ^{-1} = C\sigma ^{-1}e^{- | \frac{k-\mu}{\sigma} |}\).

Hence for each k, we have

$$ P(S=k) \leq \frac{Ce^{- |\frac{k-\mu}{\sigma} |}}{\sigma}. $$
(4.11)

From (4.8) and (4.9) and our bounds (4.10) and (4.11) we get, upon upon letting \(\gamma \to \infty \) and using the fact that \(\lim_{\gamma \to \infty}p(2\gamma )^{1/2\gamma}=1\) [22, Sect. 6.4], that (4.6) may be bounded as

$$ \bigl\vert P(S=k) - P\bigl(S^{(I)} + 1 =k\bigr) \bigr\vert \leq \frac{Ce^{- \vert \frac{k-\mu}{\sigma} \vert }}{\sigma ^{2}}. $$

Now we consider (4.7). Recalling that \(W=\sum_{i=1}^{n}\xi _{i}\), where \(\xi _{i} = (X_{i}-p_{i})/\sigma \), and \(W^{*} \overset{d}{=} W-\xi _{I} + \xi _{I}^{*}\) with \(\xi _{i}^{*} \sim (U-p_{i})/\sigma \) and U uniform on \([0, 1]\), we see that

$$ P\bigl(W^{*}_{\mu ,\sigma ^{2}} = k\bigr) = P\bigl(k-1 < \sigma W^{*} + \mu \leq k\bigr) =P\bigl(k-1 < S^{(I)} + U \leq k \bigr) = P\bigl(S^{(I)} = k-1\bigr), $$

and so \(S^{(I)} + 1 \overset{d}{=} W^{*}_{\mu ,\sigma ^{2}}\). As we clearly also have \(P(Z_{\mu ,\sigma ^{2}} = k) = Ng_{k}\), (4.5) implies that

$$ \bigl\vert P\bigl(S^{(I)} + 1 = k\bigr) - Ng_{k} \bigr\vert \leq \frac{Ce^{- \vert \frac{k-\mu}{\sigma} \vert }}{\sigma ^{2}}, $$

completing the proof. □

Data availability

Not applicable.

References

  1. Stein, C.: A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proc. Sixth Berkeley Symp. Math. Statist. Probab., vol. 2, pp. 583–602. Univ. California Press, California (1972)

    Google Scholar 

  2. Chen, L.H.Y., Goldstein, L., Shao, Q.: Normal Approximation by Stein’s Method. Probability and Its Applications. Springer, Heidelberg (2011)

    Book  Google Scholar 

  3. Chen, L., Shao, Q.-M.: A non-uniform Berry–Esseen bound via Stein’s method. Probab. Theory Relat. Fields 120, 236–254 (2001)

    Article  MathSciNet  Google Scholar 

  4. Bikelis, A.: An estimate of the remainder in a combinatorial central limit theorem. Liet. Mat. Rink. 6, 323–346 (1966). (in Russian)

    Google Scholar 

  5. McDonald, D.: The local limit theorem: a historical perspective. J. Iran. Stat. Soc. 4, 73–86 (2005)

    Google Scholar 

  6. DeMoivre, A.: The Doctrine of Chances: Or, a Method of Calculating the Probabilities of Events in Play, 2nd edn. Woodfall, London (1738)

    Google Scholar 

  7. Laplace, P.S.: Théorie Analytique des Probabilités. Courcier, Paris (1812)

    Google Scholar 

  8. Zolotukhin, A., Nagaev, S., Chebotarev, V.: On a bound of the absolute constant in the Berry–Esseen inequality for i.i.d. Bernoulli random variables. Mod. Stoch. Theory Appl. 5, 385–410 (2018)

    Article  MathSciNet  Google Scholar 

  9. Petrov, V.V.: Sums of Independent Random Variables. de Gruyter, Berlin (1975)

    Book  Google Scholar 

  10. Siripraparat, T., Neammanee, K.: A local limit theorem for Poisson binomial random variables. ScienceAsia (2021)

  11. Siripraparat, T., Neammanee, K.: An improvement of convergence rate in the local limit theorem for integral-valued random variables. J. Inequal. Appl. 2021, 57 (2021)

    Article  MathSciNet  Google Scholar 

  12. Barbour, A.D., Röllin, A., Ross, N.: Error bounds in local limit theorems using Stein’s method. Bernoulli 25, 1076–1104 (2019)

    Article  MathSciNet  Google Scholar 

  13. Fang, X.: Discretized normal approximation by Stein’s method. Bernoulli 20, 1404–1431 (2014)

    Article  MathSciNet  Google Scholar 

  14. Barbour, A.D., Choi, K.: A non-uniform bound for translated Poisson approximation. Electron. J. Probab. 9, 18–36 (2004)

    Article  MathSciNet  Google Scholar 

  15. Poisson, S.D.: Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile: Précédées des Règles Générales du Calcul des Probabilités. Bachelier, Paris (1837)

    Google Scholar 

  16. Duffie, D., Saita, L., Wang, K.: Multi-period corporate default prediction with stochastic covariates. J. Financ. Econ. 83, 635–665 (2007)

    Article  Google Scholar 

  17. Boland, P.J., Proschan, F.: The reliability of k out of n systems. Ann. Probab. 11, 760–764 (1983)

    Article  MathSciNet  Google Scholar 

  18. Broderick, T., Pitman, J., Jordan, M.: Feature allocations, probability functions, and paintboxes. Bayesian Anal. 8, 801–836 (2013)

    Article  MathSciNet  Google Scholar 

  19. Daskalakis, C., Diakonikolas, I., Servedio, R.: Learning Poisson binomial distributions. Algorithmica 72, 316–357 (2015)

    Article  MathSciNet  Google Scholar 

  20. Tang, W., Tang, F.: The Poisson binomial distribution – old & new. Stat. Sci. 38, 108–119 (2023)

    Article  MathSciNet  Google Scholar 

  21. Goldstein, L., Reinert, G.: Stein’s method and the zero bias transformation with application to simple random sampling. Ann. Appl. Probab. 7, 935–952 (1997)

    Article  MathSciNet  Google Scholar 

  22. Andrews, G.E., Eriksson, K.: Integer Partitions. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

Download references

Acknowledgements

The authors are grateful to two anonymous reviewers for helpful comments on the first draft of our manuscript and to the Ratchadapisek Somphot Fund at Chulalongkornm University for funding this work. GA also thanks the London Mathematical Society (Grant Ref: ECF-2022-01) for prior funding, which allowed him to visit KN at Chulalongkorn University to learn about Stein’s method.

Funding

This research is supported by Ratchadapisek Somphot Fund for Postdoctoral Fellowship, Chulalongkorn University.

Author information

Authors and Affiliations

Authors

Contributions

Both authors contributed equally to this research.

Corresponding author

Correspondence to Kritsana Neammanee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:  Proof of Lemma 3.2

Appendix:  Proof of Lemma 3.2

We will use the standard Gaussian tail inequalities

$$ \frac{we^{-w^{2}/2}}{(1+w^{2})\sqrt{2\pi}} \leq 1 -\Phi (w) \leq \frac{e^{-w^{2}/2}}{w\sqrt{2\pi}}, \quad w \geq 0. $$
(A.1)

We also observe that by (3.3) we may write \(f'_{x}\) as

f x (w)=w e w 2 / 2 w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 dt+ 1 ( x 1 σ , x ] (w)N h x
(A.2)
=w e w 2 / 2 w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 dt+ 1 ( x 1 σ , x ] (w)N h x .
(A.3)

Proof

(a) In this case, 1 ( x 1 σ , x ] (w)=1.

Case 1: \(w \geq 0\).

For all \(t\in \mathbb{R}\), we have

N h x e t 2 / 2 [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 (1N h x ) e t 2 / 2 ,
(A.4)

so that integrating over \([w, \infty ]\) gives

( 2 π ) [ 1 Φ ( w ) ] N h x w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 d t ( 2 π ) [ 1 Φ ( w ) ] ( 1 N h x ) .

From (A.3) we get

$$\begin{aligned} & -(\sqrt{2\pi})w e^{w^{2}/2} \bigl[1 - \Phi (w) \bigr](1-Nh_{x}) + (1-Nh_{x}) \\ &\quad \leq f_{x}'(w) \leq (\sqrt{2\pi})w e^{w^{2}/2} \bigl[1 - \Phi (w) \bigr]Nh_{x} + (1-Nh_{x}), \end{aligned}$$

from which (A.1) gives \(0 \leq f_{x}'(w) \leq 1\).

Case 2: \(w < 0\).

Integrating (A.4) over \((-\infty , w)\) gives

( 2 π )Φ(w)N h x w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 dt( 2 π )Φ(w)(1N h x ),

implying

( 2 π ) [ 1 Φ ( | w | ) ] N h x w [ 1 ( x 1 σ , x ] ( t ) N h x ] e t 2 / 2 dt
(A.5)
$$\begin{aligned} & \leq (\sqrt{2\pi}) \bigl[1 - \Phi \bigl( \vert w \vert \bigr) \bigr](1-Nh_{x}). \end{aligned}$$
(A.6)

Now using (A.2) and recalling \(w < 0\), (A.5) and (A.6) give

$$\begin{aligned} & (\sqrt{2\pi})w e^{w^{2}/2} \bigl[1 - \Phi \bigl( \vert w \vert \bigr) \bigr](1-Nh_{x}) + (1-Nh_{x}) \\ &\quad \leq f_{x}'(w) \leq -(\sqrt{2\pi})w e^{w^{2}/2} \bigl[1-\Phi \bigl( \vert w \vert \bigr) \bigr]Nh_{x} + (1-Nh_{x}) \end{aligned}$$

or

$$\begin{aligned} & -(\sqrt{2\pi}) \vert w \vert e^{w^{2}/2} \bigl[1 - \Phi \bigl( \vert w \vert \bigr) \bigr](1-Nh_{x}) + (1-Nh_{x}) \\ &\quad \leq f_{x}'(w) \leq (\sqrt{2\pi}) \vert w \vert e^{w^{2}/2} \bigl[1-\Phi \bigl( \vert w \vert \bigr) \bigr]Nh_{x} + (1-Nh_{x}), \end{aligned}$$

and so applying (A.1) gives \(0 \leq f_{x}'(w) \leq 1\).

(b) From (3.6) we see that \(f_{x}\) is piecewise continuous and is easily checked to be continuous at \(w=x-1/\sigma \) and \(w=x\). From (3.3) and (3.6) we get \(f'_{x}(w) = (\sqrt{2\pi})Nh_{x} we^{w^{2}/2}[1-\Phi (w)] - Nh_{x}\) for \(w > x\), and this is ≤0 for \(w < 0\) and also, by (A.1), for \(w\geq 0\). Hence \(f_{x}\) is decreasing on \((x, \infty )\). A similar argument shows that \(f_{x}\) is decreasing on \((-\infty , x-1/\sigma ]\). The fact that \(f_{x}\) is increasing over \((x-1/\sigma , x]\) follows from (a).

(c) Using (A.1), we have \(\lim_{w\to \infty}f_{x}(w) = \lim_{w\to -\infty}f_{x}(w) =0\). From parts (a) and (b) we know that \(f_{x}\) is continuous and increasing on the interval \(w \in (x-1/\sigma , x]\) and decreasing otherwise. It follows that the global maximum and minimum of \(f_{x}\) occur at \(w=x\) and \(w=x-1/\sigma \), respectively, and so

$$ f_{x}(x-1/\sigma ) \leq f_{x}(w) \leq f_{x}(x) \quad \text{for all } w\in \mathbb{R}. $$

By Lemma 3.1, to show that \(|f_{x}(w)| \leq 1/\sigma \), we may assume that \(x\geq 0\). We first obtain an upper bound for \(f_{x}\). We know that the global maximum of \(f_{x}\) occurs at \(w = x\) and equals \(f_{x}(x) = \sqrt{2\pi}Nh_{x} e^{x^{2}/2}[1 - \Phi (x)]\) and that

$$ 0 \leq f_{x}(x) \leq \biggl( \int _{x-1/\sigma}^{x} e^{-t^{2}/2}\,dt \biggr) e^{x^{2}/2}. $$
(A.7)

Case 1: \(x - \frac{1}{\sigma} \geq 0\).

Subcase 1.1: \(0 \leq x \leq \sigma /2\).

From (A.7) and the fact that \(1-\Phi (x) \leq 1/2\) we have

$$ 0 \leq f_{x}(x) \leq \frac{e^{-\frac{1}{2}(x - \frac{1}{\sigma})^{2}}e^{x^{2}/2}}{2\sigma} = \frac{e^{x/\sigma}e^{-\frac{1}{2\sigma ^{2}}}}{2\sigma} \leq \frac{e^{1/2}}{2\sigma} \leq \frac{1}{\sigma}.\vadjust{\goodbreak} $$

Subcase 1.2: \(x > \sigma /2\).

As \(1 - \Phi (x) \leq (x\sqrt{2\pi})^{-1}e^{-x^{2}/2}\) and \(Nh_{x} = (\sqrt{2\pi})^{-1}\int _{x-1/\sigma}^{x}e^{-\frac{1}{2}t^{2}}\,dt \leq (\sigma \sqrt{2\pi})^{-1}\), we have

$$ 0 \leq f_{x}(x) \leq \frac{Nh_{x}}{x} \leq \frac{2Nh_{x}}{\sigma} \leq \frac{2}{\sigma ^{2}\sqrt{2\pi}} \leq \frac{1}{\sigma} $$

as \(\sigma ^{2} \geq 1\).

Case 2: \(x- \frac{1}{\sigma} \leq 0 \leq x\), i.e., \(0 \leq x \leq 1/\sigma \). In this case,

$$ 0 \leq f_{x}(x) \leq \sqrt{2\pi}Nh_{x} e^{x^{2}/2}\bigl(1-\Phi (x)\bigr) \leq \frac{e^{1/2\sigma ^{2}}}{2\sigma} \leq \frac{1}{\sigma} $$

as \(\sigma \geq 1\).

Now we obtain a lower bound for \(f_{x}\). The global minimum of \(f_{x}\) occurs at \(w = x - 1/\sigma \), and we have

$$ 0 > f_{x}(x - 1/\sigma ) = -\sqrt{2\pi}Nh_{x} e^{\frac{1}{2}(x - \frac{1}{\sigma})^{2}}\Phi (x - 1/\sigma ). $$

Case 1: \(x - \frac{1}{\sigma} \geq 0\).

In this case,

$$ Nh_{x} = \frac{1}{\sqrt{2\pi}} \int _{x-1/\sigma}^{x}e^{-\frac{1}{2}t^{2}}\,dt \leq \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{1}{2}(x-\frac{1}{\sigma})^{2}}, $$

and so

$$ \bigl\vert f_{x}(x-1/\sigma ) \bigr\vert \leq \frac{1}{\sigma}. $$

Case 2: \(x- \frac{1}{\sigma} \leq 0 \leq x\). In this case, using \(Nh_{x} \leq (\sigma \sqrt{2\pi})^{-1}\) and \(\Phi (x - 1/\sigma ) \leq 1/2\), we have

$$ \bigl\vert f_{x}(x-1/\sigma ) \bigr\vert \leq \frac{e^{\frac{1}{2}(x-1/\sigma )^{2}}}{2\sigma} \leq \frac{e^{1/2\sigma ^{2}}}{2\sigma} \leq \frac{e^{1/2}}{2\sigma} \leq \frac{1}{\sigma}. $$

Considering all cases above, we see that for each fixed \(x\in \mathbb{R}\), we have

$$ \bigl\vert f_{x}(w) \bigr\vert \leq \frac{1}{\sigma}, \quad w\in \mathbb{R}. $$

(d) We have \(Nh_{x} = (\sqrt{2\pi})^{-1}\int _{x-1/\sigma}^{x}e^{-t^{2}/2}\,dt\).

Case 1: \(x \geq 1\).

In this case, \(Nh_{x} \leq (\sigma \sqrt{2\pi})^{-1}e^{-\frac{1}{2}(x-1/\sigma )^{2}}\) and \(e^{-\frac{1}{2}(x-1/\sigma )^{2}} = e^{1/\sigma ^{2}-x^{2}/2+x/ \sigma} \leq e.e^{-x^{2}/2+x} \leq Ce^{-x}\) using the fact that \(\sigma \geq 1\).

Case 2: \(x \leq -1\).

In this case, \(Nh_{x} \leq (\sigma \sqrt{2\pi})^{-1}e^{-x^{2}/2}\) and \(e^{-x^{2}/2} \leq Ce^{-|x|}\).

Case 3: \(|x| < 1\).

In this case, as \(e^{-|x|} \in (e^{-1}, 1)\) and \(Nh_{x} \leq (\sigma \sqrt{2\pi})^{-1}\) for all x, it follows that \(Nh_{x} \leq \frac{Ce^{-|x|}}{\sigma}\) for \(|x| < 1\). □

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Auld, G., Neammanee, K. A nonuniform local limit theorem for Poisson binomial random variables via Stein’s method. J Inequal Appl 2024, 10 (2024). https://doi.org/10.1186/s13660-024-03087-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-024-03087-4

Keywords