- Research Article
- Open Access
Bounds for Tail Probabilities of the Sample Variance
© V. Bentkus and M. Van Zuijlen. 2009
- Received: 11 February 2009
- Accepted: 20 June 2009
- Published: 9 August 2009
We provide bounds for tail probabilities of the sample variance. The bounds are expressed in terms of Hoeffding functions and are the sharpest known. They are designed having in mind applications in auditing as well as in processing data related to environment.
- Convex Function
- Central Limit Theorem
- Sample Variance
- Elementary Calculation
- Point Distribution
for the mean, variance, and the fourth central moment of , and assume that . Some of our results hold only for bounded random variables. In such cases without loss of generality we assume that . Note that is a natural condition in audit applications.
The paper is organized as follows. In the introduction we give a description of bounds, some comments, and references. In Section 2 we obtain sharp upper bounds for the fourth moment. In Section 3 we give proofs of all facts and results from the introduction.
The restriction on the range of in (1.4) (resp., in (1.5) in cases where the condition is fulfilled) is natural. Indeed, for , due to the obvious inequality . Furthermore, in the case of we have for since (see Proposition 2.3 for a proof of the latter inequality).
Let us note that the known bounds (1.19)–(1.21) are the best possible in the framework of an approach based on analysis of the variance, usage of exponential functions, and of an inequality of Hoeffding (see (3.3)), which allows to reduce the problem to estimation of tail probabilities for sums of independent random variables. Our improvement is due to careful analysis of the fourth moment which appears to be quite complicated; see Section 2. Briefly the results of this paper are the following: we prove a general bound involving , , and the fourth moment ; this general bound implies all other bounds, in particular a new precise bound involving and ; we provide as well bounds for lower tails ; we compare the bounds analytically, mostly as is sufficiently large.
From the mathematical point of view the sample variance is one of the simplest nonlinear statistics. Known bounds for tail probabilities are designed having in mind linear statistics, possibly also for dependent observations. See a seminal paper of Hoeffding  published in JASA. For further development see Talagrand , Pinelis [4, 5], Bentkus [6, 7], Bentkus et al. [8, 9], and so forth. Our intention is to develop tools useful in the setting of nonlinear statistics, using the sample variance as a test statistic.
Theorem 1.1 extends and improves the known bounds (1.19)–(1.21). We can derive (1.19)–(1.21) from this theorem since we can estimate the fourth moment via various combinations of and using the boundedness assumption .
In order to derive upper confidence bounds we need only estimates of the upper tail (see ). To estimate the upper tail the condition is sufficient. The lower tail has a different type of behavior since to estimate it we indeed need the assumption that is a bounded random variable.
For Theorem 1.1 implies the known bounds (1.19)–(1.21) for the upper tail of . It implies as well the bounds (1.26)–(1.29) for the lower tail. The lower tail has a bit more complicated structure, (cf. (1.26)–(1.29) with their counterparts (1.19)–(1.21) for the upper tail).
of survival functions (cf. definitions (1.13) and (1.14) of the related Hoeffding functions). The bounds expressed in terms of Hoeffding functions have a simple analytical structure and are easily numerically computable.
We provide the values of these constants for all our bounds and give the numerical values of them in the following two cases.
Our new bounds provide a substantial improvement of the known bounds. However, from the asymptotic point of view these bounds seem to be still rather crude. To improve the bounds further one needs new methods and approaches. Some preliminary computer simulations show that in applications where is finite and random variables have small means and variances (like in auditing, where a typical value of is ), the asymptotic behavior is not related much to the behavior for small . Therefore bounds specially designed to cover the case of finite have to be developed.
Recall that we consider bounded random variables such that , and that we write and . In Lemma 2.1 we provide an optimal upper bound for the fourth moment of given a shift , a mean , and a variance . The maximizers of the fourth moment are either Bernoulli or trinomial random variables. It turns out that their distributions, say , are of the following three types (i)–(iii):
We omit the elementary calculations leading to (2.17). The calculations are related to solving systems of linear equations.
which proves the lemma.
To prove Theorems 1.1 and 1.3 we apply Lemma 2.1 with . We provide the bounds of interest as Corollary 2.2. To prove the corollary it suffices to plug in Lemma 2.1 and, using (2.2)–(2.7), to calculate explicitly. We omit related elementary however cumbersome calculations. The regions , , and are defined in (1.32).
satisfies . The function is convex. To see this, it suffices to check that restricted to straight lines is convex. Any straight line can be represented as with some . The convexity of on is equivalent to the convexity of the function of the real variable . It is clear that the second derivative is nonnegative since . Thus both and are convex.
Since both and are convex, the function attains its maximal value on the boundary of . Moreover, the maximal value of is attained on the set of extremal points of . In our case the set of the extremal points is just the set of vertexes of the cube . In other words, the maximal value of is attained when each of is either or . Since is a symmetric function, we can assume that the maximal value of is attained when and with some . Using (2.28), the corresponding value of is . Maximizing with respect to we get , if is even, and , if is odd, which we can rewrite as the desired inequality .
Proof of Theorem 1.1.
The proof is based on a combination of Hoeffding's observation (3.6) using the representation (3.8) of as a -statistic, of Chebyshev's inequality involving exponential functions, and of Proposition 3.2. Let us provide more details. We have to prove (1.22) and (1.24).
To see that the third equality in (3.27) holds, it suffices to change the variable by . The fourth equality holds by definition (1.13) of the Hoeffding function since is a Bernoulli random variable with mean zero and such that . The relation (3.27) proves (3.25) and (1.22).
A proof of (1.24) repeats the proof of (1.22) replacing everywhere and by and , respectively. The inequality in (3.23) has to be replaced by , which holds due to our assumption . Respectively, the probability now is given by (1.25).
Using the definition of the Hoeffding function we see that the right-hand sides of (3.28) and (3.31) are equal.
Proof of Theorem 1.3.
which completes the proof of (1.7) and (1.8).
Figure 1 was produced by N. Kalosha. The authors thank him for the help. The research was supported by the Lithuanian State Science and Studies Foundation, Grant no. T-15/07.
- Hoeffding W: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 1963, 58: 13–30. 10.2307/2282952MathSciNetView ArticleMATHGoogle Scholar
- Bentkus V, van Zuijlen M: On conservative confidence intervals. Lithuanian Mathematical Journal 2003,43(2):141–160. 10.1023/A:1024210921597MathSciNetView ArticleMATHGoogle Scholar
- Talagrand M: The missing factor in Hoeffding's inequalities. Annales de l'Institut Henri Poincaré B 1995,31(4):689–702.MathSciNetMATHGoogle Scholar
- Pinelis I: Optimal tail comparison based on comparison of moments. In High Dimensional Probability (Oberwolfach, 1996), Progress in Probability. Volume 43. Birkhäuser, Basel, Switzerland; 1998:297–314.View ArticleGoogle Scholar
- Pinelis I: Fractional sums and integrals of -concave tails and applications to comparison probability inequalities. In Advances in Stochastic Inequalities (Atlanta, Ga, 1997), Contemporary Mathematics. Volume 234. American Mathematical Society, Providence, RI, USA; 1999:149–168.View ArticleGoogle Scholar
- Bentkus V: A remark on the inequalities of Bernstein, Prokhorov, Bennett, Hoeffding, and Talagrand. Lithuanian Mathematical Journal 2002,42(3):262–269. 10.1023/A:1020221925664MathSciNetView ArticleMATHGoogle Scholar
- Bentkus V: On Hoeffding's inequalities. The Annals of Probability 2004,32(2):1650–1673. 10.1214/009117904000000360MathSciNetView ArticleMATHGoogle Scholar
- Bentkus V, Geuze GDC, van Zuijlen M: Trinomial laws dominating conditionally symmetric martingales. Department of Mathematics, Radboud University Nijmegen; 2005.Google Scholar
- Bentkus V, Kalosha N, van Zuijlen M: On domination of tail probabilities of (super)martingales: explicit bounds. Lithuanian Mathematical Journal 2006,46(1):3–54.MathSciNetView ArticleMATHGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.