- Research
- Open access
- Published:
The mean consistency of wavelet density estimators
Journal of Inequalities and Applications volume 2015, Article number: 111 (2015)
Abstract
The wavelet estimations have made great progress when an unknown density function belongs to a certain Besov space. However, in many practical applications, one does not know whether the density function is smooth or not. It makes sense to consider the mean \(L_{p}\)-consistency of the wavelet estimators for \(f\in L_{p}\) (\(1\leq p\leq\infty\)). In this paper, the authors will construct wavelet estimators and analyze their \(L_{p}(\mathbb{R})\) performance. They prove that, under mild conditions on the family of wavelets, the estimators are shown to be \(L_{p} \) (\(1\leq p\leq\infty\))-consistent for both noiseless and additive noise models.
1 Introduction
Wavelet analysis plays important roles in both pure and applied mathematics such as signal processing, image compressing, and numerical solutions. One of the important applications is to estimate an unknown density function based on random samples [1–3]. Optimal convergence rate and consistency are two basic asymptotic criteria of the quality for an estimator. Some perfect achievements have been made for the wavelet estimation in \(L_{p}\) norm by Donoho et al. [4] etc., when an unknown density function belongs to Besov spaces. However, in many practical applications, we do not know whether the density function is smooth or not [5]. Therefore, it is natural to consider the mean consistency of the wavelet estimators, which means \(E\|f-\hat{f}_{n}\| _{p} \) (\(1\leq p\leq\infty\)) converges to zero as the sample size n tends to infinity.
In 2005, Chacón and Rodríguez-Casal [6] discussed the mean \(L_{1}\)-consistency of the wavelet estimator based on random samples without any noise. However, in practice, the observed samples are contaminated by random noises. Devroye [7] proved the mean consistency of the kernel estimator in \(L_{1}\) norm. Liu and Taylor [8] investigated \(L_{\infty}\)-consistency of the kernel estimator. Ramírez and Vidakovic [9] proposed linear and nonlinear wavelet estimators and showed that they are \(L_{2}\)-consistent.
This paper studies the mean \(L_{p}\)-consistency of the wavelet estimator. In Section 2, we briefly describe the preliminaries on wavelet scaling functions and orthogonal projection kernels. In Section 3, for the classical model, the mean \(L_{p}\)-consistency is given, which generalizes Chacón’s theorem [6]. The last section deals with the \(L_{p}\)-consistency for the additive noise model.
2 Wavelet scaling function and orthonormal projection kernel
In this section, we shall recall some useful and well-known concepts and lemmas. As usual, \(L_{p}(\mathbb{R})\), \(p \geq1\), denotes the classical Lebesgue space on the real line ℝ.
Definition 2.1
(see [10])
A multi-resolution analysis (MRA) of \(L_{2}(\mathbb{R})\) is a set of increasing, closed linear subspaces \(V_{j}\subset V_{j+1} \), for all \(j \in\mathbb{Z}\), called scaling spaces, satisfying:
-
(i)
\(\bigcap^{\infty}_{-\infty}V{_{j}}=\{0 \}\), \(\overline{\bigcup^{\infty}_{-\infty}V_{j}}=L _{2}(\mathbb{R})\);
-
(ii)
\(f(\cdot)\in V_{0}\) if and only if \(f(2^{j}\cdot)\in V_{j}\) for all \(j \in{ \mathbb{Z}}\);
-
(iii)
\(f(\cdot)\in V_{0}\) if and only if \(f(\cdot-k) \in V_{0}\) for all \(k \in{ \mathbb{Z}}\);
-
(iv)
there exists a function \(\varphi(\cdot) \in V_{0}\) such that \(\{\varphi(\cdot-k)\}\) is an orthonormal basis in \(V_{0}\). The function \(\varphi(\cdot)\) is called the scaling function.
It is easy to show that \(\{\varphi_{jk}(x),k\in\mathbb{Z}\}\) forms an orthonormal basis in \(V_{j}\), where \(\varphi_{jk}(x)=2^{j/2}\varphi (2^{j}x-k)(x)\), \(j,k\in\mathbb{Z}\).
Condition S
There exists a bounded nonincreasing function \(\Phi(\cdot)\) such that \(\int\Phi (|u|)\,du<\infty\), and \(|\varphi(u)|\leq\Phi(|u|)\) (a.e.).
Condition S is not very restrictive. For example, the Meyer scaling functions satisfy that condition; compactly supported and bounded scaling functions do as well. Furthermore, Condition S implies that \(\varphi\in L_{1}(\mathbb{R})\cap L_{\infty}(\mathbb{R})\) and \(\operatorname{ess} \sup\sum_{k}|\varphi(x-k)|<\infty\). We denote \(\theta_{\varphi}(x)=\sum_{k}|\varphi(x-k)|\).
The following lemmas are taken from [2], which will be used later on.
Lemma 2.2
If the scaling function φ satisfies \(\operatorname{ess} \sup\sum_{k\in\mathbb{Z}}|\varphi (x-k)|<\infty\), then for any sequence \(\{\lambda_{k}\}_{k \in\mathbb {Z}}\in l_{p}\), one has \(C_{1}\|\lambda\|_{l_{p}}2^{(\frac{j}{2}-\frac{j}{p})}\leq\|\sum_{k}\lambda_{k}\varphi_{j,k}\|_{p}\leq C_{2}\|\lambda\| _{l_{p}}2^{(\frac{j}{2}-\frac{j}{p})}\), where \(C_{1}=(\|\theta_{\varphi}\|^{\frac{1}{p}}_{\infty}\|\varphi \|^{\frac{1}{q}}_{1})^{-1}\), \(C_{2}=(\|\theta_{\varphi}\|^{\frac{1}{q}}_{\infty}\|\varphi\| ^{\frac{1}{p}}_{1})^{-1}\), \(1\leq p \leq\infty\), \(\frac{1}{p}+\frac {1}{q}=1\).
Definition 2.3
(see [2])
If the scaling function φ satisfies \(\operatorname{ess} \sup\sum_{k}|\varphi (x-k)|<\infty\), the kernel function
is called orthonormal projection kernel associated with φ.
For \(f\in L_{p}(\mathbb{R}) \) (\(1\leq p\leq\infty\)), if \(\operatorname{ess} \sup\sum_{k}|\varphi(x-k)|<\infty\), it is not hard to show that
where \(K_{j}(x,y)=2^{j}K(2^{j}x,2^{j}y)\), \(\alpha_{jk}=\int\varphi_{jk}(x)f(x)\,dx\).
Lemma 2.4
If the scaling function φ satisfies Condition S, then
-
(i)
\(\int K(x,y)\,dy=1\) (a.e.);
-
(ii)
\(|K(x,y)|\leq C_{1}\Phi (\frac{|x-y|}{C_{2}} )\) (a.e.), where \(C_{1}\), \(C_{2} \) are positive constants depending on Φ.
Let \(F(x)= C_{1}\Phi(\frac{|x|}{C_{2}})\), then \(F\in L_{1}(\mathbb{R})\cap L_{\infty}(\mathbb{R})\) and \(|K(x,y)|\leq F(x-y)\) (a.e.).
Lemma 2.5
If the scaling function φ satisfies Condition S, then for \(f\in L_{p}(\mathbb{R})\), \(1\leq p<\infty\), one has
The above result is also true if \(f\in L_{\infty}(\mathbb{R})\) is uniformly continuous.
Lemma 2.6
(Rosenthal’s inequality)
Let \(X_{1}, \ldots, X_{n}\) be independent random variables such that \(E(X_{i})=0\) and \(|X_{i}|< M\), then there exists a constant \(C(p)>0\) such that
3 Mean consistency for \(L_{p}\) norm
In this section, based on the random sample without noise, we shall construct the wavelet estimator and give its \(L_{p}\)-consistency.
Let \(X_{1}, X_{2}, \ldots, X_{n} \) are independent identically distributed (i.i.d.) random samples without noise, φ be compactly supported scaling function, the wavelet estimator be defined as follows:
Obviously, one gets \(E\hat{\alpha}_{jk}=\frac{1}{n}\sum_{i=1}^{n}\int\varphi_{jk}(x)f(x)\,dx=\alpha_{jk}\). On the other hand, one obtains
Theorem 3.1
Let a scaling function \(\varphi(x)\) be compactly supported and bounded, \(\hat{f}_{n}(x)\) be the wavelet estimator defined in (1). If we take \(2^{j}\sim n^{\frac{1}{2}}\), then for any \(f\in L_{p}(\mathbb{R})\), \(1\leq p<\infty\), one has
Note
The notation \(A\lesssim B\) indicates that \(A \leqslant c B\) with a positive constant c, which is independent of A and B. If \(A\lesssim B\) and \(B\lesssim A\), we write \(A\sim B\).
Proof
Due to \((E\| f-\hat{f}_{n}\|_{p})^{p}\leq E\| f-\hat {f}_{n}\|_{p}^{p}\), one only needs to consider \(E\| f-\hat{f}_{n}\|_{p}^{p}\).
Firstly, thanks to the triangular inequality and convexity inequality, one can decompose \(E\| f-\hat{f}_{n}\|_{p}^{p}\) into a bias term and a stochastic term, respectively. That is,
(i) For the bias term \(\| f-E\hat{f}_{n}\|_{p}^{p}\), one has
Since \(\varphi(x)\) satisfies Condition S, taking \(2^{j}\sim n^{\frac {1}{2}}\), due to Lemma 2.4 and Lemma 2.5, one gets
(ii) For the stochastic term \(E\|\hat{f}_{n}-E\hat {f}_{n}\|_{p}^{p}\), one can estimate it as follows:
Denote \(Y_{i}=K_{j}(x,X_{i})-E K_{j}(x,X_{i})\), then \(\{Y_{i}\}\) are i.i.d. samples, and \(EY_{i}=0\). One obtains
(i) For \(2\leq p<\infty\), Rosenthal’s inequality, Lemma 2.6, tells us that
and
then
Therefore, one gets
Taking \(2^{j}\sim n^{\frac{1}{2}}\), one obtains the following desired result:
(ii) For \(1\leq p<2\), let \(A=\{x\mid |\hat{f}_{n}-E\hat{f}_{n}|<1\}\), \(B=\{x\mid |\hat{f}_{n}-E\hat{f}_{n}|\geq1\}\), then one has
Obviously, one knows that \(f\in L_{1}(\mathbb{R})\) which guarantees that \(\lim_{n\rightarrow\infty}E\|\hat{f}_{n}-E\hat{f}_{n}\|_{2}^{2}=0\).
Moreover, \(E\|\hat{f}_{n}-E\hat{f}_{n}\|_{1}= \int\frac{1}{n} E|\sum_{i=1}^{n}Y_{i}|\,dx\), according to Rosenthal’s inequality, Lemma 2.6, one has
where \(G(x)=F^{2}(x)\). On the other hand,
where \(B(x)=f(x)\int F(t)\,dt\), \(C(x)=\int F(t) |f(x-\frac {t}{2^{j}})-f(x) |\,dt\). Then we get
One knows that
since
Then one gets \(\lim_{n\rightarrow\infty}\int C(x)\,dx=0\). Next, for \(\int B(x)\,dx=\int f(x)\int F(t)\,dt \,dx=\|F\|_{1}\|f\|_{1}\). One has \(B(x)\in L_{1}(\mathbb{R})\). By the Lebesgue dominated convergence theorem, one gets
It remains only to show that \(\lim_{n\rightarrow\infty }A(x)=0\). Since the function \(G(x)=F^{2}(x)\) is radially decreasing,
and \(\| F\|_{2}^{2} f(x)\) is finite for almost all x, we have \(\lim_{n\rightarrow\infty}A(x)=\lim_{n\rightarrow\infty}(\frac{2^{j}}{n}G_{j}*f)^{1/2}=0\). Finally, we get
□
Remark
Theorem 3.1 can be considered as a natural extension of Theorem 1 in [6].
Next we shall consider \(L_{\infty}\)-consistency.
Theorem 3.2
Let a scaling function \(\varphi(x)\) satisfy \(\operatorname{supp} \varphi\subset[-A,A]\) and bounded, \(\hat{f}_{n}(x)\) be the wavelet estimator defined in (1). If \(f\in L_{\infty}(\mathbb{R})\) is uniformly continuous and \(f(x)\lesssim\frac{1}{(1+|x|)^{2+\delta}} \) for any \(\delta>0\), taking \(2^{j}\sim n^{\frac{1}{4}}\), then one gets
Proof
The proof is similar to Theorem 3.1. We have
Since φ satisfies Condition S and f is uniformly continuous, by Lemma 2.4 and Lemma 2.5, one gets
For the stochastic term, it can be proved that
then one has \(\|\hat{f}_{n}-E\hat{f}_{n}\|_{\infty}\lesssim 2^{j/2}\sum_{k}|\hat{\alpha}_{jk}-\alpha_{jk}|\). So one obtains
According to Rosenthal’s inequality, Lemma 2.6, one has
Moreover, one has \(E\|\hat{f}_{n}-E\hat{f}_{n}\|_{\infty}\lesssim (\frac{2^{j}}{n})^{1/2}\sum_{k} (\int_{|t-k|\leq A}f(\frac {t}{2^{j}})\,dt )^{1/2}\) and
Therefore, one gets \(E\|\hat{f}_{n}-E\hat{f}_{n}\|_{\infty}\lesssim (\frac{2^{3j}}{n})^{1/2}\). Taking \(2^{j}\sim n^{\frac{1}{4}}\), one obtains
□
4 Additive noise model
In practical situations, direct data is not always available. One of the classical models is described as follows:
where \(X_{i}\) stands for the random samples with unknown density \(f_{X}\) and \(\epsilon_{i}\) denotes the i.i.d. random noise with density g. To estimate the density \(f_{X}\) is a deconvolution problem.
In 2002, Fan and Koo [11] studied the wavelet estimation for random samples with smooth and super smooth noise over a Besov ball. In 2014, Li and Liu [12] considered the wavelet estimation for random samples with moderately ill-posed noise. In this section, we consider the mean \(L_{p}\)-consistency for \(f_{X}\in L_{p}(\mathbb{R})\) with additive noise.
The Fourier transform of \(f\in L_{1}(\mathbb{R}) \) is defined as follows:
It is well known that \(\tilde{f}_{Y}(t)=\tilde{f}_{X}(t)\tilde{g}(t)\). For \(\tilde{g}(t)\neq0 \) (\(\forall t\in\mathbb{R}\)), the wavelet estimator is given by
where
and φ is the Meyer scaling function.
Lemma 4.1
If \(f_{X}\in L_{2}\mathbb{(R)}\), then \(\hat{\alpha}_{jk}\) defined in (14) is the unbiased estimation of \(\alpha_{jk}\).
Proof
The Plancherel formula tells us that
On the other hand, one gets
Therefore, \(E\tilde{\alpha}_{jk}=\alpha_{jk}\). □
The next two theorems deal with the different cases for \(p\geq2\) and \(1\leq p<2\), respectively.
Theorem 4.1
Let \(\varphi(x)\) be the Meyer scaling function, \(\tilde{g}(t)\gtrsim(1+|t^{2}|)^{-\frac{\beta}{2}}\) (\(\beta\geq0\)), and \(\hat{f}_{X,n}(x)\) is the wavelet estimator defined in (13). If \(f_{X}\in L_{p}\mathbb{(R)}\), \(2\leq p<\infty\), taking \(2^{j}\sim n^{\frac{1-\epsilon}{1+2\beta}}\) (\(\epsilon>0\)), then one gets
Proof
Similarly, one needs to consider a bias term and a stochastic term.
(i) For the bias term, one observes that
Note that \(\int\sum_{k}|(H_{j}\varphi)_{jk}(y)||\varphi _{jk}(x)| f_{Y}(y)\,dy\leq2^{j}\|(H_{j}\varphi)\|_{\infty}\|\theta_{\varphi}\|_{\infty}\| f_{Y}\| _{1}<\infty\), then
From Lemma 2.5, one gets
(ii) For the stochastic term. Due to Lemma 2.2, it can be found that
so one gets
Firstly, we estimate E\(|\hat{\alpha}_{jk}-\alpha_{jk}|_{p}^{p}\). We have
where \(Z_{ik}=(H_{j}\varphi)_{jk}(Y_{i})-E(H_{j}\varphi)_{jk}(Y_{i})\) and \(E Z_{ik}=0\). Then
Thanks to Rosenthal’s inequality, Lemma 2.6, one has
One only needs to consider \(\sum_{k}(E|Z_{1k}|^{2})^{\frac {p}{2}}\). Define \(A=\int|(H_{j}\varphi)(y)|^{2}\,dy=2\pi\int_{R}|\frac {\tilde{\varphi}(t)}{\tilde{g}(-2^{j}t)}|^{2}\,dt\lesssim\int |(1+|2^{j}t|)^{\beta/2}\tilde{\varphi}(t)|^{2}\,dt\lesssim2^{2j\beta}\), and
Moreover,
Note that \(e^{it2^{j}y}\frac{\tilde{\varphi}(t)}{\tilde{g}(-2^{j}t)}I_{[0,2\pi]}\in L_{2}[0,2\pi]\), \(\{e^{-itk},k\in Z\}\) is an orthonormal basis of \(L_{2}[0,2\pi]\), and by the Parseval formulas, one gets
Similarly, \(\sum_{k} |\int_{-\frac{4\pi }{3}}^{0}e^{it2^{j}y}\frac{\hat{\varphi}(t)}{\hat{g}(-2^{j}t)}e^{-itk}\,dt |^{2}=2^{2j\beta}\). Then \(\sum_{k}|(H_{j}\varphi)_{jk}(y)|^{2}\lesssim2^{j(2\beta +1)}\).
For the density function \(f_{Y}\in L_{1}\mathbb{(R)}\cap L_{p}\mathbb {(R)}\), \(2\leq p<\infty\), one has \(f_{Y}\in L_{p/2}\mathbb{(R)}\). Moreover, \(\sum_{k}(E|Z_{1k}|^{2})^{\frac{p}{2}}\lesssim A^{\frac {p}{2}-1}2^{2j\beta}=2^{j(\beta p+1)}\). Therefore,
Then we get \(E\|\hat{f}_{X,n}-E\hat{f}_{X,n}\|_{p}^{p}\lesssim2^{j(\frac {p}{2}-1)}\frac{2^{j(\beta p+1)}}{n^{\frac{p}{2}}} ((\frac {2^{j}}{n})^{\frac{p}{2}-1}+1 ) \lesssim (\frac{2^{j(2\beta+1)}}{n} )^{\frac{p}{2}}\). Taking \(2^{j}\sim n^{\frac{1-\epsilon}{1+2\beta}} \) (\(\epsilon>0\)), one obtains \(\lim_{n\rightarrow\infty}E\|\hat{f}_{X,n}-E\hat{f}_{X,n}\| _{p}^{p}=0\). □
Theorem 4.2
Let \(\varphi(x)\) be the Meyer scaling function, \(\tilde{g}(t)|\gtrsim(1+|t|^{2})^{-\frac{\beta }{2}}\) (\(\beta\geq0\)), \(\hat{f}_{X,n}(x)\) is the estimator defined in (13). If \(f_{X}\in L_{2}\mathbb{(R)}\cap L_{p}\mathbb{(R)} \) (\(1\leq p<2\)) and \(\operatorname{supp} f_{X} \subset[-B,B]\), taking \(2^{j}\sim n^{\frac{1-\epsilon}{2+2\beta}}\), then one has
Proof
For the bias term, we get \(E\hat {f}_{X,n}I_{[-B,B]}(x)=K_{j}f_{X}I_{[-B,B]}(x)\), then
So one gets \(\lim_{n\rightarrow\infty}\| f_{X}-E\hat {f}_{X,n}I_{[-B,B]}\|_{p}\leq\lim_{n\rightarrow\infty}\|f-K_{j} f\|_{p}=0\).
Next we only consider the stochastic term. For any \(1\leq p<2\),
Because \(\lim_{n\rightarrow\infty}E\|\hat {f}_{X,n}I_{[-B,B]}-E\hat{f}_{X,n}I_{[-B,B]}\|_{2}^{2}\leq\lim_{n\rightarrow\infty}E\|\hat{f}_{X,n}-E\hat{f}_{X,n}\|_{2}^{2}=0\), we only need to consider \(E\|\hat{f}_{X,n}I_{[-B,B]}-E\hat {f}_{X,n}I_{[-B,B]}\|_{1}\). Clearly,
define
then
We know that
now we estimate \(E|\hat{f}_{X,n}-E\hat{f}_{X,n}|\). Using Rosenthal’s inequality, Lemma 2.6, one gets
Then \(E\|\hat{f}_{X,n}I_{[-B,B]}-E\hat {f}_{X,n}I_{[-B,B]}\|_{1}\lesssim\frac{2^{j(1+\beta )}}{n^{1/2}}\), taking \(2^{j}\sim n^{\frac{1-\epsilon}{2+2\beta}}\), one gets
□
Remark
If g is the Dirac function δ, then the conclusions with additive noise reduce to the classical model results without noise.
References
Hall, P, Patil, P: On wavelet methods for estimating smooth functions. Bernoulli 1, 41-58 (1995)
Härdle, W, Kerkyacharian, G, Picard, D, Tsybakov, A: Wavelets, Approximation and Statistical Applications. Springer, New York (1997)
Kerkyacharian, G, Picard, D: Density estimation in Besov spaces. Stat. Probab. Lett. 13, 15-24 (1992)
Donoho, DL, Johnstone, IM, Kerkyacharian, G, Picard, D: Density estimation by wavelet thresholding. Ann. Stat. 24, 508-539 (1996)
Devroye, L, Lugosi, G: Almost sure classification of density. J. Nonparametr. Stat. 14, 675-698 (2002)
Chacón, JE, Rodríguez-Casal, A: On the \(L_{1}\)-consistency of wavelet density estimates. Can. J. Stat. 33(4), 489-496 (2005)
Devroye, L: Consistent deconvolution in density estimation. Can. J. Stat. 17(2), 235-239 (1989)
Liu, MC, Taylor, RL: A consistent nonparametric density estimator for the deconvolution problem. Can. J. Stat. 17(4), 427-438 (1989)
Ramírez, P, Vidakovic, B: Wavelet density estimation for stratified size-biased sample. J. Stat. Plan. Inference 140(2), 419-432 (2010)
Daubechies, I: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)
Fan, J, Koo, J: Wavelet deconvolution. IEEE Trans. Inf. Theory 48, 734-747 (2002)
Li, R, Liu, YM: Wavelet optimal estimations for a density with some additive noises. Appl. Comput. Harmon. Anal. 36, 416-433 (2014)
Acknowledgements
The authors thank Professor Youming Liu for his profound insight and helpful suggestions. This work was supported by National Natural Science Foundation of China (No. 11271038), CSC Foundation (No. 201308110227) and Fundamental Research Fund of BJUT (No. 006000514313002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.
Rights and permissions
Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
About this article
Cite this article
Geng, Z., Wang, J. The mean consistency of wavelet density estimators. J Inequal Appl 2015, 111 (2015). https://doi.org/10.1186/s13660-015-0636-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-015-0636-1