- Research
- Open access
- Published:
Wavelet optimal estimations for a two-dimensional continuous-discrete density function over \(L^{p}\) risk
Journal of Inequalities and Applications volume 2018, Article number: 279 (2018)
Abstract
The mixed continuous-discrete density model plays an important role in reliability, finance, biostatistics, and economics. Using wavelets methods, Chesneau, Dewan, and Doosti provide upper bounds of wavelet estimations on \(L^{2}\) risk for a two-dimensional continuous-discrete density function over Besov spaces \(B^{s}_{r,q}\). This paper deals with \(L^{p}\) (\(1\leq p < \infty\)) risk estimations over Besov space, which generalizes Chesneau–Dewan–Doosti’s theorems. In addition, we firstly provide a lower bound of \(L^{p}\) risk. It turns out that the linear wavelet estimator attains the optimal convergence rate for \(r \geq p\), and the nonlinear one offers optimal estimation up to a logarithmic factor.
1 Introduction
1.1 Introduction
The density estimation plays an important role in both statistics and econometrics. This paper considers a two-dimensional density estimation model defined over mixed continuous and discrete variables [2]. More precisely, let \((X_{1},Y_{1})\), \((X_{2},Y_{2}),\dots,(X_{n},Y_{n})\) be independent and identically distributed (i.i.d.) observations of a bivariate random variable \((X,Y)\), where X is a continuous random variable, and Y is a discrete one. The joint density function of \((X,Y)\) is given by
with \(F(x,v)=P(X\leq x, Y=v)\) being the distribution function of \((X,Y)\). We are interested in estimating \(f(x,v)\) from \((X_{1},Y_{1})\), \((X_{2},Y_{2}),\dots, (X_{n},Y_{n})\). This continuous-discrete density model also arises in survival analysis, economics, and social sciences. For example, consider a series system with m components, which fails as soon as one of the components fails. Let X be the failure time of the system, and let Y be the component whose failure resulted in the failure of the system. Then \((X,Y)\) is a bivariate continuous-discrete random variable. For more examples, see [1] and [4].
The conventional kernel method gives a nice estimation for the continuous-discrete density function [1, 10, 14]. However, it is hard to provide the optimal estimation for the densities in Besov spaces. In addition, the complexity of bandwidth selection increases the difficulty of the kernel method.
Recently, wavelet methods have made the remarkable achievements in density estimation [7, 8, 11, 12, 15] due to their time and frequency localization, multiscale decomposition, and fast algorithm in numerical computations. In fact, wavelet estimation attains optimality for densities in Besov spaces, which avoids the disadvantage of kernel methods. Using the wavelet method, Chesneau et al. [2] constructed linear and nonlinear wavelet estimators for a two-dimensional continuous-discrete density function and derived their mean integrated squared errors performance over Besov balls.
This paper addresses \(L^{p}\) (\(1\leq p<\infty\)) risk estimations on Besov balls by using wavelet bases, which generalizes Chesneau–Dewan–Doosti’s theorems. It should be pointed out that a lower bound for \(L^{p}\) risk of all estimators is derived firstly. It turns out that the linear wavelet estimator is optimal for \(r\geq p\) and the nonlinear one attains optimal estimation up to a logarithmic factor.
1.2 Notations and definitions
In this paper, we use the tensor product method to construct an orthonormal wavelet basis for \(L^{2}(\mathbb{R}^{2})\), which will be used in later discussions. With a one-dimensional Daubechies scaling function \(D_{2N}\) and a wavelet function \(\psi_{2N}\) (\(\psi_{2N}\) can be constituted from the scaling function \(D_{2N}\)), we construct two-dimensional tensor product wavelets φ, \(\psi^{1}\), \(\psi ^{2}\), and \(\psi^{3}\) as follows:
Then φ and \(\psi^{i}\) (\(i=1, 2, 3\)) are compactly supported in time domain, because Daubechies’ wavelet \(D_{2N}\) and \(\psi_{2N}\) are [5, 8].
Denote
for \(k=(k_{1}, k_{2})\in\mathbb{Z}^{2}\) and \(i=1, 2, 3\). Then for each \(f\in L^{2}(\mathbb{R}^{2})\),
holds in \(L^{2}\) sense, where \(\alpha_{j,k}:=\langle f, \varphi_{j,k}\rangle\), \(\beta^{i}_{j,k}:=\langle f, \psi_{j,k}^{i}\rangle\). As usual, let \(P_{j}\) be the orthogonal projection operator defined by
Details on wavelet basis can be found in [5, 8]. A scaling function φ is called m-regular if \(\varphi\in C^{m}(\mathbb{R}^{2})\) and \(|D^{\alpha}\varphi(x)|\leq C(1+|x|^{2})^{-l}\) for each \(l\in\mathbb{Z}\) (\(|\alpha|=0, 1, \ldots, m\)). By the definition of tensor product wavelets we find that the scaling function φ is m-regular, since Daubechies’ function \(D_{2N}\) is smooth enough for large N.
One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder spaces and \(L^{2}\)-Sobolev spaces as particular examples. Throughout the paper, we work within a Besov space on a compact subset of \(\mathbb{R}^{2}\). The following lemma shows equivalent definitions for those spaces, which are fundamental in our discussions.
Lemma 1.1
([13])
Let φ be an m-regular orthonormal scaling function with the corresponding wavelets \(\psi^{i}\) (\(i=1, 2, 3\)). If \(f\in L^{r}(\mathbb{R}^{2})\), \(\alpha_{j,k}=\langle f, \varphi_{j,k}\rangle\) \(\beta^{i}_{j,k}=\langle f, \psi^{i}_{j,k}\rangle\), and \(1\leq r, q\leq\infty\), \(0< s< m\). Then following assertions are equivalent:
-
(i)
\(f\in B_{r,q}^{s}(\mathbb{R}^{2})\);
-
(ii)
\(\{2^{js}\|P_{j+1}f-P_{j}f\|_{r}\}_{j\geq0}\in l^{q}\);
-
(iii)
\(\|\{2^{j(s+1-\frac{2}{p})}\|\beta_{j,\cdot}\|_{r}\} _{j\geq0} \|_{q}< \infty\).
The Besov norm of f can be defined by
where \(\|\alpha_{j_{0},\cdot}\|_{r}^{r}:=\sum_{k\in\mathbb{Z}^{2}}|\alpha _{j_{0},k}|^{r}\) and \(\|\beta_{j,\cdot}\|_{r}^{r}:=\sum_{i=1}^{3} \sum_{k\in\mathbb{Z}^{2}}|\beta^{i}_{j,k}|^{r}\).
Here and further, \(A\lesssim B\) means that \(A\leq CB\) for some constant \(C>0\) independent of A and B, \(A\gtrsim B\) means \(B\lesssim A\), and \(A\sim B\) stands for both \(A\lesssim B\) and \(A\gtrsim B\).
Remark 1.1
By (i) and (ii) of Lemma 1.1 we observe that
for \(f\in B_{r,q}^{s}(\mathbb{R}^{2})\). Hence
Remark 1.2
When \(r\leq p\), Lemma 1.1(i) and (iii) imply that, for \(s'-\frac{2}{p}=s-\frac{2}{r}>0\),
where \(A\hookrightarrow B\) stands for a Banach space A continuously embedded in another Banach space B. More precisely, \(\|u\|_{B}\leq C\| u\|_{A}\) (\(u\in A\)) for some constant \(C>0\).
Lemma 1.2
([13])
Let \(\varphi\in L^{2}(\mathbb{R}^{2})\) be a scaling function or a wavelet with \(\sup_{k\in\mathbb {Z}^{2}}|\varphi(x-k)|< \infty\). Then, for \(\lambda=\{\lambda_{k}\}\in l^{p}(\mathbb{Z}^{2})\) and \(1\leq p\leq\infty\),
Here \(\|\lambda\|_{p}\) is the \(l^{p}(\mathbb{Z}^{2})\) norm of \(\lambda\in l^{p}(\mathbb{Z}^{2})\):
1.3 Main results
In this subsection, we state our main results and discuss relations to some other work. To do that, we propose a new bivariate function \(f_{\ast}(x, y)\), which is an improved one of that in [2]. Define
with
where \({1}_{D}\) is the indicator function of a set D.
The construction of \(f_{\ast}\) follows the idea proposed by Chesneau [2] but is different from [2]. The weight \(u(y,v)\) equals to characteristic function \(1_{\{v-\frac{1}{2}\leq y< v+\frac{1}{2}\}}\) in [2]. By a careful verification our weight \(u(y,v)\) is differentiable with respect to y for each \(v\in\{1, 2, \ldots, m\}\). The modification of \(u(y,v)\) from the characteristic function to the smooth one makes \(f_{\ast}\) continuous in y. It is easy to see that, for any \(y=v\in\{1, 2, \ldots, m\}\),
Hence, the problem is converted to construct an estimator of \(f_{\ast}\). As in [2], we assume that \(f_{\ast}\) belongs to the space \(B_{r,q}^{s}(H, Q)\) or, equivalently, \(f_{\ast}\) belongs to the Besov ball
and that the support of \(f_{\ast}(x, \cdot)\) is contained in \([-Q, Q]\) for fixed v (\(Q>0\), \(v=1, 2, \ldots, m\)).
To introduce the wavelet estimator, we need the wavelet coefficient estimators of \(\alpha_{j,k}\) and \(\beta_{j,k}^{i}\):
Define \(\wedge_{j_{0}}:=\{k\in\mathbb{Z}^{2}, \operatorname{supp} f_{\ast}\cap \operatorname{supp} \varphi_{j_{0},k} \neq\emptyset\}\). When \(f_{\ast}\) and φ have compact supports, the cardinality of \(\wedge_{j}\) satisfies \(\sharp\wedge_{j}\lesssim2^{2j}\). Then the linear wavelet estimator of \(f_{\ast}\) is given as follows:
where \(j_{0}\) is chosen such that \(2^{j_{0}}\sim n^{\frac{1}{2s'+1}}\), \(s':=s-(\frac{2}{r}-\frac{2}{p})_{+}\), and \(x_{+}:=\max\{x,0\}\).
To obtain a nonlinear estimator, we take \(j_{0}\) and \(j_{1}\) such that \(2^{j_{1}}\sim \frac{n}{\ln n}\) and \(2^{j_{0}}\sim n^{\frac{1}{2m+1}}\) with \(m>s\). Define \(\wedge_{j}:=\{k\in\mathbb{Z}^{2}, \operatorname{supp} f_{\ast}\cap \operatorname{supp} \psi^{i}_{j, k} \neq\emptyset\}\) and \(\lambda_{j}:=\frac{T}{2}2^{-\frac{j}{2}}\sqrt{\frac{\ln n}{n}}\) (T is the constant described as Lemma 2.3). Then the nonlinear estimator is given by
From the definition of \(\hat{f}^{{\mathrm{non}}}_{n}\) we find that the nonlinear estimator has the advantage to be adaptive, since it does not depend on the indices s, r, q and H in its construction.
The following theorem gives a lower bound estimation for \(L^{p}\) risk.
Theorem 1.1
Let f̂ be an estimator of \(f_{\ast}\in B_{r,q}^{s}(H)\) with \(s>\frac{2}{r}\) and r, \(q\geq1\). Then there exists \(C>0\) such that, for \(1\leq p<\infty\),
The upper bounds of the linear and nonlinear wavelet estimators are provided by Theorems 1.2 and 1.3, respectively.
Theorem 1.2
Let \(\hat{f}^{{\mathrm{lin}}}_{n}\) be the estimator of \(f_{\ast}\in B_{r,q}^{s}(H, Q)\) defined by (1.3) with \(1\leq r,q<\infty\), \(s>0\). If the density of X is bounded, then for \(r\geq p\geq1\) or \(r\leq p<\infty\) and \(s>\frac{2}{r}\),
with \(s'=s-(\frac{2}{r}-\frac{2}{p})_{+}\) and \(x_{+}:=\max(x,0)\).
Remark 1.3
If \(r\geq2\), \(p=2\) and \(s>0\), \(s'=s\), then Theorem 1.2 reduces to Theorem 4.1 in [2]. In addition, Theorem 1.2 does not make any restriction on Q, and so the assumptions are weaker than in [2]. Theorem 1.2 extends the corresponding theorem of [2] from \(p=2\) to \(p\in[1, \infty)\).
When \(r\geq p\), \(s'=s\) and the linear wavelet estimator \(\hat{f}^{{\mathrm{lin}}}_{n}\) attains optimality thanks to Theorems 1.1 and 1.2. However, the linear estimator does not offer optimal estimation for \(r< p\), because of \(s'< s\) and \(\frac{s'}{2s'+1}<\frac{s}{2s+1}\) in this case.
To give a suboptimal estimation for \(r< p\), we need the nonlinear wavelet estimators defined by (1.4).
Theorem 1.3
Let \(\hat{f}^{{\mathrm{non}}}_{n}\) be the estimator of \(f_{\ast}\in B_{r,q}^{s}(H, Q)\) defined by (1.4) with \(1\leq r,q<\infty\), \(s>0\). If the density of X is bounded, then for \(r\geq p\geq1\) or \(r\leq p<\infty\) and \(s>\frac{2}{r}\),
with \(\alpha:=\min \{\frac{s}{2s+1}, \frac{s-\frac{2}{r}+\frac{2}{p}}{2(s-\frac {2}{r})+1}\}\).
Remark 1.4
Theorems 1.1 and 1.3 tell us that the nonlinear estimator is suboptimal up to a logarithmic factor. Moreover, if \(p=2\) and \(\{r\geq2, s>0\}\) or \(\{1\leq r<2, s>\frac{2}{r}\}\), then \(\alpha=\frac{s}{2s+1}\), and Theorem 1.3 is the same as Theorem 4.2 in [2] up to a logarithmic factor. Hence Theorem 1.3 can be considered as an extension of Theorem 4.2 in [2] from \(p=2\) to \(p\in[1, \infty)\).
In particular, we can extend the theorems to the multidimensional case as in [3] by using the technique developed by [9]. It is a challenging problem to study the estimation of a multivariate continuous-discrete conditional density. We refer to [3] for further details.
2 Some lemmas
We shall show several lemmas in this section, which are needed for proofs of our main theorems.
Lemma 2.1
Let \(\hat{\alpha}_{j,k}\) and \(\hat{\beta}_{j,k}\) be defined by (1.2). Then
for \(j\geq j_{0}\), \(k\in\mathbb{Z}^{2}\), and \(i=1, 2, 3\).
Proof
Denote \(c_{j,k}(v)=\int\phi_{j,k_{2}}(y)u(y,v)\,dy\). Then
Since \((X_{1},Y_{1})\), \((X_{2},Y_{2}), \dots, (X_{n},Y_{n})\) are independent and identically distributed, we have
Similarly to the previous arguments, \(E (\hat{\beta}^{i}_{j,k})=\beta ^{i}_{j,k}\). The proof of Lemma 2.1 is done. □
To show Lemma 2.2, we introduce Rosenthal’s inequality.
Rosenthal’s inequality
([8])
Let \(X_{1}, X_{2}, \ldots, X_{n}\) be independent random variables such that \(EX_{l} =0\) and \(E |X_{l} |^{p}<\infty\) (\(l=1,2,\ldots, n\)). Then, with \(C_{p}>0\),
Lemma 2.2
Let \(\hat{\alpha}_{j,k}\) and \(\hat{\beta}_{j,k}\) be defined by (1.2). If the density of X is bounded, then there exists a constant \(C>0\) such that
for \(1\leq p<\infty\) and \(2^{j}\leq n\).
Proof
We only prove the first inequality, since the second one is similar. By the definition of \(\hat{\alpha}_{j,k}\),
where \(c_{j,k_{2}}(Y_{l}):=\int_{\mathbb{R}} \phi_{j,k_{2}}(y)u(y,Y_{l})\,dy\), and ϕ is a one-dimensional Daubechies scaling function \(D_{2N}\). Since \(|u(y,v)|\leq2\), we obtain that
and
due to the boundedness of \(f_{X}\). Define \(\xi_{l}:=\phi _{j,k_{1}}(X_{l})c_{j,k_{2}}(Y_{l})-\alpha_{j,k}\). Then
It follows from Lemma 2.1 and Jensen’s inequality that
Hence (2.3) reduces to
thanks to (2.2). By the definition of \(\hat{\alpha}_{j,k}\) and \(\xi_{l}\), \(\hat{\alpha }_{j,k}-\alpha_{j,k}=\frac{1}{n}\sum_{l=1}^{n}\xi_{l}\), where \(\xi_{1}, \xi_{2}, \ldots, \xi_{n}\) are independent because \((X_{1}, Y_{1})\), \((X_{2}, Y_{2}), \ldots, (X_{n}, Y_{n})\) also are. On the other hand, Lemma 2.1 implies \(E(\xi_{l})=0\). Then Rosenthal inequality leads to
By (2.4) we know that
for \(1\leq p<2\) and
for \(p\geq2\) thanks to the assumption \(2^{j}\leq n\). Combining these with (2.5), we receive the desired conclusion
This completes the proof. □
To prove Lemma 2.3, we need the well-known Bernstein inequality.
Bernstein’s inequality
([8])
Let \(X_{1}, X_{2}, \ldots, X_{n}\) be i.i.d. random variables with \(E(X_{i})=0\) and \(\|X_{i}\|_{\infty }\leq M\). Then, for each \(\gamma>0\),
The next lemma is an extension of Proposition 4.2 in [2].
Lemma 2.3
Let \(2^{j}\leq\frac{n}{\ln n}\), \(\hat{\beta}_{j,k}^{i}\) (\(i=1,2,3\)) be defined in (1.2). If the density of X is bounded, then for each \(\varepsilon>0\), there exists \(T>0\) such that, for \(j\geq0\) and \(k\in\mathbb{Z}^{2}\),
Proof
We only show (2.6) for \(i=1\). By the definition of \(\hat{\beta}^{1}_{j,k}\), \(\hat{\beta}^{1}_{j,k}=\frac{1}{n}\sum_{l=1}^{n}\int_{\mathbb{R}} \psi ^{1}_{j,k}(X_{l}, y)u(y, Y_{l})\,dy\), and
where \(d_{j,k_{2}}(Y_{l}):=\int_{\mathbb{R}} \psi_{j,k_{2}}(y)u(y, Y_{l})\,dy\) (ϕ, ψ stand for the one-dimensional Daubechies scaling function and wavelet function, respectively). Define \(\eta_{l}:=\phi_{j,k_{1}}(X_{l})d_{j,k_{2}}(Y_{l})-\beta^{1}_{j,k}\). Then \(\hat{\beta}^{1}_{j,k}-\beta^{1}_{j,k}=\frac{1}{n}\sum_{l=1}^{n}\eta_{l}\) and \(E(\eta_{l})=0\) because of \(\beta^{l}_{j,k}=E(\hat{\beta}^{l}_{j,k})= E [\phi_{j,k_{1}}(X_{i})d_{j,k_{2}}(Y_{i}) ]\).
Using (2.1) with ψ instead of ϕ, we get \(|d_{j,k_{2}}(Y_{l}) |\lesssim2^{-\frac{j}{2}}\). Note that \(|\phi_{j,k_{1}}(X_{l})|:=2^{\frac{j}{2}} |\phi(2^{j}X_{l}-k_{1})|\leq2^{\frac{j}{2}}\|\phi\|_{\infty}\). Then \(|\phi_{j,k_{1}}(X_{l})d_{j,k_{2}}(Y_{l}) |\lesssim1\) and \(|\beta^{1}_{j,k}|= |E [\phi _{j,k_{1}}(X_{l})d_{j,k_{2}}(Y_{l}) ] |\lesssim1\). Hence
By replacing \(c_{j,k_{2}}\) and \(\alpha_{j,k}\) with \(d_{j,k_{2}}\) and \(\beta^{1}_{j,k}\), respectively, arguments similar to (2.1)–(2.4) show that
Because \(\eta_{1}, \eta_{2}, \ldots, \eta_{n}\) are i.i.d. and \(E(\eta_{l})=0\) (\(l=1, 2, \ldots, n\)), Bernstein’s inequality tells us that
with \(\lambda_{j}=\frac{T}{2}2^{-\frac{1}{2}j}\sqrt{\frac{\ln n}{n}}\). This with (2.7)–(2.8) implies
because \(2^{\frac{j}{2}}\sqrt{\frac{\ln n}{n}}\leq1\) by the assumption \(2^{j}\leq\frac{n}{\ln n}\). Note that \(\ln n> j\ln2\) due to \(n\geq2^{j}\ln n>2^{j}\). Hence
by choosing \(T>0\) such that \(\frac{T^{2}\ln2}{8 (C_{1}+\frac {C_{2}}{6}T )}>\varepsilon\). Then (2.9) reduces to
which concludes (2.6) with \(i=1\). Similarly, the conclusions with \(i=2,3\) hold. This completes the proof. □
At the end of this section, we introduce two classical lemmas, which are needed for the proof of lower bound.
Lemma 2.4
(Varshamov–Gilbert lemma, [11])
Let \(\Theta:= \{\varepsilon=(\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{m}) , \varepsilon_{i}\in\{0, 1\} \}\). Then there exists a subset \((\varepsilon^{0}, \varepsilon^{1}, \ldots, \varepsilon^{T})\) of Θ with \(\varepsilon^{0}=(0, 0, \ldots, 0)\) such that \(T\geq2^{\frac{m}{8}}\) and
To state Fano’s lemma, we introduce a concept: When P is absolutely continuous with respect to Q (denoted by \(P\ll Q\)), the Kullback divergence of P and Q between two measures P and Q is defined by
where \(p(x)\) and \(q(x)\) are the density functions of P and Q, respectively.
Lemma 2.5
(Fano’s lemma, [6])
Let \((\Omega, \mathcal{F}, P_{k})\) be a probability spaces, and let \(A_{k}\in\mathcal{F}\), \(k=0, 1, \ldots , m\). If \(A_{k}\cap A_{k'}=\emptyset\) for \(k\neq k'\), then with \(A^{C}\) standing for the complement of A and \(\mathcal{K}_{m}:=\inf_{0\leq k\leq m}\frac{1}{m}\sum_{k\neq k'}K(P_{k}, P_{k'})\),
where \(K(P_{k}, P_{k'})\) is the Kullback distance of \(P_{k}\) and \(P_{k'}\) (\(k=0, 1, \ldots, m\)).
3 Proofs of lower bounds
We rewrite Theorem 1.1 as follows before giving its proof.
Theorem 3.1
Let \(\hat{f}_{n}\) be an estimator of \(f_{\ast}\in B_{r,q}^{s}(H)\) with \(s>\frac{2}{r}\) and \(1\leq r,q\leq \infty\). Then, for \(1\leq p<\infty\),
Proof
As in Sect. 1, we take the two-dimensional tensor product wavelet
where \(D_{2N}(\cdot)\) and \(\psi_{2N}(\cdot)\) are the one-dimensional Daubechies scaling function and wavelet function, respectively. Then \(\psi^{1}\) is m-regular (\(m>s\)) for large N, and
due to \(\operatorname{supp} D_{2N}\subseteq[0,2N-1]\) and \(\operatorname{supp} \psi_{2N}\subseteq[-N+1,N]\). Then there exists a compactly supported density function \(g_{0}\) such that \(\int_{\mathbb{R}^{2}} g_{0}(x)\,dx=1\), \(g_{0}(x)|_{[0,2N-1]\times[-N+1,N]}=c_{0}\), and \(g_{0}\in B_{r,q}^{s}(H)\). Define \(\Delta_{j}:=\Delta_{j}^{1}\times\Delta_{j}^{2}\) with
Then \(\sharp\Delta_{j}=2^{j}(2^{j}-1)\sim2^{2j}\) (\(\sharp\Delta_{j}\) denotes the cardinality of \(\Delta_{j}\)). Denote \(a_{j}:=2^{-(2s+1)j}\) and
Obviously, the supports of \(\psi^{1}_{j,k}\) and \(\psi^{1}_{j,k'}\) are disjoint for \(k\neq k'\in\Delta_{j}\) and \(\operatorname{supp} \psi^{1}_{j,k}\subseteq \operatorname{supp} g_{0}\). When \((x, y)\in[0,2N-1]\times[-N+1,N]\),
for large j. On the other hand,
Hence \(g_{\varepsilon}\) is a bivariate density function for \(\varepsilon=(\varepsilon_{k})_{k\in\Delta_{j}}\).
Moreover, \(g_{\varepsilon}\in B^{s}_{r,q}(H)\). In fact, for \(\varepsilon_{k}\in\{0, 1\}\), \(\sum_{k\in\Delta_{j}} |\varepsilon _{k}|^{r}\leq2^{2j}\) and
By Lemma 1.1, \(\|a_{j}\sum_{k\in\Delta_{j}}\varepsilon_{k}\psi^{1}_{j,k}\| _{B^{s}_{r,q}}\leq H\). This with \(g_{0}\in B_{r,q}^{s}(H)\) implies \(g_{\varepsilon}\in B^{s}_{r,q}(H)\).
According to Lemma 2.4 (Varshamov–Gilbert theorem), for \(\Omega= \{\varepsilon=(\varepsilon_{k})_{k\in\Delta_{j}}, \varepsilon_{k}\in\{0, 1\} \}\), there exists a subset \(\{\varepsilon^{(0)}, \varepsilon ^{(1)},\ldots,\varepsilon^{(M)} \}\) of Ω such that \(M\geq2^{\frac{2^{2j}}{8}}\), \(\varepsilon^{(0)}=(0,0,\ldots, 0)\), and for \(m,n=0,1,\ldots,M\), \(m\neq n\),
Denote \(\wedge':= \{ g_{\varepsilon^{(0)}}, g_{\varepsilon^{(1)}}, \ldots, g_{\varepsilon^{(M)}} \}\). Then \(\wedge'\subseteq\wedge\), and for \(g_{\varepsilon^{(m)}}, g_{\varepsilon^{(n)}}\in\wedge'\),
since the supports of \(\psi^{1}_{j,k}\) (\(k\in\Delta_{j}\)) are mutually disjoint. This with (3.1) leads to
Define
\(i=0, 1, 2, \ldots, M\). Then \(A_{\varepsilon^{(m)}}\cap A_{\varepsilon^{(n)}}=\emptyset\) for \(m\neq n\). Denote by \(P^{n}_{f}\) the probability measure with the density \(f^{n}(x,y):=\prod_{i=1}^{n}f(x_{i},y_{i})\). By the construction of \(g_{\varepsilon^{(i)}}\), \(P^{n}_{g_{\varepsilon^{(i)}}}\ll P^{n}_{g_{0}}\). Then it follows from Lemma 2.5 (Fano’s lemma) that
Furthermore,
Taking \(2^{j}\sim n^{\frac{1}{2(2s+1)}}\), we obtain that
with \(\mathcal{K}_{M}:=\inf_{0\leq v\leq M} \frac{1}{M}\sum_{i\neq v}K(P^{n}_{g_{\varepsilon^{(i)}}}, P^{n}_{g_{\varepsilon^{(v)}}})\). By the definition of Kullback divergence,
where we applied the inequality \(\ln u\leq u-1\) for \(u>0\) in the last inequality. Note that
and \(g_{0}(x_{1},y_{1})=c_{0}\) for \((x_{1},y_{1})\in[0,2N-1]\times [-N+1,N]\). Combining this with the Parseval identity, we reduce (3.3) to
Hence
On the other hand, \(2^{j}\sim n^{\frac{1}{2(2s+1)}}\) implies \(na_{j}^{2}\leq C\). Then it follows from \(M\geq2^{\frac {2^{2j}}{8}}\geq e^{{2^{2j}\frac{\ln2}{8}}}\) that
by choosing \(C>0\) such that \(C<\frac{\ln2}{16}c_{0}\). This with (3.2) leads to
Now, it remains to show that
Similarly to the proof of (3.5), we construct the family of density functions \(\{g_{k}, k\in\Delta_{j}\}\) as follows:
where \(a_{j}:=2^{-j(s+1-\frac{2}{r})}\). Obviously, \(\int_{\mathbb{R}^{2}} g_{k}(x,y) \,dx \,dy=\int_{\mathbb{R}^{2}} g_{0}(x,y) \,dx \,dy=1\), and
for large j since \(s>\frac{2}{r}\). Then \(g_{k}\) is a bivariate density function for fixed \(k\in\Delta_{j}\). From the proof of (3.5) we know that \(g_{0}\in B^{s}_{r,q}(H)\). This with
implies \(g_{k}\in B^{s}_{r,q}(H)\) for \(k\in\Delta_{j}\).
To prove (3.6), we need to show that
When \(k\neq k'\in\Delta_{j}\), \(\operatorname{supp} \psi^{1}_{j,k} \cap \operatorname{supp} \psi ^{1}_{j,k'} =\emptyset\) and
Moreover,
Define \(B_{k}:= \{\|\hat{f}_{n}-g_{k}\|_{p}<\frac{\delta _{j}}{2} \}\). Then \(B_{k}\cap B_{k'}=\emptyset\) (\(k\neq k'\)). According to Lemma 2.5 (Fano’s lemma), we find that
where \(M=\sharp\Delta_{j}\) and \(\mathcal{K}_{M}:=\inf_{0\leq v\leq M}\frac{1}{M}\sum_{k\neq v} K(P^{n}_{g_{k}}, P^{n}_{g_{v}})\leq\frac{1}{M}\sum_{k\neq 0}K(P^{n}_{g_{k}}, P^{n}_{g_{0}})\). Similar to (3.3)–(3.4), we conclude that
Hence \(\mathcal{K}_{M}\leq c^{-1}_{0}C_{1}na_{j}^{2}\). By taking \(2^{j}\sim(\frac{n}{\ln n})^{\frac{1}{2(s-\frac {2}{r})+1}}\)we obtain that \(\ln2^{j}\geq C'\ln n\) and \(e^{-\mathcal{K}_{M}}\geq e^{-c^{-1}_{0}C_{1}na_{j}^{2}}\geq e^{-c_{0}^{-1}C\ln n}\), thanks to \(na_{j}^{2}\leq C_{2}\ln n\) (\(C=C_{1}C_{2}\)). Moreover, choosing \(C_{1}\) and \(C'\) such that \(C'>c_{0}^{-1}C\), we have
due to \(M\sim2^{2j}\). This with (3.8) implies \(\sup_{k\in\Delta_{j}}P_{g_{k}}^{n}(\|\hat{f}_{n}-g_{k}\|_{p}\geq\frac {\delta_{j}}{2}) \gtrsim1\). Furthermore,
Then the desired conclusion (3.7) follows from \(\delta_{j}:=2^{\frac{1}{p}}\|\psi^{1}\|_{p}2^{-j(s-\frac{2}{r}+\frac {2}{p})}\) and the choice of \(2^{j}\sim(\frac{n}{\ln n})^{\frac{1}{2(s-\frac {2}{r})+1}}\). This completes the proof. □
4 Proofs of upper bounds
In this section, we prove the upper bounds of wavelet estimators. The result of the linear one is derived firstly. We restate and prove Theorem 1.2 as Theorem 4.1.
Theorem 4.1
Let \(\hat{f}^{{\mathrm{lin}}}_{n}\) be the linear estimator of \(f_{\ast}\in B_{r,q}^{s}(H, Q)\) defined in (1.3) with \(1\leq r,q<\infty\), \(s>0\). If the density of X is bounded, then for \(\{r\geq p\geq1\}\) or \(\{r\leq p<\infty\textit{ and } s>\frac{2}{r}\}\),
with \(s'=s-(\frac{2}{r}-\frac{2}{p})_{+}\) and \(x_{+}:=\max\{x,0\}\).
Proof
When \(r\leq p\), \(s':=s-(\frac{2}{r}-\frac{2}{p})_{+}=s-\frac {2}{r}+\frac{2}{p}\) and \(B_{r,q}^{s}(\mathbb{R}^{2})\hookrightarrow B_{p,q}^{s'}(\mathbb{R}^{2})\) thanks to Remark 1.2. Then
When \(r>p\) and \(f_{\ast}\) has a compact support, then \(\hat {f}_{n}^{{\mathrm{lin}}}\) does due to φ having the same property. By the Hölder inequality,
Because \(s'=s\) in that case, it is sufficient to prove that
for the conclusion of Theorem 4.1.
Recall that \(\hat{f}^{{\mathrm{lin}}}_{n}:=\sum_{k\in\wedge _{j_{0}}}\hat{\alpha}_{j_{0},k}\phi_{j_{0},k}\). Then by Lemma 2.1 we conclude that
due to Lemma 1.2. It follows from Lemma 2.2 and \(\sharp\wedge_{j_{0}}\lesssim 2^{2j_{0}}\) that
thanks to the choice of \(2^{j_{0}}\sim n^{\frac{1}{2s'+1}}\).
On the other hand, by Lemma 2.1, \(E(\hat{f}^{{\mathrm{lin}}}_{n})=\sum_{k\in\wedge_{j_{0}}}\alpha _{j_{0},k}\varphi_{j_{0},k}=P_{j_{0}}f_{\ast}\). Combining this with \(f_{\ast}\in B_{p,q}^{s'}(\mathbb{R}^{2})\) and Remark 1.1, we get that
Taking \(2^{j_{0}}\sim n^{\frac{1}{2s'+1}}\), it is easy to show
which means that (4.1) holds. The proof is done. □
Next, we are in a position to prove the conclusion of the nonlinear one.
Theorem 4.2
Let \(\hat{f}^{{\mathrm{non}}}_{n}\) be the nonlinear estimator of \(f_{\ast}\in B_{r,q}^{s}(H, Q)\) defined in (1.4) with \(1\leq r,q<\infty\), \(s>0\). If the density of X is bounded, then for \(\{r\geq p\geq1\}\) or \(\{r\leq p<\infty\textit{ and } s>\frac{2}{r}\}\),
with \(\alpha:=\min \{\frac{s}{2s+1}, \frac{s-\frac{2}{r}+\frac{2}{p}}{2(s-\frac {2}{r})+1} \}\).
Proof
We only need to prove the case \(r\leq p\). In fact, when \(r>p\), \(\hat{f}^{{\mathrm{non}}}_{n}\) has a compact support because of φ, ψ, and \(f_{\ast}\) have the same property. By the Hölder inequality,
Using Theorem 4.2 for the case \(r=p\), we find that \(\sup_{f_{\ast}\in B_{r,q}^{s}(H,Q)} E\|\hat{f}^{{\mathrm {non}}}_{n}-f_{\ast}\|_{r}^{r}\lesssim [4] (\ln n)^{r}(\frac{\ln n}{n})^{\alpha r}\), and therefore
It remains to estimate the case \(r\leq p\). Recall that
with \(\lambda_{j}=T2^{-\frac{j}{2}}\sqrt{\frac{\ln n}{n}}\). Denote \(f_{j_{0},j_{1}}:=\sum_{j=j_{0}}^{j_{1}}\sum_{i=1}^{3}\sum_{k\in\wedge_{j}} (\hat{\beta}^{i}_{j,k}1_{\{|\hat{\beta}^{i}_{j,k}|>\lambda_{j}\}}-\beta ^{i}_{j,k})\psi^{i}_{j,k}\). Then
From the proof of Theorem 4.1 we obtain that
and
due to \(2^{j_{0}}\sim n^{\frac{1}{2m+1}}\), \(2^{j_{1}}\sim\frac{n}{\ln n}\) and \(\alpha=\min \{\frac{s}{2s+1}, \frac{s-\frac{2}{r}+\frac{2}{p}}{2(s-\frac{2}{r})+1} \}\).
By \(f_{j_{0},j_{1}}:= \sum_{j=j_{0}}^{j_{1}}\sum_{i=1}^{3}\sum_{k\in\wedge_{j}} (\hat{\beta}^{i}_{j,k}1_{\{|\hat{\beta}^{i}_{j,k}|>\lambda_{j}\}}-\beta ^{i}_{j,k})\psi^{i}_{j,k}\) and Lemma 1.2,
On the other hand, it is easy to see that
and \(1_{\{|\hat{\beta}^{i}_{j,k}|\geq\lambda_{j}, |\beta ^{i}_{j,k}|<\lambda_{j}/2\}}\leq1_{\{|\hat{\beta}^{i}_{j,k}-\beta ^{i}_{j,k}|>\lambda_{j}/2\}}\). Then
with
When \(|\hat{\beta}^{i}_{j,k}|<\lambda_{j}\) and \(|\beta^{i}_{j,k}|>2\lambda_{j}\), \(|\hat{\beta}^{i}_{j,k}-\beta^{i}_{j,k} |\geq |\beta^{i}_{j,k}|-|\hat{\beta}^{i}_{j,k}|>{|\hat{\beta }^{i}_{j,k}|}/{2}\). Hence
Then (4.6) reduces to
By (4.4)–(4.5) and (4.7) it is sufficient to show
for the conclusion of Theorem 4.2.
To estimate \(T_{1}\), using the Hölder inequality, we find that
Note that \(E (1_{\{|\hat{\beta}^{i}_{j, k}-\beta^{i}_{j, k}|\geq \lambda_{j}/2\}} )= P (|\hat{\beta}^{i}_{j,k}-\beta^{i}_{j, k}|\geq\frac{\lambda _{j}}{2} )\leq 2^{-\varepsilon j}\) due to Lemma 2.3. Taking ε such that \(\varepsilon>p\), we conclude that
thanks to Lemma 2.2, \(\sharp\wedge_{j}\lesssim2^{2j}\) and the choice of \(j_{0}\). Hence (4.8) with \(\ell=1\) holds since \(\alpha\leq\frac {s}{2s+1}\).
To estimate \(T_{2}\) and \(T_{3}\), define
Recall that \(2^{j_{0}}\sim n^{\frac{1}{2m+1}}\), \(2^{j_{1}}\sim\frac {n}{\ln n}\) and \(\alpha:=\min \{ \frac{s}{2s+1}, \frac{s-\frac{2}{r}+\frac {2}{p}}{2(s-\frac{2}{r})+1} \}\). Then
Hence \(2^{j_{0}}\leq2^{j_{0}^{\ast}}\) and \(2^{j_{1}^{\ast}}\leq2^{j_{1}}\). Moreover, a simple computation shows that \(1-2\alpha\leq\frac{\alpha}{s-\frac{2}{r}+\frac{2}{p}}\), which implies \(2^{j_{0}^{\ast}}\leq2^{j_{1}^{\ast}}\).
Now, we estimate \(T_{2}\) by dividing \(T_{2}\) into
Since \(1_{\{|\hat{\beta}^{i}_{j, k}|\geq\lambda_{j},|\beta^{i}_{j, k}|\geq\lambda_{j}/2 \}}\leq1\), by Lemma 2.2 we know that
thanks to \(\sharp\wedge_{j}\lesssim2^{2j}\) and the choice of \(j_{0}^{*}\). To estimate \(t_{2}\), we observe that
This with Lemma 2.2 leads to
Note that \(\|\beta_{j, \cdot}\|_{r}\lesssim2^{-j(s+1-\frac{2}{r})}\) because of \(f_{\ast}\in B^{s}_{r,q}\) and Lemma 1.1. Then (4.11) reduces to
thanks to \(\lambda_{j}=\frac{T}{2}2^{-\frac{j}{2}}\sqrt{\frac{\ln n}{n}}\). Denote \(\theta:=sr+\frac{r}{2}-\frac{p}{2}\). When \(\theta>0\), \(r>\frac{p}{2s+1}\) and
due to the choice of \(j_{0}^{\ast}\). In (4.13), we use the fact \(\alpha=\frac{s}{2s+1}\) in the case \(r>\frac{p}{2s+1}\).
To show (4.13) for \(\theta\leq0\), define \(r_{1}:=(1-2\alpha)p>0\). Then \(\alpha=\frac{s-\frac{2}{r}+\frac {2}{p}}{2(s-\frac{2}{r})+1}\leq\frac{s}{2s+1}\) and \(r\leq\frac{p}{2s+1}\leq(1-2\alpha)p=r_{1}\) because \(\theta\leq0\). The same arguments as (4.11) show that
It follows from \(f_{\ast}\in B^{s}_{r, q}\) and Lemma 1.1 that
due to \(r\leq r_{1}\). Therefore, similarly to (4.12), we get that
Note that \(\frac{p}{2}-2-(s-\frac{2}{r}+\frac{1}{2})r_{1}=0\) because of \(r_{1}=(1-2\alpha)p\) and \(\alpha=\frac{s-\frac{2}{r}+\frac {2}{p}}{2(s-\frac{2}{r})+1}\). Then
which implies that (4.13) holds for \(\theta\leq0\). The desired conclusion (4.8) with \(\ell=2\) follows from (4.9)–(4.10) and (4.13)–(4.14).
Finally, by splitting \(T_{3}\) into
we obtain that
thanks to \(\sharp\wedge_{j}\lesssim2^{2j}\) and the choice of \(\lambda_{j}\) and \(j_{0}^{\ast}\).
To estimate \(e_{2}\), we use the fact \(1_{ \{|\hat{\beta}^{i}_{j, k}| \leq\lambda_{j}, |\beta^{i}_{j, k}|\leq2\lambda_{j} \}}\leq (\frac{2\lambda_{j}}{|\beta^{i}_{j, k}|} )^{p-r}\) because of \(r\leq p\). Similarly to (4.11)–(4.13),
for \(\theta>0\), where \(\theta:=sr+\frac{r}{2}-\frac{p}{2}\). When \(\theta\leq0\), we rewrite \(e_{2}\) as follows:
Proceeding as in (4.11) and (4.12), we find that
This with the choice of \(2^{j_{1}^{\ast}}\sim(\frac{n}{\ln n})^{\frac {\alpha}{s-\frac{2}{r}+\frac{2}{p}}}\) leads to
due to \(\alpha=\frac{s-\frac{2}{r}+\frac{2}{p}}{2(s-\frac{2}{r})+1}\) for \(\theta\leq0\). When \(r\leq p\),
thanks to \(f_{\ast}\in B^{s}_{r, q}\) and Lemma 1.1. Therefore
Combining this with the choice of \(2^{j_{1}^{\ast}}\sim(\frac{n}{\ln n})^{\frac{\alpha}{s-\frac{2}{r}+\frac{2}{p}}}\), we observe that
This with (4.19) implies that (4.17) holds for \(\theta \leq0\). Hence
Therefore, the desired conclusion can be concluded by (4.4)–(4.8) with \(\ell=1,2,3\), which completes the proof. □
References
Ahmad, I.A., Cerrito, P.B.: Nonparametric estimation of joint discrete-continuous probability densities with applications. J. Stat. Plan. Inference 41, 349–364 (1994)
Chesneau, C., Dewan, I., Doosti, H.: Nonparametric estimation of a two dimensional continuous-discrete density function. Stat. Methodol. 18, 64–78 (2014)
Chesneau, C., Doosti, H.: A note on the adaptive estimation of a conditional continuous-discrete multivariate density by wavelet methods. Chin. J. Math. 2016, 6204874 (2016)
Crowder, M.J.: Classical Competing Risks. Chapman & Hall, London (2001)
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)
Devore, R., Kerkyacharian, G., Piard, D., Temlyakov, V.: Approximation methods for supervised learning. Found. Comput. Math. 6, 3–58 (2006)
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., Picard, D.: Density estimation by wavelet thresholding. Ann. Stat. 24, 508–539 (1996)
Härdle, W., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation and Statistical Applications. Springer, New York (1998)
Kou, J.K., Liu, Y.M.: An extension of Chesneau’s theorem. Stat. Probab. Lett. 108, 23–32 (2016)
Li, Q., Maasoumi, E., Racine, J.S.: A nonparametric test for equality of distributions with mixed categorical and continuous data. J. Econom. 148, 186–200 (2009)
Li, R., Liu, Y.M.: Wavelet optimal estimations for a density with some additive noises. Appl. Comput. Harmon. Anal. 32, 416–433 (2014)
Liu, Y.M., Wang, H.Y.: Convergence order of wavelet thresholding estimator for differential operators on Besov spaces. Appl. Comput. Harmon. Anal. 32, 342–356 (2012)
Meyer, Y.: Wavelets and Operators. Cambridge University Press, Cambridge (1992)
Ouyang, D., Li, Q., Racine, J.: Cross-validation and the estimation of probability distributions with categorical data. J. Nonparametr. Stat. 18, 69–100 (2006)
Zeng, X.C.: A note on wavelet deconvolution density estimation. Int. J. Wavelets Multiresolut. Inf. Process. 15(6), 1750055 (2017)
Funding
This work was supported by the National Natural Science Foundation of China (Grant No. 11771030, 11601030), the Beijing Natural Science Foundation (Grant No. 1172001), the Premium Funding Project for Academic Human Resources Development in Beijing Union University (Grant No. BPHR2018CZ10), and the Scientific Research Project of Beijing Municipal Education Commission (Grant No. KM201711417002).
Author information
Authors and Affiliations
Contributions
All authors finish this work together. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Hu, L., Zeng, X. & Wang, J. Wavelet optimal estimations for a two-dimensional continuous-discrete density function over \(L^{p}\) risk. J Inequal Appl 2018, 279 (2018). https://doi.org/10.1186/s13660-018-1868-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-018-1868-7