Wavelet optimal estimations for a two-dimensional continuous-discrete density function over Lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L^{p}$\end{document} risk

The mixed continuous-discrete density model plays an important role in reliability, finance, biostatistics, and economics. Using wavelets methods, Chesneau, Dewan, and Doosti provide upper bounds of wavelet estimations on L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L^{2}$\end{document} risk for a two-dimensional continuous-discrete density function over Besov spaces Br,qs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$B^{s}_{r,q}$\end{document}. This paper deals with Lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L^{p}$\end{document} (1≤p<∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$1\leq p < \infty$\end{document}) risk estimations over Besov space, which generalizes Chesneau–Dewan–Doosti’s theorems. In addition, we firstly provide a lower bound of Lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L^{p}$\end{document} risk. It turns out that the linear wavelet estimator attains the optimal convergence rate for r≥p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$r \geq p$\end{document}, and the nonlinear one offers optimal estimation up to a logarithmic factor.


Introduction
The density estimation plays an important role in both statistics and econometrics. This paper considers a two-dimensional density estimation model defined over mixed continuous and discrete variables [2]. More precisely, let (X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X n , Y n ) be independent and identically distributed (i.i.d.) observations of a bivariate random variable (X, Y ), where X is a continuous random variable, and Y is a discrete one. The joint density function of (X, Y ) is given by with F(x, v) = P(X ≤ x, Y = v) being the distribution function of (X, Y ). We are interested in estimating f (x, v) from (X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X n , Y n ). This continuous-discrete density model also arises in survival analysis, economics, and social sciences. For example, consider a series system with m components, which fails as soon as one of the components fails. Let X be the failure time of the system, and let Y be the component whose failure resulted in the failure of the system. Then (X, Y ) is a bivariate continuous-discrete random variable.
For more examples, see [1] and [4]. The conventional kernel method gives a nice estimation for the continuous-discrete density function [1,10,14]. However, it is hard to provide the optimal estimation for the densities in Besov spaces. In addition, the complexity of bandwidth selection increases the difficulty of the kernel method.
Recently, wavelet methods have made the remarkable achievements in density estimation [7,8,11,12,15] due to their time and frequency localization, multiscale decomposition, and fast algorithm in numerical computations. In fact, wavelet estimation attains optimality for densities in Besov spaces, which avoids the disadvantage of kernel methods. Using the wavelet method, Chesneau et al. [2] constructed linear and nonlinear wavelet estimators for a two-dimensional continuous-discrete density function and derived their mean integrated squared errors performance over Besov balls.
This paper addresses L p (1 ≤ p < ∞) risk estimations on Besov balls by using wavelet bases, which generalizes Chesneau-Dewan-Doosti's theorems. It should be pointed out that a lower bound for L p risk of all estimators is derived firstly. It turns out that the linear wavelet estimator is optimal for r ≥ p and the nonlinear one attains optimal estimation up to a logarithmic factor.

Notations and definitions
In this paper, we use the tensor product method to construct an orthonormal wavelet basis for L 2 (R 2 ), which will be used in later discussions. With a one-dimensional Daubechies scaling function D 2N and a wavelet function ψ 2N (ψ 2N can be constituted from the scaling function D 2N ), we construct two-dimensional tensor product wavelets ϕ, ψ 1 , ψ 2 , and ψ 3 as follows: Then ϕ and ψ i (i = 1, 2, 3) are compactly supported in time domain, because Daubechies' wavelet D 2N and ψ 2N are [5,8]. Denote As usual, let P j be the orthogonal projection operator defined by Details on wavelet basis can be found in [5,8]. A scaling function ϕ is called m-regular if ϕ ∈ C m (R 2 ) and |D α ϕ(x)| ≤ C(1 + |x| 2 ) -l for each l ∈ Z (|α| = 0, 1, . . . , m). By the definition of tensor product wavelets we find that the scaling function ϕ is m-regular, since Daubechies' function D 2N is smooth enough for large N .
One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder spaces and L 2 -Sobolev spaces as particular examples. Throughout the paper, we work within a Besov space on a compact subset of R 2 . The following lemma shows equivalent definitions for those spaces, which are fundamental in our discussions. Lemma 1.1 ([13]) Let ϕ be an m-regular orthonormal scaling function with the corresponding wavelets ψ i (i = 1, 2, 3). If f ∈ L r (R 2 ), α j,k = f , ϕ j,k β i j,k = f , ψ i j,k , and 1 ≤ r, q ≤ ∞, 0 < s < m. Then following assertions are equivalent: The Besov norm of f can be defined by where α j 0 ,· r r := k∈Z 2 |α j 0 ,k | r and β j,· Hence Remark 1.2 When r ≤ p, Lemma 1.1(i) and (iii) imply that, for s -2 p = s -2 r > 0, where A → B stands for a Banach space A continuously embedded in another Banach space B. More precisely, u B ≤ C u A (u ∈ A) for some constant C > 0.

Main results
In this subsection, we state our main results and discuss relations to some other work. To do that, we propose a new bivariate function f * (x, y), which is an improved one of that in [2]. Define The construction of f * follows the idea proposed by Chesneau [2] but is different from [2]. The weight u(y, v) equals to characteristic function 1 {v-1 2 ≤y<v+ 1 2 } in [2]. By a careful verification our weight u(y, v) is differentiable with respect to y for each v ∈ {1, 2, . . . , m}. The modification of u(y, v) from the characteristic function to the smooth one makes f * continuous in y. It is easy to see that, for any y = v ∈ {1, 2, . . . , m}, Hence, the problem is converted to construct an estimator of f * . As in [2], we assume that f * belongs to the space B s r,q (H, Q) or, equivalently, f * belongs to the Besov ball To introduce the wavelet estimator, we need the wavelet coefficient estimators of α j,k and β i j,k : When f * and ϕ have compact supports, the cardinality of ∧ j satisfies ∧ j 2 2j . Then the linear wavelet estimator of f * is given as fol- where j 0 is chosen such that 2 j 0 ∼ n 1 2s +1 , s := s -( 2 r -2 p ) + , and x + := max{x, 0}. To obtain a nonlinear estimator, we take j 0 and j 1 such that 2 j 1 ∼ n ln n and 2 j 0 ∼ n 1 2m+1 with ln n n (T is the constant described as Lemma 2.3). Then the nonlinear estimator is given bŷ From the definition off non n we find that the nonlinear estimator has the advantage to be adaptive, since it does not depend on the indices s, r, q and H in its construction.
The following theorem gives a lower bound estimation for L p risk.
The upper bounds of the linear and nonlinear wavelet estimators are provided by Theorems 1.2 and 1.3, respectively.
When r ≥ p, s = s and the linear wavelet estimatorf lin n attains optimality thanks to Theorems 1.1 and 1.2. However, the linear estimator does not offer optimal estimation for r < p, because of s < s and s 2s +1 < s 2s+1 in this case. To give a suboptimal estimation for r < p, we need the nonlinear wavelet estimators defined by (1.4 In particular, we can extend the theorems to the multidimensional case as in [3] by using the technique developed by [9]. It is a challenging problem to study the estimation of a multivariate continuous-discrete conditional density. We refer to [3] for further details.

Some lemmas
We shall show several lemmas in this section, which are needed for proofs of our main theorems.
To show Lemma 2.2, we introduce Rosenthal's inequality.
Rosenthal's inequality ([8]) Let X 1 , X 2 , . . . , X n be independent random variables such that EX l = 0 and E|X l | p < ∞ (l = 1, 2, . . . , n). Then, with C p > 0, Lemma 2.2 Letα j,k andβ j,k be defined by (1.2). If the density of X is bounded, then there exists a constant C > 0 such that Proof We only prove the first inequality, since the second one is similar. By the definition ofα j,k , where c j,k 2 (Y l ) := R φ j,k 2 (y)u(y, Y l ) dy, and φ is a one-dimensional Daubechies scaling function D 2N . Since |u(y, v)| ≤ 2, we obtain that due to the boundedness of f X . Define ξ l := φ j,k 1 (X l )c j,k 2 (Y l )α j,k . Then 3)
To prove Lemma 2.3, we need the well-known Bernstein inequality.
At the end of this section, we introduce two classical lemmas, which are needed for the proof of lower bound.
To state Fano's lemma, we introduce a concept: When P is absolutely continuous with respect to Q (denoted by P Q), the Kullback divergence of P and Q between two measures P and Q is defined by where p(x) and q(x) are the density functions of P and Q, respectively. Lemma 2.5 (Fano's lemma, [6]) Let ( , F, P k ) be a probability spaces, and let A k ∈ F , k = 0, 1, . . . , m. If A k ∩ A k = ∅ for k = k , then with A C standing for the complement of A and where K(P k , P k ) is the Kullback distance of P k and P k (k = 0, 1, . . . , m).

Proofs of lower bounds
We rewrite Theorem 1.1 as follows before giving its proof.
Theorem 3.1 Letf n be an estimator of f * ∈ B s r,q (H) with s > 2 r and 1 ≤ r, q ≤ ∞. Then, for .
Proof As in Sect. 1, we take the two-dimensional tensor product wavelet where D 2N (·) and ψ 2N (·) are the one-dimensional Daubechies scaling function and wavelet function, respectively. Then ψ 1 is m-regular (m > s) for large N , and Then j = 2 j (2 j -1) ∼ 2 2j ( j denotes the cardinality of j ). Denote a j := 2 -(2s+1)j and Obviously, the supports of ψ 1 j,k and ψ 1 j,k are disjoint for k = k ∈ j and supp ψ 1 j,k ⊆ supp g 0 .
for large j. On the other hand, Hence g ε is a bivariate density function for ε = (ε k ) k∈ j .
Moreover, g ε ∈ B s r,q (H). In fact, for ε k ∈ {0, 1}, k∈ j |ε k | r ≤ 2 2j and By Lemma 1.1, a j k∈ j ε k ψ 1 j,k B s r,q ≤ H. This with g 0 ∈ B s r,q (H) implies g ε ∈ B s r,q (H). According to Lemma 2.4 (Varshamov-Gilbert theorem), for since the supports of ψ 1 j,k (k ∈ j ) are mutually disjoint. This with (3.1) leads to Denote by P n f the probability measure with the density f n (x, y) := n i=1 f (x i , y i ). By the construction of g ε (i) , P n g ε (i) P n g 0 . Then it follows from Lemma 2.5 (Fano's lemma) that Furthermore, Taking 2 j ∼ n 1 2(2s+1) , we obtain that with K M := inf 0≤v≤M 1 M i =v K(P n g ε (i) , P n g ε (v) ). By the definition of Kullback divergence, where we applied the inequality ln u ≤ u -1 for u > 0 in the last inequality. Note that Combining this with the Parseval identity, we reduce (3.3) to On the other hand, 2 j ∼ n Now, it remains to show that . (3.6) Similarly to the proof of (3.5), we construct the family of density functions {g k , k ∈ j } as follows: g k (x, y) := g 0 (x, y) + a j ψ 1 j,k (x, y), k ∈ j , where a j := 2 -j(s+1-2 r ) . Obviously, R 2 g k (x, y) dx dy = R 2 g 0 (x, y) dx dy = 1, and g k (x, y)| [0,2N-1]×[-N+1,N] ≥ c 0 -2 -j(s-2 r ) ψ 1 ∞ > 0 for large j since s > 2 r . Then g k is a bivariate density function for fixed k ∈ j . From the proof of (3.5) we know that g 0 ∈ B s r,q (H). This with a j ψ 1 j,k B s r,q ∼ a j 2 j(s+1-2 r ) ≤ 1 implies g k ∈ B s r,q (H) for k ∈ j . To prove (3.6), we need to show that When k = k ∈ j , supp ψ 1 j,k ∩ supp ψ 1 j,k = ∅ and where M = j and K M := inf 0≤v≤M 1 M k =v K(P n g k , P n g v ) ≤ 1 M k =0 K(P n g k , P n g 0 ). Similar to (3.3)-(3.4), we conclude that K P n g k , P n g v ≤ n R 2 g 0 (x, y) -1 g k (x, y)g 0 (x, y) 2 dx dy ≤ c -1 0 C 1 na 2 j .
Hence K M ≤ c -1 0 C 1 na 2 j . By taking 2 j ∼ ( n ln n ) 1 2(s-2 r )+1 we obtain that ln 2 j ≥ C ln n and e -K M ≥ e -c -1 0 C 1 na 2 j ≥ e -c -1 0 C ln n , thanks to na 2 j ≤ C 2 ln n (C = C 1 C 2 ). Moreover, choosing C 1 and C such that C > c -1 0 C, we have √ Me -3e -1 e -K M e ln 2 j e -3e -1 e -K M ≥ e C ln n-c -1 0 C ln n-3e -1 1 due to M ∼ 2 2j . This with (3.8) implies sup k∈ j P n g k ( f ng k p ≥