Wavelet density estimation for mixing and size-biased data

This paper considers wavelet estimation for a multivariate density function based on mixing and size-biased data. We provide upper bounds for the mean integrated squared error (MISE) of wavelet estimators. It turns out that our results reduce to the corresponding theorem of Shirazi and Doosti (Stat. Methodol. 27:12–19, 2015), when the random sample is independent.


Introduction
Let {Y i , i ∈ Z} be a strictly stationary random process defined on a probability space ( , F, P) with the common density function where ω denotes a known positive function, f stands for an unknown density function of the unobserved random variable X and μ = Eω(X) = R d ω(y)f (y) dy < +∞. We want to estimate the unknown density function f from a sequence of strong mixing data Y 1 , Y 2 , . . . , Y n . When Y 1 , Y 2 , . . . , Y n are independent and d = 1, Ramírez and Vidakovic [13] propose a linear wavelet estimator and show it to be L 2 consistent; Chesneau [1] considers the optimal convergence rates of wavelet block thresholding estimator; Shirazi and Doosti [16] expand Ramírez and Vidakovic's [13] work to d ≥ 1. Chesneau et al. [2] extend the independence to both positively and negatively associated cases. They show a convergence rate for mean integrated squared error (MISE). An upper bound of wavelet estimation on L p (1 ≤ p < +∞) risk in negatively associated case is given by Liu and Xu [9]. This paper deals with the d-dimensional density estimate problem (1), when Y 1 , Y 2 , . . . , Y n are strong mixing. We give upper bounds for the mean integrated squared error (MISE) of wavelet estimators. It turns out that our linear result reduces to Shirazi and Doosti's [16] theorem, when the random sample is independent.

Wavelets and Besov spaces
As a central notion in wavelet analysis, Multiresolution Analysis (MRA, Meyer [11]) plays an important role in constructing a wavelet basis, which means a sequence of closed sub-spaces {V j } j∈Z of the square integrable function space L 2 (R d ) satisfying the following properties: Here and after, Z denotes the integer set and N := {n ∈ Z, n ≥ 0}; (iv) There exists a scaling function ϕ ∈ L 2 (R d ) such that {ϕ(·k), k ∈ Z d } forms an orthonormal basis of V 0 = span{ϕ(·k)}. When d = 1, there is a simple way to define an orthonormal wavelet basis. Examples include the Daubechies wavelets with compact supports. For d ≥ 2, the tensor product method gives an MRA {V j } of L 2 (R d ) from one-dimensional MRA. In fact, with a scaling function ϕ of tensor products, we find M = 2 d -1 wavelet functions ψ ( = 1, 2, . . . , M) such that, for each f ∈ L 2 (R d ), the following decomposition Let P j be the orthogonal projection operator from L 2 (R d ) onto the space V j with the orthonormal basis {ϕ j,k (·) = 2 jd/2 ϕ(2 j · -k), k ∈ Z d }. Then, for f ∈ L 2 (R d ), A wavelet basis can be used to characterize Besov spaces. The next lemma provides equivalent definitions for those spaces, for which we need one more notation: a scaling function ϕ is called m-regular if ϕ ∈ C m (R d ) and |D α ϕ(y)| ≤ c(1 + |y| 2 )for each ∈ Z and each multi-index α ∈ N d with |α| ≤ m. Lemma 1.1 (Meyer [11]) Let ϕ be m-regular, ψ ( = 1, 2, . . . , M, M = 2 d -1) be the corresponding wavelets and f ∈ L p (R d ). If α j,k = f , ϕ j,k , β j,k = f , ψ j,k , p, q ∈ [1, ∞], and 0 < s < m, then the following assertions are equivalent: ( The Besov norm of f can be defined by

Estimators and result
In this paper, we require supp Y i ⊆ [0, 1] d in model (1). This is similar to Chesneau [1], Chesneau et al. [2], Liu and Xu [9]. Now we give the definition of strong mixing.
Obviously, the independent and identically distributed (i.i.d.) data are strong mixing since P(A ∩ B) = P(A)P(B) and α(k) ≡ 0 in that case. Now, we provide two examples for strong mixing data.
Then it can be proved by Theorem 2 and Corollary 1 of Doukhan [5] on p. 58 that {X t , t ∈ Z} is a strong mixing sequence.
with l × r and l × l matrices A(k), B(i) respectively, as well as B(0) being the identity matrix. If the absolute values of the zeros of the determinant det P(z) := det p i=0 B(i)z i (z ∈ C) are strictly greater than 1, then {Y (t), t ∈ Z} is strong mixing (Mokkadem [12]).
It is well known that a Lebesgue measurable function maps i.i.d. data to i.i.d. data. When dealing with strong mixing data, it seems necessary to require the functions ω in (1) to be Borel measurable. A Borel measurable function f on R d means {y ∈ R d , f (y) > c} being a Borel set for each c ∈ R. In that case, we can prove easily that {f (Y i )} remains strong mixing Guo [6]. This note is important for the proofs of the lemmas in the next section.
Before introducing our estimators, we formulate the following assumptions: A1. The weight function ω has both positive upper and lower bounds, i.e., for y ∈ [0, 1] d , A2. The strong mixing coefficient of Assumption A1 is standard for the nonparametric density model with size-biased data, see Ramírez and Vidakovic [13], Chesneau [1], Liu and Xu [9]. Condition A3 can be viewed as a 'Castellana-Leadbetter' type condition in Masry [10].
We choose a d-dimensional scaling function with D 2N (·) being the one-dimensional Daubechies scaling function. Then ϕ is m-regular (m > 0) when N gets large enough. Note that D 2N has compact support [0, 2N -1] and the corresponding wavelet has compact support [- We introduce and Now, we define our linear wavelet estimator and the nonlinear wavelet estimator with t n := ln n n . The positive integers j 0 and j 1 are specified in the theorem, while the constant κ will be chosen in the proof of the theorem.
The the nonlinear estimator in (7) with Remark 1 When d = 1, n -2s 2s+1 is the optimal convergence rate in the minimax sense for the standard nonparametric density model, see Donoho et al. [4].
Remark 2 When the strong mixing data Y 1 , Y 2 , . . . , Y n reduce to independent and identically distributed (i.i.d.) data, the convergence rate of our linear estimator is the same as that of Theorem 3.1 in Shirazi and Doosti [16].
Remark 3 Compared with the linear wavelet estimator f lin n , the nonlinear estimator f non n is adaptive, which means both j 0 and j 1 do not depend on s, p, and q. On the other hand, the convergence rate of the nonlinear estimator remains the same as that of the linear one up to (ln n) 3 , when p ≥ 2. However, it gets better for 1 ≤ p < 2.

Some lemmas
In this section, we provide some lemmas for the proof of the theorem. The following simple (but important) lemma holds. (1),
Proof One includes a simple proof for completeness. By (3), This with (1) leads to which concludes (9a). Using (1), one knows that This completes the proof of (9b). Similar arguments show (9c).
Proof One proves the second inequality only, the first one is similar. By the definition of β j,k , k (y) dy| 1 thanks to Hölder's inequality and orthonormality of {ψ j,k }. On the other hand, | μ n μ | 1 and | μ n | 1 because of A1. Hence, It follows from Lemma 2.1 and the definition of variance that . (11) Note that Condition A1 implies var Then it suffices to show By the strict stationarity of Y i , On the other hand, Davydov's inequality and A1 show that These with A2 give the desired conclusion (12),

Now, the main work is to show
Clearly, By A1-A3 and (1), the first term of the above inequality is bounded by

It remains to show
where the assumption 2 jd ≤ n is needed. According to A1 and A3, Hence, On the other hand, Davydov's inequality and A1-A3 tell that Moreover, n m=2 jd | cov( This with (15) shows (14).
To prove the last lemma in this section, we need the following Bernstein-type inequality (Liebscher [7,8], Rio [14]).
Bernstein-type inequality Let (Y i ) i∈Z be a strong mixing process with mixing coefficient , β j,k be defined in (5) and t n = ln n n . If A1-A3 hold and 2 jd ≤ n (ln n) 3 , then there exists a constant κ > 1 such that Proof According to the arguments of (10), Hence, it suffices to prove One shows the second inequality only, because the first one is similar and even simpler.
Then E(η i ) = 0 thanks to (9c), and η 1 , . . . , η n are strong mixing with the mixing coefficients α(k) ≤ γ e -ck because of Condition A2. By A1-A3, | According to the arguments of (13), Then it follows from Bernstein-type inequality with m = u ln n (the constant u will be chosen later on) that Clearly, 64 2 jd 2 κnt n nγ e -cm ne -cu ln n holds due to t n = ln n n , 2 jd ≤ n (ln n) 3 and m = u ln n. Choose u such that 1cu < -4, then the second term of (17) is bounded by n -4 . On the other hand, the first one of (17) has the following upper bound: thanks to D m m, 2 jd ≤ n (ln n) 3 and m = u ln n. Obviously, there exists sufficiently large κ > 1 such that exp{-κ 2 ln n 64 (1 + 1 6 κu) -1 } n -4 . Finally, the desired conclusion (16) follows.

Proof of the theorem
This section proves the theorem. The main idea of the proof comes from Donoho et al. [4].