Skip to main content

Nonlinear wavelet density estimation for biased data in Sobolev spaces

Abstract

In this paper, we consider the density estimation problem from independent and identically distributed (i.i.d.) biased observations. We develop an adaptive wavelet hard thresholding rule and evaluate its performance by considering L p risk over Sobolev balls. We prove that our estimation attains a sharp rate of convergence and show the optimality.

MSC:49K40, 90C29, 90C31.

1 Introduction

In practice, it usually happens that drawing a direct sample from a random variable X is impossible. In this paper, we consider the problem of estimating the density functions f X (x) without observing directly the i.i.d. sample X 1 , X 2 ,…, X n . We observe the samples Y 1 , Y 2 ,…, Y n from biased data with the following density function:

f Y (x)= g ( x ) f X ( x ) μ ,
(1.1)

where g(x) is the so-called weight or bias function, μ=E(g(X)). The purpose of this paper is to estimate the density function f X (x) from the samples Y 1 , Y 2 ,…, Y n .

Several examples of this biased data can be found in the literature. For instance, in paper [1], it is shown that the distribution of the concentration of alcohol in the blood of intoxicated drivers is of interest, since the drunken driver has a larger chance of being arrested, the collected data are size-biased.

The density estimation problem for biased data (1.1) has been discussed in several papers. In 1982, Vardi [2] considered the nonparametric maximum likelihood estimation for f X (x). In 1991, Jones [3] discussed the mean squared error properties of the kernel density estimation. In 2004, Efromovich [4] developed the Efromovich-Pinsker adaptive Fourier estimator. It was based on a blockwise shrinkage algorithm and achieved the minimax rate of convergence under the L 2 risk over a Besov class B 2 , 2 s .

In 2010, Ramírez and Vidakovic [5] proposed a linear wavelet estimator and discussed the consistency of a function in L 2 [0,1] under the mean integrated squared error (MISE) sense. But the wavelet estimator in paper [5] contained the unknown parameter μ. In the same year, Christophe [6] constructed a nonlinear wavelet estimator and evaluated the L p risk in the Besov space B r , q s . However, Sobolev spaces W r N (N∈ N + ) except r=2 is not a special case in the Besov space B r , q s .

In this paper, we consider the nonlinear hard thresholding wavelet density estimation for biased data in Sobolev spaces W r N (N∈ N + ). We mainly give the upper bound of minimax rate of convergence under the L p risk without particular restriction on the parameters r and p, and the convergence rate is optimal.

2 Preliminaries

In this section, we shall recall some well-known concepts and lemmas.

2.1 Wavelets

In this paper, we always assume that the scaling wavelet φ is orthonormal, compactly supported and N+1 regular.

Definition 2.1 The scaling function φ(x) is called m regular if φ(x) has continuous derivatives of order m and its corresponding wavelet ψ(x) has vanishing moments of order m, i.e., ∫ x k ψ(x)dx=0, k=0,1,…,m−1.

The following conditions about the scaling function φ and the kernel function K(x,y) will be very useful in the third section.

Condition (θ)

The function θ φ (x)= ∑ k ∈ Z |φ(x−k)| is such that ess sup x ∈ R θ φ (x)<∞.

Condition H(N)

There exists an integrable function F(x) such that for any x,y∈R, |K(x,y)|≤F(x−y), where ∫ | x | N F(x)dx<∞.

Condition M(N)

Condition H(N) is satisfied and ∫K(x,y) ( y − x ) k dy= δ 0 k , k=0,…,N, x∈R.

For any x∈R, j,k∈Z, denoted by φ j k (x):= 2 j 2 φ( 2 j x−k), ψ j k (x):= 2 j 2 ψ( 2 j x−k), then for any f(x)∈ L r (R):={f(x)| ∫ R | f ( x ) | r dx<∞}, where 1≤r<∞, we have the following equation [7]:

f(x)= ∑ k ∈ Z α J , k φ J , k (x)+ ∑ j ≥ J ∑ k ∈ Z β j , k ψ j , k (x),a.e.,
(2.1)

where

α J , k = ∫ R f(x) φ J , k (x)dx, β j , k = ∫ R f(x) ψ j , k (x)dx.

2.2 Sobolev space

The Sobolev space W r N (R) (N∈ N + ) is defined by W r N (R):={f:f∈ L r (R), f ( N ) ∈ L r (R)}, which is equipped with the norm ∥ f ∥ W r N := ∥ f ∥ r + ∥ f ( N ) ∥ r . The Sobolev balls W ˜ r N (A,L) are defined as follows:

W Ëœ r N ( A , L ) : = { f ∈ W r N ( R ) : f  is a probability density function , supp f ≤ A , ∥ f ( N ) ∥ r ≤ L } .

Between a Sobolev space and a Besov space, the following embedding conclusions are established.

Lemma 2.1 [8]

Let s>0, 1≤p,q,r≤∞, then

  1. (i)

    W r N ↪ B r ∞ N ↪ B ∞ ∞ N − 1 / r , ∀N>1/r;

  2. (ii)

    B r q s ↪ B p q s ′ , ∀r<p, s ′ =s−1/r+1/p,

where A↪B denotes that the Banach space A is continuously embedding in the Banach space B, i.e., there exists a constant c≥0 such that for any u∈A, we have ∥ u ∥ B ≤c ∥ u ∥ A .

2.3 Auxiliary lemmas

The following lemmas given by [9] will be used in the next section.

Lemma 2.2 If the scaling function φ satisfies Condition (θ), then for any sequence { λ k } k ∈ Z satisfying ∥ λ ∥ l p := ( ∑ k | λ k | p ) 1 p <∞, we have C 1 ∥ λ ∥ l p 2 ( j 2 − j p ) ≤ ∥ ∑ k λ k φ j , k ∥ p ≤ C 2 ∥ λ ∥ l p 2 ( j 2 − j p ) , where C 1 = ( ∥ θ φ ∥ ∞ 1 p ∥ φ ∥ 1 1 q ) − 1 , C 2 = ( ∥ θ φ ∥ ∞ 1 q ∥ φ ∥ 1 1 p ) − 1 , 1≤p≤∞, 1 p + 1 q =1.

Lemma 2.3 For some integer N≥0, if the kernel function K(x,y) satisfies Conditions M(N) and H(N+1), f∈ B p q s (R), where 1≤p,q≤∞, 0<s<N+1, then we have ∥ K j f − f ∥ p = 2 − j s ε j , where ε j ∈ l q .

Lemma 2.4 (Rosenthal inequality)

Let X 1 ,…, X n be independent random variables such that E( X i )=0 and E( | X i | p )<∞, then there exists a constant C(p)>0 such that

E ( | ∑ i = 1 n X i | p ) ≤ C ( p ) ( ∑ i = 1 n E ( | X i | p ) + ( ∑ i = 1 n E ( X i 2 ) ) p / 2 ) , p > 2 , E ( | ∑ i = 1 n X i | p ) ≤ ( ∑ i = 1 n E ( X i 2 ) ) p / 2 , 0 < p ≤ 2 .

Lemma 2.5 (Bernstein inequality)

Let X 1 , X 2 ,…, X n be independent random variables such that E( X i )=0, E( X i 2 )≤ σ 2 , | X i |≤M<∞. Then

P ( | 1 n ∑ i = 1 n X i | > λ ) ≤2exp ( − n λ 2 2 ( σ 2 + M λ / 3 ) ) ,∀λ>0.

Remark In this paper, we often use the notation A≲B to indicate that A⩽cB with a positive constant c, which is independent of A and B. If A≲B and B≲A, we write A∼B.

3 Main results

In this paper, our hard thresholding wavelet density estimator is defined as follows:

f ˆ n X non (x)= ∑ k α ˆ j 0 k φ j 0 k (x)+ ∑ j = j 0 j 1 ∑ k β ˆ j k ∗ ψ j k (x),
(3.1)

where

α ˆ j 0 k := μ ˆ n ∑ i = 1 n φ j 0 k ( Y i ) g ( Y i ) , β ˆ j k := μ ˆ n ∑ i = 1 n ψ j k ( Y i ) g ( Y i ) , μ ˆ := n ∑ i = 1 n 1 g ( Y i ) .

The hard thresholding wavelet coefficients are β ˆ j k ∗ := β ˆ j k I{| β ˆ j k |≥λ}, where

I { | β ˆ j k | ≥ λ } :={ 1 , | β ˆ j k | ≥ λ , 0 , | β ˆ j k | < λ .

Suppose that the parameters j 0 , j 1 , λ of the wavelet thresholding estimator (3.1) satisfy the assumptions:

2 j 0 ∼{ ( ( ln n ) p − r r n ) 1 2 N + 1 , r > p 2 N + 1 , n 1 − 2 / p 2 ( N − 1 / r ) + 1 , r ≤ p 2 N + 1 ,
(3.2)
2 j 1 ∼{ n N N ′ ( 2 N + 1 ) , r > p 2 N + 1 , ( n ln n ) 1 2 ( N − 1 / r ) + 1 , r ≤ p 2 N + 1 ,
(3.3)
λ=c j n ,
(3.4)

where c is a suitably chosen positive constant.

Lemma 3.1 Suppose that there exist two constants g 1 and g 2 such that 0< g 1 ≤g(x)≤ g 2 <∞ for x∈R. Let α j k , β j k be the coefficients in the expansion (2.1) and let α ˆ j k , β ˆ j k be defined by estimator in (3.1). If 2 j ≤n, then for any 1≤p<∞, we have

  1. (i)

    E | α j k − α ˆ j k | p ≲ n − p 2 ;

  2. (ii)

    E | β j k − β ˆ j k | p ≲ n − p 2 .

Proof (i) From the definition of α ˆ j k and the triangular inequality, we have

| α ˆ j , k − α j , k | = | μ ˆ n ∑ i = 1 n φ j , k ( Y i ) g ( Y i ) − α j , k | = | μ ˆ μ ( μ n ∑ i = 1 n φ j , k ( Y i ) g ( Y i ) − α j , k ) + μ ˆ α j , k ( 1 μ − 1 μ ˆ ) | ≤ | μ ˆ μ | | μ n ∑ i = 1 n φ j , k ( Y i ) g ( Y i ) − α j , k | + | μ ˆ α j , k | | 1 μ ˆ − 1 μ | .

Since g 1 ≤g(y)≤ g 2 , we have

μ ˆ = n ∑ i = 1 n 1 g ( Y i ) ≤ g 2 ,μ=Eg(X)≥ g 1 ,

and

| α j , k |≤∫ | f X ( y ) | | φ j , k ( y ) | dy≤ ( ∫ | f X ( y ) | 2 d y ) 1 2 ( ∫ | φ j , k ( y ) | 2 d y ) 1 2 ≤ A 1 / 2 ∥ f ∥ ∞ .

Furthermore, a Sobolev space and a Besov space have the following embedding theorem, W r N ↪ B r ∞ N ↪ B ∞ ∞ N − 1 / r , for any integer N>1/r, then we have ∥ f ∥ ∞ ≤ ∥ f ∥ B ∞ ∞ N − 1 / r ≤ ∥ f ∥ W r N =c. Therefore, by the convexity inequality, we get

E | α ˆ j , k − α j , k | p ≤ E ( g 2 g 1 | μ n ∑ i = 1 n φ j , k ( Y i ) g ( Y i ) − α j , k | + g 2 A 1 / 2 c | 1 μ ˆ − 1 μ | ) p ≤ 2 p − 1 max { g 2 g 1 , g 2 A 1 / 2 c } p E ( | μ n ∑ i = 1 n φ j , k ( Y i ) g ( Y i ) − α j , k | p + | 1 μ ˆ − 1 μ | p ) ≲ E | μ n ∑ i = 1 n φ j , k ( Y i ) g ( Y i ) − α j , k | p + E | 1 μ ˆ − 1 μ | p = : T 1 + T 2 ,

where T 1 :=E | μ n ∑ i = 1 n φ j , k ( Y i ) g ( Y i ) − α j , k | p , T 2 :=E | 1 μ ˆ − 1 μ | p .

The term T i is estimated as follows. Firstly, let ξ i :=μ φ j , k ( Y i ) g ( Y i ) − α j , k , we can see that they are i.i.d., and E( ξ i )=0. Moreover, for any m≥2,

E | ξ i | m = E | μ φ j , k ( Y i ) g ( Y i ) − α j , k | m ≤ 2 m − 1 ( E | μ φ j , k ( Y i ) g ( Y i ) | m + | α j , k | m ) ,

where

E | μ φ j , k ( Y i ) g ( Y i ) | m = μ m − 1 φ j , k m − 2 ( Y i ) g ( Y i ) m − 1 E | μ φ j , k 2 ( Y i ) g ( Y i ) | ≤ g 2 m − 1 g 1 − m + 1 2 j 2 ( m − 2 ) ∥ φ ∥ ∞ m − 2 E | μ φ j , k 2 ( Y i ) g ( Y i ) | ,

and

E | μ φ j , k 2 ( Y i ) g ( Y i ) | = ∫ μ φ j , k 2 ( y ) g ( y ) f Y ( y ) d y = ∫ μ φ j , k 2 ( y ) g ( y ) g ( y ) f X ( y ) μ d y ≤ ∥ f ∥ ∞ ≤ ∥ f ∥ W r N = c .

So, we have

E | μ φ j , k ( Y i ) g ( Y i ) | m ≤ g 2 m − 1 g 1 − m + 1 2 j 2 ( m − 2 ) ∥ φ ∥ ∞ m − 2 c.

Since 2 j ≤n, we obtain

E | ξ i | m ≤C 2 j ( m − 2 ) / 2 ≲ n ( m − 2 ) / 2 .

By Rosenthal’s inequality, we have

T 1 =E | 1 n ∑ i = 1 n ξ i | p ≲ n − p ( n E | ξ i | p + n p / 2 ( E | ξ i | 2 ) p / 2 ) ≲ n − p / 2 .
(3.5)

To estimate the term T 2 , let η i = 1 g ( Y i ) − 1 μ . We can compute E( η i )=0 easily, and for any m≥2, E | η i | m ≤C.

If p≥2, i.e., 1−p<−p/2, using Rosenthal’s inequality, we have

T 2 =E | 1 n ∑ i = 1 n η i | p ≲ n − p ( n E | η i | p + n p / 2 ( E | η i | 2 ) p / 2 ) ≲ n − p + 1 + n − p / 2 ≲ n − p / 2 .
(3.6)

If 1≤p<2, we get

T 2 =E | 1 n ∑ i = 1 n η i | p ≤ n − p ( n p / 2 ( E | η i | 2 ) p / 2 ) ≤ n − p / 2 .
(3.7)

By (3.5), (3.6) and (3.7), we obtain

E | α ˆ j , k − α j , k | p ≲ T 1 + T 2 ≲ n − p / 2 .
  1. (ii)

    It is similar to (i), we omit it. □

Lemma 3.2 If j 2 j ≤n, then for any ω>0, there exists a constant c>0 such that

P ( | β ˆ j k − β j k | > λ = c j n ) ≲ 2 − ω j .
(3.8)

Proof We can easily get

μ ˆ ≤ g 2 , μ ≥ g 1 , 1 μ ≤ g 1 − 1 , | β j , k | ≤ A 1 / 2 ∥ f ∥ ∞ ≤ A 1 / 2 ∥ f ∥ W r N .

Therefore,

| β ˆ j , k − β j , k | = | μ ˆ μ ( μ n ∑ i = 1 n ψ j , k ( Y i ) g ( Y i ) − β j , k ) + μ ˆ β j , k ( 1 μ − 1 μ ˆ ) | ≤ g 2 g 1 | 1 n ∑ i = 1 n ( μ ψ j , k ( Y i ) g ( Y i ) − β j , k ) | + g 2 A 1 / 2 ∥ f X ∥ W r N | 1 n ∑ i = 1 n ( 1 g ( Y i ) − 1 μ ) | = : g 2 g 1 | 1 n ∑ i = 1 n ξ i | + g 2 A 1 / 2 ∥ f X ∥ W r N | 1 n ∑ i = 1 n η i | ,

where ξ i =μ ψ j , k ( Y i ) g ( Y i ) − β j , k , η i = 1 g ( Y i ) − 1 μ . So, we get

P ( | β ˆ j , k − β j , k | > λ ) ≤ P ( g 2 g 1 | 1 n ∑ i = 1 n ξ i | + g 2 A 1 / 2 ∥ f X ∥ W r N | 1 n ∑ i = 1 n η i | > λ ) ≤ P ( g 2 g 1 | 1 n ∑ i = 1 n ξ i | > λ / 2 ) + P ( g 2 A 1 / 2 ∥ f X ∥ W r N | 1 n ∑ i = 1 n η i | > λ / 2 ) = P ( | 1 n ∑ i = 1 n ξ i | > λ g 1 2 g 2 ) + P ( | 1 n ∑ i = 1 n η i | > λ 2 g 2 A 1 / 2 ∥ f X ∥ W r N ) = : P 1 + P 2 ,

where P 1 :=P(| 1 n ∑ i = 1 n ξ i |> λ g 1 2 g 2 ), P 2 :=P(| 1 n ∑ i = 1 n η i |> λ 2 g 2 A 1 / 2 ∥ f X ∥ W r N ).

Now, we estimate P 1 . Clearly, E ξ i =0, and

E ξ i 2 = E ( μ ψ j , k ( Y i ) g ( Y i ) − β j , k ) 2 ≤ 2 ( E | μ ψ j , k ( Y i ) g ( Y i ) | 2 + β j , k 2 ) = 2 ( E | μ ψ j , k 2 ( Y i ) g ( Y i ) | μ g ( Y i ) + β j , k 2 ) ≤ 2 ( g 2 g 1 ∥ f X ∥ W r N + A ∥ f X ∥ W r N 2 ) : = σ 2 .

Furthermore, we have

| ξ i |= | μ ψ j , k ( Y i ) g ( Y i ) − E ( μ ψ j , k ( Y i ) g ( Y i ) ) | ≤2⋅ 2 j / 2 g 2 g 1 − 1 ∥ ψ ∥ ∞ .

By Bernstein’s inequality, we obtain

P ( | 1 n ∑ i = 1 n ξ i | > λ g 1 2 g 2 ) ≤ 2 exp ( − n λ 2 g 1 2 / 4 g 2 2 2 ( σ 2 + λ g 1 2 g 2 2 ⋅ 2 j / 2 g 2 g 1 − 1 ∥ ψ ∥ ∞ / 3 ) ) = 2 exp ( − n ⋅ c 2 j n ⋅ g 1 2 / 4 g 2 2 2 ( σ 2 + g 2 c j / n 2 j / 2 g 2 g 1 − 1 ∥ ψ ∥ ∞ g 1 / 3 ) ) = 2 exp ( − c 2 j g 1 2 / 4 g 2 2 2 ( σ 2 + j 2 j / n ∥ ψ ∥ ∞ c / 3 ) ) .

Since j 2 j ≤n, then

P ( | 1 n ∑ i = 1 n ξ i | > λ g 1 2 g 2 ) ≤ 2 exp ( − c 2 j g 1 2 / 4 g 2 2 2 ( σ 2 + ∥ ψ ∥ ∞ c / 3 ) ) = 2 exp ( − c 2 g 1 2 / 4 g 2 2 2 ( σ 2 + ∥ ψ ∥ ∞ c / 3 ) j ) .

Taking c 1 >0 such that c 1 2 g 1 2 / 4 g 2 2 2 ( σ 2 + ∥ ψ ∥ ∞ c 1 / 3 ) ≥ω, then

P 1 =P ( | 1 n ∑ i = 1 n ξ i | > λ g 1 2 g 2 ) ≤2 e − ω j ≲ 2 − ω j .
(3.9)

Next, we estimate P 2 . We compute that E η i =0, i.e.,

E η i = E ( 1 g ( Y i ) ) − E ( 1 μ ) = ∫ 1 g ( y ) f Y ( y ) d y − 1 μ = ∫ 1 g ( y ) g ( y ) f X ( y ) μ d y − 1 μ = 1 μ ∫ f X ( y ) d y − 1 μ = 0 ,

and

E η i 2 = E ( 1 g ( Y i ) − 1 μ ) 2 ≤ 2 ( E | 1 g ( Y i ) | 2 + 1 μ 2 ) ≤ 4 g 1 2 , | η i | = | 1 g ( Y i ) − 1 μ | = | 1 g ( Y i ) − E 1 g ( Y i ) | ≤ 2 g 1 − 1 .

By Bernstein’s inequality, we obtain

P ( | 1 n ∑ i = 1 n η i | > λ 2 g 2 A 1 / 2 ∥ f X ∥ W r N ) ≤ 2 exp ( − n ( λ / ( 2 g 2 A 1 / 2 ∥ f X ∥ W r N ) ) 2 2 ( 4 g 1 2 + λ g 1 − 1 / ( 3 g 2 A 1 / 2 ∥ f X ∥ W r N ) ) ) = 2 exp ( − n c 2 j / ( 4 n g 2 2 A ∥ f X ∥ W r N 2 ) 2 ( 4 g 1 2 + c j / n / ( 3 g 1 g 2 A 1 / 2 ∥ f X ∥ W r N ) ) ) .

Since j≤n, then

P ( | 1 n ∑ i = 1 n η i | > λ 2 g 2 A 1 / 2 ∥ f X ∥ W r N ) ≤ 2 exp ( − c 2 / ( 4 g 2 2 A ∥ f X ∥ W r N 2 ) 2 ( 4 g 1 2 + c / ( 3 g 1 g 2 A 1 / 2 ∥ f X ∥ W r N ) ) j ) .

Taking c 2 >0 such that c 2 2 / ( 4 g 2 2 A ∥ f X ∥ W r N 2 ) 2 ( 4 g 1 2 + c 2 / ( 3 g 1 g 2 A 1 / 2 ∥ f X ∥ W r N ) ) ≥ω, we have

P 2 =P ( | 1 n ∑ i = 1 n η i | > λ 2 g 2 A 1 / 2 ∥ f X ∥ W r N ) ≤2 e − ω j ≲ 2 − ω j .
(3.10)

Taking c=max{ c 1 , c 2 }, by (3.9) and (3.10), we have

P ( | β ˆ j , k − β j , k | > λ ) ≤ P 1 + P 2 ≲ 2 − ω j .

 □

Lemma 3.3 Suppose that there exist two constants g 1 and g 2 such that 0< g 1 ≤g(x)≤ g 2 <∞, for x∈R, and β ˆ j k , β ˆ j k ∗ are given by (3.1). Then

E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p ≲{ ( ln n ) c 3 n − N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r < p 2 N + 1 ,

where c 3 , c 4 are constants.

Proof By Lemma 2.2, we obtain

E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p ≤ E ∑ j = j 0 j 1 ∥ ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p ≲ E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k ∗ − β j k | p ) 1 p .

Furthermore, since β ˆ j k ∗ = β ˆ j k I{| β ˆ j k |>λ}, we have

E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p ≲ E ( ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p ) 1 p × ( I { | β ˆ j k | > λ , | β j k | ≥ λ 2 } + I { | β ˆ j k | > λ , | β j k | < λ 2 } ) + ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β j k | p ) 1 p ( I { | β ˆ j k | ≤ λ , | β j k | ≤ 2 λ } + I { | β ˆ j k | ≤ λ , | β j k | > 2 λ } ) ) .

Note that

I { | β ˆ j k | > λ , | β j k | < λ 2 } ≤ I { | β ˆ j k − β j k | > λ 2 } , I { | β ˆ j k | ≤ λ , | β j k | > 2 λ } ≤ I { | β ˆ j k − β j k | > λ 2 } ,

and if | β ˆ j k |≤λ, | β j k |>2λ, we get | β ˆ j k − β j k |≥| β j k |−| β ˆ j k |> | β j k | 2 , i.e., | β j k |<2| β ˆ j k − β j k |; therefore, we have

E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p ≲ E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β j k | ≥ λ 2 } ) 1 p + E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β ˆ j k − β j k | > λ 2 } ) 1 p + ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β j k | p I { | β j k | ≤ 2 λ } ) 1 p = : W 1 + W 2 + W 3 ,

where

W 1 : = E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β j k | ≥ λ 2 } ) 1 p , W 2 : = E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β ˆ j k − β j k | > λ 2 } ) 1 p , W 3 : = ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β j k | p I { | β j k | ≤ 2 λ } ) 1 p .
  1. (i)

    Firstly, we estimate

    W 1 :=E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β j k | ≥ λ 2 } ) 1 p .

By Lemma 3.1, we have

E | β ˆ j k − β j k | p ≲ n − p 2 .

Using I{| β j k |≥ λ 2 }≤ ( | β j k | λ 2 ) r and Jensen’s inequality, we obtain

W 1 = E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β j k | ≥ λ 2 } ) 1 p ≤ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k E | β ˆ j k − β j k | p I { | β j k | > λ 2 } ) 1 p ≲ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k n − p 2 ( | β j k | λ / 2 ) r ) 1 p = ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) n − 1 2 λ − r p ∥ β j ⋅ ∥ r r p .

By ∥ β j ⋅ ∥ r ≲ 2 − j ( N + 1 2 − 1 r ) and λ∼c ln n n , we have

W 1 ≲ ∑ j = j 0 j 1 n − 1 2 2 j ( 1 2 − 1 p ) 2 − j ( N + 1 2 − 1 r ) r p ( n ln n ) r 2 p = n − 1 2 ( n ln n ) r 2 p ∑ j = j 0 j 1 2 − j ξ ≤ n r − p 2 p ln n − r 2 p ( 2 − j 0 ξ I { ξ > 0 } + 2 − j 1 ξ I { ξ < 0 } + ( j 1 − j 0 + 1 ) I { ξ = 0 } ) .

Using Lemma 3.1 and (3.4), we obtain

W 1 ≲{ ( ln n ) c 3 n − N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r < p 2 N + 1 ,
(3.11)

where c 3 , c 4 are constants.

  1. (ii)

    For

    W 3 := ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β j k | p I { | β j k | ≤ 2 λ } ) 1 p ,

let ξ:= 1 2 ( r p (2N+1)−1). By I{| β j k |≤2λ}≤ ( 2 λ | β j k | ) p − r (r<p), we have

W 3 ≲ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β j k | p ( 2 λ | β j k | ) p − r ) 1 p = ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( 2 λ ) p − r p ( ∑ k | β j k | r ) 1 p = ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( 2 λ ) p − r p ∥ β j ⋅ ∥ r r p .

Since f X ∈ W r N (R), then ∥ β j ⋅ ∥ r ≲ 2 − j ( N + 1 2 − 1 r ) . Taking λ=c j n ∼c ln n n , j 1 − j 0 ∼C(lnn), we have

W 3 ≲ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) λ p − r p 2 − j ( N + 1 2 − 1 r ) r p ≲ ( ln n n ) p − r 2 p ∑ j = j 0 j 1 2 − j 1 2 [ r p ( 2 N + 1 ) − 1 ] = ( ln n n ) p − r 2 p ∑ j = j 0 j 1 2 − j ξ ≲ ( ln n n ) p − r 2 p ( 2 − j 0 ξ I { ξ > 0 } + 2 − j 1 ξ I { ξ < 0 } + ( j 1 − j 0 + 1 ) I { ξ = 0 } ) .

Note that ξ>0 if and only if r> p 2 N + 1 . When ξ=0, i.e., p=r(2N+1), we can compute N ′ 2 ( N − 1 r ) + 1 = p − r 2 p . Using (3.2), (3.3), we obtain

W 3 ≲{ ( ln n n ) p − r 2 p 2 − j 0 ξ = ( ln n ) p − r 2 r ( 2 N + 1 ) n − N 2 N + 1 , r > p 2 N + 1 , ( ln n n ) p − r 2 p ( j 1 − j 0 + 1 ) ≲ ( ln n n ) p − r 2 p , r = p 2 N + 1 , ( ln n n ) p − r 2 p 2 − j 1 ξ = ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r < p 2 N + 1 .
(3.12)
  1. (iii)

    Finally, we estimate

    W 2 :=E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β ˆ j k − β j k | > λ 2 } ) 1 p .

Let 1< q ′ ,q<∞, and 1 q + 1 q ′ =1. Using Jensen’s inequality and Hölder’s inequality, we have

W 2 = E ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k | β ˆ j k − β j k | p I { | β ˆ j k − β j k | > λ 2 } ) 1 p ≤ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k E ( | β ˆ j k − β j k | p I { | β ˆ j k − β j k | > λ 2 } ) ) 1 p ≤ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k ( E | β ˆ j k − β j k | q p ) 1 q ( E I q ′ { | β ˆ j k − β j k | > λ 2 } ) 1 q ′ ) 1 p ≤ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( ∑ k ( E | β ˆ j k − β j k | q p ) 1 q ( P ( | β ˆ j k − β j k | > λ 2 ) ) 1 q ′ ) 1 p .

By Lemma 3.1 and Lemma 3.2, we obtain

W 2 ≤ ∑ j = j 0 j 1 2 j ( 1 2 − 1 p ) ( 2 j n − p 2 2 − ω j q ′ ) 1 p = n − 1 2 ∑ j = j 0 j 1 2 j ( 1 2 − ω p q ′ ) .

Taking large enough ω such that 1 2 < ω p q ′ , we get

W 2 ≲ n − 1 2 2 j 0 ( 1 2 − ω p q ′ ) ≤ n − 1 2 j 0 .

Taking 2 j 0 as in (3.2), we have

W 2 ≲{ ( ln n ) p − r 2 r ( 2 N + 1 ) n − N 2 N + 1 , r > p 2 N + 1 , n − N ′ 2 ( N − 1 / r ) + 1 , r ≤ p 2 N + 1 .
(3.13)

Putting (3.11), (3.12) and (3.13) together, we can obtain

E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p ≲{ ( ln n ) c 3 n − N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r < p 2 N + 1 ,

where c 3 , c 4 are constants. □

Theorem 3.4 Let the scaling function φ(x) be orthonormal, compactly supported and N+1 regular. There exist two positive constants g 1 and g 2 such that g 1 ≤g(x)≤ g 2 , x∈R. If f ˆ n X non is the nonlinear wavelet estimator in (3.1), and assumptions (3.2), (3.3) and (3.4) are satisfied, then for any f X ∈ W ˜ r N (A,L), where 1≤r<p<∞, N> 1 r , we have

sup f X ∈ W ˜ r N ( A , L ) E ∥ f ˆ n X non − f X ∥ p ≲{ ( ln n ) c 3 n − N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r < p 2 N + 1 ,

where N ′ =N−1/r+1/p, c 3 , c 4 are constants.

Proof By the definition of f ˆ n X non in (3.1) and the expansion of f X in (2.1), one has

f ˆ n X non − f X = ∑ k ( α ˆ j 0 k − α j 0 k ) φ j 0 k + ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k + P j 1 + 1 f X − f X .

Then

E ∥ f ˆ n X non − f X ∥ p ≤ E ∥ ∑ k ( α ˆ j 0 k − α j 0 k ) φ j 0 k ∥ p + E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p + ∥ P j 1 + 1 f X − f X ∥ p = : I 1 + I 2 + I 3 ,

where

I 1 : = E ∥ ∑ k ( α ˆ j 0 k − α j 0 k ) φ j 0 k ∥ p , I 2 : = E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p , I 3 : = ∥ P j 1 + 1 f X − f X ∥ p .

Firstly, we estimate

I 1 :=E ∥ ∑ k ( α ˆ j 0 k − α j 0 k ) φ j 0 k ∥ p .

By Lemma 2.2 and Jensen’s inequality,

I 1 ≲ 2 j 0 ( 1 2 − 1 p ) E ( ∑ k | α ˆ j 0 k − α j 0 k | p ) 1 p ≤ 2 j 0 ( 1 2 − 1 p ) ( ∑ k E | α ˆ j 0 k − α j 0 k | p ) 1 p .

Since f X (x) and φ(x) are compactly supported, then the number of elements in {k: α j 0 k ≠0} is O( 2 j 0 ). By Lemma 3.1, we have E | α ˆ j 0 k − α j 0 k | p ≲ n − p 2 .

Therefore

I 1 ≲ 2 j 0 ( 1 2 − 1 p ) ( 2 j 0 n − p 2 ) 1 p = n − 1 2 j 0 .

Using (3.2), we have

I 1 ≲{ ( ln n ) p − r 2 r ( 2 N + 1 ) n − N 2 N + 1 , r > p 2 N + 1 , n − N ′ 2 ( N − 1 / r ) + 1 , r ≤ p 2 N + 1 ,
(3.14)

where N ′ =N− 1 r + 1 p .

Next, we estimate

I 3 := ∥ P j 1 + 1 f X − f X ∥ p .

In reference [9], it turns out that if the scaling function φ(x) is orthonormal, compactly supported and N+1 regular, then the associated kernel function K(x,y):= ∑ k φ(x−k)φ(y−k) satisfies Conditions H(N+1) and M(N), and K j f(x)= P j f(x).

Since a Sobolev space and a Besov space have the following embedding theorem: W Ëœ r N ↪ B Ëœ r ∞ N ↪ B Ëœ p ∞ N ′ , where N ′ =N− 1 r + 1 p , then f X ∈ B Ëœ p ∞ N ′ . By Lemma 2.3, we have

∥ P j 1 + 1 f X − f X ∥ p ≲ 2 − j 1 N ′ .

Taking 2 j 1 as in (3.3), we have

I 3 ≲{ n − N 2 N + 1 , r > p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r ≤ p 2 N + 1 .
(3.15)

Finally, we estimate

I 2 :=E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p .

Using Lemma 3.3, we obtain

E ∥ ∑ j = j 0 j 1 ∑ k ( β ˆ j k ∗ − β j k ) ψ j k ∥ p ≲{ ( ln n ) c 3 n − N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r < p 2 N + 1 .
(3.16)

By (3.14), (3.15) and (3.16), we obtain

sup f X ∈ W ˜ r N ( A , L ) E ∥ f ˆ n X non − f X ∥ p ≲{ ( ln n ) c 3 n − N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r < p 2 N + 1 .

 □

4 Optimality

Now, we discuss the optimality of the rates of convergence. Using similar techniques as those in reference [10], we can obtain the following lower bound theorem.

Theorem 4.1 Let the scaling function φ(x) be orthonormal, compactly supported and N+1 regular, f X ∈ W ˜ r N (A,L). If there exist two positive constants g 1 and g 2 such that g 1 ≤g(x)≤ g 2 , x∈R, then for any estimator f ˆ n X , we have

inf f ˆ n X sup f X ∈ W ˜ r N ( A , L ) E ∥ f ˆ n X − f X ∥ p ≳{ n − N 2 N + 1 , r > p 2 N + 1 , ( ln n n ) N ′ 2 ( N − 1 / r ) + 1 , r ≤ p 2 N + 1 ,

where 1≤r,p<∞, N> 1 r .

Remark The proof is very similar to that in reference [10], in which the author studied the lower bound of the convergence rates in Besov spaces for the samples without bias data.

According to Theorem 4.1, we can see that:

  1. (i)

    When r< p 2 N + 1 , our nonlinear estimator can attain the optimal rate.

  2. (ii)

    When r= p 2 N + 1 , our convergence rate and the optimal rate of convergence differ in a logarithmic. So, it is sub-optimal.

  3. (iii)

    When r> p 2 N + 1 , the logarithmic factor is an extra penalty for the chosen wavelet thresholding, our convergence rate is sub-optimal.

References

  1. Efromovich S Springer Series in Statistics. In Nonparametric Curve Estimation. Methods, Theory, and Applications. Springer, New York; 1999.

    Google Scholar 

  2. Vardi Y: Nonparametric estimation in the presence of length bias. Ann. Stat. 1982, 10(2):616–620.

    Article  MathSciNet  MATH  Google Scholar 

  3. Jones MC: Kernel density estimation for length-biased data. Biometrika 1991, 78(3):511–519.

    Article  MathSciNet  MATH  Google Scholar 

  4. Efromovich S: Density estimation for biased data. Ann. Stat. 2004, 32: 1137–1161.

    Article  MathSciNet  MATH  Google Scholar 

  5. Ramirez P, Vidakovic B: Wavelet density estimation for stratified size-biased sample. J. Stat. Plan. Inference 2010, 140(2):419–432.

    Article  MathSciNet  MATH  Google Scholar 

  6. Christophe C: Wavelet block thresholding for density estimation in the presence of bias. J. Korean Stat. Soc. 2010, 39: 43–53.

    Article  MathSciNet  MATH  Google Scholar 

  7. Kelly C, Kon MA, Rapheal LA: Local convergence for wavelet expansion. J. Funct. Anal. 1994, 126: 102–138.

    Article  MathSciNet  Google Scholar 

  8. Triebel H: Theory of Function Spaces. Birkhäuser, Basel; 1983.

    Book  MATH  Google Scholar 

  9. Härdle W, Kerkyacharian G, Picard D, Tsybakov A: Wavelets, Approximation and Statistical Applications. Springer, Berlin; 1997.

    MATH  Google Scholar 

  10. Wang HY: Convergence rates of density estimation in Besov spaces. Appl. Math. 2011, 2(10):1258–1262.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (No. 11271038) and Foundation of BJUT (No. 006000542213501).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinru Wang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

WJR participated in the sequence alignment and drafted the manuscript. WM participated in the design of the study and performed the statistical analysis. ZY conceived of the study and participated in its design and coordination. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Wang, J., Wang, M. & Zhou, Y. Nonlinear wavelet density estimation for biased data in Sobolev spaces. J Inequal Appl 2013, 308 (2013). https://doi.org/10.1186/1029-242X-2013-308

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1029-242X-2013-308

Keywords