Asymptotics for L1$L_{1}$-wavelet method for nonparametric regression

Wavelets are particularly useful because of their natural adaptive ability to characterize data with intrinsically local properties. When the data contain outliers or come from a population with a heavy-tailed distribution, L1-estimation should obtain a better fit. In this paper, we propose a L1-wavelet method for nonparametric regression, and derive the asymptotic properties of the L1-wavelet estimator, including the Bahadur representation, the rate of convergence and asymptotic normality. The rate of convergence of it is comparable with the optimal convergence rate of the nonparametric estimation in nonparametric models, and it does not require the continuously differentiable conditions of a nonparametric function.


Introduction
Consider the problem of estimating the underlying regression function from a set of noisy data. The nonparametric regression is an underlying framework. It has the following standard form: y i = g(t i ) + i , i = 1, . . . , n, (1.1) ods certainly have some nice properties as regards Gaussian errors, but this method will not perform well because of the high sensitiveness to extreme outliers, specially to the errors having a heavy-tailed distribution. More robust estimation methods are required.
The local median and M-estimators have been studied; see, for example, [7][8][9][10][11][12][13]. Also see [14,15] for more details on quantile regression and robust estimation, respectively. As pointed out in [9], among many robust estimation methods, the L 1 method based on least absolute deviations behaves quite well because of downweight outliers, unique solutions and no transition point in the influence function (such as the additional parameter c in Huber's ρ c function). The above methods basically require that the unknown function g has high smoothness. But in reality, the condition may not be satisfied. In fact, objects of some practical areas, such as signal and image processing, are frequently inhomogeneous.
In this paper, we consider the wavelet technique to recover the signal function g based on L 1 method for the robust case.
We aim to study the asymptotic properties on L 1 -wavelet estimator for the nonpara-  [20] also gave asymptotic bias and variance of the wavelet density estimator by wavelet-based reproducing kernels. Zhou and You [21] constructed wavelet estimators for varying-coefficient partially linear regression models, and established their asymptotic normalities and some convergence rates. For varyingcoefficient models, the convergence rate and asymptotic normality of wavelet estimators were considered by [22,23] provided asymptotic bias and variance of wavelet estimator for regression function under a mixing stochastic process. Recently, Chesneau et al. [24] proposed the nonparametric wavelet estimators of the quantile density function and its consistency. Li and Xiao [25] considered a wavelet estimator for the mean regression function with strong mixing errors and investigated their asymptotic rates of convergence by using the thresholding of the empirical wavelet coefficients. Berry-Esseen type bounds for wavelet estimators for semiparametric regression models were studied by [26,27]. For the nonparametric models (1.1), as we learned, no study on L 1 -wavelet estimators is reported. For this model, the estimation should be combined with the special feature of the model.
In this paper, we develop L 1 -wavelet method for nonparametric regression model (1.1) by adopting wavelet to detect and represent localized features of the signal function g, and applying L 1 to yield better recovery for outliers or heavy-tailed data. The advantage of L 1 -wavelet method is in avoiding the restrictive smoothness requirement for nonparametric function of the traditional smoothing approaches, such as kernel and local polynomial methods, and to robustify the usual mean regression. Last, we investigate asymptotic properties of the L 1 -wavelet estimators, including the Bahadur representation, the rate of convergence and asymptotic normality.
The paper is organized as follows. In Sect. 2, we provide some necessary background on wavelet and develop L 1 -wavelet estimation for the model (1.1). Asymptotic properties of L 1 -wavelet estimators are presented in Sect. 3. Technical proofs are deferred to Sect. 4.

L 1 -Wavelet estimation
Wavelet analysis requires a description of two related and suitably chosen orthonormal basic functions: the scaling function φ and the wavelet ψ. A wavelet system is generated by dilation and translation of φ and ψ through A multiresolution analysis of L 2 (R) consists of a nested sequence of closed subspace V m , m ∈ Z of L 2 (R), where L 2 (R) is the set of square integral functions over real line. Since {φ(·k), k ∈ Z} is an orthogonal family of L 2 (R) and V 0 is the subspace spanned, {φ 0k , k ∈ Z} and {φ mk , k ∈ Z} are the orthogonal bases of V 0 and V m , respectively. From the Moore-Aronszajn theorem [28], it follows that is a reproducing kernel of V 0 . By self-similarity of multiresolution subspaces, is a reproducing kernel of V m . Thus, the projection of g on the space V m is given by This motivates us to define a L 1 -wavelet estimator of g bŷ where A i are intervals that partition [0, 1], so that t i ∈ A i . One way of defining the intervals For the ith sample point, define e + i and ei to be the positive and negative parts of e i . Then, with the noisy samples, problem (2.1) can be reduced to the following linear program: Subject to a + e + -e -= y, where a = a1 p , 1 p is a p dimensional vector whose each component is 1, e + = (e + 1 , . . . , e + n ) T , e -= (e -1 , . . . , en ) T and y = (y 1 , . . . , y n ) T . In addition, A i E m (t, s) ds can be calculated by the cascade algorithm given by [16]. Thus, the L 1 -wavelet estimator can be easily obtained. This linear program is just for calculating the estimator. To establish the asymptotic properties, we work only with (2.1).

Asymptotic properties
We begin with the following assumptions required to derive the asymptotic properties of the proposed estimator in Sect. 2.
(A1) The noisy errors i are i.i.d. with median 0 and a continuous, positive density f in a neighborhood of 0. (A2) g belongs to the Sobolev space H ν (R) with order ν > 1/2. (A3) g satisfies the Lipschitz of order condition of order γ > 0. (A4) φ has a compact support, and is in the Schwarz space with order l > ν, it satisfies the Lipschitz condition with order l.
Our results are as follows.
where sign(·) is a sign function.
which is comparable with the optimal convergence rate of the nonparametric estimation in nonparametric models. Meanwhile, it is the same as the results of [21] (in probability) and [22] (almost sure) for any t ∈ [0, 1] based on a least-square wavelet estimator, but they require that g is continuously differentiable, that is, g ∈ H ν (R) (ν > 3/2).
To obtain an asymptotic expansion of the variance and an asymptotic normality result, we need to consider an approximation toĝ(t) based on its values at dyadic points of order m. That is, we defineĝ d (t) =ĝ n (t (m) ) with t (m) = 2 m t /2 m , where z denotes the maximum integer not greater than z.
Remark 3.4ĝ d (t) is the piecewise-constant approximation ofĝ(t) at resolution 2 -m . The reason to consider this is that the variance ofĝ is unstable as a function of t, because var(ĝ(t)) = 2 m n -1 κ(t) We know that, if t is nondyadic, then the sequence t m wanders around the unit interval and fails to converge. Also see [16].

Technical proofs
In order to prove the main results, we first present several lemmas.

Lemma 4.1 Suppose that (A4) holds. We have:
(i) E 0 (t, s) ≤ c k /(1 + |t -s|) k and E k (t, s) ≤ 2 k c k /(1 + 2 k |t -s|) k , where k is a positive integer and c k is a constant depending on k only.
The proofs of (i) and (ii) can be found in [16], and (iii) follows from (i); the proof of (iv) can be found in [30].

Lemma 4.2 Suppose that (A4)-(A5) hold and h(·) satisfies (A2)-(A3). Then
It follows easily from Theorem 3.2 of [16]. It can be found in [31]. See [32]. Below, we give the proof of the main results. The proof of Theorem 3.1 uses the idea of [32] and the convex lemma (Lemma 4.4). To complete the proof of Theorem 3.2, it is enough to check the Lindeberg-type condition.

Proof of Theorem 3.1 (i) From (2.1), note thatĝ(t) =â andâ minimizes
]. Then θ minimizes the function The idea behind the proof, as in [32], is to approximate G n (θ ) by a quadratic function whose minima have an explicit expression, and then to show thatθ is close enough to those minima that share their asymptotic behavior. We now set out to approximate G n (θ ) by a quadratic function of θ . Write s) ds, which does not depend on θ , and We have G n (θ ) = E G n (θ ) + W n θ + R n (θ ) -E R n (θ ) .
where δ n = max{(n -γ + η m ), |θ |}. For (4.1), note that We get Let a n = O p {n -γ + η m + 2 m n 1+γ }. Combining (4.2)-(4.4), for each fixed θ , we have = f (0)θ 2 + (W n + a n )θ , (4.5) with a n = o p (1) uniformly. Note that It is easy to see that W n has a bounded second moment and hence is stochastically bounded. Since the convex function G n (θ ) -(W n + a n )θ converges in probability to the convex function f (0)θ 2 , it follows from the convexity lemma, Lemma 4.4, that, for every compact set K , sup θ∈K G n (θ ) -(W n + a n )θf (0)θ 2 = o p (1). (4.6) Thus, the quadratic approximation to the convex function G n (θ ) holds uniformly for θ in any compact set. So, using the convexity assumption again, the minimizerθ of G n (θ ) converges in probability to the minimizer θ = -1 2 f -1 (0)(W n + a n ), (4.7) that is, The assertion can be proved by some elementary arguments, which is similar to the proof of Theorem 1 in [32]. Based on (4.6), let G n (θ ) = (W n + a n )θ + f (0)θ 2 + r n (θ ) which can be written as with sup θ∈K |r n (θ )| = o p (1). Becauseθ is stochastically bounded. The compact set K can be chosen to contain a closed ball B(n) with centerθ and radius δ in probability, thereby implying that Now consider the behavior of G n (θ ) outside B(n). Suppose θ =θ + βμ, with β > δ and μ is a unit vector. Define θ * as the boundary point B n that lies on the line segment fromθ to θ , i.e. θ * =θ + δμ. Convexity of G n (θ ), (4.8) and the definition of n imply It follows that when 2 n < f (0)δ 2 , which happens with probability tending to one, the minimum of G n (θ ) cannot occur at any θ with |θ -θ| > δ. This implies that, for any δ > 0 and for large enough n, the minimum of G n (θ ) must be achieved with B(n), i.e., |θ -θ | ≤ δ with probability tending to one. Thus, it completes the proof of (i).
From (4.12) and Lemma 4.1, one sees that the order is O(2 m /n) → 0. Thus, we complete Theorem 3.2.