Skip to main content

\(L_{1}\)-Estimation for covariate-adjusted regression


We study a covariate-adjusted regression(CAR) model that is proposed for such situations where both predictors and response in a regression model are not directly observable but are distorted by a multiplicative factor that is determined by an unknown function of some observable covariate. By establishing a connection to varying-coefficient models, we present the local linear \(L_{1}\)-estimation method when the underlying error distribution deviates from a normal distribution. The robust estimators of parameters are proposed in the underlying regression model. The consistency and asymptotic normality of the robust estimators are investigated. Since the limit distribution depends on the unknown components of the errors, an empirical likelihood ratio method based on \(L_{1}\) estimator is proposed. The confidence intervals for the regression coefficients are constructed. Simulation results demonstrate the superiority of the proposed estimators over other classical estimators when the underlying errors have heavy tails. Pima Indian diabetes data set is conducted to illustrate the performance of the proposed method, where the response and predictors are potentially contaminated by body mass index.


Covariate-adjusted regression(CAR) was initially proposed for regression analysis by Sentürk and Müller [18], where both the response and predictors are not directly observed. The available data are distorted by unknown functions of some common observable covariate. An example is the fibrinogen data collected on 69 hemodialysis patients, where the regression of fibrinogen level on serum transferrin level is of interest in Kaysen et al. [11]. Both response and predictor are known to be influenced in a multiplicative fashion by body mass index, defined as weight/height2. Based on the observation, Sentürk and Müller [18] suggested that the confounding variable affects the primary variables through a flexible multiplicative unknown function. Such way of adjustment may reduce non-negligible bias and lead to consistent estimators of the parameters of interest, which is through dividing by the body mass index identified as a common confounder. For the simple case of two variables of interest, the underlying variables are

$$ Y=\frac{\tilde{Y}}{\psi (U)},\quad\quad X=\frac{\tilde{X}}{\phi (U)}, $$

, where ‘‘’’indicates independence, Y and X are the unobservable continuous variables of interest, while and are available distorted variables. U is an observed continuous scalar confounding variable, \(\psi (\cdot )\) and \(\phi (\cdot )\) denote unknown smooth contaminating functions of U. The main goal is to uncover the relationship between response Y and covariate X, based on the confounding variable U, and on the contaminate variables and . Sentürk and Müller [18] considered the simple linear regression model

$$ Y=\gamma _{0}+\gamma _{1}X+e, $$

where \(\gamma _{0}\) and \(\gamma _{1}\) are unknown parameters, e is the error term, . Reasonable assumption for \(\psi (\cdot )\) and \(\phi (\cdot )\) is that the mean distorting effect vanishes, that is,

$$ E\bigl(\psi (U)\bigr)=1, \quad\quad E\bigl(\phi (U)\bigr)=1.$$

The central objective, based on the observation of the confounding variable U and the distorted observations \((\tilde{Y}, \tilde{X})\) in (1.1) is to estimate the unknown parameters \(\gamma _{0}\) and \(\gamma _{1}\).

To eliminate the effect caused by distortions, Sentürk and Müller [19] proposed covariate-adjusted varying coefficient model(CAVCM) to target the covariate-adjusted relationship between longitudinal variables. Sentürk and Nguyen [20] proposed the estimation procedures based on local polynomial smoothing technique (LP) for the model. Cui et al. [3] considered the covariate-adjusted nonlinear regression and proposed a direct plug-in estimation procedure for the model. Li et al. [12] studied covariate-adjusted partially linear regression models and obtained confidence intervals for the regression coefficients.

According to model (1.1)–(1.3), the regression of on can be expressed as

$$ \tilde{Y}=\beta _{0}(U)+\beta _{1}(U) \tilde{X}+e(U),$$

where \(\beta _{0}(U)=\psi (U)\gamma _{0}\), \(\beta _{1}(U)=\gamma _{1}\psi (U)/\phi (U)\), \(e(U)=\psi (U)e\).

This is a varying coefficient model with heteroscedasticity, that is, a useful extension of classical linear models. Varying coefficient models are widely used in diverse areas as the modeling bias can significantly be reduced and the ‘‘curse of dimensionality’’problem can also be avoided. See, for example, Hastie and Tibshirani [8], Fan and Zhang [57]. Least-squares (LS) method is the popular approach in the vast literature on model (1.4), as LS method has favorable properties for a large class of error distributions. However, this method will break down when the random error is adversely affected by outliers and heavy-tail distributions. The robust estimation method is desired. In this article, we propose robust coefficient estimation motivated by Tang et al. [21]. We use a two-step estimation procedure to estimate the unknown parameters. Firstly, we use \(L_{1}\)-estimation to estimate varying coefficients based on local linear fit. Because model (1.4) is heteroscedastic, the inferring methods are not same. Secondly, the estimates of unknown parameters are constructed based on weighted averages of these functions. However, the limiting variance has a complex structure with several unknown components. An estimated empirical log-likelihood approach to construct the confidence region of the regression parameter is developed. An empirical log-likelihood ratio is proved to be asymptotically standard chi-square.

The rest of this article is organized as follows. In Sect. 2, we describe the \(L_{1}\) estimation procedure and propose the estimation of both nonparametric and parametric components. We obtain the asymptotic results and discuss the efficiency of the estimators. In Sect. 3, we construct empirical likelihood based confidence regions for the parameters. Section 4 presents the hypothesis testing procedure. In Sects. 5 and 6, some simulations and empirical study are carried out to assess the performance of the proposed estimators and confidence regions. Section 7 concludes the paper with discussion. The proofs of theorems are deferred to Appendix.

Estimation and asymptotic behavior

Consider a covariate-adjusted regression model in the following general form:

$$ \textstyle\begin{cases} Y=\boldsymbol{X}^{\tau }\boldsymbol{\gamma }+e, \\ \tilde{Y}=\psi (U)Y, \\ \tilde{X}_{r}=\phi _{r}(U)X_{r},\quad r=1,\ldots,p, \end{cases} $$

where \(\boldsymbol{\gamma }=(\gamma _{1},\ldots,\gamma _{p})^{\tau }\). Y and \(X_{r}\), \(r=1,\ldots,p\), are unobservable variables distorted by smooth function \(\psi (U)\) and \(\phi _{r}(U)\). , \(\tilde{X_{r}}\), \(r=1,\ldots,p\), and univariate confounder U are observable variables. e is random error with 0.5 quantile being zero, \(E(\psi (U))=1\), \(E(\phi _{r}(U))=1\), \(r=1,\ldots,p\). Our goal is to estimate the unknown parameter γ consistently based the observed data, and to further establish asymptotic normality for the proposed estimators. The estimation of the regression coefficient γ is a two-step estimation procedure similar to that in Sentürk and Müller [18]. From model (2.1), we rewrite CAR into the following CAVCM:

$$ \tilde{Y}=\tilde{\boldsymbol{X}}^{\tau }\boldsymbol{\beta }(U)+e(U),$$

where \(\boldsymbol{\beta }(U)=(\beta _{1}(U),\ldots,\beta _{p}(U))^{\tau }\), \(\beta _{r}(U)=\gamma _{r}\frac{\psi (U)}{\phi _{r}(U)}\), \(r=1,\ldots, p\), \(e(U)=\psi (U)e\).

In the first step, we employ \(L_{1}\)-estimation to estimate varying coefficients \(\boldsymbol{\beta }(U)\) based on local linear fit. For U in the neighborhood of u, we use a local linear approximation

$$ \beta _{r}(U)\approx a_{r}(u)+a^{\prime }_{r}(u) (U-u) \overset{\triangle }{=}a_{r}+b_{r}(U-u) $$

for \(r=1,\ldots,p\).

Suppose that \(\{U_{i}, \tilde{\boldsymbol{X}}_{i}, \tilde{Y}_{i}\}\), \(i=1,\ldots, n\), are independent and identically distributed samples from model (2.1), \(\tilde{\boldsymbol{X}}_{i}=(\tilde{X}_{i1},\ldots,\tilde{X}_{ip})^{\tau }\). Let \((\hat{\boldsymbol{a}}^{\tau },\hat{\boldsymbol{b}}^{\tau })^{\tau }\) be the local linear \(L_{1}\)-estimate of \((\boldsymbol{a}^{\tau }, \boldsymbol{b}^{\tau })^{\tau }\) by minimizing

$$ \sum_{i=1}^{n} \bigl\vert \tilde{Y}_{i}-\tilde{\boldsymbol{X}}^{\tau }_{i}\bigl( \boldsymbol{a}+(U_{i}-u) \boldsymbol{b}\bigr) \bigr\vert K \bigl((U_{i}-u)/h\bigr),$$

where \(\boldsymbol{a}=(a_{1},\ldots,a_{p})^{\tau }\), \(\boldsymbol{b}=(b_{1},\ldots,b_{p})^{\tau }\).

In the second step, from (C1), (C2), and (2.1),

$$ E(\tilde{X}_{r})=E(X_{r}), \quad\quad E\bigl(\beta _{r}(U) \tilde{X}_{r}\bigr)=\gamma _{r}E( \tilde{X}_{r}),\quad r=1, \ldots,p. $$

The unknown regression parameters \(\gamma _{r}\), \(r=1,\ldots,p\), are obtained as averages of raw estimates \(\hat{\beta }_{r}(U_{i})\). The estimates are given by

$$ \hat{\gamma }_{r}=\frac{1}{\bar{\tilde{X}}_{r}}\sum _{i=1}^{n} \frac{1}{n}\hat{\beta }_{r}(U_{i})\tilde{X}_{ir},\quad r=1,\ldots,p, $$

where \(\bar{\tilde{X}}_{r}=\frac{1}{n}\sum_{i=1}^{n}\tilde{X}_{ir}\).

In this section, we establish the asymptotic properties of \(\hat{\gamma }_{r}\).

Theorem 1

Under the regularity conditions inAppendix, if\(h\to 0\)and\(nh\to \infty \)as\(n\to \infty \), then

$$ \hat{\gamma }_{r}=\gamma _{r}+O_{p} \bigl(n^{-1/2}\bigr)+O_{p}(C_{n}), \quad r=1,\ldots,p, $$

where\(C_{n}=O_{p}(h^{2}+\log ^{1/2}(1/h)/(nh))\).

Theorem 2

Under the regularity conditions inAppendix, if\(nh^{2}/\log (1/h)\to \infty \)and\(nh^{4}\to 0\)as\(n\to \infty \), then the asymptotic distribution of\(\hat{\gamma }_{r}\)is given by

$$ \sqrt{n}(\hat{\gamma }_{r}-\gamma _{r}) \stackrel{D}{\to }N\bigl(0,\sigma _{r}^{2}\bigr), \quad r=1,\ldots,p, $$


$$\begin{aligned} \sigma _{r}^{2} =& \bigl\{ \gamma _{r}^{2}E \bigl(X_{r}^{2}\bigr)\operatorname{var}\bigl(\psi (U)\bigr)+ \gamma _{r}^{2}\operatorname{var}(X_{r}) \\ & {} +2\gamma ^{2}_{r}\bigl[E\bigl(\phi _{r}(U)\psi (U)\bigr)E\bigl(X_{r}^{2}\bigr)-\bigl(E(X_{r}) \bigr)^{2}\bigr]+ \gamma _{r}^{2}\operatorname{var} \bigl(\phi _{r}(U)X_{r}\bigr) \bigr\} /\bigl\{ E(X_{r})\bigr\} ^{2}. \end{aligned}$$

The optimal bandwidth for \(\hat{\beta }_{r}(\cdot )\) is \(h\sim n^{-1/5}\). This bandwidth does not satisfy the condition in Theorem 2. In order to obtain the asymptotic normality for \(\hat{\gamma }_{r}\), undersmoothing for \(\hat{\beta }_{r}(\cdot )\) is necessary. The requirement has also been used in the literature for semiparametric model; see Carroll et al. [2] for a detailed discussion.

Empirical likelihood

Although we have obtained the asymptotic distribution of \(\gamma _{r}\), the \(\sigma _{r}^{2}\) is complex and includes several unknown components to be estimated. To resolve this difficulty, we propose an empirical likelihood method to construct a confidence interval for \(\gamma _{r}\). For more information on the empirical likelihood estimation, we refer to Owen [16].

Note that \(E((\beta _{r}(U_{i})-\gamma _{r})\tilde{X}_{ir})=0\) for \(i=1,2,\ldots,n\), \(r=1,\ldots,p\) if \(\gamma _{r}\) is the true parameter. Hence, the problem of testing whether \(\gamma _{r}\) is the true parameter is equivalent to testing whether \(E((\beta _{r}(U_{i})-\gamma _{r})\tilde{X}_{ir})=0\). By Owen [15], to construct an empirical likelihood ratio function for \(\gamma _{r}\), we denote \(V_{i}(\gamma _{r})=(\beta _{r}(U_{i})-\gamma _{r})\tilde{X}_{ir}\). That is, we can define the profile empirical likelihood ratio function

$$\begin{aligned}& L_{n}(\gamma _{r})=-2\max \Biggl\{ \sum _{i=1}^{n}\log (np_{i})\Big| p_{i} \geq 0,\sum_{i=1}^{n}p_{i}=1,\sum _{i=1}^{n}p_{i}V_{i}( \gamma _{r})=0 \Biggr\} . \end{aligned}$$

It can be shown that \(L_{n}(\gamma _{r})\) is asymptotically chi-squared with 1 degree of freedom. However, \(L_{n}(\gamma _{r})\) cannot be directly used to make statistical inference on \(\gamma _{r}\) because \(L_{n}(\gamma _{r})\) contains the unknown \(\beta _{r}(\cdot )\). A natural way is to replace \(\beta _{r}(\cdot )\) by \(L_{1}\)-estimator \(\hat{\beta }_{r}(\cdot )\) and to replace \(V_{i}(\gamma _{r})\) by \(\hat{V}_{i}(\gamma _{r})\). Then an estimated empirical likelihood ratio function is defined by

$$\begin{aligned}& \hat{L}_{n}(\gamma _{r})=-2\max \Biggl\{ \sum _{i=1}^{n}\log (np_{i}) \Big| p_{i} \geq 0,\sum_{i=1}^{n}p_{i}=1,\sum _{i=1}^{n}p_{i} \hat{V}_{i}( \gamma _{r})=0 \Biggr\} . \end{aligned}$$

By the Lagrange multiplier method, \(\hat{L}_{n}(\gamma _{r})\) can be represented as

$$\begin{aligned}& \hat{L}_{n}(\gamma _{r})=2\sum _{i=1}^{n}\log \bigl(1+\lambda \hat{V}_{i}( \gamma _{r})\bigr), \end{aligned}$$

where λ is determined by

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n} \frac{\hat{V}_{i}(\gamma _{r})}{1+\lambda \hat{V}_{i}(\gamma _{r})}=0. \end{aligned}$$

In the following, we show that \(\log \hat{L}_{n}(\gamma _{r})\) converges to the standard chi-square distribution with degree 1.

Theorem 3

Under conditions of Theorem 2, we have

$$\begin{aligned} \hat{L}_{n}(\gamma _{r})\stackrel{D}{\to } \chi ^{2}_{1}. \end{aligned}$$

According to Theorem 3, we construct a \((1-\alpha )\)-level confidence region of \(\gamma _{r}\):

$$ {\mathrm{CR}}_{\alpha }=\bigl\{ \gamma _{r}:\hat{L}_{n}( \gamma _{r})\le c_{\alpha }\bigr\} , $$

where \(c_{\alpha }\) satisfies \(P(\chi _{1}^{2}\le c_{\alpha })=1-\alpha \).

Bootstrap test

It is often of practical interest to test for the significance of the regression coefficients. We consider the null hypothesis

$$\begin{aligned} H_{0}:\beta _{r}(u)=c_{r} ,\quad r=1, \ldots,p, \end{aligned}$$

where \(c_{r}\) is an unknown constant. Under the null hypothesis, the smooth estimator \(\hat{\beta }_{r}(u)\) of \(\beta _{r}(u)\) is expected to be close to a horizontal line. We average \(\{\hat{\beta }_{r}(U_{i})\}\) to obtain the estimator of parameter \(c_{r}\). Similar to the statistics proposed by Cai, Fan, and Yao [1], the residual sum of squares under null hypothesis is

$$ \mathrm{RSS}_{0}=n^{-1}\sum_{i=1}^{n} \Biggl\vert \tilde{Y}_{i}-\sum_{j\neq r}^{p} \hat{\beta }_{j}(U_{i})\tilde{X}_{ij}- \hat{c_{r}}\tilde{X}_{ir} \Biggr\vert . $$

Analogously, the the residual sum of squares corresponding to model (2.2) is

$$ \mathrm{RSS}_{1}=n^{-1}\sum_{i=1}^{n} \Biggl\vert \tilde{Y}_{i}-\sum_{j=1}^{p} \hat{\beta }_{j}(U_{i})\tilde{X}_{ij} \Biggr\vert . $$

The goodness-of-fit test statistic is defined as

$$ T_{n}=(\mathrm{RSS}_{0}-\mathrm{RSS}_{1})/{{ \mathrm{RSS}_{1}}}={ \mathrm{RSS}}_{0}/\mathrm{RSS}_{1}-1, $$

and we reject the null hypothesis (4.1) for large values of \(T_{n}\). The distribution of \(T^{*}_{n}\) computed from the bootstrap samples is used as an approximation to the distribution of \(T_{n}\). The p-value of the test is the relative frequency of the event \(T^{*}_{n}\ge T_{n}\).

Simulation study

In this section, we carry out simulations to investigate the performance of our proposed methods as outlined in Sects. 2 and 3. We shall compare the finite sample performance of the LP procedure with our approach. The underlying (unobserved) multiple regression model considered is as follows:

$$\begin{aligned} Y=3+0.1X_{1}+2X_{2}-0.2X_{3}+e, \end{aligned}$$

where the predictors \(X_{1}\), \(X_{2}\), and \(X_{3}\) are distributed as \(N(2,1.5^{2})\), \(N(0.5,0.25^{2})\), \(N(1,1)\). For the distribution of the confounding variable U, it is generated from a uniform \([0,1]\) distribution. The distorting functions considered are

$$ \psi (U)=(U+3)^{2}/a,\quad\quad \phi _{1}(U)=(U+10)/b, \quad\quad \phi _{3}(U)=(U+3)/c, $$

where \((a, b, c)\) are (12.339, 10.5, 3.5) for \(U\sim U[0, 1]\). The constants a, b, c are chosen such that the distorting functions satisfy the constraints in (1.3). In order to show the robustness of our estimators, the following different error distributions are considered: \(N(0,0.25)\), \(t(3)\), and \(\operatorname{Cauchy}(0,0.2)\). For the weight function, we use the Epanechnikov kernel, and the asymptotic optimal bandwidth h for LP has been considered in Senturk et al. [19]. We can produce simple formulas for the asymptotic optimal bandwidth h for \(L_{1}\): \(h_{L_{1}}=h_{\mathit{LP}}/\{4f(F^{-1}(0.5))\}^{1/5}\) motivated by Kai et al. [10]. We repeat the simulation 1000 times with sample sizes of 50, 100, and 200, respectively. The corresponding results are summarized in Table 1. As we can see from the table, when the error is normally distributed, the proposed \(L_{1}\) estimators perform nearly as well as the LP estimators although they have slightly larger biases and standard deviations. However, for the other two non-normal errors, LP estimators are not as good as expected. And \(L_{1}\) estimators have a significant improvement.

Table 1 Summary of bias and standard deviation over 1000 simulations

For the sample sizes, \(n=100\) and 200 samples are generated from the above simulated data, and for each sample, the 95% confidence intervals are computed using the empirical likelihood, which is reported in Table 2. When n increases, we see that the coverage probabilities increase.

Table 2 Coverage probabilities of confidence regions when the nominal level is 0.95


We illustrate the methodology via an application to the diabetes data set which contains eight-dimensional patterns to understand the prevalence of diabetes and other cardiovascular risk factors. The 131 subjects analyzed here are females at least 35 years old of Pima Indian heritage who were actually screened for diabetes. The female patients may have abnormal insulin action that prevents the body from normal utilization of glucose. Obesity is a risk factor in both diabetes and hypertension. One of the purposes of the study is to identify risk factors for diabetes, among which is hypertension. In this study, we investigate the relationship between plasma glucose (GLU) concentration and hypertensive measure, diastolic blood pressure (DBP). We analyze the simple linear regression relationship between GLU and DBP, \(\mathit{GLU}=\gamma _{0}+\gamma _{1}\mathit{DBP}+e \). Body mass index (BMI) is identified to be a major factor significantly associated with elevated prevalence of hypertension and diabetes. Both the response and the predictor are potentially affected by body mass index (BMI), \(\mathit{BMI}={\mathrm{weight}}/{\mathrm{height}^{2}}\). The varying coefficient model has the following form \(\mathit{GLU}=\beta _{0}(\mathit{BMI})+\beta _{1}(\mathit{BMI})\mathit{DBP}+e(\mathit{BMI})\), based on the confounding variable BMI and the contaminate variables, GLU and DBP.

The parameters \(\gamma _{0}\) and \(\gamma _{1}\) are estimated by the covariate-adjusted regression algorithm. Three outliers are removed before the analysis. The p-values for covariate-adjusted regression estimates are obtained from 1000 bootstrap samples. For least squares regression, DBP was close to being significant, \(p=0.056\), while with covariate-adjusted regression it became highly significant, \(p=0.029\). Thus, the covariate-adjusted regression model is more appropriate for the data than least-squares regression. We shall compare the performance of the LP procedure with our approach. The LP estimates are \((\hat{\gamma }_{0}, \hat{\gamma }_{1})= (72.1511, 0.5972)\). The \(L_{1}\) estimates are \((\hat{\gamma }_{0}, \hat{\gamma }_{1})= (74.2114, 0.6035)\). We estimate the standard deviation of \(\gamma _{0}\) and \(\gamma _{1}\) based on 1000 bootstrap samples for both these two methods. The corresponding standard deviation estimates of LP estimators are \(\widehat{\mathit{s.d.}}(\hat{\gamma }_{0})=0.3129\) and \(\widehat{\mathit{s.d.}}(\hat{\gamma }_{1})=0.0819\). The corresponding standard deviation estimates of \(L_{1}\) estimators are \(\widehat{\mathit{s.d.}}(\hat{\gamma }_{0})=0.2994\) and \(\widehat{\mathit{s.d.}}(\hat{\gamma }_{1})=0.0611\). From the above, we can see that the difference between the estimated parameters based on LP modeling and \(L_{1}\) modeling is relatively small; however, \(L_{1}\) estimators have smaller standard errors than LP approach, which means that \(L_{1}\) modeling has better performance. It is believed that the distortion effect of the obesity index on blood pressure is different from its effect on plasma glucose, and the distortion effect of the obesity index on GLU can be assessed directly from the estimated intercept function.


In this paper, we propose a robust and efficient procedure for CAR, which has improved on the earlier proposed LP estimation when the underlying error distribution deviates from normal distribution, and the asymptotic normality has been established under some regular conditions. We propose a two-step estimation procedure considered for CAR. Firstly, we use \(L_{1}\)-estimation motivated by Tang et al. [21] to estimate varying coefficients based on local linear fit. The performance of the smoothing technique chosen for estimation of the varying coefficient functions in the first step does affect the overall performance of the CAR estimates in the second step. When the data contain outliers or come from population with heavy-tailed distributions, \(L_{1}\)-estimation should yield better estimators. Secondly, the estimates of unknown parameters are constructed based on weighted averages of these functions. In addition, an estimated empirical log-likelihood approach to construct the confidence region of the regression parameter is developed, and the confidence intervals for the regression coefficients are constructed. Finally, it is interesting to develop a robust and efficient variable selection procedure for the CAR in high dimension setting.


  1. Cai, Z., Fan, J., Yao, Q.: Functional-coefficient regression models for nonlinear time series. J. Am. Stat. Assoc. 95, 941–956 (2000)

    Article  MathSciNet  Google Scholar 

  2. Carroll, R., Fan, J., Gijbels, I., Wand, M.: Generalized partially linear single-index models. J. Am. Stat. Assoc. 92, 477–489 (1997)

    Article  MathSciNet  Google Scholar 

  3. Cui, X., Guo, W.S., Zhu, L.X.: Covariate-adjusted nonlinear regression. Ann. Stat. 37, 1839–1870 (2009)

    Article  MathSciNet  Google Scholar 

  4. Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications. Chapman & Hall, London (1996)

    MATH  Google Scholar 

  5. Fan, J., Zhang, W.: Statistical estimation in varying coefficient model. Ann. Stat. 27, 1491–1518 (1999)

    Article  MathSciNet  Google Scholar 

  6. Fan, J., Zhang, W.: Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scand. J. Stat. 27, 715–731 (2000)

    Article  MathSciNet  Google Scholar 

  7. Fan, J., Zhang, W.: Statistical methods with varying coefficient models. Stat. Interface 1, 179–195 (2008)

    Article  MathSciNet  Google Scholar 

  8. Hastie, T.J., Tibshirani, R.J.: Varying-coefficient models. J. R. Stat. Soc., Ser. B 55, 757–796 (1993)

    MathSciNet  MATH  Google Scholar 

  9. Kai, B., Li, R., Zou, H.: Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J. R. Stat. Soc., Ser. B 72, 49–69 (2010)

    Article  MathSciNet  Google Scholar 

  10. Kai, B., Li, R., Zou, H.: New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann. Stat. 39, 305–332 (2011)

    Article  MathSciNet  Google Scholar 

  11. Kaysen, G.A., Dubin, J.A., Müller, H.G., Mitch, W.E., Rosales, L.M., Levin, N.W., the Hemo Study Group: Relationships among inflammation nutrition and physiologic mechanisms establishing albumin levels in hemodialysis patients. Kidney Int. 61, 2240–2249 (2002)

    Article  Google Scholar 

  12. Li, F., Lin, L., Cui, X.: Covariate-adjusted partially linear regression models. Commun. Stat., Theory Methods 39, 1054–1074 (2010)

    Article  MathSciNet  Google Scholar 

  13. Mack, Y., Silverman, B.: Weak and strong uniform consistency of kernel regression estimates. Probab. Theory Relat. Fields 61, 405–415 (1982)

    MathSciNet  MATH  Google Scholar 

  14. Owen, A.B.: Empirical likelihood ratio confidence regions. Ann. Stat. 18, 90–120 (1990)

    Article  MathSciNet  Google Scholar 

  15. Owen, A.B.: Empirical likelihood for linear models. Ann. Stat. 19, 1725–1747 (1991)

    Article  MathSciNet  Google Scholar 

  16. Owen, A.B.: Empirical Likelihood. Chapman & Hall, New York (2001)

    Book  Google Scholar 

  17. Pollard, D.: Asymptotics for least absolute deviation regression estimators. Econom. Theory 7, 186–199 (1991)

    Article  MathSciNet  Google Scholar 

  18. Sentürk, D., Müller, H.G.: Covariate-adjusted regression. Biometrika 92, 75–89 (2005)

    Article  MathSciNet  Google Scholar 

  19. Sentürk, D., Müller, H.G.: Inference for covariate-adjusted regression via varying coefficient models. Ann. Stat. 34, 654–679 (2006)

    Article  MathSciNet  Google Scholar 

  20. Senturk, D., Nguyen, D.V.: Estimation in covariate-adjusted regression. Comput. Stat. Data Anal. 50, 3294–3310 (2006)

    Article  MathSciNet  Google Scholar 

  21. Tang, Q.G., Wang, J.D.: \(L_{1}\)-Estimation for varying coefficient models. Statistics 39, 389–404 (2005)

    Article  MathSciNet  Google Scholar 

Download references


We sincerely thank referees and editor for their helpful comments and suggestions, which have improved this version of the manuscript.

Availability of data and materials

Pima Indian diabetes data set is used for empirical study. This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The data set is available on the web site:


This work is supported by the National Natural Science Foundation of China (NO. 1187028, 11731015, 11571051, 11501241), Natural Science Foundation of Jilin Province (No. 20180101216JC, 20170101057JC, 20150520053JH), and Program for Changbaishan Scholars of Jilin Province (2015010).

Author information

Authors and Affiliations



YS and DW conceived the idea of the study; YS analyzed the data; DW interpreted the results; YS wrote the paper; all the authors discussed the results and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Dehui Wang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.



To establish the main results given in Sects. 2 and 3, the following regularity conditions are imposed:

  1. (C1)

    Contaminating functions \(\psi (\cdot )\) and \(\phi _{r}(\cdot )\) are twice continuously differentiable, satisfying \(E(\psi (U_{i}))=1\), \(E(\phi _{r}(U_{i}))=1\), \(\phi _{r}(U_{i})>0\), \(\psi (U_{i})>0\), \(r=1,\ldots,p\), \(i=1,\ldots, n\).

  2. (C2)

    The variables X, U, e are mutually independent, and the variables Y, U are mutually independent, \(E(Y^{2})<\infty \), \(E(X_{r}^{2})<\infty \), \(r=1,\ldots,p\).

  3. (C3)

    The random variable U has bounded support Ω, \(f_{U}(\cdot )\) is the density function of covariate U.

  4. (C4)

    The kernel function \(K(\cdot )\) is symmetric with a compact support and satisfies a Lipschitz condition.

  5. (C5)

    Denoted by \(f(\cdot )\) and \(F(\cdot )\) are the density function and cumulative distribution of the error e, respectively. \(f(\cdot )\) is bounded away from zero and has a continuous and uniformly bounded derivative. \(F(0)=0.5\).

  6. (C6)

    \(E(\tilde{\boldsymbol{X}}\tilde{\boldsymbol{X}}^{\tau }| U=u)\) is nonsingular for all \(u\in \varOmega \).


These conditions are mild. Conditions (C1)–(C3) are assumed in Cui et al. [3]. Conditions (C4)–(C5) can be found in Tang et al. [21]. Condition (C6) can be found in Kai et al. [9].

In order to obtain our results, we first prove some lemmas.

Lemma 1

Let\((X_{1}, Y_{1}),\ldots, (X_{n}, Y_{n})\)be i.i.d. random vectors, where the\(Y_{i}\)s are scalar random variables. Assume further that\(E \vert Y \vert ^{r}<\infty \)and that\(\sup_{X}\int \vert y \vert ^{r}f(x,y)\,dy<\infty \), where\(f(x, y)\)denotes the joint density of\((X, Y)\). Let\(K(\cdot )\)be a bounded positive function with bounded support, satisfying a Lipschitz condition. Then

$$\begin{aligned} \sup_{{\boldsymbol{X}}\in D} \Biggl\vert n^{-1}\sum _{i=1}^{n}\bigl\{ K_{h}(X_{i}-x)Y_{i}-E \bigl[K_{h}(X_{i}-x)Y_{i}\bigr] \bigr\} \Biggr\vert =O_{p} \biggl(\frac{\log ^{1/2}(1/h)}{\sqrt{nh}} \biggr), \end{aligned}$$

provided that\(n^{2\epsilon -1}h\to \infty \)for some\(\epsilon <1-r^{-1}\), \(K_{h}(\cdot )=K(\cdot /h)/h\).

The proof of Lemma 1 can be found in Mack and Silverman [13].

Lemma 2

Under the regularity conditions (C1)(C6), if\(h\to 0\)and\(nh\to \infty \)as\(n\to \infty \), then

$$ \hat{\boldsymbol{\beta }}(U)-\boldsymbol{\beta }(U)=\frac{1}{\sqrt{nh}}O_{p} \bigl(h^{2}+ \log ^{1/2}(1/h)/\sqrt{nh}\bigr). $$


Let \(\boldsymbol{\eta }=(nh)^{1/2}(\boldsymbol{a}-\boldsymbol{\beta }(u))\), \(\boldsymbol{\zeta }=(nh)^{1/2}h(\boldsymbol{b}-\boldsymbol{\beta }^{\prime }(u))\), \(\Delta _{i}=(nh)^{-1/2}\tilde{\boldsymbol{X}}^{\tau }_{i}[ \boldsymbol{\eta }+h^{-1}(U_{i}-u) \boldsymbol{\zeta }]\), \(r_{i}(u)=\tilde{\boldsymbol{X}}^{\tau }_{i}\{\boldsymbol{\beta }(U_{i})-\boldsymbol{\beta }(u)- \boldsymbol{\beta }^{\prime }(U_{i}-u)\}\). We recall \(\{\hat{\boldsymbol{a}}^{\tau },\hat{\boldsymbol{b}}^{\tau }\}^{\tau }\) minimizes

$$\begin{aligned}& \sum_{i=1}^{n} \bigl\vert \tilde{Y}_{i}-\tilde{\boldsymbol{X}}^{\tau }_{i}\bigl( \boldsymbol{a}+(U_{i}-u) \boldsymbol{b}\bigr) \bigr\vert K \bigl((U_{i}-u)/h\bigr), \\& \begin{aligned} \bigl(\hat{\boldsymbol{a}}^{\tau },\hat{ \boldsymbol{b}}^{\tau }\bigr)^{\tau }={}&\mathop{\mathrm{argmin}} _{\boldsymbol{a}, \boldsymbol{b}}\sum_{i=1}^{n} \bigl\vert \tilde{Y}_{i}- \tilde{\boldsymbol{X}}^{\tau }_{i} \bigl(\boldsymbol{a}+(U_{i}-u)\boldsymbol{b}\bigr) \bigr\vert K \bigl((U_{i}-u)/h\bigr)& \\ ={}&\mathop{\mathrm{argmin}} _{\boldsymbol{a}, \boldsymbol{b}}\sum_{i=1}^{n} \bigl\{ \bigl\vert (nh)^{-1/2}\tilde{\boldsymbol{X}}^{\tau }_{i} \bigl[(nh)^{1/2}\bigl(\boldsymbol{a}- \boldsymbol{\beta }(u) \bigr) \\ &{}+h^{-1}(U_{i}-u) (nh)^{1/2}h\bigl( \boldsymbol{b}-\boldsymbol{\beta }^{\prime }(u)\bigr)\bigr] \\ &{}- \bigl[\tilde{\boldsymbol{X}}^{\tau }_{i}\bigl\{ \boldsymbol{\beta }(U_{i})-\boldsymbol{\beta }(u)- \boldsymbol{\beta }^{\prime }(U_{i}-u)\bigr\} +e(U_{i}) \bigr] \bigr\vert \\ &{}- \bigl\vert \tilde{\boldsymbol{X}}^{\tau }_{i}\bigl\{ \boldsymbol{\beta }(U_{i})-\boldsymbol{\beta }(u)- \boldsymbol{\beta }^{\prime }(U_{i}-u)\bigr\} +e(U_{i}) \bigr\vert \bigr\} K\bigl((U_{i}-u)/h\bigr), \\ ={}&\mathop{\mathrm{argmin}} _{\boldsymbol{a}, \boldsymbol{b}}\sum_{i=1}^{n} \bigl( \bigl\vert \Delta _{i}-r_{i}(u)-e(U_{i}) \bigr\vert - \bigl\vert r_{i}(u)+e(U_{i}) \bigr\vert \bigr)K\bigl((U_{i}-u)/h\bigr). \end{aligned} \end{aligned}$$

The last equality holds because the last term is free of the optimization variables a and b. By applying the following identity:

$$\begin{aligned}& \vert x-y \vert - \vert x \vert =y\bigl(2I(x\le 0)-1\bigr)+2 \int _{0}^{y}\bigl\{ I(x\le s)-I(x\le 0)\bigr\} \,ds, \\& \sum_{i=1}^{n} \bigl( \bigl\vert r_{i}(u)+e(U_{i})-\Delta _{i} \bigr\vert - \bigl\vert r_{i}(u)+e(U_{i}) \bigr\vert \bigr)K \bigl((U_{i}-u)/h\bigr) \\& \quad =2\sum_{i=1}^{n}\Delta _{i}\bigl[I\bigl(e(U_{i})\le -r_{i}(u)\bigr)-1/2 \bigr]K\bigl((U_{i}-u)/h\bigr) \\& \quad\quad{} +2\sum_{i=1}^{n}K \bigl((U_{i}-u)/h\bigr) \int _{0}^{\Delta _{i}} \bigl[I\bigl(e(U_{i})+r_{i}(u) \le s\bigr)-I\bigl(e(U_{i})+r_{i}(u)\le 0\bigr) \bigr]\,ds \\& \quad =\bigl[2\boldsymbol{W}_{n}^{\tau }\bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau }\bigr)^{\tau }+2B_{n} \bigr], \end{aligned}$$


$$\begin{aligned}& \boldsymbol{W}_{n}^{\tau }=(nh)^{-1/2}\sum _{i=1}^{n}K\bigl((U_{i}-u)/h\bigr) \bigl[I \bigl(e_{i} \le -r_{i}(u)/\psi (U_{i})\bigr)-1/2 \bigr]\tilde{\boldsymbol{X}}_{i}^{\tau } \biggl(I_{p}, \frac{U_{i}-u}{h}I_{p} \biggr), \\& B_{n}=\sum_{i=1}^{n}K \bigl((U_{i}-u_{0})/h\bigr) \int _{0}^{\Delta _{i}} \bigl[I\bigl(e(U_{i})+r_{i}(u) \le s\bigr)-I\bigl(e(U_{i})+r_{i}(u)\le 0\bigr) \bigr]\,ds. \end{aligned}$$

Since \(L_{1}\)-loss is a special case of quantile loss function at 0.5, the next proof is similar to that of Theorem 3.1 of Kai et al. [10]. By Lemma 1, we have

$$ B_{n}=E(B_{n})+O_{p}\bigl(\log ^{1/2}(1/h)/\sqrt{nh}\bigr). $$

The conditional expectation of \(B_{n}\) can be calculated as follows:

$$\begin{aligned}& E(B_{n}| U,\boldsymbol{X}) \\& \quad = \sum_{i=1}^{n}K \bigl((U_{i}-u)/h\bigr) \int _{0}^{\Delta _{i}}\bigl\{ F\bigl(s/\psi (U_{i})-r_{i}(u)/ \psi (U_{i})\bigr)]-F \bigl(-r_{i}(u)/\psi (U_{i})\bigr)\bigr\} \,ds \\& \quad = \frac{1}{2}\bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{ \zeta }^{\tau }\bigr) \Biggl( \frac{1}{nh}\sum _{i=1}^{n}\frac{1}{\psi (U_{i})}K\bigl((U_{i}-u)/h \bigr)f\bigl(-r_{i}(u)/ \psi (U_{i})\bigr) \bigl(\tilde{ \boldsymbol{X}}_{i}^{\tau }, \tilde{\boldsymbol{X}}_{i}^{\tau }(U_{i}-u)/h \bigr)^{\tau } \\& \quad\quad{} \times \bigl(\tilde{\boldsymbol{X}}_{i}^{\tau }, \tilde{\boldsymbol{X}}_{i}^{\tau }(U_{i}-u)/h\bigr) \Biggr) \bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau } \bigr)^{\tau }+O_{p}\bigl(\log ^{1/2}(1/h)/ \sqrt{nh} \bigr) \\& \quad \stackrel{\triangle }{=} \frac{1}{2}\bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{ \tau }\bigr)S_{n}\bigl( \boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau } \bigr)^{\tau }+O_{p}\bigl( \log ^{1/2}(1/h)/\sqrt{nh} \bigr), \end{aligned}$$


$$\begin{aligned} S_{n} =&\frac{1}{nh}\sum_{i=1}^{n} \biggl\{ \frac{1}{\psi (U_{i})}K \biggl(\frac{U_{i}-u}{h} \biggr)f \bigl(-r_{i}(u)/\psi (U_{i})\bigr) \\ &{}\times\bigl(\tilde{\boldsymbol{X}}_{i}^{\tau },\tilde{ \boldsymbol{X}}_{i}^{\tau }(U_{i}-u)/h \bigr)^{\tau }\bigl(\tilde{\boldsymbol{X}}_{i}^{\tau }, \tilde{\boldsymbol{X}}_{i}^{\tau }(U_{i}-u)/h\bigr) \biggr\} . \end{aligned}$$

It can be shown that

$$ E(S_{n})=\frac{f_{U}(u)}{\psi (u)}S+O_{p}\bigl(h^{2} \bigr), $$

where \(S=\operatorname{diag} (f(0),\mu _{2}f(0))\otimes E(\tilde{\boldsymbol{X}} \tilde{\boldsymbol{X}}^{\tau }| U=u)\), \(\mu _{2}=\int {u^{2}}K(u)\,du\). Then

$$\begin{aligned}& \boldsymbol{W}_{n}^{\tau }\bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau }\bigr)^{\tau }+B_{n} \\& \quad = \boldsymbol{W}_{n}^{\tau }\bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau }\bigr)^{\tau }+E(B_{n})+O_{p} \bigl( \log ^{1/2}(1/h)/\sqrt{nh}\bigr) \\& \quad = \boldsymbol{W}_{n}^{\tau }\bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau }\bigr)^{\tau }+ \frac{f_{U}(u)}{2\psi (u)}\bigl(\boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau }\bigr)S\bigl( \boldsymbol{\eta }^{\tau }, \boldsymbol{\zeta }^{\tau }\bigr)^{\tau }+O_{p}\bigl(h^{2}+\log ^{1/2}(1/h)/ \sqrt{nh}\bigr). \end{aligned}$$

Similar to the proof procedures of Theorem 3.1 of Kai et al. [10], by applying the convexity lemma of Pollard [17] and the quadratic approximation lemma of Fan et al. [4], the minimizer can be expressed as

$$ \bigl(\hat{\boldsymbol{\eta }}^{\tau },\hat{\boldsymbol{\zeta }}^{\tau }\bigr)^{\tau }=- \frac{\psi (u)}{2f_{U}(u)}S^{-1} \boldsymbol{W}_{n}+O_{p}\bigl(h^{2}+\log ^{1/2}(1/h)/ \sqrt{nh}\bigr). $$

That is,

$$\begin{aligned} \hat{\boldsymbol{\eta }} =&-\frac{\psi (u)}{f_{U}(u)f(0)}\bigl(E\bigl(\tilde{ \boldsymbol{X}} \tilde{\boldsymbol{X}}^{\tau }| U=u\bigr) \bigr)^{-1} \sum_{i=1}^{n} \biggl\{ \frac{1}{\sqrt{nh}}K\bigl((U_{i}-u)/h\bigr) \\ &{}\times \bigl[I\bigl(e_{i}\le -r_{i}(u)/\psi (U_{i})\bigr)-1/2 \bigr] \tilde{\boldsymbol{X}}_{i} \biggr\} +O_{p}\bigl(h^{2}+\log ^{1/2}(1/h)/\sqrt{nh} \bigr). \end{aligned}$$

Hence, we get

$$\begin{aligned} \hat{\boldsymbol{\beta }}(u)-\boldsymbol{\beta }(u) =&- \frac{\psi (u)}{f_{U}(u)f(0)}\bigl(E\bigl( \tilde{\boldsymbol{X}}\tilde{\boldsymbol{X}}^{\tau }| U=u\bigr)\bigr)^{-1} \sum_{i=1}^{n} \biggl\{ \frac{1}{nh}K\bigl((U_{i}-u)/h\bigr) \\ &{}\times \bigl[I\bigl(e_{i}\le -r_{i}(u)/\psi (U_{i})\bigr)-1/2 \bigr] \tilde{\boldsymbol{X}}_{i} \biggr\} +\frac{1}{\sqrt{nh}}O_{p}\bigl(h^{2}+\log ^{1/2}(1/h)/ \sqrt{nh}\bigr). \end{aligned}$$

Obviously, the asymptotic expression of \(\boldsymbol{\beta }(U_{k})\) is

$$\begin{aligned} \hat{\boldsymbol{\beta }}(U_{k}) =&\boldsymbol{\beta }(U_{k})- \frac{\psi (U_{k})}{f_{U}(U_{k})f(0)}\bigl(E\bigl(\tilde{\boldsymbol{X}}_{k} \tilde{\boldsymbol{X}}_{k}^{\tau }| U_{k}\bigr) \bigr)^{-1} \sum_{i=1}^{n} \biggl\{ \frac{1}{nh}K((U_{i}-U_{k})/h \\ & {} \times \bigl[I\bigl(e_{i}\le -r_{i}(U_{k})/ \psi (U_{i})\bigr)-1/2 \bigr] \tilde{\boldsymbol{X}}_{i} \biggr\} +\frac{1}{\sqrt{nh}}O_{p}\bigl(h^{2}+\log ^{1/2}(1/h)/ \sqrt{nh}\bigr) \end{aligned}$$

for \(k=1,\ldots, n\).

We split the second term in the previous expression into two parts \(R_{1k}+R_{2k}\), where

$$\begin{aligned} R_{1k}&=-\frac{\psi (U_{k})}{f_{U}(U_{k})f(0)}\bigl(E\bigl(\tilde{\boldsymbol{X}}_{k} \tilde{\boldsymbol{X}}^{\tau }_{k}| U_{k}\bigr) \bigr)^{-1} \sum_{i=1}^{n} \biggl\{ \frac{1}{nh}K\bigl((U_{i}-U_{k})/h\bigr) \\ &\quad {} \times \biggl[I\biggl(e_{i}\le -\frac{r_{i}(U_{k})}{\psi (U_{i})} \biggr)-I(e_{i} \le 0) \biggr] \tilde{\boldsymbol{X}}_{i} \biggr\} \end{aligned}$$


$$\begin{aligned} \begin{aligned} R_{2k}&=-\frac{\psi (U_{k})}{f_{U}(U_{k})f(0)}\bigl(E\bigl(\tilde{\boldsymbol{X}}_{k} \tilde{\boldsymbol{X}}^{\tau }_{k}| U_{k}\bigr) \bigr)^{-1} \sum_{i=1}^{n} \biggl\{ \frac{1}{nh}K\bigl((U_{i}-U_{k})/h\bigr) \\ &\quad {} \times \biggl[I(e_{i}\le 0)-\frac{1}{2} \biggr] \tilde{ \boldsymbol{X}}_{i} \biggr\} . \end{aligned} \end{aligned}$$


$$ E\bigl( \Vert R_{1k} \Vert ^{2}\bigr)=\frac{1}{n^{2}h^{2}} \sum_{i=1}^{n}E(R_{ii,1k})+ \frac{2}{n^{2}h^{2}}\sum_{i\ne j}E(R_{ij,1k}), $$


$$\begin{aligned}& \begin{aligned} R_{ii,1k}&=\frac{\psi ^{2}(U_{k})}{f_{U}^{2}(U_{k})f^{2}(0)} \tilde{ \boldsymbol{X}}_{i}^{\tau }\bigl(E\bigl(\tilde{ \boldsymbol{X}}_{k}\tilde{\boldsymbol{X}}_{k}^{\tau }| U_{k}\bigr)\bigr)^{-2} \tilde{\boldsymbol{X}}_{i} \\ &\quad{} \times \biggl[I\biggl(e_{i}\le -\frac{r_{i}(U_{k})}{\psi (U_{i})} \biggr)-I(e_{i} \le 0) \biggr]^{2}K^{2} \bigl((U_{i}-U_{k})/h\bigr) ,\end{aligned} \\& \begin{aligned} R_{ij,1k}&=\frac{\psi ^{2}(U_{k})}{f_{U}^{2}(U_{k})f^{2}(0)} \tilde{ \boldsymbol{X}}_{i}^{\tau }\bigl(E\bigl(\tilde{ \boldsymbol{X}}_{k}\tilde{\boldsymbol{X}}^{\tau }_{k}| U_{k}\bigr)\bigr)^{-2} \tilde{\boldsymbol{X}}_{j}K \bigl((U_{i}-U_{k})/h\bigr)K\bigl((U_{j}-U_{k})/h \bigr) \\ &\quad{} \times \bigl[I\bigl(e_{i}\le -r_{i}(U_{k})/ \psi (U_{i})\bigr)-I(e_{i}\le 0) \bigr] \bigl[I \bigl(e_{j}\le -r_{j}(U_{k})/\psi (U_{j})\bigr)-I(e_{j}\le 0) \bigr]. \end{aligned} \end{aligned}$$

By the fact

$$ \bigl[I\bigl(e_{i}\le -r_{i}(U_{k})/\psi (U_{i})\bigr)-I(e_{i}\le 0) \bigr]^{2}= \bigl\vert I\bigl(e_{i}\le -r_{i}(U_{k})/\psi (U_{i})\bigr)-I(e_{i}\le 0) \bigr\vert , $$

without loss of generality, assuming \(-\frac{r_{i}(U_{k})}{\psi (U_{i})}>0\), we have

$$\begin{aligned}& E(R_{ii,1k}) \\& \quad = E\bigl\{ E(R_{ii,1k})| U_{k},U_{i}\bigr\} \\& \quad = E \biggl\{ \frac{\psi ^{2}(U_{k})}{f_{U}^{2}(U_{k})f^{2}(0)}E\bigl( \tilde{\boldsymbol{X}}_{i}^{\tau } \bigl[E\bigl(\tilde{\boldsymbol{X}}_{k}\tilde{\boldsymbol{X}}^{\tau }_{k}| U_{k}\bigr)\bigr]^{-2} \tilde{\boldsymbol{X}}_{i}| U_{i}\bigr) \\& \quad\quad{}\times \bigl[F\bigl(-r_{i}(U_{k})/\psi (U_{i})\bigr)-F(0) \bigr]K^{2}\bigl((U_{i}-U_{k})/h \bigr) \biggr\} \\& \quad = E \biggl\{ \frac{\psi ^{2}(U_{k})}{f_{U}^{2}(U_{k})f^{2}(0)\psi (U_{i})}E\bigl( \tilde{\boldsymbol{X}}_{i}^{\tau } \bigl[E\bigl(\tilde{\boldsymbol{X}}_{k}\tilde{\boldsymbol{X}}^{\tau }_{k}| U_{k}\bigr)\bigr]^{-2} \tilde{\boldsymbol{X}}_{i}| U_{i}\bigr) \\& \quad\quad{}\times f(\xi ) \bigl(-r_{i}(U_{k}) \bigr)K^{2}\bigl((U_{i}-U_{k})/h\bigr) \biggr\} \\& \quad \le M \int \int \frac{\psi ^{2}(U_{k})f_{U}(U_{i})}{f_{U}(U_{k})\psi (U_{i})}E\bigl( \tilde{\boldsymbol{X}}_{i}^{\tau } \bigl[E\bigl(\tilde{\boldsymbol{X}}_{k}\tilde{\boldsymbol{X}}^{\tau }_{k}| U_{k}\bigr)\bigr]^{-2} \tilde{\boldsymbol{X}}_{i}| U_{i}\bigr) \\& \quad\quad{}\times(U_{i}-U_{k})^{2}K^{2} \bigl((U_{i}-U_{k})/h\bigr)\,dU_{k} \,dU_{i}, \end{aligned}$$

where ξ between 0 and \(-\frac{r_{i}(U_{k})}{\psi (U_{i})}\). Noting that \(K(\cdot )\) is a symmetric function, we have \(E(R_{ii,1k})=O(h^{3})\) uniformly for k. In the same spirit, we can prove \(E(R_{ij,1k})=O(h^{6})\) uniformly for k. It follows that

$$ E\bigl( \Vert R_{1k} \Vert \bigr)^{2}= \frac{1}{n^{2}h^{2}}nO\bigl(h^{3}\bigr)+\frac{2}{n^{2}h^{2}}n(n-1)O \bigl(h^{6}\bigr)=O\bigl(h^{4}\bigr) $$

uniformly for k.

For \(R_{2k}\), noting that

$$ E \Biggl\{ \sum_{i=1}^{n}K \bigl((U_{i}-U_{k})/h\bigr) \bigl[I(e_{i}\le 0)-1/2 \bigr] \tilde{X}_{ir} \Biggr\} ^{2}=O(nh), $$

we have \(R_{2k}=O_{p}(\frac{1}{\sqrt{nh}})\).

Therefore, we can obtain (A.1). □

Lemma 3

Under assumptions (C1)(C6), if\(nh^{2}/\log (1/h)\to \infty \)and\(nh^{4}\to 0\)as\(n\to \infty \), we have

$$ \frac{1}{\sqrt{n}}\sum_{i=1}^{n} \hat{V}_{i}(\gamma _{r}) \stackrel{D}{\to }N\bigl(0,A( \gamma _{r})\bigr), $$

where\(A(\gamma _{r})=\gamma ^{2}_{r} E(\psi (U_{i})-\phi _{r}(U_{i}))^{2} E(X^{2}_{ir})\).


We use some elementary calculation to obtain

$$ \frac{1}{\sqrt{n}}\sum_{i=1}^{n} \hat{V}_{i}(\gamma _{r})= \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\bigl(\beta _{r}(U_{i})- \gamma _{r}\bigr) \tilde{X}_{ir}+ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\bigl(\hat{\beta }_{r}(U_{i})- \beta _{r}(U_{i})\bigr) \tilde{X}_{ir}. $$

By central theorems for the sum of independent and identically distributed random variables, we can obtain that

$$ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bigl(\beta _{r}(U_{i})-\gamma _{r}\bigr) \tilde{X}_{ir}\stackrel{D}{\to }N\bigl(0,A(\gamma _{r}) \bigr). $$

By Lemma 2, we can show that \(n^{-1/2}\sum_{i=1}^{n}(\hat{\beta }_{r}(U_{i})-\beta _{r}(U_{i})) \tilde{X}_{ir}\stackrel{P}{\to }0\). This together with (A.5) and (A.6) proves Lemma 3. □

Lemma 4

Under conditions of Lemma 2, we have

$$ \frac{1}{n}\sum_{i=1}^{n} \hat{V}^{2}_{i}(\gamma _{r}) \stackrel{P}{\to }A( \gamma _{r}). $$


$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n} \hat{V}^{2}_{i}(\gamma _{r})={}&\frac{1}{n} \sum_{i=1}^{n}\bigl(\beta _{r}(U_{i})-\gamma _{r}\bigr)^{2} \tilde{X}^{2}_{ir}+ \frac{1}{n}\sum _{i=1}^{n}\bigl(\hat{\beta}_{r}(U_{i})- \beta _{r}(U_{i})\bigr)^{2} \tilde{X}^{2}_{ir} \\ &{} +\frac{2}{n}\sum_{i=1}^{n} \bigl(\hat{\beta}_{r}(U_{i})-\beta _{r}(U_{i}) \bigr) \bigl( \beta _{r}(U_{i})-\gamma _{r}\bigr) \tilde{X}^{2}_{ir} \\ =:{}&M_{1}+M_{2}+M_{3}. \end{aligned}$$

By the law of large numbers, we obtain \(M_{1}\stackrel{P}{\to }A(\gamma _{r})\). Hence, to prove Lemma 4, we only need to show that \(M_{l}\stackrel{P}{\to }0\), \(l=2,3\).

By conditions (C1)–(C3) and Lemma 2, we obtain

$$ \vert M_{2} \vert \leq \frac{1}{n}\max_{1\leq i\leq n} \bigl(\hat{\beta }_{r}(U_{i})- \beta _{r}(U_{i}) \bigr)^{2}\sum_{i=1}^{n} \tilde{X}^{2}_{ir}=o_{p}(1). $$

By the similar argument for \(M_{3}\), we have

$$ \vert M_{3} \vert \leq \frac{2}{n}\max _{1\leq i\leq n} \bigl\vert \hat{\beta }_{r}(U_{i})- \beta _{r}(U_{i}) \bigr\vert \sum _{i=1}^{n}\bigl(\beta _{r}(U_{i})- \gamma _{r}\bigr) \tilde{X}^{2}_{ir}=o_{p}(1). $$

The proof is completed. □

Lemma 5

Under the assumptions of Theorem 3, we have

$$ \max_{1\leq i\leq n} \bigl\vert \hat{V}_{i}(\gamma _{r}) \bigr\vert =o_{p}\bigl(n^{1/2}\bigr). $$


Some elementary calculation yields that

$$ \max_{1\leq i\leq n} \bigl\vert \hat{V}_{i}(\gamma _{r}) \bigr\vert \leq \max_{1\leq i\leq n} \bigl\vert \bigl(\beta _{r}(U_{i})-\gamma _{r}\bigr) \tilde{X}_{ir} \bigr\vert + \max_{1\leq i\leq n} \bigl\vert \bigl(\hat{\beta }_{r}(U_{i})-\beta _{r}(U_{i}) \bigr) \tilde{X}_{ir} \bigr\vert . $$

By conditions (C1) and (C2), we have \(\frac{1}{n}\sum_{i=1}^{n}(\beta _{r}(U_{i})-\gamma _{r})^{2} \tilde{X}^{2}_{ir}\stackrel{\mathit{a.s.}}{=}A(\gamma _{r})<\infty \). This implies that \(\max_{1\leq i\leq n} \vert (\beta _{r}(U_{i})-\gamma _{r})\tilde{X}_{ir} \vert =o_{p}(n^{1/2})\). By Markov’s inequality, for any \(\kappa >0\),

$$\begin{aligned} P\Bigl\{ n^{1/2}\max_{1\leq i\leq n} \bigl\vert \bigl(\hat{ \beta }_{r}(U_{i})-\beta _{r}(U_{i}) \bigr) \tilde{X}_{ir} \bigr\vert >\kappa \Bigr\} &\leq \sum _{i=1}^{n} P\bigl\{ \bigl\vert \hat{\beta }_{r}(U_{i})- \beta _{r}(U_{i})) \tilde{X}_{ir} \bigr\vert >\kappa \sqrt{n}\bigr\} \\ &\leq \frac{1}{n\kappa ^{2}}\sum_{i=1}^{n}E \bigl\{ \bigl[\hat{\beta }_{r}(U_{i})- \beta _{r}(U_{i}))\bigr]^{2}\tilde{X}^{2}_{ir} \bigr\} \to 0. \end{aligned}$$

This is \(\max_{1\leq i\leq n} \vert (\hat{\beta }_{r}(U_{i})-\beta _{r}(U_{i})) \tilde{X}_{ir} \vert =o_{p}(n^{1/2})\). The proof is completed. □

Lemma 6

Under the conditions of Theorem 3, we have

$$ \lambda =O_{p}\bigl(n^{-1/2}\bigr). $$


By using Lemmas 35 and the same method in Owen [14], we can prove this lemma. Here, we omit the process of the proof. □

Proof of Theorem 1

The proposed estimates \(\hat{\gamma }_{r}\) can be denoted by

$$\begin{aligned} \hat{\gamma }_{r} =&\frac{1}{\bar{\tilde{X}}_{r}}\sum _{i=1}^{n} \frac{1}{n}\hat{\beta }_{r}(U_{i})\tilde{X}_{ir} \\ =&\frac{1}{\bar{\tilde{X}}_{r}}\frac{1}{n}\sum_{i=1}^{n} \bigl[ \hat{\beta }_{r}(U_{i})-\beta _{r}(U_{i}) \bigr]\tilde{X}_{ir} + \frac{1}{\bar{\tilde{X}}_{r}}\frac{1}{n}\sum _{i=1}^{n}\beta _{r}(U_{i}) \tilde{X}_{ir} \\ =&Q_{1}+Q_{2}. \end{aligned}$$

Applying Lemma 2, we have

$$\begin{aligned}& Q_{1}=O_{p}(C_{n})\frac{1}{\bar{\tilde{X}}_{r}} \frac{1}{n}\sum_{i=1}^{n} \tilde{X}_{ir}=O_{p}(C_{n}), \\& \begin{aligned} Q_{2}&=\frac{1}{\bar{\tilde{X}}_{r}}\frac{1}{n}\sum _{i=1}^{n} \biggl[\gamma _{r} \frac{\psi (U_{i})}{\phi _{r}(U_{i})}\phi _{r}(U_{i}) \tilde{X}_{ir} \biggr] \\ &=\gamma _{r}\frac{1}{\bar{\tilde{X}}_{r}}\frac{1}{n}\sum _{i=1}^{n} \psi _{r}(U_{i})X_{ir}. \end{aligned} \end{aligned}$$

It is obvious that the following result holds: \(\bar{\tilde{X}}_{r}=E(X_{r})+O_{p}(n^{-1/2})\). Thereafter, we get \(Q_{2}=\gamma _{r}+O_{p}(n^{-1/2})\). The law of large numbers is used in the previous equations, and Theorem 1 holds. □

Proof of Theorem 2

Motivated by methodology in Sentürk and Müller [19], we show

$$ \sqrt{n} \begin{pmatrix} \frac{1}{n}\sum_{i=1}^{n}\hat{\beta }_{r}(U_{i})\tilde{X}_{ir}- \gamma _{r}E(X_{r}) \\ \bar{\tilde{X}}_{r}-E(X_{r}) \end{pmatrix}\stackrel{D}{\to }N\left ({ \mathbf{0}}, \begin{pmatrix} \varSigma _{r,11},&\varSigma _{r,12} \\ \varSigma _{r,21},&\varSigma _{r,22}\end{pmatrix} \right ), $$


$$\begin{aligned}& \varSigma _{r,11}=\gamma _{r}^{2}E \bigl(X_{r}^{2}\bigr)\operatorname{var}\bigl(\psi (U)\bigr)+ \gamma _{r}^{2}\operatorname{var}(X_{r}), \\& \varSigma _{r,12}=\varSigma _{r,21}=\gamma _{r}\bigl[E \bigl(\phi _{r}(U)\psi (U)\bigr)E\bigl(X_{r}^{2} \bigr)-\bigl(E(X_{r})\bigr)^{2}\bigr],\quad \quad \varSigma _{r,22}=\operatorname{var}\bigl(\phi _{r}(U)X_{r} \bigr). \end{aligned}$$

The asymptotic normality of \(\sqrt{n}(\hat{\gamma }_{r}-\gamma _{r})\) for \(r=0,\ldots,n\) will follow (A.8) with a simple application of the δ-method, since \(\hat{\gamma }_{r}=\sum_{i=1}^{n}n^{-1}\hat{\beta }_{r}(U_{i}) \tilde{X}_{ir}/\bar{\tilde{X}}_{r}\) as defined in (2.4). In view of the Cramér–Wald device, we need only verify that, for any real ab,

$$ \sqrt{n}\Biggl\{ a\Biggl[\sum_{i=1}^{n} \bigl(\hat{\beta }_{r}(U_{i})\tilde{X}_{ir} \bigr)/n- \gamma _{r}E(X_{r})\Biggr]+b\bigl[\bar{ \tilde{X}}_{r}-E(X_{r})\bigr]\Biggr\} \stackrel{D}{\to }N \bigl(0,\sigma _{r}^{*2}\bigr), $$


$$\begin{aligned} \sigma ^{*2}_{r} =&a^{2}\gamma _{r}^{2}E\bigl(X_{r}^{2}\bigr) \operatorname{var}\bigl(\psi (U)\bigr)+a^{2} \gamma _{r}^{2} \operatorname{var}(X_{r}) \\ & {} +2ab\gamma _{r}\bigl[E\bigl(\phi _{r}(U)\psi (U)\bigr)E \bigl(X_{r}^{2}\bigr)-\bigl(E(X_{r}) \bigr)^{2}\bigr]+b^{2} \operatorname{var}\bigl(\phi _{r}(U)X_{r}\bigr). \end{aligned}$$


$$\begin{aligned}& \sqrt{n} \Biggl\{ a\Biggl[\sum_{i=1}^{n} \bigl(\hat{\beta }_{r}(U_{i})\tilde{X}_{ir} \bigr)/n- \gamma _{r}E(X_{r})\Biggr]+b\bigl[\bar{ \tilde{X}}_{r}-E(X_{r})\bigr] \Biggr\} \\& \quad = \sqrt{n} \Biggl\{ \frac{a}{n}\sum_{i=1}^{n} \bigl[\bigl(\hat{\beta }_{r}(U_{i})- \beta _{r}(U_{i})\bigr)\tilde{X}_{ir}\bigr]+ \frac{a}{n}\sum_{i=1}^{n}\bigl[\beta _{r}(U_{i}) \tilde{X}_{ir}\bigr]-{a\gamma _{r}E(X_{r})} \\& \quad\quad {} +\frac{b}{n}\sum_{i=1}^{n} \tilde{X}_{ir}-{ bE(X_{r})} \Biggr\} \\& \quad = \sqrt{n} \Biggl\{ \frac{a}{n}\sum_{i=1}^{n} \bigl[\bigl(\hat{\beta }_{r}(U_{i})- \beta _{r}(U_{i})\bigr)\tilde{X}_{ir}\bigr] \Biggr\} + \sqrt{n} \Biggl\{ \frac{a}{n}\sum_{i=1}^{n} \bigl[\beta _{r}(U_{i})\tilde{X}_{ir}\bigr]-a \gamma _{r}E(X_{r}) \\ & \quad\quad {} +\frac{b}{n}\sum_{i=1}^{n} \tilde{X}_{ir}-bE(X_{r}) \Biggr\} \\ & \quad = I_{1}+I_{2}. \end{aligned}$$

For \(I_{1}\), using Lemma 2 and the conditions in (C1)–(C6), we have

$$\begin{aligned}& \sqrt{n}\sum_{i=1}^{n}\frac{1}{n} \bigl(\bigl(\hat{\beta }_{r}(U_{i})-\beta _{r}(U_{i})\bigr) \tilde{X}_{ir}\bigr) \\ & \quad = \sqrt{n}\sum_{i=1}^{n} \frac{1}{n} \Biggl\{ - \frac{\psi (U_{i})}{f_{U}(U_{i})f(0)} \sum_{j=1}^{n} \biggl( \frac{1}{nh}K\bigl((U_{j}-U_{i})/h\bigr)] \biggl[I(e_{j}\le 0)-\frac{1}{2} \biggr] \\ & \quad \quad{} \times \bigl(E\bigl(\tilde{\boldsymbol{X}}_{i}\tilde{ \boldsymbol{X}}_{i}^{\tau }| U_{i} \bigr)^{-1} \tilde{\boldsymbol{X}}_{j}\bigr)_{r} \biggr)+\frac{1}{\sqrt{nh}}O_{p}\bigl(h^{2}+ \log ^{1/2}(1/h)/ \sqrt{nh}\bigr) \Biggr\} \tilde{X}_{ir} \\ & \quad = \sqrt{n}\sum_{i=1}^{n} \frac{1}{n} \Biggl\{ - \frac{\psi (U_{i})}{f_{U}(U_{i})f(0)} \sum_{j=1}^{n} \biggl( \frac{1}{nh}K\bigl((U_{j}-U_{i})/h\bigr)] \biggl[I(e_{j}\le 0)-\frac{1}{2} \biggr] \\ & \quad \quad{} \times \bigl(E\bigl(\tilde{\boldsymbol{X}}_{i}\tilde{ \boldsymbol{X}}_{i}^{\tau }| U_{i} \bigr)^{-1} \tilde{\boldsymbol{X}}_{j}\bigr)_{r} \biggr) \Biggr\} \tilde{X}_{ir} +o_{p}(1) \\ & \quad = G_{n}+o_{p}(1). \end{aligned}$$

Now, let us deal with \(G_{n}\). Notice that \(E(I(e_{j}\le 0)-1/2)=0\). Hence, we get

$$\begin{aligned} E\bigl(G_{n}^{2}\bigr) \le & \frac{C}{n}\sum _{j=1}^{n}E \Biggl\{ \sum_{i=1}^{n} \frac{1}{nh}K\bigl((U_{j}-U_{i})/h\bigr) \biggl[I(e_{j}\le 0)-\frac{1}{2} \biggr] \tilde{X}_{ir} ) \Biggr\} ^{2} \\ =&\frac{1}{(nh)^{2}}O(nh)=o(1). \end{aligned}$$

This implies \(I_{1}=o_{p}(1)\). By the central limit theorem,

$$ I_{2}\stackrel{D}{\to }N\bigl(0,\sigma _{r}^{*2} \bigr). $$

This completes the proof of Theorem 2. □

Proof of Theorem 3

We use a Taylor expansion to (3.1), and by Lemmas 36 we can obtain

$$ \hat{L}_{n}(\gamma _{r})=2\sum _{i=1}^{n}\bigl\{ \lambda \hat{V}_{i}( \gamma _{r})-\bigl[\lambda \hat{V}_{i}(\gamma _{r})\bigr]^{2}/2\bigr\} +o_{p}(1). $$

By Eq. (3.2), we have

$$\begin{aligned} 0&=\frac{1}{n}\sum_{i=1}^{n} \frac{\hat{V}_{i}(\gamma _{r})}{1+\lambda \hat{V}_{i}(\gamma _{r})} \\ &=\frac{1}{n}\sum_{i=1}^{n} \hat{V}_{i}(\gamma _{r})-\frac{1}{n}\sum _{i=1}^{n} \hat{V}^{2}_{i}(\gamma _{r})\lambda + \frac{1}{n}\sum_{i=1}^{n} \frac{\hat{V}^{3}_{i}(\gamma _{r})\lambda ^{2}}{1+\lambda \hat{V}_{i}(\gamma _{r})}. \end{aligned}$$

By Lemmas 36, the final term of (A.11) has norm bounded by

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n} \bigl\vert \hat{V}^{3}_{i}(\gamma _{r}) \bigr\vert \lambda ^{2} \bigl\vert 1+ \lambda \hat{V}_{i}(\gamma _{r}) \bigr\vert ^{-1}&\leq O_{p} \bigl(n^{-1}\bigr)\max_{1\leq i\leq n} \bigl\vert \hat{V}_{i}(\gamma _{r}) \bigr\vert \frac{1}{n}\sum _{i=1}^{n} \bigl\vert \hat{V}_{i}( \gamma _{r}) \bigr\vert ^{2} \\ &=O_{p}\bigl(n^{-1}\bigr)o_{p} \bigl(n^{1/2}\bigr)O_{p}(1) \\ &=o_{p}\bigl(n^{-1/2}\bigr). \end{aligned}$$

This, together with (A.11), yields

$$\begin{aligned}& \sum_{i=1}^{n}\bigl[\hat{V}_{i}( \gamma _{r})\lambda \bigr]^{2}=\sum _{i=1}^{n} \hat{V}_{i}(\gamma _{r})\lambda +o_{p}(1), \\& \lambda = \Biggl[\sum_{i=1}^{n} \hat{V}^{2}_{i}(\gamma _{r}) \Biggr]^{-1} \sum_{i=1}^{n}\hat{V}_{i}(\gamma _{r})+o_{p}\bigl(n^{-1/2}\bigr). \end{aligned}$$

Then, by (A.10), we have

$$ \hat{L}_{n}(\gamma _{r})= \Biggl[\frac{1}{\sqrt{n}}\sum _{i=1}^{n} \hat{V}_{i}(\gamma _{r}) \Biggr]^{2} \Biggl[\frac{1}{n}\sum _{i=1}^{n} \hat{V}^{2}_{i}(\gamma _{r}) \Biggr]^{-1}+o_{p}(1). $$

This, together with Lemmas 3 and 4, completes the proof. □

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Wang, D. \(L_{1}\)-Estimation for covariate-adjusted regression. J Inequal Appl 2020, 75 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Covariate-adjusted regression
  • Least absolute deviation estimation
  • Asymptotic normality
  • Local linear estimate