 Research
 Open Access
 Published:
Adaptive bridge estimation for highdimensional regression models
Journal of Inequalities and Applications volume 2016, Article number: 258 (2016)
Abstract
In highdimensional models, the penalized method becomes an effective measure to select variables. We propose an adaptive bridge method and show its oracle property. The effectiveness of the proposed method is demonstrated by numerical results.
Introduction
For the classical linear regression model \(Y=X\beta+\varepsilon\), we are interested in the problem of variable selection and estimation, where \(Y=(y_{1},y_{2},\ldots,y_{n})^{T}\) is the response vector, \(X=(X_{1},X_{2},\ldots,X_{p})=(x_{1},x_{2},\ldots,x_{n})^{T}=(x_{ij})_{n\times p}\) is an \(n\times p\) design matrix, and \(\varepsilon=(\varepsilon_{1},\varepsilon_{2},\ldots,\varepsilon_{n})^{T}\) is a random vector. The main topic is how to estimate the coefficients vector \(\beta\in\mathrm{R}^{p}\) when p increases with sample size n and many elements of β equal zero. We can transfer this problem into a minimization of a penalized least squares objective function
where \(\\cdot\\) is the \(l_{2}\) norm of the vector, λ is a tuning parameter. For \(\zeta>0\), β̂ is called the bridge estimator proposed by Frank and Friedman [1]. There are two wellknown special cases of the bridge estimator. If \(\zeta=2\), it is the ridge estimator in Hoerl and Kennard [2]; if \(\zeta=1\), it is the Lasso estimator by Tibshirani [3], which does not possess the oracle property in Fan and Li [4]. For \(0<\zeta\leq 1\), Knight and Fu [5] studied the asymptotic distributions of bridge estimators when the number of covariates is fixed and provided a theoretical justification for the use of bridge estimators to select variables. The bridge estimators can distinguish between the covariates whose coefficients are exactly zero and the covariates whose coefficients are nonzero. There is much statistical literature about penalizationbased methods. Some examples include the SCAD by Fan and Li [4], the elastic net by Zou and Hastie [6], the adaptive lasso by Zou [7], the Dantzig selector by Candes and Tao [8] and the nonconcave MCP penalty by Zhang [9]. For bridge estimation, Huang et al. [10] extended the results of Knight and Fu [5] to infinite dimensional parameters and showed that for \(0<\zeta<1\) the bridge estimator can correctly select covariates with nonzero coefficients and under appropriate conditions the bridge estimator enjoys the oracle property. Subsequently, Wang et al. [11] studied the consistency of the bridge estimator for a generalized linear model.
In this paper, we consider the following penalized model:
where \(\tilde{\omega}=(\tilde{\omega}_{1},\tilde{\omega}_{2},\ldots,\tilde {\omega}_{p})^{T}\) is a given vector of weights. Usually, if we let the initial estimator \(\tilde{\beta}=(\tilde{\beta}_{1},\tilde{\beta }_{2},\ldots,\tilde{\beta}_{p})^{T}\) be the nonpenalized MLE, then \(\tilde {\omega}_{j}=\tilde{\beta}_{j}^{1}\), \(j=1,2,\ldots,p\). β̂ is called the adaptive bridge estimator. We propose and study the adaptive bridge estimator method (abridge for short). We derive some theoretical properties of the adaptive bridge estimator for the case when p can increase to infinity with n. Under some conditions, with the choice of the tuning parameter, we show that the adaptive bridge estimator enjoys the oracle property; that is, the adaptive bridge estimator can correctly select covariates with nonzero coefficients with probability converging to one and that the estimator of nonzero coefficients has the same asymptotic distribution that it would have if the zero coefficients were known in advance.
As far as we know, there is no literature to discuss the properties of an adaptive bridge, so our results make up for this. Compared with the results in Huang et al. [10] and Wang et al. [11], the condition (A_{2}) (see Section 2) imposed on the true coefficients is much weaker. Moreover, in Wang et al. [11] one needs the true coefficients to meet the additional condition called covering number. Besides, Huang et al. [10] and Wang et al. [11] both use the LQA algorithm to obtain the estimator. The shortcoming of the LQA algorithm is that if we delete one variable in some step of the iteration, this variable will have no chance to appear in the final model. In order to improve this algorithm, we employ the MM algorithm to improve the stability.
The rest of the paper is organized as follows. In Section 2, we introduce notations and assumptions which will be needed in the our results and present the main results. Section 3 presents some simulation results. The conclusion and the proofs of the main results are arranged in Sections 4 and 5.
Main results
For convenience of the statement, we first give some notations. Let \(\beta_{0}=(\beta_{01},\beta_{02},\ldots,\beta_{0p})^{T}\) be the true parameter, \(J_{1}=\{j:\beta_{0j}\neq0,j=1,2,\ldots,p\}\), \(J_{2}=\{j:\beta _{0j}=0,j=1,2,\ldots,p\}\), the cardinality of the set \(J_{1}\) is denoted by q and \(h_{1}=\min\{\beta_{0j}:j\in J_{1}\}\). Without loss of generality, we assume that the first q coefficients of covariates (denoted by \(X_{(1)}\)) are nonzero, \(X_{(2)}\) be covariates with zero coefficients, \(\beta_{0}=(\beta_{0(1)}^{T},\beta_{0(2)}^{T})^{T}\), \(\hat{\beta }=(\hat{\beta}_{(1)}^{T},\hat{\beta}_{(2)}^{T})^{T}\) correspondingly. Actually, p, q, X, Y, β, and λ are related to the sample size n, we omit n for convenience. In this paper, we only consider the statistical properties of the adaptive bridge for the case of \(p< n\); consequently we put \(p=O(n^{c_{2}})\), \(q=O(n^{c_{1}})\), \(\lambda =O(n^{\delta})\), where \(0\leq c_{1}< c_{2}<1\), \(\delta>0\). Here we use the terminology in Zhao and Yu [12] , and we define \(\hat{\beta}=_{s}\beta _{0}\) if and only if \(\operatorname{sgn}(\hat{\beta})=\operatorname{sgn}(\beta_{0})\), where we denote the sign of a \(p\times1\) vector β as \(\operatorname{sgn}(\beta )=(\operatorname{sgn}(\beta_{1}), \operatorname{sgn}(\beta_{2}),\ldots, \operatorname{sgn}(\beta _{p}))^{T}\). For any symmetric matrix Z, denote by \(\lambda_{\mathrm{min}}(Z)\) and \(\lambda_{\mathrm{max}}(Z)\) the minimum and maximum eigenvalue of Z, respectively. Denote \(\frac{X^{T}X}{n}:=D\) and \(D=\bigl ( {\scriptsize\begin{matrix}{} D_{11} &D_{12} \cr D_{21}& D_{22}\end{matrix}} \bigr )\), where \(D_{11}=\frac{1}{n}X_{(1)}^{T}X_{(1)}\).
Next, we state some assumptions which will be needed in the following results.
 (A_{1}):

The error term ε is i.i.d. with \(E(\varepsilon)=0\) and \(E(\varepsilon^{2k})<+\infty\), where \(k>0\). For the special case we denote \(E(\varepsilon^{2})=\sigma^{2}\).
 (A_{2}):

There exists a positive constant M such that \(h_{1}\geq Mn^{\alpha}\), where \(\max\{\frac{1}{2},\frac{c_{2}1}{2},\frac{1}{2\zeta }\}<\alpha<\min\{c_{2}\delta,\frac{c_{2}\delta\zeta}{1+\zeta}\}\) and \(\delta+\alpha+\frac{1}{2}\zeta< c_{2}\).
 (A_{3}):

Suppose \(\tau_{1}\) and \(\tau_{2}\) are the minimum and maximum eigenvalues of the matrix \(D_{11}\). There exist constants \(\tau_{10}\) and \(\tau_{20}\) such that \(0<\tau_{10}\leq\tau_{1}\leq\tau_{2}\leq\tau _{20}\), and the eigenvalues of \(\frac{1}{n}X^{T}\operatorname{var}(Y)X\) are bounded.
 (A_{4}):

Let \(g_{i}\) be the transpose of the ith row vector of \(X_{(1)}\), such that \(\lim_{n\rightarrow\infty}n^{\frac {1}{2}} \max_{1\leq i\leq n}g_{i}^{T} g_{i}=0\).
It is worth mentioning that condition (A_{1}) is much weaker than those in the literature where it is commonly assumed that the error term has Gaussian tail probability distribution. In this paper we allow ε to have a heavy tail. The regularity condition (A_{2}) is a common assumption for the nonzero coefficients, which can ensure that all important covariates could be included in the finally selected model. Condition (A_{3}) means that the matrix \(\frac {1}{n}X_{(1)}^{T}X_{(1)}\) is strictly positive definite. For condition (A_{4}), we will use it to prove the asymptotic normality of the estimators of the nonzero coefficients. In fact, if the nonzero coefficients have an upper bound, then we can easily verify condition (A_{4}).
Consistency of the estimation
Theorem 2.1
Consistency of the estimation
If \(0<\zeta<2\), and conditions (A_{1})(A_{3}) hold, then there exists a local minimizer β̂ of \(Q(\beta)\), such that \(\\hat{\beta }\beta_{0}\=O_{p}(n^{\frac{\delta+\alphac_{2}}{\zeta}})\).
Remark 2.1
By condition (A_{2}), we know that \(c_{2}\delta\alpha>0\) and the estimator consistency refers to the order of sample size and tuning parameter. Theorem 2.1 extends the previous results.
Oracle property of the estimation
Theorem 2.2
Oracle property
If \(0<\zeta<1\), and conditions (A_{1})(A_{4}) hold, then the adaptive bridge estimator satisfies the following properties.

(1)
(Selection consistency) \(\lim_{n \rightarrow\infty }P\{\hat{\beta}=_{s}\beta_{0}\}=1\);

(2)
(Asymptotic normality) \(\sqrt{n}s^{1}u^{T}(\hat{\beta }_{(1)}\beta_{0(1)})\stackrel{\mathrm{d}}{\longrightarrow} N(0,1)\), where \(s^{2}=\sigma^{2}u^{T}D_{11}^{1}u\) for any \(q\times1\) vector u and \(\u\\leq1\).
Remark 2.2
By Theorems 2.1 and 2.2, we can easily see that the adaptive bridge is able to consistently identify the true model.
Simulation results
In this section we evaluate the performance of the adaptive bridge estimator proposed in (1.1) by simulation studies. Set \(\zeta=1/2\) and simulate the data by the model \(Y=X\beta+\varepsilon\), \(\varepsilon\sim N(0,\sigma^{2})\), where \(\sigma=1\), \(\beta _{0(1)}=(2.5,2.5,2.5,3,3,3,1,1,1)^{T}\). The design matrix X is generated by a pdimensional multivariate normal distribution with mean zero and a covariance matrix whose \((i,j)\)th component is \(\rho ^{ij}\), where we let \(\rho=0.5\mbox{ and }0.9\), respectively. The following examples are considered.
Example 3.1
The sample size \(n=200\) and the covariates number \(p=50\).
Example 3.2
The sample size \(n=500\) and the covariates number \(p=80\).
Example 3.3
The sample size \(n=800\) and the covariates number \(p=100\).
We connect the minorizationmaximization (MM) algorithm by Hunter and Li [13] and the NewtonRaphson method to estimate the adaptive abridge (abridge), where the tuning parameter is selected by 5fold crossvalidation. Meanwhile, we compare our results with that from lasso [14], adaptive lasso (alasso) and bridge methods. In order to evaluate the performance of the estimators, we select four measures called \(L_{2}\)loss, PE, C, and IC. \(L_{2}\)loss is median of \(\\hat{\beta }\beta_{0}\_{2}\) to evaluate the estimation accuracy, and PE is the prediction error defined by median of \(n^{1}\YX\hat{\beta}\^{2}\). The other two measures are to qualify the performance of model consistency, where C and IC refer to the average number of correctly selected zero covariates and the average number of incorrectly selected zero covariates. The numerical results are listed in Table 1 and Table 2, where υ equals the number of zero coefficients in the true model and the numbers in parentheses are the corresponding standard deviations which are obtained by 500 replicates.
Note that in every case the adaptive bridge outperforms the other methods in sparsity, which can select the smaller model. For the adaptive bridge the prediction error is a little higher than the other methods, but when consider the estimation accuracy, the adaptive bridge is still the winner, followed by bridge. We also find the interesting fact that with the sample size n larger, the performance of correctly selecting the zero covariates for the adaptive bridge is better whenever \(\rho=0.5\mbox{ or }0.9\). Meanwhile with n increasing, the estimation accuracy performs better, but the prediction error is worse. Additionally, when ρ increases, the prediction error increases, but the estimation accuracy decreases.
Conclusion
In this paper we have proposed the adaptive bridge estimator and presented some theoretical properties of the adaptive bridge estimator. Under some conditions, with the choice of the tuning parameter, we have showed that the adaptive bridge estimator enjoys the oracle property. The effectiveness of the proposed method is demonstrated by numerical results.
Proofs
Proof of Theorem 2.1
In view of the idea in Fan and Li [4], we only need to prove that, for any \(\epsilon>0\), there exists a large constant C such that
which means that with a probability of at least \(1\epsilon\) there exists a local minimizer β̂ in the ball \(\{\beta_{0}+\theta u:\u\\leq C\}\).
First, let \(\theta=n^{\frac{\delta+\alphac_{2}}{\zeta}}\), then
where \(T_{1}=\lambda_{\mathrm{min}}(\frac{X^{T}X}{n})\theta^{2}n\u\^{2}\), \(T_{2}=n\theta u^{T}\frac{X^{T}(YX\beta_{0})}{n}\), and \(T_{3}=\lambda\sum_{j=1}^{p}\tilde {\omega}_{j}\theta^{\zeta}\u\^{\zeta}\).
For \(T_{2}\), set \(v=O_{P}(n^{\alpha})\) and by assumptions (A_{2}) and (A_{3}) we have
Hence
As for \(T_{3}\), observe that \(\\tilde{\beta}\beta_{0}\=O_{P}((\frac {p}{n})^{1/2})\), \(\min\beta_{j}\leq\max\tilde{\beta_{j}}\beta _{0j}+\min\tilde{\beta_{j}}\), and assumption (A_{2}), we can obtain
This together with assumption (A_{2}) yields \(P\{\min\tilde{\beta _{j}}\geq\frac{1}{2}Mn^{\alpha}\}\rightarrow1\) (\(n\rightarrow\infty\)).
For \(v_{1}=\frac{2\lambda p}{Mn^{\alpha}}\), \(P\{\lambda\sum_{j=1}^{p}\tilde{\omega}_{j}\leq v_{1}\}\geq P\{\frac {\lambda p}{\min\tilde{\beta_{j}}}\leq v_{1}\}=P\{\min\tilde{\beta _{j}}\geq\frac{\lambda p}{v_{1}}\}\rightarrow1\) (\(n\rightarrow\infty\)), i.e., \(\lambda\sum_{j=1}^{p}\tilde{\omega}_{j}=O_{P}(\frac {\lambda p}{Mn^{\alpha}})\). Now with assumption (A_{2}) we conclude that
When \(0<\zeta<2\) and C is large enough, by (5.3) and (5.4) we see that (5.2) is determined by \(T_{1}\), so (5.1) holds. □
Proof of Theorem 2.2
(1) First of all, by the KKT condition we know that β̂ is the defined adaptive bridge estimator, if the following holds:
Let \(\hat{u}=\hat{\beta}\beta_{0}\) and define \(V(u)=\sum_{j=1}^{n}(\varepsilon_{i}X_{i}^{T}u)^{2}+\lambda\sum_{j=1}^{p}\tilde{\omega }_{j}u_{j}+\beta_{0j}^{\zeta}\), then we obtain \(\hat{u}=\arg\min_{u}V(u)\). Notice that \(\sum_{j=1}^{n}(\varepsilon_{i}X_{i}^{T}u)^{2}=2\varepsilon ^{T}Xu+nu^{T}Du+\varepsilon^{T}\varepsilon\), which yields \(\frac{d[\sum_{j=1}^{n}(\varepsilon_{i}X_{i}^{T}u)^{2}]}{du}_{u=\hat {u}}=2X^{T}\varepsilon+2nD\hat{u} :=2\sqrt{n}[D(\sqrt{n}\hat{u})E]\), where \(E=\frac{X^{T}\varepsilon}{\sqrt{n}}\). Together with (5.5) and the fact \(\{\hat{u}_{(1)}<\beta_{0(1)}\} \subset\{\operatorname{sgn}(\hat{\beta}_{(1)})=\operatorname{sgn}(\beta_{0(1)})\}\), if û satisfies
where \(\bar{W}=(\tilde{\omega}_{1}\hat{u}_{(1)}+\beta_{01}^{\zeta 1}\operatorname{sgn}(\beta_{01}),\tilde{\omega}_{2}\hat{u}_{(1)}+\beta _{02}^{\zeta1}\operatorname{sgn}(\beta_{02}), \ldots,\tilde{\omega}_{p}\hat{u}_{(1)}+\beta_{0p}^{\zeta1} \operatorname{sgn}(\beta _{0p}))^{T}\), then we have \(\operatorname{sgn}(\hat{\beta}_{(1)})=\operatorname{sgn}(\beta_{0(1)})\) and \(\hat{\beta}_{(2)}=0\). Let
it follows that \(D_{11}^{1}E_{(1)}+\frac{\lambda\zeta}{2\sqrt{n}}D_{11}^{1}\tilde {W}_{(1)}<\sqrt{n}\beta_{0(1)}\). Denote \(A=\{ D_{11}^{1}E_{(1)}+\frac{\lambda\zeta}{2\sqrt{n}}D_{11}^{1}\tilde {W_{(1)}}<\sqrt{n}\beta_{0(1)}\}\), we conclude that \(P\{\operatorname{sgn}(\hat{\beta})=\operatorname{sgn}(\beta_{0})\}\geq P\{A\}\), from which it follows that
where \(\xi=(\xi_{1},\xi_{2},\ldots,\xi_{q})^{T}=D_{11}^{1}E_{(1)}\), \(Z=(Z_{1},Z_{2},\ldots,Z_{q})^{T}=D_{11}^{1}W_{(1)}\). For \(I_{1}=P\{\xi_{i}\geq\frac{1}{2}\sqrt{n}\beta_{0i}, \exists i\in J_{1}\}\), then \(E[(\xi_{i})^{2k}]<\infty\), \(\forall i\in J_{1}\). So its tail probability satisfies \(P\{\xi_{i}>t\}=O(t^{2k})\), \(\forall t>0\), which yields
For \(I_{2}\), notice that \(1I_{2}=P\{\frac{\lambda\zeta}{n}Z_{i}\leq\beta _{0i}, \exists i\in J_{1} \}\) and \(Z_{i}\leq\ D_{11}^{1}\tilde{W}_{(1)}\\leq\frac{1}{\tau_{1}}\\tilde{W}_{(1)}\\leq \frac{2\sqrt{q}h_{1}^{\zeta1}}{\tau_{1}\min_{j\in J_{1}}\tilde{\beta}_{j}}\), then we can get
This follows that \(I_{2}\rightarrow0\) (\(n\rightarrow\infty\)). Together with (5.6) and (5.7), \(\lim_{n \rightarrow\infty}P\{\hat {\beta}=_{s}\beta_{0}\}=1\) holds. This completes the proof of the first part of Theorem 2.2.
(2) Let \(W=(\tilde{\omega}_{1}\hat{\beta_{1}}^{\zeta1}\operatorname{sgn}(\hat{\beta}_{1}),\tilde{\omega}_{2}\hat{\beta}_{2}^{\zeta1}\operatorname{sgn}(\hat{\beta}_{2}), \ldots,\tilde{\omega}_{p}\hat{\beta}_{p}^{\zeta1}\operatorname{sgn}(\hat{\beta }_{p}))^{T}\), we can easily get \(\frac{\partial\YX\beta\^{2}}{\partial \beta_{j}}_{\beta=\hat{\beta}}=0\), \(j\in J_{1}\), i.e., \(X_{(1)}^{T}(YX_{(1)}\hat{\beta}_{(1)})=X^{T}_{(1)}(X_{(1)}\beta _{0(1)}X_{(1)}\hat{\beta}_{(1)}+\varepsilon)=\frac{\lambda\zeta}{2}W_{(1)}\), which yields \(D_{11}(\hat{\beta}_{(1)}\beta_{0(1)})=\frac{X_{(1)}^{T}\varepsilon }{n}\frac{\lambda\zeta}{2n}W_{(1)}\). It follows from the first part of Theorem 2.2 that \(\lim_{n \rightarrow \infty}P\{D_{11}(\hat{\beta}_{(1)}\beta_{0(1)})=\frac {X_{(1)}^{T}\varepsilon}{n}\frac{\lambda\zeta}{2n}W_{(1)}\}=1\), then we can see that, for any \(q\times1\) vector u and \(\u\\leq1\),
Notice that
where the second inequality holds because \(P\{\min_{j\in J_{1}}\hat{\beta }_{j}\geq\frac{1}{2}M_{1}n^{\alpha}\}\rightarrow1\) (\(n\rightarrow\infty\)), for \(M_{1}>0\). By \(\frac{1}{2}c_{1}(\zeta1)+\alpha(\zeta2)\frac{1}{2}<0\), we obtain \(\frac{\lambda\zeta}{2\sqrt{n}}u^{T}D_{11}^{1}W_{(1)}=o_{P}(1)\), which together with (5.8) yields
Denote \(s^{2}=\sigma^{2}u^{T}D_{11}^{1}u\) and \(F_{i}=n^{\frac {1}{2}}s^{1}u^{T}D_{11}^{1}g_{i}^{T}\), by assumption (A_{4}) and (5.9) we have \(\sqrt{n}s^{1}u^{T}(\hat{\beta}_{(1)}\beta_{0(1)})=\sum_{i=1}^{n}F_{i}\varepsilon_{i}+o_{P}(1)\stackrel{\mathrm{d}}{\longrightarrow} N(0,1)\). This completes the proof of the second part of Theorem 2.2. □
References
Frank, IE, Friedman, JH: A statistical view of some chemometrics regression tools. Technometrics 35, 109148 (1993) (with discussion)
Hoerl, AE, Kennard, RW: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 5567 (1970)
Tibshirani, R: The lasso method for variable selection in the Cox model. Stat. Med. 16, 385395 (1997)
Fan, J, Li, R: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 13481360 (2001)
Knight, K, Fu, WJ: Asymptotics for lassotype estimators. Ann. Stat. 28, 13561378 (2000)
Zou, H, Hastie, T: Regularization and variable selection via the elastic net. J. R. Stat. Soc., Ser. B 67, 301320 (2005)
Zou, H: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 14181429 (2006)
Candes, E, Tao, T: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 23132351 (2007)
Zhang, C: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894942 (2010)
Huang, J, Ma, S, Zhang, CH: Adaptive lasso for sparse highdimensional regression models. Stat. Sin. 18, 16031618 (2008)
Wang, M, Song, L, Wang, X: Bridge estimation for generalized linear models with a diverging number of parameters. Stat. Probab. Lett. 80, 15841596 (2010)
Zhao, P, Yu, B: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 25412563 (2006)
Hunter, DR, Li, R: Variable selection using MM algorithms. Ann. Stat. 33, 16171642 (2005)
Efron, B, Hastie, T, Johnstone, I, Tibshirani, R: Least angle regression. Ann. Stat. 32, 407499 (2004) (with discussion)
Acknowledgements
The research was supported by the NSF of Anhui Province (No. 1508085QA13) and the China Scholarship Council.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Chen, Z., Zhu, Y. & Zhu, C. Adaptive bridge estimation for highdimensional regression models. J Inequal Appl 2016, 258 (2016). https://doi.org/10.1186/s136600161205y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s136600161205y
MSC
 62F12
 62E15
 62J05
Keywords
 adaptive bridge
 highdimensionality
 variable selection
 oracle property
 penalized method
 tuning parameter