Skip to main content

A limited memory BFGS subspace algorithm for bound constrained nonsmooth problems


The subspace technique has been widely used to solve unconstrained/constrained optimization problems and there exist many results that have been obtained. In this paper, a subspace algorithm combining with limited memory BFGS update is proposed for large-scale nonsmooth optimization problems with box-constrained conditions. This algorithm can ensure that all iteration points are feasible and the sequence of objective functions is decreasing. Moreover, rapid changes in the active set are allowed. The global convergence is established under some suitable conditions. Numerical results show that this method is very effective for large-scale nonsmooth box-constrained optimization, where the largest dimension of the test problems is 11,000 variables.


Consider the following large-scale nonsmooth optimization problems:

$$ \min f(x),\quad\text{s.t.}\quad l\leq x \leq u, $$

where \(f(x):\Re^{n}\rightarrow\Re\) is supposed to be locally Lipschitz continuous and the number of variables n is supposed to be large. The vectors l and u represent the lower and the upper bounds on the variables, respectively.

Many practical optimization problems involve nonsmooth functions with large amounts of variables (see, e.g., [1, 24]). The active-set method can be easily generalized when the objective function is nonsmooth. For example, Sreedharan [34] extends the method developed in [33] to solve nonsmooth problem with a special objective function and inequality constraint. Also, it is quite easy to generalize the ε-active-set method to the nondifferentiable case [25]. Yuan et al. [42, 45] use the two-point gradient method and the trust region method to solve nonsmooth problems. Wolfe [35] and Lemaréchal [22] initiated a giant stride forward in nonsmooth optimization by the bundle concept. Kiwiel [20] proposed a bundle variant, which is close to bundle trust iteration method [32]. Some good results about the bundle technique can be found in [19, 21, 31] etc. The basic assumption of the bundle methods is that at every point \(x\in\Re^{n}\), the value of the objective function \(f(x)\) and an arbitrary subgradient \(\xi \in\Re^{n}\) from the subdifferential can be evaluated [7] with

$$\partial f(x)=\operatorname{conv}\Bigl\{ \lim_{t\rightarrow\infty} \nabla f(x_{t})\mid x_{t}\rightarrow x\text{ and }\nabla f(x_{t})\text{ exists}\Bigr\} $$

where “conv” denotes the convex hull of a set. The idea behind bundle methods is to approximate \(\partial f(x)\) by gathering subgradient information from previous iterations into a bundle. At the moment, various versions of bundle methods are regarded as the most effective and reliable methods for nonsmooth optimization. Bundle methods are efficient for small- and medium-scale problems. This is explained by the fact that bundle methods need relatively large bundles to be capable of solving the problems efficiently [19]. At present, Haarala et al. (see [14, 15] etc.) introduce the limited memory bundle methods for large-scale nonsmooth unconstrained and constrained minimization, which are a hybrid of the variable metric bundle methods and the limited memory variable metric methods and some good results are obtained. The dimension of the test problems can be arrived one thousand variables. For unconstrained nonsmooth problems, Karmitsa et al. [18] tested and compared different methods from both groups and some methods which may be considered as hybrids of these two and/or some others, where the largest dimension of test nonsmooth problems is 4000. Yuan et al. [3840, 43, 46] present the conjugate gradient algorithms for solving unconstrained nonsmooth problems and the nonsmooth problems with 60,000 variables are successfully solved.

Normally the nonsmooth optimization problems are difficult to solve, even when they are unconstrained. Derivative free methods, like Powell’s method [11] or genetic algorithms [13] may be unreliable and become inefficient whenever the dimension of the problem increases. The direct application of smooth gradient-based methods to nonsmooth problems may lead to a failure in optimality conditions, in convergence, or in gradient approximation [23]. Therefore, special tools for solving nonsmooth optimization problems are needed, especially for constrained nonsmooth problems. In this paper, we present a new algorithm that combines an active-set strategy with the gradient projection method. The active sets are based on guessing technique to be identified at each iteration, the search direction in free subspace is determined by limited memory BFGS algorithm, which will provide an efficient means for attacking large-scale nonsmooth bound constrained optimization problems. This paper has the following main attributes:


the nonsmooth objective function is descent;


a limited memory BFGS method is given for nonsmooth problem; the iteration sequence \(\{x_{k}\}\) is feasible;


the global convergence of the new method is established;


large-scale nonsmooth problems (11,000 variables) are successfully solved.

This paper is organized as follows. In the next section, we briefly review some nonsmooth analysis, the L-BFGS method for unconstrained optimization, and the motivation based on these techniques. In Sect. 3, we describe the active-set algorithm for (1.1). The global convergence will be established in Sect. 4. Numerical results are reported in Sect. 5. Throughout this paper, \(\|\cdot\|\) denotes the Euclidean norm of vectors or matrix.

Motivation based on nonsmooth analysis and the L-BFGS update

This section will state some results on nonsmooth analysis and the L-BFGS formula, respectively.

Results of convex analysis and nonsmooth analysis

Let \(f^{\mathrm{MY}}:\Re^{n}\rightarrow\Re\) be the so-called Moreau–Yosida regularization of f and be defined by

$$ f^{\mathrm{MY}}(x)=\min_{z\in\Re^{n}}\biggl\{ f(z)+\frac{1}{2\lambda} \Vert z-x \Vert ^{2}\biggr\} , $$

where λ is a positive parameter. Then it is not difficult to see that the problem (1.1) is equivalent to the following problem:

$$ \min_{x\in\mathbf{B}} f^{\mathrm{MY}}(x), \mathbf{B}=\{x\mid l\leq x\leq u\}. $$

The function \(f^{\mathrm{MY}}\) has some good properties: it is a differentiable convex function and has a Lipschitz continuous gradient even when the function f is nondifferentiable. The gradient function of \(f^{\mathrm{MY}}\) can be proved to be semismooth under some reasonable conditions [12, 30]. Based on these features, many algorithms have been proposed for (2.2) (see [2] etc.) when \(\mathbf{B}=\Re^{n}\). Set

$$\theta(z)=f(z)+\frac{1}{2\lambda} \Vert z-x \Vert ^{2} $$

and denote \(p(x)=\operatorname{argmin} \theta(z)\). Then \(p(x)\) is well-defined and unique since \(\theta(z)\) is strongly convex. By (2.1), \(f^{\mathrm{MY}}(x)\) can be expressed by

$$f^{\mathrm{MY}}(x)=f\bigl(p(x)\bigr)+\frac{1}{2\lambda} \bigl\Vert p(x)-x \bigr\Vert ^{2}. $$

In what follows, we denote the gradient of \(f^{\mathrm{MY}}\) by g. Some features about \(f^{\mathrm{MY}}(x)\) can be seen in [3, 8, 16]. The generalized Jacobian of \(f^{\mathrm{MY}}(x)\) and the property of BD-regular can be found in [6, 29], respectively. Some properties are given as follows.

(i) The function \(f^{\mathrm{MY}}\) is finite-valued, convex, and everywhere differentiable with

$$ g(x)=\nabla f^{\mathrm{MY}}(x)=\frac{x-p(x)}{\lambda}. $$

Moreover, the gradient mapping \(g:\Re^{n}\rightarrow\Re^{n}\) is globally Lipschitz continuous with modulus λ, i.e.,

$$ \bigl\Vert g(x)-g(y) \bigr\Vert \leq\frac{1}{\lambda} \Vert x-y \Vert ,\quad \forall x,y\in \Re^{n}. $$

(ii) If g is BD-regular at x, which means all matrices \(V\in \partial_{B}g(x)\) are nonsingular, then there exist constants \(\mu_{1}>0\), \(\mu_{2}>0\) and a neighborhood Ω of x such that

$$d^{T}Vd\geq\mu_{1} \Vert d \Vert ^{2},\qquad \bigl\Vert V^{-1} \bigr\Vert \leq \mu_{2},\quad \forall d \in\Re^{n}, V\in\partial_{B}g(x). $$

It is obviously that \(f^{\mathrm{MY}}(x)\) and \(g(x)\) can be obtained through the optimal solution of \(\operatorname{argmin}_{z\in\Re^{n}}\theta(z)\). However, \(p(x)\), the minimizer of \(\theta(z)\), is difficult or even impossible to solve exactly. Such makes that we cannot apply the exact value of \(p(x)\) to define \(f^{\mathrm{MY}}(x)\) and \(g(x)\), thus the numerical methods are often used to solve it. In the following, we always suppose that the results (i)–(ii) holds without special notes.

L-BFGS update

At every iteration \(x_{k}\), the L-BFGS method stores a small number (say m) of correction pairs \(\{s_{i},y_{i}\}\) (\(i=k-1,\ldots,k-m\)) to get \(H_{k+1}\), instead of storing the matrices \(H_{k}\) with

$$s_{k}=x_{k+1}-x_{k}, y_{k}=\nabla h_{k+1}-\nabla h_{k}, $$

where \(h(x):\Re^{n}\rightarrow\Re\) is a continuously differentiable function. In fact, this method is an adaptation of the BFGS method to large-scale problems (see [4, 5, 37, 41, 44] for details). The L-BFGS update formula is defined by

$$\begin{aligned} H_{k+1} =& V_{k}^{T} \bigl[V_{k-1}^{T} H_{k-1}V_{k-1}+ \rho_{k-1} s_{k-1} s_{k-1}^{T}\bigr] V_{k}+\rho_{k} s_{k} s_{k}^{T} \\ =&V_{k}^{T}V_{k-1}^{T}H_{k-1}V_{k-1}+V_{k}^{T} \rho_{k-1} s_{k-1} s_{k-1}^{T} V_{k}+\rho_{k} s_{k} s_{k}^{T} \\ =&\cdots \\ =&\bigl[V_{k}^{T}\cdots V_{k-m+1}^{T} \bigr]H_{k-m+1}[V_{k-m+1}\cdots V_{k}] \\ &{}+ \rho_{k+m-1}\bigl[V_{k-1}^{T}\cdots V_{k-m+2}^{T}\bigr]s_{k-m+1}s_{k-m+1}^{T}[V_{k-m+2}\cdots V_{k-1}]+ \cdots+\rho_{k}s_{k}s_{k}^{T}, \end{aligned}$$

where \(\rho_{k}=\frac{1}{y_{k}^{T}s_{k}}\) and \(V_{k}=I-\rho_{k} y_{k} s_{k}^{T}\). These correction pairs contain information about the curvature of the function and, in conjunction with the BFGS formula, define the limited memory iteration matrix. This method often provides a fast rate of linear convergence and requires minimal storage.

It is well known that the positive definiteness of update matrix \(H_{k}\) is very important to analyze the convergence of the algorithm. Byrd et al. [4] show that the limited memory BFGS matrix has this property if the curvature \(s_{k}^{T}y_{k}>0\) is satisfied. Similarly Powell [28] proposes that \(y_{k}\) should be designed by

$$\begin{aligned} y_{k}=\left \{ \textstyle\begin{array}{l@{\quad}l} y_{k},& \mbox{if } s_{k}^{T}y_{k}\geq0.2s_{k}^{T}B_{k}s_{k},\\ \theta_{k}y_{k}+(1-\theta_{k})B_{k}s_{k},& \mbox{otherwise}, \end{array}\displaystyle \right . \end{aligned}$$

where \(\theta_{k}=\frac{0.8s_{k}^{T}B_{k}s_{k}}{s_{k}^{T}B_{k}s_{k}-s_{k}^{T}y_{k}}\), \(B_{k}\) is an approximation of \(\nabla^{2} h(x_{k})\) and \(B_{k}=H_{k}^{-1}\).

Inspired by the Moreau–Yosida regularization and the limited memory technique, we will give a limited memory BFGS method for box-constrained optimization with nonsmooth objective function. In the given algorithm, we also combine an active-set strategy with the gradient projection method. The techniques of the following algorithm are similar to those in Facchinei and Lucidi [10], Ni and Yuan [26], and Xiao and Wei [36], where the main difference lies in solving of the nonsmooth optimization problem.


Setting the feasible region \(\mathbf{B}=\{x\in\Re^{n}:l_{i}\leq x^{i} \leq u_{i},i=1,\ldots,n\}\), where \(x^{i}\) denotes the ith element of vector x. A vector \(\overline{x}\in\mathbf{B}\) is said to be a stationary point for problem (2.2) if the following relations:

$$\begin{aligned} \left \{ \textstyle\begin{array}{l} l_{i}=\overline{x}^{i} \quad\Rightarrow \quad g^{i}(\overline{x})\geq0,\\ l_{i} < \overline{x}^{i}< u_{i} \quad\Rightarrow \quad g^{i}(\overline{x})=0,\\ \overline{x}^{i}=u_{i} \quad\Rightarrow\quad g^{i}(\overline{x})\leq 0, \end{array}\displaystyle \right . \end{aligned}$$

hold, where \(g^{i}\) is the ith element of vector \(g(x)\). Considering the observation of the above section, we will first solve (2.2). And we will develop its solution to problem (1.1). The iterative method is used to solve (2.2) and defined by

$$ x_{k+1}=x_{k}+\alpha_{k}d_{k},\quad k=0,1,2,\ldots, $$

where \(\alpha_{k}\) is a steplength and \(d_{k}\) is a search direction of \(f^{\mathrm{MY}}\) at \(x_{k}\). Let \(\overline{x}\in\mathbf{B}\) be a stationary point of problem (2.2) and define the active constraint set

$$ \overline{\varGamma}=\bigl\{ i:l_{i}= \overline{x}^{i}\bigr\} ,\qquad \overline{\varPsi}=\bigl\{ i: \overline{x}^{i}=u_{i}\bigr\} , $$

thus we can define the set of the free variables by

$$\overline{\varUpsilon}=\{1,\ldots,n\} \setminus(\overline{\varGamma }\cup \overline{\varPsi}). $$

Therefore (3.1) can be rewritten as

$$\begin{aligned} \left \{ \textstyle\begin{array}{l@{\quad}l} g^{i}(\overline{x})\geq0, &\forall i\in\overline{\varGamma},\\ g^{i}(\overline{x})=0 ,&\forall i\in\overline{\varUpsilon},\\ g^{i}(\overline{x})\leq0, &\forall i\in \overline{\varPsi}. \end{array}\displaystyle \right . \end{aligned}$$

It is reasonable to define the approximation \(\varGamma(x)\), \(\varUpsilon(x)\) and \(\varPsi(x)\) to Γ̅, ϒ̅ and Ψ̅, respectively:

$$ \begin{gathered} \varGamma(x)=\bigl\{ i:x^{i}\leq l_{i} +a_{i}(x)g^{i}(x)\bigr\} , \\ \varPsi(x)=\bigl\{ i:x^{i}\geq u_{i} +b_{i}(x)g^{i}(x)\bigr\} , \\ \varUpsilon(x)=\{1,\ldots,n\} \setminus(\varGamma\cup\varPsi),\end{gathered} $$

where \(a_{i}\) and \(b_{i}\) are nonnegative continuous bounded from above on B, which have the properties, namely if \(x^{i}=l_{i}\) or \(x^{i}=u_{i}\) then \(a_{i}(x)>0\) or \(b_{i}(x)>0\), respectively. Similar to Theorem 3 in [10] for smooth optimization, we can get some results about \(\varGamma (x)\), \(\varUpsilon(x)\), \(\varPsi(x)\), Γ̅, ϒ̅, and Ψ̅ in nonsmooth problems.

Theorem 3.1

For any feasible pointx, \(\varGamma(x)\cap \varPsi(x)=\emptyset\). Suppose that strict complementarity holds andis a stationary point of problem (2.2), then there exists a neighborhood ofand every feasible pointxin the neighborhood, the following relations:

$$\varGamma(x)=\overline{\varGamma},\qquad \varUpsilon(x)=\overline{\varUpsilon },\qquad \varPsi(x)=\overline{\varPsi}, $$



For any feasible x, if \(k\in\varGamma(x)\), it is obviously that \(g^{k}(x)\geq0\) holds. Suppose that \(k\in \varPsi(x)\), then \(u_{k}\geq x^{k}\geq u_{k}+b_{k}(x)g^{k}(x)\geq u_{k}\) is true. This implies that \(l_{k}=x^{k}=u_{k}\) and \(g^{k}(x)=0\), which is a contradiction. Then \(\varGamma(x)\cap\varPsi(x)=\emptyset\) holds.

Now we prove that the second conclusion of this theorem holds. If \(i\in \overline{\varGamma}\), by the definition of Γ̅ and the strict complementarity, then \(g^{i}(\overline{x})> 0\) holds. Since \(a_{i}\) is nonnegative, \(\overline{x^{i}}\leq l_{i} +a_{i}(\overline{x})g^{i}(\overline{x})\). Since both \(a_{i}\) and \(g^{i}\) are continuous in , we deduce that \(i\in\varGamma(x)\) holds. So \(\overline{\varGamma}\subseteq\varGamma(x)\) is true.

Otherwise if \(i\in \varGamma(x)\), then by the definition of \(\varGamma(x)\), \(a_{i}(x)g^{i}(x)\geq x^{i}-l_{i}\geq0\) holds. Since \(a_{i}\) is nonnegative, \(g^{i}(x)> 0\) holds. Since \(g^{i}\) is continuous in and the strict complementarity holds, we deduce that \(i\in \overline{\varGamma}\) holds. Thus \(\varGamma(x)\subseteq \overline{\varGamma}\) holds.

Therefore, \(\varGamma(x)=\overline{\varGamma}\) holds. By the similar way, we can obtain \(\varUpsilon(x)=\overline{\varUpsilon}\) and \(\varPsi(x)=\overline {\varPsi}\). The proof is complete. □

Theorem 3.1 proves that \(\varGamma(x)\), \(\varUpsilon(x)\) and \(\varPsi(x)\) are “good” estimate of Γ̅, ϒ̅, and Ψ̅, respectively. In the next, we give the choices of the direction \(d_{k}\) and the stepsize \(\alpha_{k}\) along with the current point \(x_{k}\in\mathbf{B}\), respectively. Consider the sets \(\varGamma_{k}=\varGamma(x_{k})\), \(\varUpsilon_{k}=\varUpsilon(x_{k})\) and \(\varPsi_{k}=\varPsi(x_{k})\), the search direction \(d_{k}=(d_{k}^{\varGamma_{k}},d_{k}^{\varUpsilon_{k}},d_{k}^{\varPsi_{k}})\) is chosen as

$$\begin{aligned}& d_{k}^{i}= x_{k}^{i}-l_{i},\quad i\in\varGamma_{k}; \end{aligned}$$
$$\begin{aligned}& d_{k}^{i}=u_{i}-x_{k}^{i},\quad i\in\varPsi_{k}; \end{aligned}$$
$$\begin{aligned}& d_{k}^{i}=-\bigl(Z\overline{H}_{k} Z^{T} g_{k}\bigr), \quad i\in\varUpsilon_{k}. \end{aligned}$$

where \(d_{k}^{\varUpsilon_{k}}\) denotes the subspace direction for the inactive variables, \(\overline{H}_{k}=Z^{T} H_{k} Z\in \Re^{|\varUpsilon_{k}|\times|\varUpsilon_{k}|}\) is an approximation of the reduced inverse Hessian matrix, \(H_{k}\) is an approximation of the full space inverse Hessian matrix, and Z is the matrix with columns \(\{e^{i}\mid i\in\varUpsilon_{k}\} \) and \(e^{i}\) is the ith column of the identity matrix in \(\Re^{n\times n}\).

For smooth optimization problems, several authors use the projected search for quadratic and nonlinear programming problems with bounds (see [27]). The projected search finds a steplength \(\alpha_{k}=\beta^{k}>0\) with sufficient decrease in the function \(\phi_{k} : \Re\rightarrow\Re\) such that

$$ \phi_{k}(\alpha)\leq\phi_{k}(0)+\sigma \nabla\phi_{k}(0)\alpha, $$

where \(\beta\in(0,1)\) and \(\sigma\in(0,\frac{1}{2})\) are constants,

$$\phi_{k} (\alpha)=f^{\mathrm{MY}}\bigl([x_{k}+\alpha d_{k}]^{+}\bigr), $$

\([\cdot]^{+}\) is the projection into B defined by

$$\begin{aligned}{} [x]^{+}= \left \{ \textstyle\begin{array}{l@{\quad}l} x^{i},& \text{if }l_{i}\leq x^{i} \leq u_{i},\\ l_{i},& \text{if }x^{i}< l_{i},\\ u_{i} ,&\text{if }x^{i}>u_{i}. \end{array}\displaystyle \right . \end{aligned}$$

In this paper, we also use this technique to determine the steplength \(\alpha_{k}\). Based on the selections about \(d_{k}\) and \(\alpha_{k}\) and let \(\mathit{ms}\leq m\) be the number of the correction pairs, we give the steps of this algorithm.


Up-to-date \((\mathit{ms}, \{\overline{s}_{k}\}, \{ \overline{y}_{k}\}, H_{0}, d, Z)\)

Step 1::


Step 2::

if \(\mathit{ms}=0\), \(d=H_{0}d\), return;

Step 3::

\(\alpha=(\overline{s}_{k}^{\mathit{ms}-1})^{T}d/(\overline{y}_{k}^{\mathit{ms}-1})^{T}\overline {s}_{k}^{\mathit{ms}-1}\); \(d=d-\alpha\overline{y}_{k}^{\mathit{ms}-1}\);

Step 4::

Call Up-to-date \((\mathit{ms}-1, \{\overline{s}_{k}\}, \{ \overline{y}_{k}\}, H^{0}, d, Z)\);

Step 5::

\(d=d+(\alpha-(d^{T}\overline{y}_{k}^{\mathit{ms}-1}/(\overline {y}_{k}^{\mathit{ms}-1})^{T}\overline{s}_{k}^{\mathit{ms}-1}))\overline{s}_{k}^{n-1}\);

Step 6::


Now we state the algorithm for nonsmooth optimization problems (2.2) with bound constrained conditions.

Algorithm 1


Step 1: :

Given \(x_{0}\in\mathbf{B}\), constants \(\sigma\in (0,\frac{1}{2})\) and \(m\in(3,20)\), initial matrix θI, \(a_{i}(x)\) and \(b_{i}(x)\); set \(k=0\).

Step 2: :

Using (3.5) to decide \(\varGamma_{k}=\varGamma (x_{k})\), \(\varPsi_{k}=\varPsi(x_{k})\), and \(\varUpsilon_{k}=\varUpsilon(x_{k})\).

Step 3: :

According to (3.6)–(3.8), obtain \(d_{k}\).

Step 4: :

\(d_{k}=0\), stops.

Step 5: :

Find \(\alpha_{k}\) by (3.9).

Step 6: :

Let \(x_{k+1}=[x_{k}+\alpha_{k}d_{k}]^{+}\) and get \(f^{\mathrm{MY}}(x_{k+1})\) and \(g(x_{k+1})\).

Step 7: :

Update \(H_{k}\) by (2.5).

Step 8: :

Set \(k=k+1\) and go to Step 2.


The given algorithm can be regarded as an extension of [36] from smooth optimization to nonsmooth optimization.

Global convergence

Assumption A

The matrix \(\overline{H}_{k}\) (\(k=1,2,\ldots\)) is positive definite, namely, there exist constants \(0<\lambda_{1}\leq\lambda_{2}\) satisfying

$$\lambda_{1} \Vert y \Vert ^{2} \leq y^{T}\overline{H}_{k}y\leq\lambda_{2} \Vert y \Vert ^{2}, \quad\text{for all nonzero }y\in \Re^{ \vert \varUpsilon_{k} \vert }. $$

Assumption B

The level set \(\phi=\{x\in\Re^{n}\mid f^{\mathrm{MY}}(x)\leq f^{\mathrm{MY}}(x_{0})\}\cap K\) is compact and \(f^{\mathrm{MY}}\) is bounded from below.

Similar to the proof techniques of paper [36] on smooth box optimization, we can get the following lemma. So we only state it as follows but omit the proof.

Lemma 4.1

Assume that\(d_{k}\neq0\)be defined by (3.6)(3.8) and\(x_{k}\in\mathbf{B}\), we have

$$ \min\biggl\{ 1,\frac{ \Vert u-l \Vert _{\infty}}{ \Vert d_{k} \Vert _{\infty}}\biggr\} \geq \beta^{k}\geq \min\biggl\{ 1,\frac{\chi_{k}}{ \Vert d_{k} \Vert _{\infty}}\biggr\} $$


$$ \Vert d_{k} \Vert ^{2}\leq-\varrho g_{k}^{T}d_{k}. $$

where\(\varrho>0\)is a constant, \(\chi_{k}=\min\{ |a_{i}(x_{k})g_{k}^{i}|,|b_{i}(x_{k})g_{k}^{i}|,i\in \varUpsilon_{k},g_{k}^{i}\neq0\}\), \(\beta^{k}=\sup_{0\leq\omega\leq 1}\{\omega\mid l \leq x_{k} + \omega d_{k}\leq u\}\), and\(g_{k}^{i}\)is theith element of\(g(x_{k})\). Moreover, \(x_{k}\)is a\(KKT\)point of (2.2) if and only if\(d_{k}=0\).

If \(x_{k}\) is not a \(KKT\) point, by Lemmas 4.1, it is easy to deduce that \(d_{k}\) is descent.

Theorem 4.1

The sequence\(\{x_{k},d_{k},\overline{H}_{k}\}\)is generated by Algorithm 1. Let AssumptionAand Bhold, and\(\|Z^{T}\overline{H}_{k} Z\|\leq \zeta_{1} \)hold for allkwith constant\(\zeta_{1}>0\). Then every accumulation point of\(\{x_{k}\}\)is a\(KKT\)point of (2.2).


We will prove this theorem by contradiction. Let \(x_{*}\) be any accumulation point of \(\{x_{i}\}\), then for \(i=1,2,\dots\), there exists a subspace \(\{x_{k_{i}}\}\) satisfying \(\lim_{k\rightarrow \infty}x_{k_{i}}=x_{*}\). Suppose that \(x_{*}\) is not a \(KKT\) point, by (3.3) and (3.4), we can conclude that there exists \(j\in\overline{\varGamma}\) or \(j \in \overline{\varPsi}\) such that

$$ g(x_{*})^{j}< 0 \quad\text{or}\quad g(x_{*})^{j} >0 $$

or \(j \in\overline{\varUpsilon}\) satisfying

$$ g(x_{*})^{j}\neq0. $$

By the line search (3.9) and (4.2), we can deduce that the sequence \(\{f^{\mathrm{MY}}(x_{k})\}\) is descent. Using \(f^{\mathrm{MY}}\) is bounded from below in Assumption B, we have

$$ \infty>\sum_{k=1}^{\infty}\bigl(f^{\mathrm{MY}}(x_{k})-f^{\mathrm{MY}}(x_{k+1}) \bigr)\geq\sum_{k=1}^{\infty}-\sigma \alpha_{k}g_{k}^{T}d_{k}, $$

this implies that

$$\sum_{k=1}^{\infty}-\alpha_{k}g_{k}^{T}d_{k} < \infty. $$

In particular, we get

$$ \lim_{k\to\infty}-\alpha_{k}g_{k}^{T}d_{k}=0. $$

Using the definition of \(d_{k}\), setting \(\vartheta_{1}=\max_{x\in\mathbf {B}}\|g_{k}\|^{2}\), and

$$\nu_{k}=\sum_{i\in \varGamma_{k}}\bigl(a_{i}(x_{k})\bigr)^{2}+\sum_{i\in\varPsi_{k}}\bigl(b_{i}(x_{k})\bigr)^{2},$$

we get

$$\begin{aligned} \Vert d_{k} \Vert ^{2} =& \bigl\Vert Z \overline{H}_{k}Z^{T}g_{k} \bigr\Vert ^{2}+\sum_{i\in \varGamma_{k}}\bigl(l_{i}-x_{k}^{i} \bigr)^{2} +\sum_{i\in\varPsi_{k}} \bigl(u_{i}-x_{k}^{i}\bigr)^{2} \\ \leq& \zeta_{1}^{2} \Vert g_{k} \Vert ^{2} + \sum_{i\in\varGamma_{k}}\bigl(a_{i}(x_{k}) g_{k}^{i}\bigr)^{2} +\sum _{i\in\varPsi_{k}}\bigl(b_{i}(x_{k})g_{k}^{i} \bigr)^{2} \\ =&\bigl(\zeta_{1}^{2}+\nu_{k}\bigr) \Vert g_{k} \Vert ^{2} \\ \leq& \bigl(\zeta_{1}^{2}+\nu_{k}\bigr) \theta_{1}. \end{aligned}$$

The above relation and (4.1) implies that there exists a constant \(\beta^{*}\in(0,1)\) such that

$$ \beta^{k}\geq\beta^{*},\quad \forall k. $$

Suppose that \(\alpha_{k}\) satisfies the line search (3.9), if \(\alpha_{k}< 0.1\beta^{*}\), then there exists an unacceptable steplength \(\alpha_{k,i}\leq 10\alpha_{k}\) with \(i\geq0\) satisfying

$$f^{\mathrm{MY}}(x_{k})+\alpha_{k,i}g_{k}^{T}d_{k}+ \zeta_{2}\alpha_{k,i}^{2} \Vert d_{k} \Vert ^{2}\geq f^{\mathrm{MY}}(x_{k}+ \alpha_{k}d_{k})\geq f^{\mathrm{MY}}(x_{k})+ \sigma\alpha_{k,i}g_{k}^{T}d_{k}, $$

where \(\zeta_{2}=\frac{1}{2}\max_{x\in\mathbf{B}}\|\nabla^{2}f^{\mathrm{MY}}(x)\| \). Then we obtain

$$\alpha_{k,i}\geq\frac{-(1-\sigma)g_{k}^{T}d_{k}}{\zeta_{2} \Vert d_{k} \Vert ^{2}}\geq \frac{(1-\sigma)}{\zeta_{2}\varrho}, $$

where the second inequality follows from (4.2). Therefore, we get

$$\alpha_{k}\geq\min\bigl\{ 0.1\alpha_{k,i},0.1\beta^{*}\bigr\} = \min\biggl\{ \frac{(1-\sigma)}{10\zeta_{2}\zeta},0.1\beta^{*}\biggr\} >0,\quad \forall k. $$

This together with (4.6) implies that

$$ \lim_{k\to\infty}g_{k}^{T}d_{k}=0. $$

By the definition of \(\varGamma_{k}\) and \(\varPsi_{k}\), we can conclude that every term of the right side in the following relation:

$$-d_{k}^{T}g_{k}=g_{k}^{T}Z \overline{H}_{k}Z^{T}g_{k}-\sum _{i\in \varGamma_{k}}\bigl(l_{i}-x_{k}^{i} \bigr)g_{k}^{i}-\sum_{i\in\varPsi_{k}} \bigl(u_{i}-x_{k}^{i}\bigr)g_{k}^{i} $$

is larger than zero. Thus, by (4.8), we get

$$\lim_{k\to\infty} \bigl\Vert Z^{T}g^{k} \bigr\Vert =0, \qquad\lim_{k\to\infty}\sum_{i\in\varGamma _{k}} \bigl(l_{i}-x_{k}^{i}\bigr)g_{k}^{i}=0,\qquad \lim_{k\to\infty}\sum_{i\in\varPsi ^{k}} \bigl(u_{i}-x_{k}^{i}\bigr)g_{k}^{i}=0. $$

Therefore, for some \(j \in\overline{\varUpsilon}\) such that (4.4). By the above three relations, for all sufficiently large i, we get \(j\notin\varGamma(x_{k_{i}})\cup\varPsi(x_{k_{i}})\cup\varUpsilon (x_{k_{i}})\), this is a contradiction. This completes the proof. □


If the condition \(g(\overline{x})=0\) holds, by (2.3) and the convexity of \(f^{\mathrm{MY}}(x)\), it is not difficult to get \(\overline{x}=p(\overline{x})\). Accordingly the point is the unique optimal solution of (1.1).

Numerical results

In the experiments, all codes were written in MATLAB r2010a and run on a PC with CPU Intel(R) Core(TM) i3-3217U CPU 1.80 GHz, 4.00 G bytes of RAM memory, and Windows 7 operating system. Our experiments are performed on a set of the nonlinear box-constrained nonsmooth problems from Karmitsa [17] which have the given initial points; these problems are listed in Table 1. In the experiment, we choose \(\sigma=\beta=0.1\), \(a_{i}(x)=b_{i}(x)=10^{-5}\) in (3.5), \(\theta=1\) and the “basic matrix” to be the identity matrix I in the limited memory BFGS method, and \(m=5\). For the subproblem (2.1), we use the PRP conjugate gradient algorithm to solve it where the iteration number and the function number are added to the main-Algorithm program. The PRP conjugate gradient algorithm for (2.1) is listed as follows:

Table 1 Tested problems

PRP algorithm for subproblem (2.1)

Step 1::

Given \(z_{0}\), constants \(\lambda>0\), \(\tau_{0}>0\), and \(\iota\in (0,1)\), \(d_{0}^{\mathrm{sub}}=-g_{0}^{\mathrm{sub}}=-\frac{z_{0}-x_{0}}{\lambda}\), set \(k=0\).

Step 2::

If \(\|g_{k}^{\mathrm{sub}}\|<\iota\), stop.

Step 3::

Find \(\alpha_{k}^{\mathrm{sub}}\) satisfying the following Armijo line search:

$$f^{\mathrm{MY}}\bigl(z_{k}+\alpha_{k}^{\mathrm{sub}}d_{k}^{\mathrm{sub}} \bigr)-f^{\mathrm{MY}}(z_{0})\leq\alpha_{k}^{\mathrm{sub}} \tau _{0}\bigl(d_{k}^{\mathrm{sub}}\bigr)^{T}g_{k}^{\mathrm{sub}}, $$

where \(\alpha_{k}^{\mathrm{sub}}=\max\{1,\frac{1}{2},\frac{1}{2^{2}},\frac {1}{2^{3}},\ldots\}\) such that the above inequality.

Step 4::

Let \(z_{k+1}=z_{k}+\alpha_{k}^{\mathrm{sub}}d_{k}^{\mathrm{sub}}\) and \(g_{k+1}^{\mathrm{sub}}=\frac{z_{k+1}-x_{0}}{\lambda}\).

Step 5::

If \(\|g_{k+1}^{\mathrm{sub}}\|<\iota\), stops, otherwise compute the search direction by

$$d_{k+1}^{\mathrm{sub}}=-g_{k+1}^{\mathrm{sub}}+ \frac{(g_{k+1}^{\mathrm{sub}}-g_{k}^{\mathrm{sub}})^{T} g_{k+1}^{\mathrm{sub}}}{ \Vert g_{k}^{\mathrm{sub}} \Vert }. $$
Step 6::

Set \(k=k+1\) and go to Step 3.

In the above algorithm, \(x_{0}\) follows Algorithm 1, \(\iota=1\text{e}{-}3\), and \(\tau_{0}=0.25\). Since we aim to design the given algorithm to solve large-scale nonsmooth problems, the dimensions of the test problems are 5000, 7000, 10,000, and 11,000. The following Himmeblau stop rule is used: If \(| f^{\mathrm{MY}}(x_{k})|> 1\text{e}{-}4\), let \(\mathit{stop}1=\frac{| f^{\mathrm{MY}}(x_{k})-f^{\mathrm{MY}}(x_{k+1})|}{| f^{\mathrm{MY}}(x_{k})|}\); otherwise, let \(\mathit{stop}1=| f^{\mathrm{MY}}(x_{k})-f^{\mathrm{MY}}(x_{k+1})|\). The program stops if \(\mathit{stop}1< 1\text{e}{-}4\) is satisfied. In the experiment, we find that the different stop rules will influence the iteration number and the function number obviously but for the final function value. Since the results of iteration number are stable, we choose the Himmeblau stop rule. In order to show the efficiency of the given method, we also test the normal Active-set algorithm with L-BFGS update and compare their performance. The columns of Table 2 has the following meaning:


the dimension of the problem;


the total number of iterations;


the total number of the function value;


the CPU time in second;


denotes the function value at the point when the program is stopped.

Table 2 Numerical results

From the numerical results of Table 2, it is not difficult to see that both of these two methods are effective for solving these ten box-constrained nonsmooth problems. The iteration number and the function number do not change with the dimension increasing except for problems Generalization of MAXQ and Chained Crescent I, which shows that the given algorithm is feasible and stable. For problems Chained Crescent I and Chained Crescent II, since they have many similar properties, they have the same optimization value, respectively, with the determined dimensions. The cpu-time is interesting although it is becoming large with the dimension increasing.

To directly show the performance of these two algorithms, the tool of Dolan and Moré [9] is used to analyze them. Figure 1 shows that the performance of Algorithm 1 and Active-set algorithm is relative to cpu-time of Table 2. It is not difficult to see that Algorithm 1 has won, since it has the higher probability of being the optimal solver. Figure 1 shows that Algorithm 1 can successfully solve 100% of the test problems at \(t\approx2\) and Active-set algorithm completely solves the test problems at about \(t\approx41\). All in all, although the proposed method does not obtain significant development as we have expected, we think that the enhancement of this proposed method is still noticeable.

Figure 1

Performance of cpu-Time of these two algorithms.


(i) It is well known that the nonsmooth problems are very difficult to solve even when the objective function is unconstrained, especially for large-scale nonsmooth problems. The Moreau–Yosida regularization technique is an effective tool to deal with this problem. Then we use this technique and propose a subspace algorithm for solving box-constrained nonsmooth optimization problems.

(ii) In order to decrease the workload of the computer and get good numerical performance, the limited memory BFGS method is utilizable in the given algorithm. The numerical performance of the test problems show that the presented algorithm is very interesting for large-scale problems. The dimension is from 5000 to 11,000 variables, which are larger than those of the unconstrained nonsmooth problems of [18].

(iii) In the experiment, we find the different stop rules will obviously influence the iteration numbers and the function numbers but for the final functions. The reason lies in the stop rule, then further work in the future is to find more correct stop rules of the given algorithms.

(iv) Inspired by the idea of [26] and [10], we extend their techniques to nonsmooth problems. The proof methods of this paper are similar to [36]. However, all of these three papers concentrate on the continuous differentiable optimization problems.

(v) Considering the above discussions, we think there are at least three issues that could lead to improvements. The first point is the constant m in the L-BFGS update formula that could be adjusted. Another important point that should be considered is probably the choice of the parameters in the active-set identification technique, since the value of the used parameters is not the only choice. The last one is the most important one, which is from the numerical performance, namely whether are there better stop rules, other optimality conditions and convergence conditions in the nonsmooth problems? All of these aspects are for further work in the future.

Overall, we think that the method provide a valid approach for solving large-scale box-constrained nonsmooth problems, since the numerical performance is interesting.


  1. 1.

    Ben-tal, A., Nemirovski, A.: Non-Euclidean restricted memory level method for large-scale convex optimization. Math. Program. 3, 407–456 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  2. 2.

    Birge, J.R., Qi, L., Wei, Z.: A general approach to convergence properties of some methods for nonsmooth convex optimization. Appl. Math. Optim. 38, 141–158 (1998)

    MathSciNet  MATH  Article  Google Scholar 

  3. 3.

    Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: A family of veriable metric proximal methods. Math. Program. 68, 15–47 (1995)

    MATH  Google Scholar 

  4. 4.

    Byrd, R.H., Lu, P.H., Nocedal, J.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Stat. Comput. 16, 1190–1208 (1995)

    MathSciNet  MATH  Article  Google Scholar 

  5. 5.

    Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-Newton matrices and their use in limited memory methods. Math. Program. 63, 129–156 (1994)

    MathSciNet  MATH  Article  Google Scholar 

  6. 6.

    Calamai, P., Moré, J.J.: Projected gradient for linearly constrained programms. Math. Program. 39, 93–116 (1987)

    MATH  Article  Google Scholar 

  7. 7.

    Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley-Interscience, New York (1983)

    Google Scholar 

  8. 8.

    Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minization. Math. Program. 62, 261–273 (1993)

    MATH  Article  Google Scholar 

  9. 9.

    Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)

    MathSciNet  MATH  Article  Google Scholar 

  10. 10.

    Facchinei, F., Júdice, J., Soares, J.: An active set Newton algorithm for large-scale nonlinear programs with box constraints. SIAM J. Optim. 8, 158–186 (1998)

    MathSciNet  MATH  Article  Google Scholar 

  11. 11.

    Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, Chichester (1987)

    Google Scholar 

  12. 12.

    Fukushima, M., Qi, L.: A global and superlinearly convergent algorithm for nonsmooth convex minimization. SIAM J. Optim. 6, 1106–1120 (1996)

    MathSciNet  MATH  Article  Google Scholar 

  13. 13.

    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1998)

    Google Scholar 

  14. 14.

    Haarala, M., Mäkelä, M.M.: Limited memory bundle algorithm for large bound constrained nonsmooth minimization problems. Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, No. B. 1/2006, University of Jyväskylä, Finland (2006)

  15. 15.

    Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19, 673–692 (2004)

    MathSciNet  MATH  Article  Google Scholar 

  16. 16.

    Hiriart-Urruty, J.B., Lemmaréchal, C.: Convex Analysis and Minimization Algorithms II. Spring, Berlin (1983)

    Google Scholar 

  17. 17.

    Karmitsa, N.: Test problems for large-scale nonsmooth minimization. Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, No. B. 4/2007, University of Jyväskylä, Finland (2007)

  18. 18.

    Karmitsaa, N., Bagirovb, A., Mäkelä, M.M.: Comparing different nonsmooth minimization methods and software. Optim. Methods Softw. 27, 131–153 (2012)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics, vol. 1133. Springer, Berlin (1985)

    Google Scholar 

  20. 20.

    Kiwiel, K.C.: Proximity control in bundle methods for convex nondifferentiable optimization. Math. Program. 46, 105–122 (1990)

    MATH  Article  Google Scholar 

  21. 21.

    Kiwiel, K.C.: Proximal level bundle methods for convex nondifferentiable optimization, saddle-point problems and variational inequalities. Math. Program. 69, 89–109 (1995)

    MathSciNet  MATH  Google Scholar 

  22. 22.

    Lemaréchal, C.: Extensions diverses des médthodes de gradient et applications, Thèse d’Etat. Paris (1980)

  23. 23.

    Lemaréchal, C.: Nondifferentiable optimization. In: Nemhauser, G.L., Rinnooy Kan, A.H.G., Todd, M.J. (eds.) Optimization, pp. 529–572. Elsevier, New York (1989)

    Google Scholar 

  24. 24.

    Majava, K., Haarala, N., Kärkkäinen, T.: Solving variational image denoising problems using limited memory bundle method. In: Proceedings of the 2nd International Conference on Scientific Computing and Partial Differential Equations and the First East Asia SIAM Symposium, Hongkong (2005)

    Google Scholar 

  25. 25.

    Nguyen, V.H., Strodiot, J.J.: A linearly constrained algorithm not requiring derivative continuity. Eng. Struct. 6, 7–11 (1984)

    Article  Google Scholar 

  26. 26.

    Ni, Q.: A subspace projected conjugate algorithm for large bound constrained quadratic programming. Numer. Math. J. Chin. Univ. 7, 51–60 (1998)

    MathSciNet  MATH  Google Scholar 

  27. 27.

    Ni, Q., Yuan, Y.X.: A subspace limited memory quasi-Newton algorithm for large-scale nonlinear bound consrtained optimization. Math. Comput. 66, 1509–1520 (1997)

    MATH  Article  Google Scholar 

  28. 28.

    Powell, M.J.D.: A fast algorithm for nonlinearly constrained optimization calculations. In: Numerical Analysis, pp. 155–157 (1978)

    Google Scholar 

  29. 29.

    Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18, 227–245 (1993)

    MathSciNet  MATH  Article  Google Scholar 

  30. 30.

    Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58, 353–367 (1993)

    MathSciNet  MATH  Article  Google Scholar 

  31. 31.

    Schramm, H.: Eine kombination von bundle- und trust-region-verfahren zur Lösung nichtdifferenzierbare optimierungsprobleme. Bayreuther Mathematische Schriften, vol. 30 (1989)

    Google Scholar 

  32. 32.

    Schramm, H., Zowe, J.: A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J. Optim. 2, 121–152 (1992)

    MathSciNet  MATH  Article  Google Scholar 

  33. 33.

    Schultz, H.K.: A Kuhn–Tucker algorithm. SIAM J. Control Optim. 11, 438–445 (1973)

    MathSciNet  MATH  Article  Google Scholar 

  34. 34.

    Sreedharan, V.P.: A subgradient projection algorithm. J. Approx. Theory 35, 111–126 (1982)

    MathSciNet  MATH  Article  Google Scholar 

  35. 35.

    Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable convex functions. Math. Program. Stud. 3, 145–173 (1975)

    MATH  Article  Google Scholar 

  36. 36.

    Xiao, Y., Wei, Z.: A new subspace limited memory BFGS algorithm for large-scale bound constrained optimization. Appl. Math. Comput. 185, 350–359 (2007)

    MathSciNet  MATH  Google Scholar 

  37. 37.

    Yuan, G., Li, P., Lu, J.: The global convergence of the BFGS method with a modified WWP line search for nonconvex functions. Numer. Algorithms (2019, accepted)

  38. 38.

    Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm for large-scale nonlinear equations and image restoration problems. Appl. Numer. Math. 147, 129–141 (2020)

    MathSciNet  MATH  Article  Google Scholar 

  39. 39.

    Yuan, G., Meng, Z., Li, Y.: A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 168, 129–152 (2016)

    MathSciNet  MATH  Article  Google Scholar 

  40. 40.

    Yuan, G., Sheng, Z.: Optimization Algorithm of Nonsmooth. Science Press, Beijing (2017)

    Google Scholar 

  41. 41.

    Yuan, G., Sheng, Z., Wang, P., Hu, W., Li, C.: The global convergence of a modified BFGS method for nonconvex functions. J. Comput. Appl. Math. 327, 274–294 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Yuan, G., Wei, Z.: The Barzilai and Borwein gradient method with nonmonotone line search for nonsmooth convex optimization problems. Math. Model. Anal. 17, 203–216 (2012)

    MathSciNet  MATH  Article  Google Scholar 

  43. 43.

    Yuan, G., Wei, Z., Li, G.: A modified Polak–Ribière–Polyak conjugate gradient algorithm with nonmonotone line search for nonsmooth convex minimization. J. Comput. Appl. Math. 255, 86–96 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  44. 44.

    Yuan, G., Wei, Z., Lu, X.: Global convergence of the BFGS method and the PRP method for general functions under a modified weak Wolfe–Powell line search. Appl. Math. Model. 47, 811–825 (2017)

    MathSciNet  MATH  Article  Google Scholar 

  45. 45.

    Yuan, G., Wei, Z., Wang, Z.: Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minimization. Comput. Optim. Appl. 54, 45–64 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  46. 46.

    Yuan, G., Wei, Z., Yang, Y.: The global convergence of the Polak–Ribière–Polyak conjugate gradient algorithm under inexact line search for nonconvex functions. J. Comput. Appl. Math. 362, 262–275 (2019)

    MathSciNet  MATH  Article  Google Scholar 

Download references


The author would like to thank the editor for his/(or her) work and the referees for their valuable comments, which greatly improved our paper..

Availability of data and materials

All data are included in this paper and they be freely used.


This work is supported by the High Level Innovation Teams and Excellent Scholars Program in Guangxi institutions of higher education (Grant No. [2019]52), the National Natural Science Foundation of China (Grant No. 11661009), the Guangxi Natural Science Key Fund (No. 2017GXNSFDA198046), and the National Social Science Key Foundation of China (Grant No. 17AJL012).

Author information




The main idea of this paper was proposed by XL. XL prepared the manuscript initially and performed all the steps of the proofs in this research. The author read and approved the final manuscript.

Corresponding author

Correspondence to Xiangrong Li.

Ethics declarations

Competing interests

There is no potential conflicts of interest.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, X. A limited memory BFGS subspace algorithm for bound constrained nonsmooth problems. J Inequal Appl 2020, 135 (2020).

Download citation


  • 65H10
  • 90C26


  • Nonsmooth
  • Large scale
  • Limited BFGS method
  • Global convergence