- Research
- Open access
- Published:
A limited memory BFGS subspace algorithm for bound constrained nonsmooth problems
Journal of Inequalities and Applications volume 2020, Article number: 135 (2020)
Abstract
The subspace technique has been widely used to solve unconstrained/constrained optimization problems and there exist many results that have been obtained. In this paper, a subspace algorithm combining with limited memory BFGS update is proposed for large-scale nonsmooth optimization problems with box-constrained conditions. This algorithm can ensure that all iteration points are feasible and the sequence of objective functions is decreasing. Moreover, rapid changes in the active set are allowed. The global convergence is established under some suitable conditions. Numerical results show that this method is very effective for large-scale nonsmooth box-constrained optimization, where the largest dimension of the test problems is 11,000 variables.
1 Introduction
Consider the following large-scale nonsmooth optimization problems:
where \(f(x):\Re^{n}\rightarrow\Re\) is supposed to be locally Lipschitz continuous and the number of variables n is supposed to be large. The vectors l and u represent the lower and the upper bounds on the variables, respectively.
Many practical optimization problems involve nonsmooth functions with large amounts of variables (see, e.g., [1, 24]). The active-set method can be easily generalized when the objective function is nonsmooth. For example, Sreedharan [34] extends the method developed in [33] to solve nonsmooth problem with a special objective function and inequality constraint. Also, it is quite easy to generalize the ε-active-set method to the nondifferentiable case [25]. Yuan et al. [42, 45] use the two-point gradient method and the trust region method to solve nonsmooth problems. Wolfe [35] and Lemaréchal [22] initiated a giant stride forward in nonsmooth optimization by the bundle concept. Kiwiel [20] proposed a bundle variant, which is close to bundle trust iteration method [32]. Some good results about the bundle technique can be found in [19, 21, 31] etc. The basic assumption of the bundle methods is that at every point \(x\in\Re^{n}\), the value of the objective function \(f(x)\) and an arbitrary subgradient \(\xi \in\Re^{n}\) from the subdifferential can be evaluated [7] with
where “conv” denotes the convex hull of a set. The idea behind bundle methods is to approximate \(\partial f(x)\) by gathering subgradient information from previous iterations into a bundle. At the moment, various versions of bundle methods are regarded as the most effective and reliable methods for nonsmooth optimization. Bundle methods are efficient for small- and medium-scale problems. This is explained by the fact that bundle methods need relatively large bundles to be capable of solving the problems efficiently [19]. At present, Haarala et al. (see [14, 15] etc.) introduce the limited memory bundle methods for large-scale nonsmooth unconstrained and constrained minimization, which are a hybrid of the variable metric bundle methods and the limited memory variable metric methods and some good results are obtained. The dimension of the test problems can be arrived one thousand variables. For unconstrained nonsmooth problems, Karmitsa et al. [18] tested and compared different methods from both groups and some methods which may be considered as hybrids of these two and/or some others, where the largest dimension of test nonsmooth problems is 4000. Yuan et al. [38–40, 43, 46] present the conjugate gradient algorithms for solving unconstrained nonsmooth problems and the nonsmooth problems with 60,000 variables are successfully solved.
Normally the nonsmooth optimization problems are difficult to solve, even when they are unconstrained. Derivative free methods, like Powell’s method [11] or genetic algorithms [13] may be unreliable and become inefficient whenever the dimension of the problem increases. The direct application of smooth gradient-based methods to nonsmooth problems may lead to a failure in optimality conditions, in convergence, or in gradient approximation [23]. Therefore, special tools for solving nonsmooth optimization problems are needed, especially for constrained nonsmooth problems. In this paper, we present a new algorithm that combines an active-set strategy with the gradient projection method. The active sets are based on guessing technique to be identified at each iteration, the search direction in free subspace is determined by limited memory BFGS algorithm, which will provide an efficient means for attacking large-scale nonsmooth bound constrained optimization problems. This paper has the following main attributes:
- ♣:
the nonsmooth objective function is descent;
- ♣:
a limited memory BFGS method is given for nonsmooth problem; the iteration sequence \(\{x_{k}\}\) is feasible;
- ♣:
the global convergence of the new method is established;
- ♣:
large-scale nonsmooth problems (11,000 variables) are successfully solved.
This paper is organized as follows. In the next section, we briefly review some nonsmooth analysis, the L-BFGS method for unconstrained optimization, and the motivation based on these techniques. In Sect. 3, we describe the active-set algorithm for (1.1). The global convergence will be established in Sect. 4. Numerical results are reported in Sect. 5. Throughout this paper, \(\|\cdot\|\) denotes the Euclidean norm of vectors or matrix.
2 Motivation based on nonsmooth analysis and the L-BFGS update
This section will state some results on nonsmooth analysis and the L-BFGS formula, respectively.
2.1 Results of convex analysis and nonsmooth analysis
Let \(f^{\mathrm{MY}}:\Re^{n}\rightarrow\Re\) be the so-called Moreau–Yosida regularization of f and be defined by
where λ is a positive parameter. Then it is not difficult to see that the problem (1.1) is equivalent to the following problem:
The function \(f^{\mathrm{MY}}\) has some good properties: it is a differentiable convex function and has a Lipschitz continuous gradient even when the function f is nondifferentiable. The gradient function of \(f^{\mathrm{MY}}\) can be proved to be semismooth under some reasonable conditions [12, 30]. Based on these features, many algorithms have been proposed for (2.2) (see [2] etc.) when \(\mathbf{B}=\Re^{n}\). Set
and denote \(p(x)=\operatorname{argmin} \theta(z)\). Then \(p(x)\) is well-defined and unique since \(\theta(z)\) is strongly convex. By (2.1), \(f^{\mathrm{MY}}(x)\) can be expressed by
In what follows, we denote the gradient of \(f^{\mathrm{MY}}\) by g. Some features about \(f^{\mathrm{MY}}(x)\) can be seen in [3, 8, 16]. The generalized Jacobian of \(f^{\mathrm{MY}}(x)\) and the property of BD-regular can be found in [6, 29], respectively. Some properties are given as follows.
(i) The function \(f^{\mathrm{MY}}\) is finite-valued, convex, and everywhere differentiable with
Moreover, the gradient mapping \(g:\Re^{n}\rightarrow\Re^{n}\) is globally Lipschitz continuous with modulus λ, i.e.,
(ii) If g is BD-regular at x, which means all matrices \(V\in \partial_{B}g(x)\) are nonsingular, then there exist constants \(\mu_{1}>0\), \(\mu_{2}>0\) and a neighborhood Ω of x such that
It is obviously that \(f^{\mathrm{MY}}(x)\) and \(g(x)\) can be obtained through the optimal solution of \(\operatorname{argmin}_{z\in\Re^{n}}\theta(z)\). However, \(p(x)\), the minimizer of \(\theta(z)\), is difficult or even impossible to solve exactly. Such makes that we cannot apply the exact value of \(p(x)\) to define \(f^{\mathrm{MY}}(x)\) and \(g(x)\), thus the numerical methods are often used to solve it. In the following, we always suppose that the results (i)–(ii) holds without special notes.
2.2 L-BFGS update
At every iteration \(x_{k}\), the L-BFGS method stores a small number (say m) of correction pairs \(\{s_{i},y_{i}\}\) (\(i=k-1,\ldots,k-m\)) to get \(H_{k+1}\), instead of storing the matrices \(H_{k}\) with
where \(h(x):\Re^{n}\rightarrow\Re\) is a continuously differentiable function. In fact, this method is an adaptation of the BFGS method to large-scale problems (see [4, 5, 37, 41, 44] for details). The L-BFGS update formula is defined by
where \(\rho_{k}=\frac{1}{y_{k}^{T}s_{k}}\) and \(V_{k}=I-\rho_{k} y_{k} s_{k}^{T}\). These correction pairs contain information about the curvature of the function and, in conjunction with the BFGS formula, define the limited memory iteration matrix. This method often provides a fast rate of linear convergence and requires minimal storage.
It is well known that the positive definiteness of update matrix \(H_{k}\) is very important to analyze the convergence of the algorithm. Byrd et al. [4] show that the limited memory BFGS matrix has this property if the curvature \(s_{k}^{T}y_{k}>0\) is satisfied. Similarly Powell [28] proposes that \(y_{k}\) should be designed by
where \(\theta_{k}=\frac{0.8s_{k}^{T}B_{k}s_{k}}{s_{k}^{T}B_{k}s_{k}-s_{k}^{T}y_{k}}\), \(B_{k}\) is an approximation of \(\nabla^{2} h(x_{k})\) and \(B_{k}=H_{k}^{-1}\).
Inspired by the Moreau–Yosida regularization and the limited memory technique, we will give a limited memory BFGS method for box-constrained optimization with nonsmooth objective function. In the given algorithm, we also combine an active-set strategy with the gradient projection method. The techniques of the following algorithm are similar to those in Facchinei and Lucidi [10], Ni and Yuan [26], and Xiao and Wei [36], where the main difference lies in solving of the nonsmooth optimization problem.
3 Algorithm
Setting the feasible region \(\mathbf{B}=\{x\in\Re^{n}:l_{i}\leq x^{i} \leq u_{i},i=1,\ldots,n\}\), where \(x^{i}\) denotes the ith element of vector x. A vector \(\overline{x}\in\mathbf{B}\) is said to be a stationary point for problem (2.2) if the following relations:
hold, where \(g^{i}\) is the ith element of vector \(g(x)\). Considering the observation of the above section, we will first solve (2.2). And we will develop its solution to problem (1.1). The iterative method is used to solve (2.2) and defined by
where \(\alpha_{k}\) is a steplength and \(d_{k}\) is a search direction of \(f^{\mathrm{MY}}\) at \(x_{k}\). Let \(\overline{x}\in\mathbf{B}\) be a stationary point of problem (2.2) and define the active constraint set
thus we can define the set of the free variables by
Therefore (3.1) can be rewritten as
It is reasonable to define the approximation \(\varGamma(x)\), \(\varUpsilon(x)\) and \(\varPsi(x)\) to Γ̅, ϒ̅ and Ψ̅, respectively:
where \(a_{i}\) and \(b_{i}\) are nonnegative continuous bounded from above on B, which have the properties, namely if \(x^{i}=l_{i}\) or \(x^{i}=u_{i}\) then \(a_{i}(x)>0\) or \(b_{i}(x)>0\), respectively. Similar to Theorem 3 in [10] for smooth optimization, we can get some results about \(\varGamma (x)\), \(\varUpsilon(x)\), \(\varPsi(x)\), Γ̅, ϒ̅, and Ψ̅ in nonsmooth problems.
Theorem 3.1
For any feasible pointx, \(\varGamma(x)\cap \varPsi(x)=\emptyset\). Suppose that strict complementarity holds andx̅is a stationary point of problem (2.2), then there exists a neighborhood ofx̅and every feasible pointxin the neighborhood, the following relations:
hold.
Proof
For any feasible x, if \(k\in\varGamma(x)\), it is obviously that \(g^{k}(x)\geq0\) holds. Suppose that \(k\in \varPsi(x)\), then \(u_{k}\geq x^{k}\geq u_{k}+b_{k}(x)g^{k}(x)\geq u_{k}\) is true. This implies that \(l_{k}=x^{k}=u_{k}\) and \(g^{k}(x)=0\), which is a contradiction. Then \(\varGamma(x)\cap\varPsi(x)=\emptyset\) holds.
Now we prove that the second conclusion of this theorem holds. If \(i\in \overline{\varGamma}\), by the definition of Γ̅ and the strict complementarity, then \(g^{i}(\overline{x})> 0\) holds. Since \(a_{i}\) is nonnegative, \(\overline{x^{i}}\leq l_{i} +a_{i}(\overline{x})g^{i}(\overline{x})\). Since both \(a_{i}\) and \(g^{i}\) are continuous in x̅, we deduce that \(i\in\varGamma(x)\) holds. So \(\overline{\varGamma}\subseteq\varGamma(x)\) is true.
Otherwise if \(i\in \varGamma(x)\), then by the definition of \(\varGamma(x)\), \(a_{i}(x)g^{i}(x)\geq x^{i}-l_{i}\geq0\) holds. Since \(a_{i}\) is nonnegative, \(g^{i}(x)> 0\) holds. Since \(g^{i}\) is continuous in x̅ and the strict complementarity holds, we deduce that \(i\in \overline{\varGamma}\) holds. Thus \(\varGamma(x)\subseteq \overline{\varGamma}\) holds.
Therefore, \(\varGamma(x)=\overline{\varGamma}\) holds. By the similar way, we can obtain \(\varUpsilon(x)=\overline{\varUpsilon}\) and \(\varPsi(x)=\overline {\varPsi}\). The proof is complete. □
Theorem 3.1 proves that \(\varGamma(x)\), \(\varUpsilon(x)\) and \(\varPsi(x)\) are “good” estimate of Γ̅, ϒ̅, and Ψ̅, respectively. In the next, we give the choices of the direction \(d_{k}\) and the stepsize \(\alpha_{k}\) along with the current point \(x_{k}\in\mathbf{B}\), respectively. Consider the sets \(\varGamma_{k}=\varGamma(x_{k})\), \(\varUpsilon_{k}=\varUpsilon(x_{k})\) and \(\varPsi_{k}=\varPsi(x_{k})\), the search direction \(d_{k}=(d_{k}^{\varGamma_{k}},d_{k}^{\varUpsilon_{k}},d_{k}^{\varPsi_{k}})\) is chosen as
where \(d_{k}^{\varUpsilon_{k}}\) denotes the subspace direction for the inactive variables, \(\overline{H}_{k}=Z^{T} H_{k} Z\in \Re^{|\varUpsilon_{k}|\times|\varUpsilon_{k}|}\) is an approximation of the reduced inverse Hessian matrix, \(H_{k}\) is an approximation of the full space inverse Hessian matrix, and Z is the matrix with columns \(\{e^{i}\mid i\in\varUpsilon_{k}\} \) and \(e^{i}\) is the ith column of the identity matrix in \(\Re^{n\times n}\).
For smooth optimization problems, several authors use the projected search for quadratic and nonlinear programming problems with bounds (see [27]). The projected search finds a steplength \(\alpha_{k}=\beta^{k}>0\) with sufficient decrease in the function \(\phi_{k} : \Re\rightarrow\Re\) such that
where \(\beta\in(0,1)\) and \(\sigma\in(0,\frac{1}{2})\) are constants,
\([\cdot]^{+}\) is the projection into B defined by
In this paper, we also use this technique to determine the steplength \(\alpha_{k}\). Based on the selections about \(d_{k}\) and \(\alpha_{k}\) and let \(\mathit{ms}\leq m\) be the number of the correction pairs, we give the steps of this algorithm.
Sub-algorithm
Up-to-date \((\mathit{ms}, \{\overline{s}_{k}\}, \{ \overline{y}_{k}\}, H_{0}, d, Z)\)
- Step 1::
\(d=Z'd\);
- Step 2::
if \(\mathit{ms}=0\), \(d=H_{0}d\), return;
- Step 3::
\(\alpha=(\overline{s}_{k}^{\mathit{ms}-1})^{T}d/(\overline{y}_{k}^{\mathit{ms}-1})^{T}\overline {s}_{k}^{\mathit{ms}-1}\); \(d=d-\alpha\overline{y}_{k}^{\mathit{ms}-1}\);
- Step 4::
Call Up-to-date \((\mathit{ms}-1, \{\overline{s}_{k}\}, \{ \overline{y}_{k}\}, H^{0}, d, Z)\);
- Step 5::
\(d=d+(\alpha-(d^{T}\overline{y}_{k}^{\mathit{ms}-1}/(\overline {y}_{k}^{\mathit{ms}-1})^{T}\overline{s}_{k}^{\mathit{ms}-1}))\overline{s}_{k}^{n-1}\);
- Step 6::
\(d=Z'd\).
Now we state the algorithm for nonsmooth optimization problems (2.2) with bound constrained conditions.
Algorithm 1
(main-Algorithm)
- Step 1: :
-
Given \(x_{0}\in\mathbf{B}\), constants \(\sigma\in (0,\frac{1}{2})\) and \(m\in(3,20)\), initial matrix θI, \(a_{i}(x)\) and \(b_{i}(x)\); set \(k=0\).
- Step 2: :
-
Using (3.5) to decide \(\varGamma_{k}=\varGamma (x_{k})\), \(\varPsi_{k}=\varPsi(x_{k})\), and \(\varUpsilon_{k}=\varUpsilon(x_{k})\).
- Step 3: :
- Step 4: :
-
\(d_{k}=0\), stops.
- Step 5: :
-
Find \(\alpha_{k}\) by (3.9).
- Step 6: :
-
Let \(x_{k+1}=[x_{k}+\alpha_{k}d_{k}]^{+}\) and get \(f^{\mathrm{MY}}(x_{k+1})\) and \(g(x_{k+1})\).
- Step 7: :
-
Update \(H_{k}\) by (2.5).
- Step 8: :
-
Set \(k=k+1\) and go to Step 2.
Remark
The given algorithm can be regarded as an extension of [36] from smooth optimization to nonsmooth optimization.
4 Global convergence
Assumption A
The matrix \(\overline{H}_{k}\) (\(k=1,2,\ldots\)) is positive definite, namely, there exist constants \(0<\lambda_{1}\leq\lambda_{2}\) satisfying
Assumption B
The level set \(\phi=\{x\in\Re^{n}\mid f^{\mathrm{MY}}(x)\leq f^{\mathrm{MY}}(x_{0})\}\cap K\) is compact and \(f^{\mathrm{MY}}\) is bounded from below.
Similar to the proof techniques of paper [36] on smooth box optimization, we can get the following lemma. So we only state it as follows but omit the proof.
Lemma 4.1
Assume that\(d_{k}\neq0\)be defined by (3.6)–(3.8) and\(x_{k}\in\mathbf{B}\), we have
and
where\(\varrho>0\)is a constant, \(\chi_{k}=\min\{ |a_{i}(x_{k})g_{k}^{i}|,|b_{i}(x_{k})g_{k}^{i}|,i\in \varUpsilon_{k},g_{k}^{i}\neq0\}\), \(\beta^{k}=\sup_{0\leq\omega\leq 1}\{\omega\mid l \leq x_{k} + \omega d_{k}\leq u\}\), and\(g_{k}^{i}\)is theith element of\(g(x_{k})\). Moreover, \(x_{k}\)is a\(KKT\)point of (2.2) if and only if\(d_{k}=0\).
If \(x_{k}\) is not a \(KKT\) point, by Lemmas 4.1, it is easy to deduce that \(d_{k}\) is descent.
Theorem 4.1
The sequence\(\{x_{k},d_{k},\overline{H}_{k}\}\)is generated by Algorithm 1. Let AssumptionAand Bhold, and\(\|Z^{T}\overline{H}_{k} Z\|\leq \zeta_{1} \)hold for allkwith constant\(\zeta_{1}>0\). Then every accumulation point of\(\{x_{k}\}\)is a\(KKT\)point of (2.2).
Proof
We will prove this theorem by contradiction. Let \(x_{*}\) be any accumulation point of \(\{x_{i}\}\), then for \(i=1,2,\dots\), there exists a subspace \(\{x_{k_{i}}\}\) satisfying \(\lim_{k\rightarrow \infty}x_{k_{i}}=x_{*}\). Suppose that \(x_{*}\) is not a \(KKT\) point, by (3.3) and (3.4), we can conclude that there exists \(j\in\overline{\varGamma}\) or \(j \in \overline{\varPsi}\) such that
or \(j \in\overline{\varUpsilon}\) satisfying
By the line search (3.9) and (4.2), we can deduce that the sequence \(\{f^{\mathrm{MY}}(x_{k})\}\) is descent. Using \(f^{\mathrm{MY}}\) is bounded from below in Assumption B, we have
this implies that
In particular, we get
Using the definition of \(d_{k}\), setting \(\vartheta_{1}=\max_{x\in\mathbf {B}}\|g_{k}\|^{2}\), and
we get
The above relation and (4.1) implies that there exists a constant \(\beta^{*}\in(0,1)\) such that
Suppose that \(\alpha_{k}\) satisfies the line search (3.9), if \(\alpha_{k}< 0.1\beta^{*}\), then there exists an unacceptable steplength \(\alpha_{k,i}\leq 10\alpha_{k}\) with \(i\geq0\) satisfying
where \(\zeta_{2}=\frac{1}{2}\max_{x\in\mathbf{B}}\|\nabla^{2}f^{\mathrm{MY}}(x)\| \). Then we obtain
where the second inequality follows from (4.2). Therefore, we get
This together with (4.6) implies that
By the definition of \(\varGamma_{k}\) and \(\varPsi_{k}\), we can conclude that every term of the right side in the following relation:
is larger than zero. Thus, by (4.8), we get
Therefore, for some \(j \in\overline{\varUpsilon}\) such that (4.4). By the above three relations, for all sufficiently large i, we get \(j\notin\varGamma(x_{k_{i}})\cup\varPsi(x_{k_{i}})\cup\varUpsilon (x_{k_{i}})\), this is a contradiction. This completes the proof. □
Remark
If the condition \(g(\overline{x})=0\) holds, by (2.3) and the convexity of \(f^{\mathrm{MY}}(x)\), it is not difficult to get \(\overline{x}=p(\overline{x})\). Accordingly the point x̅ is the unique optimal solution of (1.1).
5 Numerical results
In the experiments, all codes were written in MATLAB r2010a and run on a PC with CPU Intel(R) Core(TM) i3-3217U CPU 1.80 GHz, 4.00 G bytes of RAM memory, and Windows 7 operating system. Our experiments are performed on a set of the nonlinear box-constrained nonsmooth problems from Karmitsa [17] which have the given initial points; these problems are listed in Table 1. In the experiment, we choose \(\sigma=\beta=0.1\), \(a_{i}(x)=b_{i}(x)=10^{-5}\) in (3.5), \(\theta=1\) and the “basic matrix” to be the identity matrix I in the limited memory BFGS method, and \(m=5\). For the subproblem (2.1), we use the PRP conjugate gradient algorithm to solve it where the iteration number and the function number are added to the main-Algorithm program. The PRP conjugate gradient algorithm for (2.1) is listed as follows:
PRP algorithm for subproblem (2.1)
- Step 1::
-
Given \(z_{0}\), constants \(\lambda>0\), \(\tau_{0}>0\), and \(\iota\in (0,1)\), \(d_{0}^{\mathrm{sub}}=-g_{0}^{\mathrm{sub}}=-\frac{z_{0}-x_{0}}{\lambda}\), set \(k=0\).
- Step 2::
-
If \(\|g_{k}^{\mathrm{sub}}\|<\iota\), stop.
- Step 3::
-
Find \(\alpha_{k}^{\mathrm{sub}}\) satisfying the following Armijo line search:
$$f^{\mathrm{MY}}\bigl(z_{k}+\alpha_{k}^{\mathrm{sub}}d_{k}^{\mathrm{sub}} \bigr)-f^{\mathrm{MY}}(z_{0})\leq\alpha_{k}^{\mathrm{sub}} \tau _{0}\bigl(d_{k}^{\mathrm{sub}}\bigr)^{T}g_{k}^{\mathrm{sub}}, $$where \(\alpha_{k}^{\mathrm{sub}}=\max\{1,\frac{1}{2},\frac{1}{2^{2}},\frac {1}{2^{3}},\ldots\}\) such that the above inequality.
- Step 4::
-
Let \(z_{k+1}=z_{k}+\alpha_{k}^{\mathrm{sub}}d_{k}^{\mathrm{sub}}\) and \(g_{k+1}^{\mathrm{sub}}=\frac{z_{k+1}-x_{0}}{\lambda}\).
- Step 5::
-
If \(\|g_{k+1}^{\mathrm{sub}}\|<\iota\), stops, otherwise compute the search direction by
$$d_{k+1}^{\mathrm{sub}}=-g_{k+1}^{\mathrm{sub}}+ \frac{(g_{k+1}^{\mathrm{sub}}-g_{k}^{\mathrm{sub}})^{T} g_{k+1}^{\mathrm{sub}}}{ \Vert g_{k}^{\mathrm{sub}} \Vert }. $$ - Step 6::
-
Set \(k=k+1\) and go to Step 3.
In the above algorithm, \(x_{0}\) follows Algorithm 1, \(\iota=1\text{e}{-}3\), and \(\tau_{0}=0.25\). Since we aim to design the given algorithm to solve large-scale nonsmooth problems, the dimensions of the test problems are 5000, 7000, 10,000, and 11,000. The following Himmeblau stop rule is used: If \(| f^{\mathrm{MY}}(x_{k})|> 1\text{e}{-}4\), let \(\mathit{stop}1=\frac{| f^{\mathrm{MY}}(x_{k})-f^{\mathrm{MY}}(x_{k+1})|}{| f^{\mathrm{MY}}(x_{k})|}\); otherwise, let \(\mathit{stop}1=| f^{\mathrm{MY}}(x_{k})-f^{\mathrm{MY}}(x_{k+1})|\). The program stops if \(\mathit{stop}1< 1\text{e}{-}4\) is satisfied. In the experiment, we find that the different stop rules will influence the iteration number and the function number obviously but for the final function value. Since the results of iteration number are stable, we choose the Himmeblau stop rule. In order to show the efficiency of the given method, we also test the normal Active-set algorithm with L-BFGS update and compare their performance. The columns of Table 2 has the following meaning:
- dim::
the dimension of the problem;
- NI::
the total number of iterations;
- NF::
the total number of the function value;
- cpu-time::
the CPU time in second;
- \(f(\overline{x})\)::
denotes the function value at the point x̅ when the program is stopped.
From the numerical results of Table 2, it is not difficult to see that both of these two methods are effective for solving these ten box-constrained nonsmooth problems. The iteration number and the function number do not change with the dimension increasing except for problems Generalization of MAXQ and Chained Crescent I, which shows that the given algorithm is feasible and stable. For problems Chained Crescent I and Chained Crescent II, since they have many similar properties, they have the same optimization value, respectively, with the determined dimensions. The cpu-time is interesting although it is becoming large with the dimension increasing.
To directly show the performance of these two algorithms, the tool of Dolan and Moré [9] is used to analyze them. Figure 1 shows that the performance of Algorithm 1 and Active-set algorithm is relative to cpu-time of Table 2. It is not difficult to see that Algorithm 1 has won, since it has the higher probability of being the optimal solver. Figure 1 shows that Algorithm 1 can successfully solve 100% of the test problems at \(t\approx2\) and Active-set algorithm completely solves the test problems at about \(t\approx41\). All in all, although the proposed method does not obtain significant development as we have expected, we think that the enhancement of this proposed method is still noticeable.
6 Conclusions
(i) It is well known that the nonsmooth problems are very difficult to solve even when the objective function is unconstrained, especially for large-scale nonsmooth problems. The Moreau–Yosida regularization technique is an effective tool to deal with this problem. Then we use this technique and propose a subspace algorithm for solving box-constrained nonsmooth optimization problems.
(ii) In order to decrease the workload of the computer and get good numerical performance, the limited memory BFGS method is utilizable in the given algorithm. The numerical performance of the test problems show that the presented algorithm is very interesting for large-scale problems. The dimension is from 5000 to 11,000 variables, which are larger than those of the unconstrained nonsmooth problems of [18].
(iii) In the experiment, we find the different stop rules will obviously influence the iteration numbers and the function numbers but for the final functions. The reason lies in the stop rule, then further work in the future is to find more correct stop rules of the given algorithms.
(iv) Inspired by the idea of [26] and [10], we extend their techniques to nonsmooth problems. The proof methods of this paper are similar to [36]. However, all of these three papers concentrate on the continuous differentiable optimization problems.
(v) Considering the above discussions, we think there are at least three issues that could lead to improvements. The first point is the constant m in the L-BFGS update formula that could be adjusted. Another important point that should be considered is probably the choice of the parameters in the active-set identification technique, since the value of the used parameters is not the only choice. The last one is the most important one, which is from the numerical performance, namely whether are there better stop rules, other optimality conditions and convergence conditions in the nonsmooth problems? All of these aspects are for further work in the future.
Overall, we think that the method provide a valid approach for solving large-scale box-constrained nonsmooth problems, since the numerical performance is interesting.
References
Ben-tal, A., Nemirovski, A.: Non-Euclidean restricted memory level method for large-scale convex optimization. Math. Program. 3, 407–456 (2005)
Birge, J.R., Qi, L., Wei, Z.: A general approach to convergence properties of some methods for nonsmooth convex optimization. Appl. Math. Optim. 38, 141–158 (1998)
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: A family of veriable metric proximal methods. Math. Program. 68, 15–47 (1995)
Byrd, R.H., Lu, P.H., Nocedal, J.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Stat. Comput. 16, 1190–1208 (1995)
Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-Newton matrices and their use in limited memory methods. Math. Program. 63, 129–156 (1994)
Calamai, P., Moré, J.J.: Projected gradient for linearly constrained programms. Math. Program. 39, 93–116 (1987)
Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley-Interscience, New York (1983)
Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minization. Math. Program. 62, 261–273 (1993)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Facchinei, F., Júdice, J., Soares, J.: An active set Newton algorithm for large-scale nonlinear programs with box constraints. SIAM J. Optim. 8, 158–186 (1998)
Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, Chichester (1987)
Fukushima, M., Qi, L.: A global and superlinearly convergent algorithm for nonsmooth convex minimization. SIAM J. Optim. 6, 1106–1120 (1996)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1998)
Haarala, M., Mäkelä, M.M.: Limited memory bundle algorithm for large bound constrained nonsmooth minimization problems. Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, No. B. 1/2006, University of Jyväskylä, Finland (2006)
Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19, 673–692 (2004)
Hiriart-Urruty, J.B., Lemmaréchal, C.: Convex Analysis and Minimization Algorithms II. Spring, Berlin (1983)
Karmitsa, N.: Test problems for large-scale nonsmooth minimization. Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, No. B. 4/2007, University of Jyväskylä, Finland (2007)
Karmitsaa, N., Bagirovb, A., Mäkelä, M.M.: Comparing different nonsmooth minimization methods and software. Optim. Methods Softw. 27, 131–153 (2012)
Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics, vol. 1133. Springer, Berlin (1985)
Kiwiel, K.C.: Proximity control in bundle methods for convex nondifferentiable optimization. Math. Program. 46, 105–122 (1990)
Kiwiel, K.C.: Proximal level bundle methods for convex nondifferentiable optimization, saddle-point problems and variational inequalities. Math. Program. 69, 89–109 (1995)
Lemaréchal, C.: Extensions diverses des médthodes de gradient et applications, Thèse d’Etat. Paris (1980)
Lemaréchal, C.: Nondifferentiable optimization. In: Nemhauser, G.L., Rinnooy Kan, A.H.G., Todd, M.J. (eds.) Optimization, pp. 529–572. Elsevier, New York (1989)
Majava, K., Haarala, N., Kärkkäinen, T.: Solving variational image denoising problems using limited memory bundle method. In: Proceedings of the 2nd International Conference on Scientific Computing and Partial Differential Equations and the First East Asia SIAM Symposium, Hongkong (2005)
Nguyen, V.H., Strodiot, J.J.: A linearly constrained algorithm not requiring derivative continuity. Eng. Struct. 6, 7–11 (1984)
Ni, Q.: A subspace projected conjugate algorithm for large bound constrained quadratic programming. Numer. Math. J. Chin. Univ. 7, 51–60 (1998)
Ni, Q., Yuan, Y.X.: A subspace limited memory quasi-Newton algorithm for large-scale nonlinear bound consrtained optimization. Math. Comput. 66, 1509–1520 (1997)
Powell, M.J.D.: A fast algorithm for nonlinearly constrained optimization calculations. In: Numerical Analysis, pp. 155–157 (1978)
Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18, 227–245 (1993)
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58, 353–367 (1993)
Schramm, H.: Eine kombination von bundle- und trust-region-verfahren zur Lösung nichtdifferenzierbare optimierungsprobleme. Bayreuther Mathematische Schriften, vol. 30 (1989)
Schramm, H., Zowe, J.: A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J. Optim. 2, 121–152 (1992)
Schultz, H.K.: A Kuhn–Tucker algorithm. SIAM J. Control Optim. 11, 438–445 (1973)
Sreedharan, V.P.: A subgradient projection algorithm. J. Approx. Theory 35, 111–126 (1982)
Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable convex functions. Math. Program. Stud. 3, 145–173 (1975)
Xiao, Y., Wei, Z.: A new subspace limited memory BFGS algorithm for large-scale bound constrained optimization. Appl. Math. Comput. 185, 350–359 (2007)
Yuan, G., Li, P., Lu, J.: The global convergence of the BFGS method with a modified WWP line search for nonconvex functions. Numer. Algorithms (2019, accepted)
Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm for large-scale nonlinear equations and image restoration problems. Appl. Numer. Math. 147, 129–141 (2020)
Yuan, G., Meng, Z., Li, Y.: A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 168, 129–152 (2016)
Yuan, G., Sheng, Z.: Optimization Algorithm of Nonsmooth. Science Press, Beijing (2017)
Yuan, G., Sheng, Z., Wang, P., Hu, W., Li, C.: The global convergence of a modified BFGS method for nonconvex functions. J. Comput. Appl. Math. 327, 274–294 (2018)
Yuan, G., Wei, Z.: The Barzilai and Borwein gradient method with nonmonotone line search for nonsmooth convex optimization problems. Math. Model. Anal. 17, 203–216 (2012)
Yuan, G., Wei, Z., Li, G.: A modified Polak–Ribière–Polyak conjugate gradient algorithm with nonmonotone line search for nonsmooth convex minimization. J. Comput. Appl. Math. 255, 86–96 (2014)
Yuan, G., Wei, Z., Lu, X.: Global convergence of the BFGS method and the PRP method for general functions under a modified weak Wolfe–Powell line search. Appl. Math. Model. 47, 811–825 (2017)
Yuan, G., Wei, Z., Wang, Z.: Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minimization. Comput. Optim. Appl. 54, 45–64 (2013)
Yuan, G., Wei, Z., Yang, Y.: The global convergence of the Polak–Ribière–Polyak conjugate gradient algorithm under inexact line search for nonconvex functions. J. Comput. Appl. Math. 362, 262–275 (2019)
Acknowledgements
The author would like to thank the editor for his/(or her) work and the referees for their valuable comments, which greatly improved our paper..
Availability of data and materials
All data are included in this paper and they be freely used.
Funding
This work is supported by the High Level Innovation Teams and Excellent Scholars Program in Guangxi institutions of higher education (Grant No. [2019]52), the National Natural Science Foundation of China (Grant No. 11661009), the Guangxi Natural Science Key Fund (No. 2017GXNSFDA198046), and the National Social Science Key Foundation of China (Grant No. 17AJL012).
Author information
Authors and Affiliations
Contributions
The main idea of this paper was proposed by XL. XL prepared the manuscript initially and performed all the steps of the proofs in this research. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
There is no potential conflicts of interest.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, X. A limited memory BFGS subspace algorithm for bound constrained nonsmooth problems. J Inequal Appl 2020, 135 (2020). https://doi.org/10.1186/s13660-020-02398-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-020-02398-6