Skip to content

Advertisement

  • Research
  • Open Access

Two spectral gradient projection methods for constrained equations and their linear convergence rate

Journal of Inequalities and Applications20152015:8

https://doi.org/10.1186/s13660-014-0525-z

  • Received: 20 July 2014
  • Accepted: 12 December 2014
  • Published:

Abstract

Due to its simplicity and numerical efficiency for unconstrained optimization problems, the spectral gradient method has received more and more attention in recent years. In this paper, two spectral gradient projection methods for constrained equations are proposed, which are combinations of the well-known spectral gradient method and the hyperplane projection method. The new methods are not only derivative-free, but also completely matrix-free, and consequently they can be applied to solve large-scale constrained equations. Under the condition that the underlying mapping of the constrained equations is Lipschitz continuous or strongly monotone, we establish the global convergence of the new methods. Compared with the existing gradient methods for solving such problems, the new methods possess a linear convergence rate under some error bound conditions. Furthermore, a relax factor γ is attached in the update step to accelerate convergence. Preliminary numerical results show that they are efficient and promising in practice.

Keywords

  • constrained equations
  • spectral gradient method
  • projection method
  • global convergence

1 Introduction

In this paper, we consider the problems of finding a solution of the following constrained equations, denoted by \(\operatorname{CES}(F,C)\),
$$ F\bigl(x^{*}\bigr)=0\quad \mbox{subject to}\quad x^{*}\in C, $$
(1)
where \(F: C\rightarrow R^{n}\) is a given continuous nonlinear mapping and C is a nonempty closed convex set of \(R^{n}\). Obviously, when \(C=R^{n}\), (1) reduces to the nonlinear equations, which is intensively studied by many scholars. The constrained system of equations (1) appears in wide variety of problems in applied mathematics, and some important problems, such as economic equilibrium problems [1], power flow equations [2], and chemical equilibrium systems [3], can be reformulated as a problem of the kind (1).

Among various numerical methods for solving \(\operatorname{CES}(F,C)\) [48], the gradient projection methods (GPMs) are the most efficient, especially when the projection onto the feasible set C is easy to implement. For example, when C is the nonnegative orthant, or a box, or a ball, GPMs require the lowest computational cost. In addition, the GPMs are also the simplest, because they do not need to store any matrix during the iteration process. Therefore, they are completely matrix-free, and consequently, they can be applied to solve large-scale \(\operatorname{CES}(F,C)\).

It is well known that the spectral gradient method [9, 10] and the conjugate gradient method [11] are two efficient methods for solving large-scale unconstrained optimization problems due to their simplicity and low storage. Recently, combined with the projection technique, they are extended to solve constrained equations \(\operatorname{CES}(F,C)\) by some scholars [6, 7]. In [6], Yu et al. proposed a spectral gradient projection method for solving monotone \(\operatorname{CES}(F,C)\), which can be applied to nonsmooth constrained equation, and works quite well even for large-scale \(\operatorname{CES}(F,C)\). Quite recently, Liu et al. [7] developed two unified frameworks of some sufficient descent conjugate gradient projection methods for solving monotone \(\operatorname{CES}(F,C)\), which are also applied to solve large-scale nonsmooth constrained equations. However, the convergence rate issue of the methods in [6, 7] is not investigated. Therefore, whether they have a linear convergence rate is an open problem. Can we design a spectral/conjugate gradient projection method with a linear convergence rate for \(\operatorname{CES}(F,C)\)? In this paper, we answer this question positively for spectral gradient projection method. Note that, in [12], Dai and Liao proved a nice conclusion for the spectral gradient method. In fact, they established the R-linear convergence of the spectral gradient method for strongly convex quadratics of any number of dimensions, and they also proved the locally R-linear convergence for the general objective function. Obviously, the general minimization problem discussed in [12] is equivalent to the system of nonlinear equations under some mild conditions. However, for the system of constrained nonlinear equations, we shall establish the locally R-linear convergence of the spectral gradient method in this paper. Therefore, our result extends the conclusion in [12] in some sense.

In fact, in this paper, motivated by the projection methods in [13, 14] and the spectral gradient method in [6], we propose two spectral gradient projection methods for solving nonsmooth constrained equations, which can be viewed as combinations of the well-known spectral gradient method and the famous hyperplane projection method, and they possess a linear convergence rate under some error bound conditions. The remainder of this paper is organized as follows. In the next section, we describe the new methods and present their global convergence analysis. The linear convergence rates of the new methods are established in Section 3. Numerical results are reported in Section 4. Finally, some final remarks are included in Section 5.

2 Algorithm and convergence analysis

First, we denote \(\|x\|=\sqrt{x^{\top}x}\) as the Euclidean-norm. Let \({C^{*}}\) denote the solution set of \(\operatorname{CES}(F,C)\). Throughout this paper, we assume that:
  1. (A1)

    The solution set \({C^{*}}\) is nonempty.

     
  2. (A2)
    The mapping \(F(\cdot)\) is monotone on C, i.e.,
    $$ \bigl\langle F(x)-F(y), x-y\bigr\rangle \geq0,\quad \mbox{for all } x,y\in C. $$
     
  3. (A3)
    The mapping \(F(\cdot)\) is Lipschitz continuous on C, i.e., there is a positive constant L such that
    $$ \bigl\Vert F(x)-F(y)\bigr\Vert \leq L\|x-y\|,\quad \mbox{for all } x,y\in C. $$
     
  4. (A4)
    The mapping \(F(\cdot)\) is strongly monotone on C, i.e., there is a positive constant η such that
    $$ \bigl\langle F(x)-F(y), x-y\bigr\rangle \geq\eta\|x-y\| ^{2}, \quad \mbox{for all } x,y\in C. $$
    (2)
     
Obviously, (A4) implies (A2), and from (2) and the Cauchy-Schwartz inequality, we have
$$ \bigl\Vert F(x)-F(y)\bigr\Vert \geq\eta\|x-y\|, \quad \mbox{for all } x,y\in C. $$
(3)
Then let \(P_{C}(\cdot)\) denote the projection mapping from \(R^{n}\) onto the convex set C, i.e.,
$$ P_{C}(x)=\operatorname{argmin}\bigl\{ \Vert x-y\Vert |y\in C\bigr\} , $$
which has the following nonexpansive property:
$$ \bigl\Vert P_{C}(x)-P_{C}(y)\bigr\Vert \leq\|x-y\|,\quad \forall x,y\in R^{n}. $$
(4)
Now, we review the spectral gradient method for the unconstrained minimization problem:
$$ \min f(x),\quad x\in R^{n}, $$
(5)
where \(f: R^{n}\rightarrow R\) is smooth and its gradient is available. The spectral gradient for solving (5) is an iterative method of the form
$$ x_{k+1}=x_{k}-\alpha_{k}\nabla f(x_{k}), $$
where \(\alpha_{k}\) is a step size defined by (see [9])
$$ \alpha_{k}^{\mathrm{I}}=\frac{s_{k-1}^{\top}y_{k-1}}{y_{k-1}^{\top}y_{k-1}}\quad \mbox{or}\quad \alpha_{k}^{\mathrm{II}}=\frac{s_{k-1}^{\top}s_{k-1}}{s_{k-1}^{\top}y_{k-1}}, $$
(6)
in which \(s_{k-1}=x_{k}-x_{k-1}\), \(y_{k-1}=\nabla f(x_{k})-\nabla f(x_{k-1})\). The step sizes (6) are called Barzilai-Borwein (BB) step sizes, and the corresponding gradient methods are spectral gradient methods. The spectral gradient with step size \(\alpha_{k}^{\mathrm{II}}\) has been extended to solve the constrained equations (1) by Yu et al. [6], however, as discussed in the Introduction, we do not know whether the method in [6] possesses the linear convergence rate. In the following, we will extend the spectral gradient with step size \(\alpha_{k}^{\mathrm{I}}\) and \(\alpha_{k}^{\mathrm {II}}\) to solve constrained equations (1) by some new type Armijo line searches, and we propose two spectral gradient projection methods, which are not only globally convergent, but also have a linear convergence rate.

The spectral gradient projection methods are stated as follows.

Algorithm 2.1

Step 0. Set an arbitrary initial point \(x_{0}\in{C}\), the parameters \(0<\rho<1\), \(0<\sigma<r<1\), \(0<\gamma<2\), and \(0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}\). Set the initial step size \(\beta_{0}=1\) and set \(k:=0\).

Step 1. If \(F(x_{k})=0\), then stop; otherwise, go to Step 2.

Step 2. Compute \(d_{k}\) by
$$ d_{k}=\left \{ \begin{array}{l@{\quad}l} -F(x_{k}),& \mbox{if } k=0, \\ -\theta_{k}F(x_{k}), &\mbox{if } k\geq1, \end{array} \right . $$
(7)
where
$$ \theta_{k}=\frac{s_{k-1}^{\top}y_{k-1}}{y_{k-1}^{\top}y_{k-1}}, $$
(8)
which is similar to \(\alpha_{k}^{\mathrm{I}}\) defined in (6), \(y_{k-1}=F(x_{k})-F(x_{k-1})\), but \(s_{k-1}\) is defined by
$$ s_{k-1}=x_{k}-x_{k-1}+ry_{k-1}, $$
which is different from the standard definition of \(s_{k-1}\). Stop if \(d_{k}=0\); otherwise, go to Step 3.
Step 3. Find the trial point \(z_{k}=x_{k}+\alpha_{k} d_{k}\), where \(\alpha_{k}=\beta_{k}\rho^{m_{k}}\) with \(m_{k}\) being the smallest nonnegative integer m such that
$$ -\bigl\langle F(x_{k}+\alpha_{k} d_{k}),d_{k}\bigr\rangle \geq \sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
(9)
Step 4. Compute
$$ x_{k+1}=P_{C}\bigl[x_{k}-\gamma \xi_{k}F(z_{k})\bigr], $$
(10)
where
$$ \xi_{k}=\frac{\langle F(z_{k}), x_{k}-z_{k}\rangle }{\Vert F(z_{k})\Vert ^{2}}. $$
(11)
Choose an initial step size \(\beta_{k+1}\) such that \(\beta_{k+1}\in [\beta_{\mathrm{min}},\beta_{\mathrm{max}}]\). Set \(k:=k+1\) and go to Step 1.

Algorithm 2.2

Step 0. Set an arbitrary initial point \(x_{0}\in{C}\), compute L, the Lipschitz constant of \(F(\cdot)\), choose the parameters \(0<\rho <1\), \(0< r<1\), \(0<\sigma<r^{2}/(L+r)\), \(0<\gamma<2\), and \(0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}\). Set the initial step size \(\beta_{0}=1\) and set \(k:=0\).

Step 1. If \(F(x_{k})=0\), then stop; otherwise, go to Step 2.

Step 2. Compute \(d_{k}\) by
$$d_{k}=\left \{ \begin{array}{l@{\quad}l}-F(x_{k}), &\mbox{if } k=0, \\ -\vartheta_{k}F(x_{k}), &\mbox{if } k\geq1, \end{array} \right . $$
where
$$\vartheta_{k}=\frac{s_{k-1}^{\top}s_{k-1}}{s_{k-1}^{\top}y_{k-1}}, $$
which is similar to \(\alpha_{k}^{\mathrm{II}}\) defined in (6), \(s_{k-1}=x_{k}-x_{k-1}\), but \(y_{k-1}\) is defined by
$$ y_{k-1}=F(x_{k})-F(x_{k-1})+rs_{k-1}, $$
which is different from the standard definition of \(s_{k-1}\). Stop if \(d_{k}=0\); otherwise, go to Step 3.
Step 3. Find the trial point \(z_{k}=x_{k}+\alpha_{k} d_{k}\), where \(\alpha_{k}=\beta_{k}\rho^{m_{k}}\) with \(m_{k}\) being the smallest nonnegative integer m such that
$$ -\bigl\langle F(x_{k}+\alpha_{k} d_{k}),d_{k}\bigr\rangle \geq \sigma\|d_{k} \|^{2}. $$
(12)

Step 4. See Step 4 of Algorithm 2.1.

The discussions of the global convergence and linear convergence rate of Algorithm 2.2 are similar to those of Algorithm 2.1. Therefore, in the following, we discuss Algorithm 2.2 in detail, and we only give the corresponding results of Algorithm 2.2.

Remark 2.1

For Algorithm 2.1, by (3), we have
$$\begin{aligned} s_{k-1}^{\top}y_{k-1}&=\langle x_{k}-x_{k-1}+ry_{k-1},y_{k-1} \rangle \\ &\leq\frac{1}{\eta}\|y_{k-1}\|^{2}+r\|y_{k-1} \|^{2} \\ &= \biggl(\frac{1}{\eta}+r \biggr)\|y_{k-1}\|^{2}. \end{aligned}$$
In addition, by the monotonicity of \(F(\cdot)\), we also have
$$ s_{k-1}^{\top}y_{k-1}\geq r\|y_{k-1} \|^{2}. $$
So we have from the above two inequalities and (7)
$$ r\bigl\Vert F(x_{k})\bigr\Vert \leq \|d_{k}\|\leq \biggl(\frac {1}{\eta}+r \biggr)\bigl\Vert F(x_{k})\bigr\Vert , $$
(13)
from which we can get \(\|F(x_{k})\|=0\) if \(\|d_{k}\|=0\), which means \(x_{k}\) is a solution of \(\operatorname{CES}(F,C)\). Thus, Algorithm 2.1 can also terminate when \(\|d_{k}\|=0\). Similarly, for Algorithm 2.2, by the Lipschitz continuity and monotonicity of \(F(\cdot)\), we can deduce that
$$\frac{\|F(x_{k})\|}{L+r}\leq\|d_{k}\|\leq\frac{\|F(x_{k})\|}{r}. $$

In what follows, we assume that \(\|F(x_{k})\|\neq0\) and \(\|d_{k}\|\neq0\), for all k, i.e., Algorithm 2.1 or Algorithm 2.2 generates an infinite sequence \(\{x_{k}\}\).

Remark 2.2

In (10), we attach a relax factor \(\gamma \in(0,2)\) to \(F(z_{k})\) based on numerical experiences.

Remark 2.3

The line search (9) is different from that of [6, 7], which is well defined by the following lemma.

Lemma 2.1

For all \(k\geq0\), there exists a nonnegative number \(m_{k}\) satisfying (9).

Proof

For the sake of contradiction, we suppose that there exists \(k_{0}\geq0\) such that (9) is not satisfied for any nonnegative integer m, i.e.,
$$-\bigl\langle F\bigl(x_{k_{0}}+\beta_{k_{0}}\rho^{m}d_{k_{0}} \bigr),d_{k_{0}}\bigr\rangle <\sigma\bigl\Vert F(x_{k_{0}})\bigr\Vert ^{2},\quad \forall m\geq1. $$
Letting \(m\rightarrow\infty\) and using the continuity of \(F(\cdot)\) yield
$$ -\bigl\langle F(x_{k_{0}}),d_{k_{0}}\bigr\rangle \leq \sigma \bigl\Vert F(x_{k_{0}})\bigr\Vert ^{2}. $$
(14)
On the other hand, by (7) and (13), we obtain
$$ -\bigl\langle F(x_{0}),d_{0}\bigr\rangle =\bigl\Vert F(x_{0})\bigr\Vert ^{2}>r\bigl\Vert F(x_{0}) \bigr\Vert ^{2} $$
and
$$ -\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2},\quad \forall k \geq1, $$
which together with (14) means that \(\sigma\geq r\), however, this contradicts the fact that \(\sigma< r\). Therefore the assertion of Lemma 2.1 holds. This completes the proof. □

For the line search (12), we have a similar result, in the following lemma.

Lemma 2.2

For all \(k\geq0\), there exists a nonnegative number \(m_{k}\) satisfying (12).

Proof

The lemma can be proved by contradiction as that of Lemma 2.1, and we omit the proof for concision. This completes the proof. □

The step length \(\alpha_{k}\) and the norm of the function \(F(x_{k})\) satisfy the following property, which is an important result for proving the global convergence of Algorithm 2.1.

Lemma 2.3

Suppose that \(F(\cdot)\) is strongly monotone and let \(\{x_{k}\}\) and \(\{z_{k}\}\) be the sequences generated by Algorithm 2.1, then \(\{x_{k}\}\) and \(\{z_{k}\}\) are both bounded. Furthermore, we have
$$ \lim_{k\rightarrow\infty}\alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2}=0. $$
(15)

Proof

From (9), we have
$$ \bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle \geq\sigma \alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2}>0. $$
(16)
For any \(x^{*}\in C^{*}\), from (4), we have
$$\begin{aligned}& \bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2} \\& \quad = \bigl\Vert P_{C}\bigl[x_{k}-\gamma \xi_{k}F(z_{k})\bigr]-x^{*}\bigr\Vert ^{2} \\& \quad \leq \bigl\Vert x_{k}-\gamma\xi_{k}F(z_{k})-x^{*} \bigr\Vert ^{2} \\& \quad = \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-2\gamma \xi_{k}\bigl\langle F(z_{k}),x_{k}-x^{*}\bigr\rangle +\gamma^{2}\xi _{k}^{2}\bigl\Vert F(z_{k})\bigr\Vert ^{2}. \end{aligned}$$
(17)
By the monotonicity of the mapping \(F(\cdot)\), we have
$$\begin{aligned}& \bigl\langle F(z_{k}),x_{k}-x^{*}\bigr\rangle \\& \quad = \bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle +\bigl\langle F(z_{k}),z_{k}-x^{*}\bigr\rangle \\& \quad \geq \bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle +\bigl\langle F\bigl(x^{*}\bigr),z_{k}-x^{*}\bigr\rangle \\& \quad = \bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle . \end{aligned}$$
(18)
Substituting (16) and (18) into (17), we have
$$\begin{aligned}& \bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2} \\& \quad \leq \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-2\gamma \xi_{k}\bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle +\gamma ^{2}\xi_{k}^{2}\bigl\Vert F(z_{k})\bigr\Vert ^{2} \\& \quad = \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-\gamma(2- \gamma) \frac{\langle F(z_{k}),x_{k}-z_{k}\rangle ^{2}}{\Vert F(z_{k})\Vert ^{2}} \\& \quad \leq \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}- \gamma(2-\gamma) \frac{\sigma^{2}\alpha_{k}^{2}\Vert F(x_{k})\Vert ^{4}}{\Vert F(z_{k})\Vert ^{2}} , \end{aligned}$$
(19)
which together with \(\gamma\in(0,2)\) indicates that, for all k,
$$ \bigl\Vert x_{k+1}-x^{*}\bigr\Vert \leq\bigl\Vert x_{k}-x^{*}\bigr\Vert , $$
(20)
which shows that the sequence \(\{x_{k}\}\) is bounded. By (13), \(\{d_{k}\}\) is bounded and so is \(\{z_{k}\}\). Then, by the continuity of \(F(\cdot)\), there exists a constant \(M>0\) such that \(\|F(z_{k})\|\leq M\), for all k. Therefore it follows from (19) that
$$\gamma(2-\gamma)\frac{\sigma^{2}}{M^{2}}\sum_{k=0}^{\infty}\alpha_{k}^{2}\bigl\Vert F(x_{k})\bigr\Vert ^{4}\leq\sum_{k=0}^{\infty}\bigl(\bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-\bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2}\bigr)<\infty, $$
which implies that the assertion (15) holds. The proof is completed. □

Lemma 2.4

Suppose that \(F(\cdot)\) is monotone and Lipschitz continuous and let \(\{x_{k}\}\) and \(\{z_{k}\}\) be the sequences generated by Algorithm 2.2, then \(\{x_{k}\}\) and \(\{z_{k}\}\) are both bounded. Furthermore, we have
$$\lim_{k\rightarrow\infty}\alpha_{k}\|d_{k} \|^{2}=0. $$

Proof

The conclusion is a little different from (15), which results from the difference of the right hands of the line searches (9) and (12). In fact, this conclusion can be proved as that of Lemma 2.3, and we also omit it for concision. This completes the proof. □

Now, we establish the global convergence theorems for Algorithm 2.1 and Algorithm 2.2.

Theorem 2.1

Suppose that the conditions in Lemma  2.3 hold. Then the sequence \(\{x_{k}\}\) generated by Algorithm 2.1 globally converges to a solution of \(\operatorname{CES}(F,C)\).

Proof

We consider the following two possible cases.

Case 1: \(\liminf_{k\rightarrow\infty}\|F(x_{k})\|=0\), which together with the continuity of \(F(\cdot)\) implies that the sequence \(\{x_{k}\}\) has some accumulation point \(\bar{x}\) such that \(F(\bar{x})=0\). From (20), \(\{\|x_{k}-\bar{x}\|\}\) converges, and since \(\bar{x}\) is an accumulation point of \(\{x_{k}\}\), \(\{x_{k}\}\) must converge to \(\bar{x}\).

Case 2: \(\liminf_{k\rightarrow\infty}\|F(x_{k})\|>0\). Then by (15), it follows that \(\lim_{k\rightarrow\infty}\alpha_{k}=0\). Therefore, from the line search (9), for sufficiently large k, we have
$$ -\bigl\langle F\bigl(x_{k}+\beta_{k}\rho ^{m_{k}-1}d_{k}\bigr),d_{k}\bigr\rangle <\sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
(21)
Since \(\{x_{k}\}\), \(\{d_{k}\}\) are both bounded, we can choose a sequence and letting \(k\rightarrow\infty\) in (21), we can obtain
$$ -\bigl\langle F(\bar{x}),\bar{d}\bigr\rangle \leq\sigma \bigl\Vert F(\bar{x})\bigr\Vert ^{2}, $$
(22)
where \(\bar{x}\), \(\bar{d}\) are limit points of corresponding subsequences. On the other hand, by (13), we obtain
$$ -\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2},\quad \forall k \geq1. $$
Letting \(k\rightarrow\infty\) in the above inequality, we obtain
$$ -\bigl\langle F(\bar{x}),\bar{d}\bigr\rangle \geq r\bigl\Vert F( \bar{x})\bigr\Vert ^{2}. $$
(23)
Thus, by (22) and (23), we get \(r\leq\sigma\), and this contradicts the fact that \(r>\sigma\). Therefore \(\liminf_{k\rightarrow\infty}\|F(x_{k})\|>0\) does not hold. This completes the proof. □

For Algorithm 2.2, we also have the following global convergence.

Theorem 2.2

Suppose that the conditions in Lemma  2.4 hold. Then the sequence \(\{x_{k}\}\) generated by Algorithm 2.2 globally converges to a solution of \(\operatorname{CES}(F,C)\).

Proof

Following a process similar to the proof for Theorem 2.1, we can get the desired conclusion. This completes the proof. □

3 Convergence rate

By Theorem 2.1 and Theorem 2.2, we know that the sequence \(\{x_{k}\}\) generated by Algorithm 2.1 or Algorithm 2.2 converges to a solution of \(\operatorname{CES}(F,C)\). In what follows, we always assume that \(x_{k}\rightarrow x^{*}\) as \(k\rightarrow\infty\), where \(x^{*}\in C^{*}\). To establish the local convergence rate of the sequence generated by Algorithm 2.1 or Algorithm 2.2, we need the following assumption.

Assumption 3.1

For \(x^{*}\in C^{*}\), there exist three positive constants δ, c, and L such that
$$ c\operatorname{dist}\bigl(x,C^{*}\bigr)\leq\bigl\Vert F(x)\bigr\Vert ,\quad \forall x\in N\bigl(x^{*},\delta\bigr) $$
(24)
and
$$ \bigl\Vert F(x)-F(y)\bigr\Vert \leq L\|x-y\|,\quad \forall x,y \in N\bigl(x^{*},\delta\bigr), $$
(25)
where \(\operatorname{dist}(x,C^{*})\) denotes the distance from x to the solution set \(C^{*}\), and
$$N\bigl(x^{*},\delta\bigr)=\bigl\{ x\in R^{n}|\bigl\Vert x-x^{*}\bigr\Vert \leq\delta\bigr\} . $$
Obviously, (A3) in Section 2 implies (25). Here, we set the constant c so that
$$ 0<\frac{\gamma(2-\gamma)\sigma\alpha c^{2}\eta ^{2}}{ L^{2}(\beta_{\mathrm{max}}L(1+r\eta)+\eta)^{2}}<1. $$
(26)
Now, we analyze the convergence rate of the sequence \(\{x_{k}\}\) generated by Algorithm 2.1 or Algorithm 2.2 under the conditions (24) and (25).

Lemma 3.1

If (A4) and the conditions in Assumption  3.1 hold, then the sequence \(\{\alpha_{k}\}\) generated by the line search (9) has a positive bound from below.

Proof

We only need to prove that for sufficiently large k, \(\alpha_{k}\) has a positive bound from below. If \(\alpha_{k}\leq\beta_{k}\), then by the construction of \(\alpha_{k}\), we have
$$-\bigl\langle F\bigl(x_{k}+\beta_{k}\alpha_{k} \rho^{-1}d_{k}\bigr),d_{k}\bigr\rangle <\sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
In addition, by (7), we have
$$-\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
Then, by the above two inequalities, we can obtain
$$ \bigl\langle F\bigl(x_{k}+\beta_{k} \alpha_{k}\rho ^{-1}d_{k}\bigr)-F(x_{k}),d_{k} \bigr\rangle \geq(r-\sigma)\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
(27)
On the other hand, from (13) and (25), we have
$$ \bigl\langle F\bigl(x_{k}+\beta_{k} \alpha_{k}\rho ^{-1}d_{k}\bigr)-F(x_{k}),d_{k} \bigr\rangle \leq\frac{L\beta_{k}\alpha_{k}}{\rho}\|d_{k}\| ^{2}\leq \frac{L\beta_{k}\alpha_{k}(1+r\eta)^{2}}{\rho\eta^{2}}\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
(28)
By (27) and (28), for k sufficiently large we obtain
$$\alpha_{k}\geq\frac{\rho(r-\sigma)\eta^{2}}{L\beta_{k}(1+r\eta)^{2}}\geq\frac {\rho(r-\sigma)\eta^{2}}{L\beta_{\mathrm{max}}(1+r\eta)^{2}}. $$
Therefore, there is a positive constant α, such that
$$ \alpha_{k}\geq\alpha, $$
(29)
for all k. The proof is completed. □

Lemma 3.2

If (A2), (A3), and the conditions in Assumption  3.1 hold, then the sequence \(\{\alpha_{k}\}\) generated by the line search (12) has a positive bound from below.

Proof

The proof is similar to that of Lemma 3.1, and we omit it for concision. This completes the proof. □

Theorem 3.1

In addition to the assumptions in Theorem  2.1, if conditions (24) and (25) hold, then the sequence \(\{ \operatorname{dist}(x_{k},C^{*})\}\) generated by Algorithm  2.1 converges locally to 0 at the Q-linear rate, hence the sequence \(\{x_{k}\}\) converges locally to \(x^{*}\) at the R-linear rate.

Proof

Let \(v_{k}\in C^{*}\) be the closest solution to \(x_{k}\). That is, \(\|x_{k}-v_{k}\| =\operatorname{dist}(x_{k},C^{*})\). By (19), we have
$$ \|x_{k+1}-v_{k}\|^{2}\leq \|x_{k}-v_{k}\|^{2}-\gamma (2-\gamma) \frac{\langle F(z_{k}), x_{k}-z_{k}\rangle^{2}}{\|F(z_{k})\|^{2}}. $$
(30)
For sufficiently large k, it follows from (13) and (25) that
$$\begin{aligned} \bigl\Vert F(z_{k})\bigr\Vert &=\bigl\Vert F(z_{k})-F(v_{k}) \bigr\Vert \\ &\leq L\Vert z_{k}-v_{k}\Vert \\ &\leq L\bigl(\Vert x_{k}-y_{k}\Vert +\Vert x_{k}-v_{k}\Vert \bigr) \\ &\leq L\bigl(\beta_{\mathrm{max}}\Vert d_{k}\Vert +\Vert x_{k}-v_{k}\Vert \bigr) \\ &\leq L \biggl(\frac{\beta_{\mathrm{max}}(1+r\eta)\Vert F(x_{k})\Vert }{\eta}+\Vert x_{k}-v_{k} \Vert \biggr) \\ &=L \biggl(\frac{\beta_{\mathrm{max}}(1+r\eta)\Vert F(x_{k})-F(v_{k})\Vert }{\eta }+\Vert x_{k}-v_{k}\Vert \biggr) \\ &\leq L \biggl(\frac{\beta_{\mathrm{max}}L(1+r\eta)}{\eta}+1 \biggr)\Vert x_{k}-v_{k} \Vert \\ &= L \biggl(\frac{\beta_{\mathrm{max}}L(1+r\eta)}{\eta}+1 \biggr)\operatorname {dist} \bigl(x_{k},C^{*}\bigr). \end{aligned}$$
Thus, from (9), (24), and (29), for sufficiently large k, we have
$$\bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle \geq\sigma\alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq\sigma\alpha\bigl\Vert F(x_{k})\bigr\Vert ^{2}\geq\sigma\alpha c^{2}\operatorname{dist}^{2} \bigl(x_{k},C^{*}\bigr). $$
Substituting the above two inequalities into (30) and from (26), we have
$$\operatorname{dist}^{2}\bigl(x_{k+1},C^{*}\bigr)\leq \|x_{k+1}-v_{k}\|^{2}\leq \biggl(1- \frac {\gamma(2-\gamma)\sigma\alpha c^{2}\eta^{2}}{ L^{2}(\beta_{\mathrm {max}}L(1+r\eta)+\eta)^{2}} \biggr)\operatorname{dist}^{2}\bigl(x_{k},C^{*} \bigr), $$
which implies that the sequence \(\{\operatorname{dist}(x_{k},C^{*})\}\) converges locally to 0 at the Q-linear rate. Therefore, the sequence \(\{x_{k}\}\) converges locally to \(x^{*}\) at the R-linear rate. The proof is completed. □

Theorem 3.2

In addition to the assumptions in Theorem  2.2, if conditions (24) and (25) hold, then the sequence \(\{ \operatorname{dist}(x_{k},C^{*})\}\) generated by Algorithm  2.1 converges locally to 0 at the Q-linear rate, hence the sequence \(\{x_{k}\}\) converges locally to \(x^{*}\) at an R-linear rate.

Proof

The proof is similar to that of Theorem 3.1, and we also omit it for concision. This completes the proof. □

4 Numerical results

In this section, we test Algorithm 2.1 and Algorithm 2.2, and compare them with the spectral gradient projection method in [15]. We give the following three simple problems to test the efficiency of the three methods.

Problem 1

The mapping \(F(\cdot)\) is taken as \(F(x)=(f_{1}(x), f_{2}(x),\ldots,f_{n}(x))^{\top}\), where
$$f_{i}(x)=e^{x_{i}}-1, \quad \mbox{for } i=1,2,\ldots,n $$
and \({C}={R}_{+}^{n}\). Obviously, this problem has a unique solution \(x^{*}=(0,0,\ldots,0)^{\top}\).

Problem 2

The mapping \(F(\cdot)\) is taken as \(F(x)=(f_{1}(x), f_{2}(x),\ldots,f_{n}(x))^{\top}\), where
$$f_{i}(x)=x_{i}-\sin|x_{i}-1|,\quad \mbox{for } i=1,2,\ldots,n $$
and \({C}=\{x\in{R}_{+}^{n}|\sum_{i=1}^{n}x_{i}\leq n, x_{i}\geq0, i=1,2,\ldots,n\} \). Obviously, Problem 2 is nonsmooth at \(x=(1,1,\ldots,1)^{\top}\).

Problem 3

The problem is adapted from [13]. The mapping \(F(\cdot)\) is taken as \(F(x)=D(x)+Mx\), where \(D(x)\) and Mx are the nonlinear part and linear part of \(F(x)\), respectively. Here, the components of \(D(x)\) is defined by \(D_{j}(x)=a_{j}\arctan(x_{j})\), where \(a_{j}\) is a random variable in \((0,100)\), and the matrix \(M=A^{\top}A+B\), where A is an \(n\times n\) matrix whose entries are randomly generated in the interval \((-1,1)\) and a skew-symmetric matrix B is generated in the same way. In addition, \({C}={R}_{+}^{n}\).

The codes are written in Mablab 7.0 and run on a personal computer with 2.0 GHz CPU processor. The parameters used in Algorithm 2.1 and Algorithm 2.2 are set as \(\rho=0.6\), \(r=10^{-3}\), \(\sigma=10^{-4}\), and \(\gamma=1.8\) for Problem 1 and \(\gamma=1\) for Problems 2 and 3. The initial step size in Step 2 of Algorithm 2.1 or Algorithm 2.2 is set to be \(\beta_{k}=1\). We stop the iteration if the iteration number exceeds 1,000 or the inequality \(\|F(x_{k})\|\leq10^{-5}\) is satisfied. The method in [15] (denoted by CGD) is implemented with the following parameters: \(\rho=0.1\), \(r=0.01\), \(\sigma =10^{-4}\), and \(\xi=1\).

For Problems 1 and 2, the initial point is set as \(x_{0}=\operatorname {ones}(n,1)\), and for Problem 3, the initial point is set as \(x_{0}=\operatorname{rand}(n,1)\). Tables 1-3 give the numerical results by Algorithm 2.1, Algorithm 2.2, and CGD with different dimensions, where Iter. denotes the iteration number, Fn denotes the number of function evaluations, and CPU denotes the CPU time in seconds when the algorithms terminate.
Table 1

Numerical results with different dimensions of Problem 1

Dimension

Method

Iter.

Fn

CPU

1,000

Algorithm 2.1

1

5

0.02

Algorithm 2.2

1

5

0.02

CGD

11

51

0.03

5,000

Algorithm 2.1

1

5

0.03

Algorithm 2.2

1

5

0.03

CGD

12

56

0.16

50,000

Algorithm 2.1

1

5

0.14

Algorithm 2.2

1

5

0.11

CGD

13

60

1.33

100,000

Algorithm 2.1

1

5

0.25

Algorithm 2.2

1

5

0.25

CGD

13

60

2.86

Table 2

Numerical results with different dimensions of Problem 2

Dimension

Method

Iter.

Fn

CPU

1,000

Algorithm 2.1

10

59

0.05

Algorithm 2.2

8

52

0.05

CGD

12

59

0.06

5,000

Algorithm 2.1

10

59

0.17

Algorithm 2.2

8

52

0.16

CGD

12

59

0.22

50,000

Algorithm 2.1

11

64

1.58

Algorithm 2.2

10

63

1.47

CGD

12

59

1.55

100,000

Algorithm 2.1

12

69

3.50

Algorithm 2.2

10

63

3.19

CGD

13

69

3.88

Table 3

Numerical results with different dimensions of Problem 3

Dimension

Method

Iter.

Fn

CPU

100

Algorithm 2.1

125

479

0.80

Algorithm 2.2

141

743

1.39

CGD

356

5,422

10.00

500

Algorithm 2.1

215

873

19.48

Algorithm 2.2

239

1,475

32.81

CGD

270

2,700

63.13

The numerical results given in Tables 1-3 show that: (1) the three methods can solve all the tested problems successfully; (2) for the two easy Problems 1 and 2, Algorithm 2.2 performs a little better than Algorithm 2.1 for the CPU time, and both methods perform better than CGD for the three criteria: Iter., Fn, and CPU; (3) for the difficult Problem 3, Algorithm 2.1 performs best among the three methods, and both Algorithm 2.1 and Algorithm 2.2 perform much better than CGD, especially for the CPU time. From the above analysis, we conclude that Algorithm 2.1 and Algorithm 2.2 are better than CGD.

5 Conclusions

Two spectral gradient projection methods for solving constrained equations have been developed, which are not only derivative-free, but also completely matrix-free. Consequently, they can be applied to solve large-scale nonsmooth constrained equations. We established the global convergence without the requirement of differentiability of the equations, and presented the linear convergence rate under standard conditions. We also reported some numerical results to show the efficiency of the proposed methods.

Notes

Declarations

Acknowledgements

The authors gratefully acknowledge the helpful comments and suggestions of the anonymous reviewers. This work is supported by the National Natural Science Foundation of China (71371139, 11302188), the Shanghai Shuguang Talent Project (13SG24), the Shanghai Pujiang Talent Project (12PJC069), and the Foundation of Teachers Professional Development of Zhejiang Provincial Visiting Scholar in Higher School.

Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Authors’ Affiliations

(1)
School of Mathematics and Statistics, Zhejiang University of Finance and Economics, Xueyuan Street, Hangzhou, 310018, P.R. China
(2)
School of Economics and Management, Tongji University, Siping Street, Shanghai, 200092, P.R. China

References

  1. Dirkse, SP, Ferris, MC: MCPLIB: a collection of nonlinear mixed complementarity problems. Optim. Methods Softw. 5, 319-345 (1995) View ArticleGoogle Scholar
  2. Wood, AJ, Wollenberg, BF: Power Generation, Operation, and Control. Wiley, New York (1996) Google Scholar
  3. Meintjes, K, Morgan, AP: A methodology for solving chemical equilibrium systems. Appl. Math. Comput. 22, 333-361 (1987) View ArticleMATHMathSciNetGoogle Scholar
  4. Qi, LQ, Tong, XJ, Li, DH: An active-set projected trust region algorithm for box constrained nonsmooth equations. J. Optim. Theory Appl. 120, 601-625 (2004) View ArticleMATHMathSciNetGoogle Scholar
  5. Ortega, JM, Rheinboldt, WC: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970) MATHGoogle Scholar
  6. Yu, ZS, Lin, J, Sun, J, Xiao, YH, Liu, LY, Li, ZH: Spectral gradient projection method for monotone nonlinear equations with convex constraints. Appl. Numer. Math. 59, 2416-2423 (2009) View ArticleMATHMathSciNetGoogle Scholar
  7. Liu, SY, Huang, YY, Jiao, HW: Sufficient descent conjugate gradient methods for solving convex constrained nonlinear monotone equations. Abstr. Appl. Anal. 2014, Article ID 305643 (2014) MathSciNetGoogle Scholar
  8. Sun, M, Liu, J: Three derivative-free projection methods for large-scale nonlinear equations with convex constraints. J. Appl. Math. Comput. (2014). doi:10.1007/s12190-014-0774-5 Google Scholar
  9. Barzilai, J, Borwein, JM: Two point stepsize gradient methods. IMA J. Numer. Anal. 8, 141-148 (1988) View ArticleMATHMathSciNetGoogle Scholar
  10. Birgin, EG, Martinez, JM, Raydan, M: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60, 1-21 (2014) Google Scholar
  11. Fletcher, R, Reeves, C: Function minimization by conjugate gradients. Comput. J. 7, 149-154 (1964) View ArticleMATHMathSciNetGoogle Scholar
  12. Dai, YH, Liao, LZ: R-Linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1-10 (2002) View ArticleMATHMathSciNetGoogle Scholar
  13. Wang, CW, Wang, YJ, Xu, CL: A projection method for a system of nonlinear monotone equations with convex constraints. Math. Methods Oper. Res. 66, 33-46 (2007) View ArticleMATHMathSciNetGoogle Scholar
  14. Zheng, L: A new projection algorithm for solving a system of nonlinear equations with convex constraints. Bull. Korean Math. Soc. 50, 823-832 (2013) View ArticleMATHMathSciNetGoogle Scholar
  15. Xiao, YH, Zhu, H: A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 405, 310-319 (2013) View ArticleMathSciNetGoogle Scholar

Copyright

© Liu and Duan; licensee Springer 2015

Advertisement