• Research
• Open Access

# A trust region spectral method for large-scale systems of nonlinear equations

Journal of Inequalities and Applications20162016:174

https://doi.org/10.1186/s13660-016-1117-x

• Received: 27 February 2016
• Accepted: 14 June 2016
• Published:

## Abstract

The spectral gradient method is one of the most effective methods for solving large-scale systems of nonlinear equations. In this paper, we propose a new trust region spectral method without gradient. The trust region technique is a globalization strategy in our method. The global convergence of the proposed algorithm is proved. The numerical results show that our new method is more competitive than the spectral method of La Cruz et al. (Math. Comput. 75(255):1429-1448, 2006) for large-scale nonlinear equations.

## Keywords

• nonlinear equations
• trust region
• spectral method
• large-scale problem

• 65H10
• 90C06

## 1 Introduction

In this paper we introduce a trust region spectral method for solving large-scale systems of nonlinear equations
$$F(x) =0,$$
(1)
where $$F: R^{n}\rightarrow R^{n}$$ is continuously differentiable and its Jacobian matrix $$J(x)\in R^{n\times n}$$ is sparse, n is large. Large-scale systems of nonlinear equations have been widely applied in many aspects, such as network-flow problems, discrete boundary value problems, etc.
Many algorithms have been presented for solving the large-scale problem (1). Bouaricha et al.  proposed tensor methods. Bergamaschi et al.  proposed inexact quasi-Newton methods. The above methods need to calculate the Jacobian matrix or an approximation of it at each iteration. La Cruz and Raydan  introduced the spectral method for (1). The method uses the residual $$\pm F(x_{k})$$ as a search direction and the trial point at each iteration is $$x_{k}-\lambda_{k}F(x_{k})$$, where $$\lambda _{k}$$ is a spectral coefficient. $$\lambda_{k}$$ satisfies the Grippo-Lampariello-Lucidi (GLL) line search condition
$$f(x_{k}+\lambda_{k}d_{k})\leq\max _{0\leq j \leq M-1} f(x_{k-j})+\alpha \lambda_{k}\nabla f(x_{k})^{T}d_{k},$$
(2)
where $$f(x)= \frac{1}{2}\Vert F(x)\Vert ^{2}$$, M is a nonnegative integer, α is a small positive number and $$d_{k}=\pm F(x_{k})$$. This method also requires one to compute a directional derivative or a very good approximation of it at every iteration. Later La Cruz et al.  proposed a spectral method without gradient information, which uses a nonmonotone line search globalization strategy
$$f(x_{k}+\lambda_{k}d_{k})\leq\max _{0\leq j \leq M-1} f(x_{k-j})+\eta _{k}-\alpha \lambda_{k}^{2}f(x_{k}),$$
(3)
where $$\sum_{k} \eta_{k} \leq\eta<\infty$$. Meanwhile, conjugate gradient techniques have been developed for solving large-scale nonlinear equations (see ). In fact, spectral gradient, BFGS quasi-Newton, and conjugate gradient methods can solve large-scale optimization problems and systems of nonlinear equations (see ). The advantage of spectral methods is that the storage of certain matrices associated with the Hessian of objective functions can be avoided.
The purpose of this paper is to extend the spectral method for solving large-scale systems of nonlinear equations by using the trust region technique. For the traditional trust region methods , at each iterative point $$x_{k}$$, the trial step $$d_{k}$$ is obtained by solving the following trust region subproblem:
$$\min q_{k}(d) \quad\text{such that} \quad \Vert d\Vert \le \Delta_{k},$$
(4)
where $$q_{k}(d)=\frac{1}{2}\Vert F(x_{k})+J(x_{k})d\Vert ^{2}$$.

The above trust region methods are particularly effective for small to medium-sized systems of nonlinear equations; however, the computation and storage loads can greatly increase with increased dimension.

For the large-scale problems of nonlinear equations, we use $$\gamma_{k} I$$ as an approximation of $$J(x_{k})$$. At each iterative point $$x_{k}$$ in our method, the trial step $$d_{k}$$ is obtained by solving the following subproblem:
$$\min q_{k}(d)=\frac{1}{2}\Vert F_{k}+ \gamma_{k}d\Vert ^{2} \quad\text{such that} \quad \Vert d \Vert \le\Delta_{k},$$
(5)
where $$\gamma_{k}$$ is the spectral coefficient and $$F_{k}=F(x_{k})$$. The classic quasi-Newton equation is
$$B_{k+1}d_{k}=y_{k}.$$
(6)
In (6), we left-multiply $$y_{k}^{T}$$ and set $$B_{k+1}=\gamma _{k+1}I$$, it follows that
$$\gamma_{k+1}=\frac{y_{k}^{T}y_{k}}{y_{k}^{T}d_{k}},$$
(7)
where $$d_{k}=x_{k+1}-x_{k}$$ and $$y_{k}=F_{k+1}-F_{k}$$.

The paper is organized as follows. Section 2 introduces the new algorithm. The convergence theory is presented in Section 3. Section 4 demonstrates preliminary numerical results on test problems.

## 2 New algorithm

In this section, we give a trust region spectral method for solving large-scale systems of nonlinear equations. Let $$d_{k}$$ be the solution of the trust region subproblem (5). We define the actual reduction as
$$\mathit{Ared}_{k}(d_{k})=f(x_{k})-f(x_{k}+d_{k}),$$
(8)
the predict reduction as
$$\mathit{Pred}_{k}(d_{k})=q_{k}(0)-q_{k}(d_{k}).$$
(9)

Now we present our algorithm for solving (1). The algorithm is given as follows.

### Algorithm 1

Step 0.:

Choose $$0<\eta_{1} <\eta_{2} < 1$$, $$0<\beta_{1}<1 <\beta_{2}$$, $$\epsilon>0$$. Initialize $$x_{0}$$, $$0<\Delta_{0} <\bar{\Delta}$$. Set $$k:=0$$.

Step 1.:

Evaluate $$F_{k}$$, if $$\Vert F_{k}\Vert \leq\epsilon$$, then terminate.

Step 2.:

Solve the trust region subproblem (5) to obtain $$d_{k}$$.

Step 3.:
Compute
$$r_{k} = \frac{\mathit{Ared}_{k}(d_{k})}{\mathit{Pred}_{k}(d_{k})}.$$
(10)
If $$r_{k} < \eta_{1}$$, then $$\Delta_{k}= \beta_{1} \Delta_{k}$$, go to Step 2. Otherwise, go to Step 4.
Step 4.:
$$x_{k+1}=x_{k} + d_{k}$$;
$$\Delta_{k + 1} = \textstyle\begin{cases} \min\{\beta_{2} \Delta_{k}, \bar{\Delta}\}, & \text{if } r_{k} \geq\eta_{2},\\ \Delta_{k}, & \text{otherwise}. \end{cases}$$
Compute $$\gamma_{k+1}$$ by (7). Set $$k:=k+1$$, go to Step 1.

## 3 Convergence analysis

In this section, we prove the global convergence of Algorithm 1. The global convergence of Algorithm 1 needs the following assumptions.

### Assumption A

1. (1)

The level set $$\Omega=\{x\in R^{n} \vert f(x)\leq f(x_{0}) \}$$ is bounded.

2. (2)
The following relation holds:
$$\bigl\Vert [J_{k}-\gamma_{k}I]^{T} F_{k}\bigr\Vert =O\bigl(\Vert d_{k}\Vert \bigr).$$

Then we get the following lemmas.

### Lemma 3.1

$$\vert \mathit{Ared}_{k}(d_{k})-\mathit{Pred}_{k}(d_{k})\vert =O(\Vert d_{k}\Vert ^{2})$$.

### Proof

By (8) and (9), we have
\begin{aligned} \bigl\vert \mathit{Ared}_{k}(d_{k})-\mathit{Pred}_{k}(d_{k}) \bigr\vert =&\bigl\vert q_{k}(d_{k})-f(x_{k}+d_{k}) \bigr\vert \\ =&\frac{1}{2}\bigl\vert \Vert F_{k}+\gamma_{k}d_{k} \Vert ^{2}-\bigl\Vert F_{k}+J_{k}d_{k}+O \bigl(\Vert d_{k}\Vert ^{2}\bigr)\bigr\Vert ^{2}\bigr\vert \\ =&\bigl\vert \gamma_{k}F_{k}^{T}d_{k}-F_{k}^{T}J_{k}d_{k}+O \bigl(\Vert d_{k}\Vert ^{2}\bigr)+O\bigl(\Vert d_{k}\Vert ^{4}\bigr)\bigr\vert \\ \leq&\bigl\Vert [\gamma_{k}I-J_{k}]^{T}F_{k} \bigr\Vert \Vert d_{k}\Vert +O\bigl(\Vert d_{k} \Vert ^{2}\bigr) \\ =&O\bigl(\Vert d_{k}\Vert ^{2}\bigr). \end{aligned}
This completes the proof. □

Similar to Zhang and Wang , or Yuan et al. , we obtain the following result.

### Lemma 3.2

If $$d_{k}$$ is a solution of (5), then
$$\mathit{Pred}_{k}(d_{k}) \geq\frac{1}{2} \Vert \gamma_{k} F_{k}\Vert \min \biggl\{ \Delta_{k}, \frac{\Vert F_{k}\Vert }{\vert \gamma_{k}\vert } \biggr\} .$$
(11)

### Proof

Since $$d_{k}$$ is a solution of (5), for any $$\alpha\in[0,1]$$, it follows that
\begin{aligned} \mathit{Pred}_{k}(d_{k}) =& \frac{1}{2} \bigl(\Vert F_{k}\Vert ^{2} - \Vert F_{k} + \gamma_{k}d_{k}\Vert ^{2} \bigr) \\ \geq& \frac{1}{2} \biggl( \Vert F_{k}\Vert ^{2} - \biggl\Vert F_{k} -\gamma_{k}\frac{\alpha \Delta_{k}}{\Vert \gamma_{k} F_{k}\Vert } \gamma_{k} F_{k}\biggr\Vert ^{2} \biggr) \\ =& \alpha\Delta_{k} \Vert \gamma_{k} F_{k} \Vert -\frac{1}{2}\alpha^{2}\Delta _{k}^{2} \gamma_{k}^{2}. \end{aligned}
(12)
Then we have
\begin{aligned} \mathit{Pred}_{k}(d_{k}) \geq& \max_{0\leq\alpha\leq1} \biggl[ \alpha\Delta_{k} \Vert \gamma_{k} F_{k} \Vert -\frac{1}{2}\alpha^{2}\Delta_{k}^{2} \gamma_{k}^{2} \biggr] \\ \geq& \frac{1}{2}\Vert \gamma_{k} F_{k}\Vert \min \biggl\{ \Delta_{k}, \frac{\Vert F_{k}\Vert }{\vert \gamma_{k}\vert } \biggr\} . \end{aligned}
(13)
The proof is complete. □

### Lemma 3.3

Algorithm  1 does not circle between Step 2 and Step 3 infinitely.

### Proof

If Algorithm 1 circles between Step 2 and Step 3 infinitely, then for all $$i=1,2,\ldots$$ , we have $$x_{k+i}=x_{k}$$, and $$\Vert F_{k}\Vert > \epsilon$$, which implies that $$r_{k} < \eta_{1}$$, $$\Delta_{k}\to0$$.

By Lemmas 3.1 and 3.2, we have
$$\vert r_{k}-1\vert =\frac{\vert \mathit{Ared}_{k}(d_{k})-\mathit{Pred}_{k}(d_{k})\vert }{\vert \mathit{Pred}_{k}(d_{k})\vert } \le\frac{2O(\Vert d_{k}\Vert ^{2})}{\Delta_{k} \Vert \gamma_{k} F_{k}\Vert } \to0.$$
(14)
Therefore, for k sufficiently large
$$r_{k} \ge\eta_{1},$$
(15)
this contradicts the fact that $$r_{k}<\eta_{1}$$. □

### Lemma 3.4

Let Assumption  A hold and $$\{x_{k}\}$$ be generated by Algorithm  1, then $$\{x_{k}\}\subset\Omega$$. Moreover, $$\{ f(x_{k})\}$$ converges.

### Proof

By the definition of Algorithm 1, we have
$$r_{k} \ge\eta_{1} >0.$$
(16)
This implies
$$f (x_{k+1}) \le f (x_{k}) \le\cdots\le f(x_{0}).$$
Therefore, $$\{x_{k}\}\subset\Omega$$. According to $$f(x_{k})\ge0$$, we know that $$\{f(x_{k})\}$$ converges. □

The following theorem shows that Algorithm 1 is global convergent under the conditions of Assumption A.

### Theorem 3.5

Let Assumption  A hold, $$\{x_{k}\}$$ be generated by Algorithm  1. Then the algorithm either stops finitely or generates an infinite sequence $$\{x_{k}\}$$ such that
$$\lim_{k\to\infty} \Vert F_{k}\Vert =0.$$
(17)

### Proof

Assume that Algorithm 1 does not stop after finite steps. Now we suppose that (17) does not hold, then there exist a constant $$\varepsilon> 0$$ and a subsequence $$\{k_{j}\}$$ satisfying
$$\Vert F_{k_{j}}\Vert \ge\varepsilon.$$
(18)
Let $$K=\{k\vert \Vert F_{k}\Vert \ge\varepsilon\}$$.
Let $$S_{0} =\{k \vert r_{k} \geq\eta_{2} \}$$. Using Algorithm 1 and Lemma 3.2, we have
$$\sum_{k \in S_{0}}\bigl[f(x_{k})-f(x_{k+1}) \bigr]\ge\sum_{k\in S_{0}}\eta_{2} \cdot \mathit{Pred}_{k}(d_{k}) \ge\sum_{k \in K} \eta_{2} \cdot\frac{\varepsilon \vert \gamma _{k}\vert }{2} \min \biggl\{ \Delta_{k}, \frac{ \varepsilon}{ \vert \gamma_{k}\vert } \biggr\} .$$
By Lemma 3.4, we know that $$\{f(x_{k})\}$$ is convergent, then
$$\sum_{k \in S_{0}} \eta_{2} \cdot \frac{\varepsilon \vert \gamma_{k}\vert }{2} \min \biggl\{ \Delta_{k},\frac{ \varepsilon}{ \vert \gamma_{k}\vert } \biggr\} < \infty.$$
Thus, we have
$$\sum_{k\in S_{0}} \Delta_{k} < \infty.$$
(19)
From Steps 3-4 of Algorithm 1 it follows that
$$\Delta_{k+1}\leq\Delta_{k},$$
(20)
for all $$k\notin S_{0}$$, thus (19) means
$$\sum_{k\in K} \Delta_{k} < \infty.$$
(21)
Therefore there exists $$x^{*}$$ such that
$$\lim_{k\rightarrow\infty} x_{k} = x^{*}.$$
(22)
By (21), we have $$\Delta_{k}\rightarrow0$$, which implies
$$\mathit{Pred}_{k}(d_{k}) \geq\frac{\varepsilon \vert \gamma_{k}\vert }{2} \Delta_{k}$$
for all sufficiently large k. The fact that $$\vert \mathit{Ared}_{k}(d_{k}) - \mathit{Pred}_{k}(d_{k})\vert = O(\Vert d_{k}\Vert ^{2})$$ indicates that
$$\lim_{k\rightarrow\infty} r_{k} = 1,$$
which shows that, for sufficiently large k and $$k\in K$$,
$$\Delta_{k+1} \geq\Delta_{k}.$$
The above inequality contradicts (20). Thus, the conclusion follows. □

## 4 Numerical experiments

In this section, the recent spectral method in  is called Algorithm 2. We report results of some numerical experiments of Algorithms 1 and 2. We choose 14 test functions as follows (see [4, 6, 17]).

### Function 1

The trigonometric function
$$f_{i}(x)=n-\sum_{j=1}^{n}{\cos x_{j}}+i(1-\cos x_{i})-\sin x_{i},\quad i=1,2, \ldots,n.$$
Initial guess: $$x_{0}=-(\frac{1}{n},\ldots,\frac{1}{n})^{T}$$.

### Function 2

The discretized two-point boundary value problem
$$F(x)=Ax+\Phi(x),$$
when A is the $$n\times n$$ tridiagonal matrix given by
$$A=\left ( \textstyle\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} 8&-1 \\ -1&8&-1 \\ &\ddots&\ddots&\ddots\\ & &-1 &8&-1 \\ & & &-1&8 \end{array}\displaystyle \right ),$$
and $$\Phi=(\Phi_{1}(x),\Phi_{2}(x),\ldots,\Phi_{n}(x))^{T}$$ with $$\Phi _{i}(x)=\sin x_{i}-1$$, $$i=1,2,\ldots,n$$.

Initial guess: $$x_{0}=(50,0,\ldots,50,0)$$.

### Function 3

The Broyden tridiagonal function
\begin{aligned}& f_{i}(x)=(3-2x_{i})x_{i}-x_{i-1}-2x_{i+1}+1,\quad i=1,2,\ldots,n, \\& x_{0}=x_{n+1}=0. \end{aligned}
Initial guess: $$x_{0}=(-1,\ldots,-1)^{T}$$.

### Function 4

The Broyden banded function
\begin{aligned}& f_{i}(x)=x_{i}\bigl(2+5x_{i}^{2} \bigr)+1-\sum_{j\in J_{i}}x_{j}(1+x_{j}),\quad i=1,\ldots,n, \\& J_{i}=\bigl\{ j: j\neq i,\max(1,i-m_{l})\leq j \leq \min(n,i+m_{u})\bigr\} , \quad m_{l}=5, m_{l}=1. \end{aligned}
Initial guess: $$x_{0}=(-1,\ldots,-1)^{T}$$.

### Function 5

The variable dimensioned function
\begin{aligned}& f_{i}(x)=x_{i}-1,\quad i=1,2,\ldots,n-2, \\& f_{n-1}(x)=\sum_{j=1}^{n-2}j(x_{j}-1), \\& f_{n}(x)= \Biggl( \sum_{j=1}^{n-2}j(x_{j}-1) \Biggr)^{2}. \end{aligned}
Initial guess: $$x_{0}=(1-\frac{1}{n},1-\frac{2}{n},\ldots,0)^{T}$$.

### Function 6

The discrete boundary value function
\begin{aligned}& f_{1}(x)=2x_{1}+0.5h^{2}(x_{1}+h+1)^{3}-x_{2}, \\& f_{i}(x)=2x_{i}+0.5h^{2}(x_{i}+ih+1)^{3}-x_{i-1}+x_{i+1},\quad i=2,3, \ldots,n-1, \\& f_{n}(x)=2x_{n}+0.5h^{2}(x_{n}+nh+1)^{3}-x_{i-1}, \\& h=\frac{1}{n+1}. \end{aligned}
Initial guess: $$x_{0}=(h(h-1),h(2h-1),\ldots,h(nh-1))^{T}$$.

### Function 7

The logarithmic function
$$f_{i}(x)=\ln(x_{i}+1)-\frac{x_{i}}{n}, \quad i=1,2,3, \ldots,n.$$
Initial guess: $$x_{0}=(1,1,\ldots,1)^{T}$$.

### Function 8

The strictly convex function
$$f_{i}(x)=e^{x_{i}}-1, \quad i=1,2,3,\ldots,n.$$
Initial guess: $$x_{0}=(\frac{1}{n},\frac{2}{n},\ldots,1)^{T}$$.

### Function 9

The exponential function
\begin{aligned}& f_{1}(x)=e^{x_{1}-1}-1, \\& f_{i}(x)=i\bigl(e^{x_{i}-1}-x_{i}\bigr), \quad i=2,3, \ldots,n. \end{aligned}
Initial guess: $$x_{0}=(\frac{n}{n-1},\frac{n}{n-1},\ldots,\frac {n}{n-1})^{T}$$.

### Function 10

The extended Rosenbrock function (n is even). For $$i=1,2,\ldots,n/2$$,
\begin{aligned}& f_{2i-1}(x)=10\bigl(x_{2i}-x_{2i-1}^{2} \bigr), \\& f_{2i}(x)=1-x_{2i-1} . \end{aligned}
Initial guess: $$x_{0}=(-1.2,1,\ldots,-1.2,1)^{T}$$.

### Function 11

The singular function
\begin{aligned}& f_{1}(x)=\frac{1}{3}x_{1}^{3}+ \frac{1}{2}x_{2}^{2}, \\& f_{i}(x)=-\frac{1}{2}x_{i}^{2}+ \frac{i}{3}x_{i}^{3}+\frac {1}{2}x_{i+1}^{2},\quad i=2,3, \ldots,n-1, \\& f_{n}(x)=-\frac{1}{2}x_{n}^{2}+ \frac{n}{3}x_{n}^{3}. \end{aligned}
Initial guess: $$x_{0}=(1,1,\ldots,1)^{T}$$.

### Function 12

The trigexp function
\begin{aligned}& f_{1}(x)=3x_{1}^{3}+2x_{2}-5+ \sin(x_{1}-x_{2})\sin(x_{1}+x_{2}), \\& f_{i}(x)=-x_{i-1}e^{x_{i-1}-x_{i}}+x_{i} \bigl(4+3x_{i}^{2}\bigr)+2x_{i+1}+\sin (x_{i}-x_{i+1})\sin(x_{i}+x_{i+1})-8, \\& \quad i=2,3,\ldots,n-1, \\& f_{n}(x)=-x_{n-1}e^{x_{n-1}-x_{n}}+4x_{n}-3. \end{aligned}
Initial guess: $$x_{0}=(0,0,\ldots,0)^{T}$$.

### Function 13

The extended Freudentein and Roth function (n is even). For $$i=1,2,\ldots,n/2$$,
\begin{aligned}& f_{2i-1}(x)=x_{2i-1}+\bigl((5-x_{2i})x_{2i}-2 \bigr)x_{2i}-13, \\& f_{2i}(x)=x_{2i-1}+\bigl((1+x_{2i})x_{2i}-14 \bigr)x_{2i}-29 . \end{aligned}
Initial guess: $$x_{0}=(6,3,\ldots,6,3)^{T}$$.

### Function 14

The Troech problem
\begin{aligned}& f_{1}(x)=2x_{1}+\varrho h^{2} \sin h(\varrho x_{1})-x_{2}, \\& f_{i}(x)=2x_{i}+\varrho h^{2} \sin h(\varrho x_{i})-x_{i-1}-x_{i+1},\quad i=2,3,\ldots,n-1, \\& f_{n}(x)=2x_{n}+\varrho h^{2} \sin h(\varrho x_{n})-x_{n-1}, \\& h=\frac{1}{n+1},\quad\quad \varrho=10. \end{aligned}
Initial guess: $$x_{0}=(0,0,\ldots,0)^{T}$$.

In the experiments, the parameters are chosen as $$\Delta_{0}=1$$, $$\bar {\Delta}=10$$, $$\epsilon=10^{-5}$$, $$\eta_{1}=0.001$$, $$\eta_{2}=0.75$$, $$\beta_{1} =0.5$$, $$\beta_{2} =2.0$$, $$M=10$$, $$\eta_{k}=1/(k+1)^{2}$$, $$\alpha=0.5$$, where ϵ is the stop criterion. The program is also stopped if the iteration number is larger than 5,000. We obtain $$d_{k}$$ by (5) from the Dogleg method in . The program is coded in MATLAB 2009a.

To show the performance of two algorithms, we use the performance profile proposed by Dolan and Moré . The dimensions of 14 test functions are 100, 1,000, 10,000. According to the numerical results, we plot two figures based on the total number of iterations and the CPU time, respectively.

Figure 1 shows that our algorithm is slightly better than Algorithm 2 on the total number of iterations for $$n=100$$. Figures 2 and 3 indicate that two algorithms have no large discrepancies on the total number of iterations for $$n=1{,}000,10{,}000$$. From Figures 4-6, it is easy to see that our algorithm performs better than Algorithm 2 does on the CPU time for 14 test problems. Preliminary numerical results show that the performance of our algorithm is notable. Figure 1 Performance profiles of the total number of iterations of two algorithms ( $$\pmb{n=100}$$ ). Figure 2 Performance profiles of the total number of iterations of two algorithms ( $$\pmb{n=1{,}000}$$ ). Figure 3 Performance profiles of the total number of iterations of two algorithms ( $$\pmb{n=10{,}000}$$ ). Figure 4 Performance profiles of the CPU time of two algorithms ( $$\pmb{n=100}$$ ). Figure 5 Performance profiles of the CPU time of two algorithms ( $$\pmb{n=1{,}000}$$ ). Figure 6 Performance profiles of the CPU time of two algorithms ( $$\pmb{n=10{,}000}$$ ).

## Declarations

### Acknowledgements

We thank the reviewers and the editors for their valuable suggestions and comments which improve this paper greatly. This work is supported by the Science and Technology Foundation of the Department of Education of Hubei Province (D20152701) and the Foundations of Education Department of Anhui Province (KJ2016A651; 2014jyxm161). 