A conjugate gradient algorithm for large-scale unconstrained optimization problems and nonlinear equations

For large-scale unconstrained optimization problems and nonlinear equations, we propose a new three-term conjugate gradient algorithm under the Yuan–Wei–Lu line search technique. It combines the steepest descent method with the famous conjugate gradient algorithm, which utilizes both the relevant function trait and the current point feature. It possesses the following properties: (i) the search direction has a sufficient descent feature and a trust region trait, and (ii) the proposed algorithm globally converges. Numerical results prove that the proposed algorithm is perfect compared with other similar optimization algorithms.


Introduction
It is well known that the model of small-and medium-scale smooth functions is simple since it has many optimization algorithms, such as Newton, quasi-Newton, and bundle algorithms. Note that three algorithms fail to effectively address large-scale optimization problems because they need to store and calculate relevant matrices, whereas the conjugate gradient algorithm is successful because of its simplicity and efficiency.
The optimization model is an important mathematic problem since it has been applied to various fields such as economics, engineering, and physics (see [1][2][3][4][5][6][7][8][9][10][11][12]). Fletcher and Reeves [13] successfully address large-scale unconstrained optimization problems on the basis of the conjugate gradient algorithm and obtained amazing achievements. The conjugate gradient algorithm is increasingly famous because of its simplicity and low requirement of calculation machine. In general, a good conjugate gradient algorithm optimization algorithm includes a good conjugate gradient direction and an inexact line search technique (see [14][15][16][17][18]). At present, the conjugate gradient algorithm is mostly applied to smooth optimization problems, and thus, in this paper, we propose a modified LS conjugate gradient algorithm to solve large-scale nonlinear equations and smooth problems. The common algorithms of addressing nonlinear equations include Newton and quasi-Newton methods (see [19][20][21]), gradient-based, CG methods (see [22][23][24]), trust region methods (see [25][26][27]), and derivative-free methods (see [28]), and all of them fail to address large-scale problems. The famous optimization algorithms of spectral gradient approach, limited-memory quasi-Newton method and conjugate gradient algorithm, are suitable to solve large-scale problems. Li and Li [29] proposed various algorithms on the basis of modified PRP conjugate gradient, which successfully solve large-scale nonlinear equations.
A famous mathematic model is given by where f : n → and f ∈ C 2 . The relevant model is widely used in life and production. However, it is a complex mathematic model since it needs to meet various conditions in the field [30][31][32][33]. Experts and scholars have conducted numerous in-depth studies and have made some significant achievements (see [14,34,35]). It is well known that the steepest descent algorithm is perfect since it is simple and its computational and memory requirements are low. It is regrettable that the steepest descent method sometimes fails to solve problems due to the "sawtooth phenomenon". To overcome this flaw, experts and scholars presented an efficient conjugate gradient method, which provides high performance with a simple form. In general, the mathematical formula for (1.1) is where x k+1 is the next iteration point, α k is the step length, and d k is the search direction. The famous weak Wolfe-Powell (WWP) line search technique is determined by where ϕ ∈ (0, 1/2), α k > 0, and ρ ∈ (ϕ, 1). The direction d k+1 is often defined by the formula where β k ∈ . An increasing number of efficient conjugate gradient algorithms have been proposed by different expressions of β k and d k (see [13,[36][37][38][39][40][41][42] etc.). The well-known PRP algorithm is given by where g k , g k+1 , and f k denote g(x k ), g(x k+1 ), and f (x k ), respectively; g k+1 = g(x k+1 ) = ∇f (x k+1 ) is the gradient function at the point x k+1 . It is well known that the PRP algorithm is efficient but has shortcomings, as it does not possess global convergence under the WWP line search technique. To solve this complex problem, Yuan, Wei, and Lu [43] developed the following creative formula (YWL) for the normal WWP line search technique and obtained many fruitful theories: and where ι ∈ (0, 1 2 ), α k > 0, ι 1 ∈ (0, ι), and τ ∈ (ι, 1). Further work can be found in [24]. Based on the innovation of YWL line search technique, Yuan pay much attention to normal Armijo line search technique and make further study. They proposed an efficient modified Armijo line search technique: where λ, γ ∈ (0, 1), λ 1 ∈ (0, λ), and α k is the largest number of {γ k |k = 0, 1, 2, . . .}. In addition, experts and scholars pay much attention to the three-term conjugate gradient formula. Zhang et al. [44] proposed the famous formula (1.10) Nazareth [45] proposed the new formula where y k = g k+1g k and s k = x k+1x k . These two conjugate gradient methods have a sufficient descent property but fail to have the trust region feature. To improve these methods, Yuan et al. [46,47] make a further study and get some good results. This inspires us to continue the study and extend the conjugate gradient methods to get better results. In this paper, motivated by in-depth discussions, we express a modified conjugate gradient algorithm, which has the following properties: • The search direction has a sufficient descent feature and a trust region trait.
• Under mild assumptions, the proposed algorithm possesses the global convergence.
• The new algorithm combines the steepest descent method with the conjugate gradient algorithm. • Numerical results prove that it is perfect compared to other similar algorithms.
The rest of the paper is organized as follows. The next section presents the necessary properties of the proposed algorithm. The global convergence is stated in Sect. 3. In Sect. 4, we report the corresponding numerical results. In Sect. 5, we introduce the large-scale nonlinear equations and express the new algorithm. Some necessary properties are listed in Sect. 6. The numerical results are reported in Sect. 7. Without loss of generality, f (x k ) and f (x k+1 ) are replaced by f k and f k+1 , and · is the Euclidean norm.
Step 4: Set a new iteration point of x k+1 = x k + α k d k .
Step 6: If g k+1 ≤ ε, then stop. Otherwise, go to the next step.

New modified conjugate gradient algorithm
Experts and scholars have conducted thorough research on the conjugate gradient algorithm and have obtained rich theoretical achievements. In light of the previous work by experts on the conjugate gradient algorithm, a sufficient descent feature is necessary for the global convergence. Thus, we express a new conjugate gradient algorithm under the YWL line search technique as follows: g k 2 g k , and η i > 0 (i = 1, 2, 3, 4, 5). The search direction is well defined, and its properties are stated in the next section. Now, we introduce a new conjugate gradient algorithm called Algorithm 2.1.

Important characteristics
This section lists some important properties of sufficient descent, the trust region, and the global convergence of Algorithm 2.1. It expresses the necessary proof. and Proof It is obvious that formulas of (3.1) and (3.2) are true for k = 0. Now consider the condition k ≥ 1. Similarly to (2.1), we have Thus, the statement is proved.
Similarly to (3.1) and (3.2), the algorithm has a sufficient descent feature and a trust region trait. To obtain the global convergence, we propose the following necessary assumptions.
The objective function f ∈ C 2 is bounded from below, and its gradient function g is Lipschitz continuous, thats is, there exists a constant ζ such that The existence and necessity of the step length α k are established in [43]. In view of the discussion and established technique, the global convergence of the proposed algorithm is expressed as follows.
Summing these inequalities from k = 0 to ∞, under Assumption (ii), we obtain ∞ k=0 This means that Similarly to (1.8) and (3.1), we obtain Thus, we obtain the following inequality: where the last inequality is obtained since the gradient function is Lipschitz continuous. Then, we have By (3.6) we arrive at the conclusion lim k→∞ g k 2 = 0, as claimed.

Numerical results
In this section, we list the numerical result in terms of the algorithm characteristics NI, NFG, and CPU, where NI is the total iteration number, NFG is the sum of the calculation frequency of the objective function and gradient function, and CPU is the calculation time in seconds.

Problems and test experiments
The tested problems listed in Table 1 stem from [48]. At the same time, we introduce two different algorithms into this section to measure the objective algorithm efficiency through the tested problems. We denote the two algorithms as Algorithm 2 and Algorithm 3. They are different from Algorithm 2.1 only at Step 5. One is determined by (1.10), and the other is computed by (1.11).
The algorithm stops when one of the following conditions is satisfied: g(x) < , the iteration number is greater than 2000, or stop1 < e 2 , where e 1 = e 2 = 10 -5 and = 10 -6 . In Table 1, "No" and "problem" represent the index of the the tested problems and the name of the problem, respectively.
Other case: To save the paper space, we only list the data of dimension of 9000, and the remaining data are listed in the attachment.

Results and discussion
Obviously, the objective algorithm (Algorithm 2.1) is more effective than the other algorithms since the point value on the algorithm curve is largest among the three curves. In Fig. 1, the proposed algorithm curve is above the other curves. This means that the objective algorithm solves complex problems with fewer iterations, and Algorithm 3 is better than Algorithm 2. In Fig. 2, we obtain that the proposed algorithm has a large initial point, which means that it has high efficiency and its curve seems smoother than others. It is well known that the most important metric of an algorithm is the calculation time (CPU time), which is an essential aspect to measure the efficiency of an algorithm. Based on Fig. 3, the   objective algorithm successfully fully utilizes its outstanding characteristics. Therefore, it saves time compared to the other algorithms in addressing complex problems.

Nonlinear equations
The model of nonlinear equations is given by where the function of h is continuously differentiable and monotonous, and x ∈ R n , that is, Scholars and writers paid much attention to this model since it significantly influences various fields such as physics and computer technology (see [1][2][3][8][9][10][11]), and it has resulted in many fruitful theories and good techniques (see [47,[50][51][52][53][54]). By mathematical calculations we obtain that (5.1) is equivalent to the model , and · is the Euclidean norm. Then, we pay much attention to the mathematical model (5.2) since (5.1) and (5.2) have the same solution. In general, the mathematical formula for (5.2) is x k+1 = x k + α k d k . Now, we introduce the following famous line search technique into this paper [47,55]: where α k = max{s, sρ, sρ 2 , . . .}, s, ρ > 0, ρ ∈ (0, 1), and σ > 0. Solodov [56] proposes a projection proximal point algorithm in a Hilbert space that finds the zeros of set-valued maximal monotone operators. Ceng and Yao [57][58][59][60] paid much attention to the research in Hilbert spaces and obtained successful achievements. Solodov and Svaiter [61] applied the projection technique to large-scale nonlinear equations and obtained some ideal achievements. For the projection-based technique, the famous formula The search direction is extremely important for the proposed algorithm since it largely determines the efficiency. Likewise, the algorithm contains the perfect line search technique. By the monotonicity of h(x) we obtain
Step 3: Find the step length α k similar to (5.3).
Step 4: Reset the new iteration point of w k = x k + α k d k .
Step 6: Let k := k + 1 and return to Step 2. where x * is the solution of h(x * ) = 0. We consider the hyperplane It is obvious that the hyperplane separates the current iteration point of x k from the zeros of the mathematical model (5.1). Then, we need to calculate the next iteration point x k+1 through projection of current point x k . Therefore, we give the following formula for the next point: In [55], it is proved that formula (5.5) is effective since it not only obtains perfect numerical results but also has perfect theoretical characteristics. Thus, we introduce it here. The formula of the search direction d k+1 is given by 1, 2, 3). Now, we express the specific content of the proposed algorithm.

The global convergence of Algorithm 5.1
First, we make the following necessary assumptions.

Assumption 2
(i) The objective model of (5.1) has a nonempty solution set.
(ii) The function h is Lipschitz continuous on R n , which means that there is a positive constant L such that h(x)h(y) ≤ L xy , ∀x, y ∈ R n . (6.1)

By Assumption 2(ii) it is obvious that
where θ is a positive constant. Then, the necessary properties of the search direction are the following (we omit the proof ): and Now, we give some lemmas, which we utilize to obtain the global convergence of the proposed algorithm. . We obtain that the formula x k+1x k 2 < ∞.
This paper merely proposes, but omits, the relevant proof since it is similar to the proof in [61]. Proof We denote = N ∪ {0}. We suppose that Algorithm 5.1 has terminated or the formula h k → 0 is erroneous. This means that there exists a constant ε * such that h k ≥ ε * , k ∈ . (6.5) We prove this conclusion by contradiction. Suppose that certain iteration indexes k * fail to meet the condition (5.3) of the line search technique. Without loss of generality, we denote the corresponding step length as α (l) k * , where α (l) k * = ρ l s. This means that By (6.3) and Assumption 2(ii) we obtain

By (6.3) and (6.4) we have
By (6.6) we obtain It is obvious that this formula fails to meet the definition of the step length α (l) k * . Thus, we conclude that the proposed line search technique is reasonable and necessary. In other words, the line search technique generates a positive constant α k in a finite frequency of backtracking repetitions. By the established conclusion we propose the following theorem on the global convergence of the proposed algorithm. Proof We prove this by contradiction. This means that there exist a constant ε 0 > 0 and an index k 0 such that On the one hand, by (6.2) and (6.4) we obtain On the other hand, from (6.3) we have These inequalities indicate that the sequence of {d k } is bounded. This means that there exist an accumulation point d * and the corresponding infinite set N 1 such that By Lemma 6.1 we obtain that the sequence of {x k } is bounded. Thus, there exist an infinite index set N 2 ⊂ N 1 and an accumulation point x * that meet the formula By Lemmas 6.1 and 6.2 we obtain Since {d k } is bounded, we obtain By the definition of α k we obtain the following inequality: where α * k = α k /ρ. Now, we take the limit on both sides of (6.10) and (6.3) and obtain h x * T d * > 0 and h x * T d * ≤ 0.
The obtained contradiction completes the proof.

The results of nonlinear equations
In this section, we list the relevant numerical results of nonlinear equations and present the objective function h(x) = (f 1 (x), f 2 (x), . . . , f n (x)), where the relevant functions' information is listed in Table 1.

Problems and test experiments
To measure the efficiency of the proposed algorithm, in this section, we compare this method with (1.10) (as Algorithm 6) using three characteristics "NI", "NG", and "CPU" and the remind that Algorithm 6 is identical to Algorithm 5.1. "NI" presents the number of iterations, "NG" is the calculation frequency of the function, and "CPU" is the time of the process in addressing the tested problems. In Table 1, "No" and "problem" express the indices and the names of the test problems.
Stopping rule: If g k ≤ ε or the whole iteration number is greater than 2000, the algorithm stops.

11
Linear function-full rank 12 Penalty function 13 Variable dimensioned function 14 Extended Powel singular function 15 Tridiagonal system 16 Five-diagonal system 17 Extended Freudentein and Roth function 18 Extended Wood problem 19 Discrete boundary value problem The numerical results with the corresponding problem index are listed in Table 4. Then, by the technique in [49], the plots of the corresponding figures are presented for two discussed algorithms.

Results and discussion
From the above figures, we safely arrive at the conclusion that the proposed algorithm is perfect compared to similar optimization methods since the algorithm (1.10) is perfect to a large extent. In Fig. 4 we see that the proposed algorithm quickly arrives at a value of 1.0, whereas the left one slowly approaches 1.0. This means that the objective method is successful and efficient for addressing complex problems in our life and work. It is well known that the calculation time is one of the most essential characteristics in an evaluation index of the efficiency of an algorithm. From Figs. 5 and 6, it is obvious that the two algorithms are good since their corresponding point values arrive at 1.0. This result expresses that the above two algorithms solve all of the tested problems and that the proposed algorithm is efficient.

Conclusion
This paper focuses on the three-term conjugate gradient algorithms and use them to solve the optimization problems and the nonlinear equations. The given method has some good properties.
(i) The proposed three-term conjugate gradient formula possesses the sufficient descent property and the trust region feature without any conditions. The sufficient descent property can make the objective function value be descent, and then the iteration sequence {x k } converges to the global limit point. Moreover, the trust region is good for the proof of the presented algorithm to be easily turned out. (ii) The given algorithm can be used for not only the normal unstrained optimization problems but also for the nonlinear equations. Both algorithms for these two problems have the global convergence under general conditions. (iii) Large-scale problems are done by the given problems, which shows that the new algorithms are very effective.