A three-term conjugate gradient descent method with some applications

The stationary point of optimization problems can be obtained via conjugate gradient (CG) methods without the second derivative. Many researchers have used this method to solve applications in various ﬁelds, such as neural networks and image restoration. In this study, we construct a three-term CG method that fulﬁlls convergence analysis and a descent property. Next, in the second term, we employ a Hestenses-Stiefel CG formula with some restrictions to be positive. The third term includes a negative gradient used as a search direction multiplied by an accelerating expression. We also provide some numerical results collected using a strong Wolfe line search with diﬀerent sigma values over 166 optimization functions from the CUTEr library. The result shows the proposed approach is far more eﬃcient than alternative prevalent CG methods regarding central processing unit (CPU) time, number of iterations, number of function evaluations, and gradient evaluations. Moreover, we present some applications for the proposed three-term search direction in image restoration, and we compare the results with well-known CG methods with respect to the number of iterations, CPU time, as well as root-mean-square error (RMSE). Finally, we present three applications in regression analysis, image restoration


Introduction
In order to determine the stationary point of optimization problems, the nonlinear conjugate gradient (CG) method does not necessitate the second derivative or its approximation.Here, the form we consider in the present investigation is as follows: where f : R n → R as well as the gradient g(x) = ∇f (x) is available.The following is how iterative approaches are usually applied to solve (1).
where α k is obtained by an exact or inexact line search.Moreover, an inexact line search, for instance, a strong Wolfe-Powell (SWP) line [1,2], is commonly used and may be expressed as the following: and A weak Wolfe-Powell (WWP) line search is as given by Equation (3) and with 0 < δ < σ < 1.
The following expresses the search direction, d k pertaining to two terms where g k = g(x k ), while β k resembles the CG parameter.Here, the most well-known CG parameters are divided into two groups, the first of which is an efficient group defined as follows, which includes the Hestenses-Stiefel (HS) [3], Polak-Ribière-Polyak (PRP) [4], as well as Liu and Storey (LS) [5] methods.
, where y k-1 = g kg k-1 .However, this group encounters a convergence problem if their values become negative [6].In contrast, the second group is inefficient and exhibits strong global convergence.This category includes the Fletcher-Reeves (FR) [7], Fletcher (CD) [8], and Dai and Yuan (DY) [9] approaches, as defined by the following equations. .
The subsequent conjugacy condition was put forth by Dai and Liao [10].
where s k-1 = x kx k-1 and t ≥ 0. Pertaining to t = 0, the classical conjugacy condition is then expressed as Equation ( 8) becomes the classical conjugacy condition.They also presented the CG formula below [10], utilizing (6) and (7).
where 8), many researchers have suggested the three-term CG methods given below.Let the following equation represent the general form with regard to the three-term CG method: where . We then obtain a wide variety of choices by replacing t in Eq. (10) with an appropriate term, as shown in Table 1.
Liu et al. [17] demonstrated how nonconvex functions may address nonlinear monotone equations if the sufficient descent condition is met.Meanwhile, Liu et al. [18] created the three-term CG methods given below and solved Equation (1) by utilizing it in order to avoid using the condition ( ) > ν ∈ (0, 1).
Yao et al. [19] suggested a three-term CG with the following new choice of t given by t k was also chosen to meet the descent condition like the one below as per the SWP line search.
Another theorem put forth by Yao et al. [19] states that if t k is close to y k 2 y T k s k , then the search direction produces a zigzag search path.Thus, they decided on the option t k given below.
At the beginning of the CG method, a nonnegative CG formula with a new restart property was presented by Alhawarat et al. [20].
where • denotes the Euclidean norm, while μ k can be represented as Similarly, Jiang et al. [21] suggested the CG method given by: To improve the efficiency of prior methods, they constructed a restart criterion given as follows: where 0 < ξ < 1.
Recently, a convex combination between two distinct search directions is presented by Alhawarat et al. [22] as follows: where and The authors selected β ( k (1)) and β ( k (2)) given below: otherwise, and The descent condition, also known as the downhill condition, given by helps research CG methods and is crucial to the validation of global convergence analysis.Al-Baali [23] also utilized the subsequent version of ( 13) to demonstrate the FR method.
where c ∈ (0, 1).Next, the sufficient descent condition is given by Eq. ( 14) below.Moreover, it performs better than (13) because the quantity of g T k d k can be controlled using g k 2 .

Proposed modified search direction (3TCGHS) and motivation
The main motivation for researchers in CG methods is to propose a positive CG method with an efficiency similar to that of PRP or HS, with a global convergence.In the following modification, we utilize the new search direction g k-1 proposed by [17] with β HS k restricted to be nonnegative, as given below: o t h e r w i s e .(12) where . The procedures acquired to determine the optimization function's stationary point are outlined in the following Algorithm 1.

Global convergence properties
The assumption that follows is considered as a condition for the objective function.II.f is a continuous and differentiable function in some neighborhood W of , and its gradient is Lipchitz continuous, meaning that, for every x, y ∈ W , a constant L > 0 exists, in which case As per this assumption, there must be a positive constant η; in this case The CG method's convergence properties are typically established using the following lemma proposed by Zoutendijk [24].The method involves multiple line searches, including SWP and WWP line searches.Lemma 3.1 Let Assumption 1 hold.If α k satisfies the WWP line search with the descent condition (9), then take any form of (2) and (3).Then, the inequality that follows holds.
As can be seen from the following theorem, the new formula fulfills the descent condition (9).

Theorem 3.1 Let the sequences {x k } and {d
Jiang k } be developed by Equations (2) and (12), and consider the line search method obtained using Equations (3) and (4).The sufficient descent condition (11) is then satisfied.
Proof Multiply (12) by g T k to obtain Using a SWP line search, we obtain The proof is now complete.
Theorem 3.2 Let sequence {x k } be generated by Equation (2), where is the step length obtained by SWP line search.Afterwards, condition (11) for a sufficient descent holds.
Proof After multiplying (2) by g T k and substituting This completes the proof.
Gilbert and Nocedal [25] outlined a property known as Property* to perform a specialized role in studies on CG formulas related to the PRP method.The property is described below.
Property* Consider a method of the form (2) and (6), and let We claim that the method contains Property (*) provided that constants b > 1 and λ > 0 exist, where for every k ≥ 1, we acquire The lemma below illustrates that β HS k inherits Property*.The proof is similar to that given by Gilbert and Nocedal [25].
Proof First, given that d k = 0, we can acquire g k = 0 from the sufficient descent condition.Hence, we can assume that d k = 0 and We provide definitions as below: , where Since u k denotes a unit vector, we have By the triangular inequality and δ k ≥ 0, we obtain We now define Utilizing the triangular inequality, we establish Using the SWP Equations ( 4) and ( 5), we can obtain the following two inequalities.
Hence, the inequality in Eq. ( 18) can be expressed in the following way: Then, ν ≤ T. From Eq. ( 17), we have u ku k-1 ≤ 2w.By Eqs. ( 16) and ( 15), we acquire what is presented below This completes the proof.
By Lemmas 4.1 and 4.2 in [10], we are able to obtain the following outcome:

Numerical results and discussions
In this section, we provide some numerical findings to validate the efficiency for the proposed search direction.Details are provided in the Appendix.We used 166 test functions from the CUTEr library [26].The functions can be downloaded in .SIF file format from the URL below.https://www.cuter.rl.ac.uk/Problems/mastsif.shtml We modified the code from CG-Descent 6.8 to implement the proposed search direction and DL+ method.The following website has the code available for download.
https://people.clas.ufl.edu/hager/software/With an AMD A4-7210 CPU and 4 GB of RAM, the host computer was running Ubuntu 20.04 to carry out the necessary computations.We compared the modified search direction d

Application to image restoration
To the original images, we applied Gaussian noise with a standard deviation of 25%.Next, we used 3TCGHS as well as the β DL+ k (Dai-Liao) CG algorithm to restore these images.Take note that we made use of the (Dai-Liao) CG algorithm and 3TCGHS. .
If the descent condition was met; if not, we employed the steepest-descent approach to restart the algorithm.We utilized the root-mean-square error (RMSE) between the restored image and the original true image to assess the quality of the restored image.
The restored image is denoted by ς k and the true image by ς .The RMSE determines the quality of the restored image, in which lower values correspond to higher quality.The criteria for stopping is In this context, ω = 10 -3 .Note that if ω = 10 -4 or ω = 10 -6 , then RMSE remains constant, meaning that a fixed RMSE can vary in the number of iterations.
Table 2 compares 3TCGHS with the Dai-Liao CG algorithm through a series of numerical experiments.The RMSE, CPU time, and the number of iterations are all compared.It may be observed that the 3TCGHS method performed better than Dai-Liao with respect to CPU time, RMSE, and the number of iterations for most experimental tests.
Table 3 shows the outcomes of restoring destroyed images using Algorithm 1, indicating that it can be considered an efficient approach.

Application to a regression problem
Table 4 shows data on the prices and demand for some commodities over several years.
The data is similar to that used by [28].
The relation between x and y is parabolic; thus, the regression function can be defined as follows: where w 0 , w 1 , and w 2 are the regression parameters.We aim to solve the equation given below using the least square method.
This equation is able to be modified to the following unconstrained optimization problem.

Solving system of linear equations in electrical engineering
The main challenge is solving complex systems of linear equations generated from linear circuits with many components.The first CG formula was suggested by Hestenes and Steifel [3] in 1952 to solve the linear equation systems.The linear equation system is presented in the format In the case where the matrix Q is symmetric and positive definite, it may be regarded as a method for resolving a corresponding quadratic function.
To see the similarities between the above equations, differentiate f (x) in relation to x and make the gradient zero.In other words, The following example illustrates using the CG method to solve a linear equation system generated from the circuit.
Example 1 [29,30].Consider the circuit shown in Fig. 6.To create the loop equations, use loop analysis.Then, Algorithm 1 is applied to find the solution for the unknown currents.
Kirchhoff 's Current Law (often abbreviated as KCL) asserts that all currents entering and leaving a node must sum up to zero algebraically.This law describes the flow of charge into and out of a wire junction point or node.The circuit in Fig. 6 has four loops; thus, Kirchhoff 's Loop Equations can be written as follows: where the following is one way to write the system of equations: Thus, we can write the system Qx = b as follows: where Q denotes positive definite and symmetric matrix.Thus, we have the following form: After simple calculations, we compute the following function: Using Algorithm 1, we can find the following solution for Eq. ( 20) x 1 =0,x 2 =-0.5,x 3 =0.5,x 4 =0 , and the function value is f =-2.5 .

Conclusion
We have outlined a three-term CG method in the present research that satisfies both the convergence analysis and the descent condition via an SWP line search.Moreover, we have presented numerical results with different values of sigma, showing that the new search direction strongly outperformed alternative approaches with regard to the number of iterations as well as was very competitive when it came to the number of functions, gradients, and CPU time evaluated.Additionally, we have offered an application of the new search direction of image restoration, regression analysis, and solving linear systems in electrical engineering.Algorithm 1 demonstrates its efficiency in restoring destroyed images from degraded pixel data.In addition, using Algorithm 1 to solve a system of linear equations is easier than other traditional methods.In regression analysis, we found Algorithm 1 useful for obtaining the value of regression parameters.In future research, we intend to utilize CG methods in machine learning, mathematical problems in engineering, and neural networks.

Figure 1 1 Assumption 1
Figure 1 Algorithm 1 Jiang kwith DL+ methods, and we used a SWP line search to acquire the step length with σ = 0.1 and δ = 0.01 for 3TCGHS and DL+ and the previously mentioned approximate Wolfe-Powell line search for CG-Descent.Figures1-4present all outcomes via a performance measure first used by Dolan and More[27].We utilize an SWP line search together with σ = 0.1 and δ = 0.01 for d Jiang k method and DL+.From Figs. 1-4, it may be observed that the new search direction strongly outperformed DL+ in terms of the number of iterations, function evaluation, CPU time, and a number of gradient evaluations.The subsequent big notations are used in the Appendix: No. iter: Number of iterations.No. function: Number of function evaluations No. gradient: Number of gradient evaluations

Figure 2 Figure 3
Figure 2 Graph of the number of iterations

Figure 4 Figure 5
Figure 4 Graph of CPU time

Figure 6
Figure 6 The circuit of Example 1

Table 2
Numerical outcomes from images with Gaussian noise with a 25% standard deviation added to the original images using the Dai-Liao CG method as well as 3TCGHS

Table 3
Restoration of destroyed images of Coins, Cameraman,d Moon, an Baboon by reducing z via Algorithm 1

Table 4
Data on demand and price x:Price ($) y:Demand