- Research
- Open access
- Published:
The global proof of the Polak–Ribière–Polak algorithm under the YWL inexact line search technique
Journal of Inequalities and Applications volume 2019, Article number: 195 (2019)
Abstract
This paper presents a new proof method about the paper (Yuan et al. in Appl. Math. Model. 47:811–825, 2017). In the proof, the global convergence of the Polak–Ribière–Polak algorithm is established without these two assumptions \(d_{k}^{T}g_{k}<0\) and \(g_{k+1}d_{k}\leq -\sigma _{1}g_{k}^{T}d_{k}\) which are needed in the above paper. This means that this paper has the same results under weaker conditions. More dimension functions for practical problems are tested to show the performance of the modified algorithm and the normal algorithm. An application of the fact engineering model is done to show the effectiveness of the given conjugate gradient algorithm.
1 Introduction
Consider
where \(f: \Re ^{n}\rightarrow \Re \) and \(f\in C^{2}\). The Polak–Ribière–Polak (PRP) conjugate gradient (CG) method [16, 17] for (1.1) is designed by the following iterative formula:
where \(x_{k}\) is the kth iterative point, \(\alpha _{k}\) is a stepsize, and \(d_{k}\) is the search direction defined by
where \(g_{k+1}=\nabla f(x_{k+1})\) is the gradient of \(f(x)\) at point \(x_{k+1}, \beta _{k}^{\mathrm{PRP}} \in \Re \) is a scalar defined by
where \(g_{k}=\nabla f(x_{k})\) and \(\|\cdot \|\) denotes the Euclidean norm. The theory analysis and the numerical performance about the PRP method have been done by many scholars (see [2, 3, 17,18,19, 22] etc.), and many modified algorithms based on the normal PRP formula have been proposed to make a great progress ([6, 8,9,10,11,12,13, 20, 21, 23,24,25, 27, 29, 30] etc.). The well-known weak Wolfe–Powell (WWP) inexact line search for \(\alpha _{k}\) satisfies
and
where \(\delta \in (0,1/2)\) and \(\sigma \in (\delta ,1)\). At present, the global convergence of the PRP CG algorithm for nonconvex functions under the WWP line search is a well-known open problem in optimization fields, and the counterexamples of [3, 18] tell us the reason. Motivated by the idea of [3], a modified WWP line search technique is given by Yuan et al. [28] and it is designed by
and
where \(\delta \in (0,1/2)\), \(\delta _{1}\in (0,\delta )\), and \(\sigma \in (\delta ,1)\). Here we call it YWL line search. It is used for not only the PRP method but also the BFGS quasi-Newton method (see [26, 28] in detail). In the case \(\min [-\delta _{1}g(x_{k})^{T}d_{k},\delta \alpha _{k}\|d_{k}\|^{2}]=\delta \alpha _{k}\|d_{k}\|^{2}\), the global convergence of the PRP algorithm is established including the conditions \(d_{k}^{T}g_{k}<0\) and \(g_{k+1}d_{k}\leq -\sigma _{1}g_{k}^{T}d_{k}\). This paper will make a further study and obtain the global convergence similar to [28] without the conditions \(d_{k}^{T}g_{k}<0\) and \(g_{k+1}d_{k}\leq -\sigma _{1}g_{k}^{T}d_{k}\) by another proof way. This paper has the following features:
-
The PRP algorithm for nonconvex functions with the YWL line search has the global convergence.
-
The global convergence is established under weaker conditions than those of the paper [28].
-
Larger scale dimension problems are tested to show the performance of the proposed algorithm.
The next section states the algorithm and the global convergence of the presented algorithm. Section 3 does the experiments including the normal unconstrained optimization and an engineering problem. One conclusion is given in the last section.
2 PRP algorithm and global convergence
The PRP algorithm with the modified WWP line search for nonconvex functions is listed as follows.
Algorithm 1
(The PRP CG algorithm under the YWL line search rule)
- Step 1::
-
Choose an initial point \(x_{1} \in \Re ^{n}\), \(\varepsilon \in (0,1) \delta \in (0,\frac{1}{2})\), \(\delta _{1}\in (0,\delta )\), \(\sigma \in (\delta ,1)\). Set \(d_{1}=-g_{1}=-\nabla f(x_{1})\), \(k:=1\).
- Step 2::
-
If \(\|g_{k}\| \leq \varepsilon \), stop.
- Step 3::
-
Compute the step size \(\alpha _{k}\) using the YWL line search rule (1.6) and (1.7).
- Step 4::
-
Let \(x_{k+1}=x_{k}+\alpha _{k}d_{k}\).
- Step 5::
-
If \(\|g_{k+1}\|\leq \varepsilon \), stop.
- Step 6::
-
Calculate the search direction
$$ d_{k+1}=-g_{k+1}+\beta _{k}^{\mathrm{PRP}} d_{k}. $$(2.1) - Step 7::
-
Set \(k:=k+1\), and go to Step 3.
The normal assumptions for the nonconvex functions are needed as follows.
Assumption i
-
(A)
The defined level set \(L_{0}=\{x\mid f(x) \leq f(x_{0})\}\) is bounded.
-
(B)
Let \(f(x)\) be twice continuously differentiable and bounded below, and the gradient function \(g(x)\) is Lipschitz continuous, namely there exists a constant \(L>0\) satisfying
$$ \bigl\Vert g(x)-g(y) \bigr\Vert \leq L \Vert x-y \Vert ,\quad x, y \in \Re ^{n}. $$(2.2)
Remark
-
(1)
Define a case by Case i: \(\min [-\delta _{1}g(x_{k})^{T}d_{k},\delta \alpha _{k}\|d_{k}\|^{2}]=\delta \alpha _{k}\|d_{k}\|^{2}\). This case means that
$$ -\delta _{1}g(x_{k})^{T}d_{k} \geq \delta \alpha _{k} \Vert d_{k} \Vert ^{2}\geq 0, $$which can ensure that the modified WWP line search (1.6) and (1.7) is reasonable (see Theorem 2.1 in [28]). Then Algorithm 1 is well defined.
-
(2)
In [28], the global convergence of Algorithm 1 is established for Case i, and it needs not only Assumption i conditions but also
$$ d_{k}^{T}g_{k}< 0 $$and
$$ g_{k+1}d_{k}\leq -\sigma _{1}g_{k}^{T}d_{k}. $$In this paper, we will give another proof way only needing Assumption i.
-
(3)
Assumptions i(A) and i(B) imply that there exists a constant \(G^{*}>0\) such that
$$ \bigl\Vert g(x) \bigr\Vert \leq G^{*},\quad x \in L_{0}. $$(2.3)
Lemma 2.1
Let Assumption i hold. If there exists a positive constant \(\epsilon _{*}\) such that
then we can deduce that there exists a constant \(D^{*}\) satisfying
Proof
By (1.6), we get
then the following inequality
holds. Using Assumption i(A) and summing these inequalities from \(k=0\) to ∞, we have
Using Step 6 of Algorithm 1 and setting \(s_{k}=x_{k+1}-x_{k}=\alpha _{k}d_{k}\), we have
where the third inequality follows (2.2) and (2.3), and the last inequality follows (2.4). By the definition of Case i, we get
Thus, by (2.6), we get
Then we have
This implies that there exist a constant \(\varepsilon \in (0,1)\) and a positive integer \(k_{0}\geq 0\) satisfying
So, by (2.7), for all \(k>k_{0}\), we obtain
Let \(\omega ^{*}=\max \{\|d_{1}\|,\|d_{2}\|,\ldots ,\|d_{k_{0}}\|,\frac{G _{b}}{1-\varepsilon }+\|d_{k_{0}}\|\}\). Therefore, we get
The proof is complete. □
Theorem 2.1
Let the conditions of the above lemma hold. Then the following relation
holds.
Proof
Suppose that (2.9) does not hold, we can deduce that there exists a constant \(\epsilon _{*}>0\) such that
Using Lemma 2.1, we get (2.5). By a way similar to (2.6) and using the case \(-\delta _{1}g(x_{k})^{T}d_{k} \geq \delta \alpha _{k}\|d_{k}\|^{2}\), we have
which generates
Then we discuss the above relation by the following cases.
Case 1: \(\|d_{k}\| \rightarrow 0\), \(k\rightarrow \infty \). By (3.1), (2.3), (2.2), and (2.10), we have
Then we get (2.9).
Case 2: \(\alpha _{k} \rightarrow 0\), \(k\rightarrow \infty \). By (1.7), Remark (1), and the Taylor formula, we get
Combining with the case \(-\delta _{1}g(x_{k})^{T}d_{k}\geq \delta \alpha _{k}\|d_{k}\|^{2}\) leads to
So we have
This contracts the case \(\alpha _{k} \rightarrow 0\) (\(k\rightarrow \infty \)). Then we also obtain (2.9). All in all, we always have (2.9). The proof is complete. □
3 Numerical results
In this section, we do the numerical experiments of the given algorithm and the normal PRP algorithm for large scale unconstrained optimization problems and these problems are the same of the paper [28] which are from [1, 7] with the given initial points and are listed in Table 1, where the same results are not given anymore. Furthermore we also do an experiment about the fact engineering problem model by the given algorithm. Now we test them and give the results as follows.
3.1 Normal unconstrained optimization problems
To clearly show the normal PRP algorithm, its detailed steps are presented as follows.
Algorithm 2
(The normal PRP CG algorithm)
- Step 1::
-
Choose an initial point \(x_{1} \in \Re ^{n}\), \(\varepsilon \in (0,1) \delta \in (0,\frac{1}{2})\), \(\sigma \in (\delta ,1)\). Set \(d_{1}=-g_{1}=-\nabla f(x_{1})\), \(k:=1\).
- Step 2::
-
If \(\|g_{k}\| \leq \varepsilon \), stop.
- Step 3::
-
Compute the step size \(\alpha _{k}\) using the WWP line search rule (1.4) and (1.5).
- Step 4::
-
Let \(x_{k+1}=x_{k}+\alpha _{k}d_{k}\).
- Step 5::
-
If \(\|g_{k+1}\|\leq \varepsilon \), stop.
- Step 6::
-
Calculate the search direction
$$ d_{k+1}=-g_{k+1}+\beta _{k}^{\mathrm{PRP}} d_{k}. $$(3.1) - Step 7::
-
Set \(k:=k+1\), and go to Step 3.
The following Himmeblau stop rule and all parameters are the same to those of the paper [28].
Stop rules: If \(| f(x_{k})| > e_{1}\), let \(\mathit{stop}1=\frac{ | f(x_{k})-f(x_{k+1})| }{| f(x_{k})| }\), or \(\mathit{stop}1=| f(x _{k})-f(x_{k+1})| \). If the conditions \(\|g(x)\|< \epsilon \) or \(\mathit{stop} 1 < e_{2}\) hold, the program stops, where \(e_{1}=e_{2}=10^{-5}\), \(\epsilon =10^{-6}\).
Parameters: \(\delta =0.1\), \(\delta _{1}=0.05\), \(\sigma =0.9\).
Dimension: 30,000, 60,000, and 120,000 variables.
Experiments: All the programs were written in MATLAB 7.10 and run on a PC with a 1.80 GHz CPU and 4.00 GB of memory running the Windows 7 operating system.
Other cases: The program is also stopped if the number of iterations is greater than 1200. The step size \(\alpha _{k}\) in the line search is accepted if the search number is greater than 10.
The columns of Table 2 have the following meanings:
-
No.: the number of tested problems. Dim: the dimension of tested problem.
-
Cputime: the CPU time in seconds. NI: the iteration number.
-
NFG: the total number both of the gradient value and the function value.
Numerical results of Table 2 show that both of these two algorithms have a good efficiency for these practical problems. The iteration number, the number of the function value and the gradient value, and the CPU time will increase with the dimension becoming large for most of the problems. However, the CPU time does not become bigger but smaller, such as problems 4, 13, and 14 for Algorithm 2 and problems 1 and 16 for Algorithm 1; the reason may lie in the system of computer. The numerical results indicate that Algorithm 1 is competitive to Algorithm 2 especially for the CPU time for most of the tested problems. To directly show the performance of these two algorithms, the tool of Dolan and Moré [4] is used, and Figs. 1–3 show the profiles of them relative to NI, NFG, and Cputime, respectively. These three figures have the similar trend, then we only analyze Fig. 3 about the CPU time. Figure 3 shows that Algorithm 1 is better than Algorithm 2, Algorithm 1 goes beyond Algorithm 2 about 11%, and Algorithm 1 has perfect robustness comparing with Algorithm 2. In a word, Algorithm 1 provides noticeable advantages.
3.2 Fact engineering problem of the Muskingum model
The subsection studies an application of the presented algorithm for a fact engineering problem, namely the well-known hydrologic engineering application problem often called the parameter estimation problem of the nonlinear Muskingum model. The Muskingum model has the following definition.
Muskingum model [14]: The parameter estimation of the model is designed by
where \(x_{1}\) is the storage time constant, \(x_{2}\) is the weighting factor, and \(x_{3}\) is an additional parameter; at time \(t_{i}\) (\(i=1,2,\ldots,n\)), n denotes the total time number, Δt is the time step, \(I_{i}\) and \(Q_{i}\) are the observed inflow discharge and observed outflow discharge, respectively. The Muskingum model, as a hydrologic routing method, is a popular model for flood routing, whose storage depends on the water inflow and outflow. This subsection uses actual observed data of the flood run-off process between Chenggouwan and Linqing of Nanyunhe in the 8 Haihe Basin, Tianjin, China, where \(\Delta t=12(h)\). The detailed \(I_{i}\) and \(Q_{i}\) of the data of 1960, 1961, and 1964 can be found in [15]. In the numerical experiments, we set the initial point \(x=[0,1,1]^{T}\). The tested results are listed in Table 3.
Figures 4–6 are the data curves of 1960, 1961, and 1964 about the observed flows and computed flows by Algorithm 1 for estimating the parameters of the nonlinear Muskingum model, which shows that the given algorithm has good approximation for these data and Algorithm 1 is effective for the nonlinear Muskingum model. The results of Table 3 and Figs. 4–6 tell us at least two conclusions: (1) Algorithm 1 can be successfully used for solving the nonlinear Muskingum model because of its good approximation; (2) the points \(x_{1}\), \(x_{2}\), and \(x_{3}\) obtained by Algorithm 1 are different from the BFGS method and the HIWO method, which shows that the Muskingum model may be have several optimum approximated points.
4 Conclusion
This paper studies the proof method and proposes a simple proof technique to get the global convergence of the known algorithm in the paper [28]. The following conclusions are obtained by this paper:
-
(1)
This paper gives a new proof method for the paper [28] and gets the same result under weaker conditions. This new proof technique is more simple than those of the paper [28].
-
(2)
More larger-scale dimension problems are done comparing with [28] to show that the given algorithm is competitive to the normal algorithm. The nonlinear Muskingum model coming from the fact engineering problem is done by the given algorithm to estimate its parameters, which demonstrates that Algorithm 1 is very successful.
-
(3)
One interesting question and work is whether there exist some other proof methods to get the global convergence of Algorithm 1, which is one of the works of ours in the future.
References
Bongartz, I., Conn, A.R., Gould, N.I., Toint, P.L.: CUTE: constrained and unconstrained testing environment. ACM Trans. Math. Softw. 21, 123–160 (1995)
Dai, Y.: Analysis of conjugate gradient methods. Ph.D. thesis, Institute of Computational Mathematics and Scientific/Engineering Computing, Chese Academy of Sciences (1997)
Dai, Y.: Convergence properties of the BFGS algorithm. SIAM J. Optim. 13, 693–701 (2003)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Geem, Z.W.: Parameter estimation for the nonlinear Muskingum model using the BFGS technique. J. Hydrol. Eng. 132, 474–478 (2006)
Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 21–42 (1992)
Gould, N.I., Orban, D., Toint, P.L.: CUTEr and SifDec: a constrained and unconstrained testing environment, revised. ACM Trans. Math. Softw. 29, 373–394 (2003)
Hager, W., Zhang, H.: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16, 170–192 (2005)
Hager, W., Zhang, H.: Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. 32, 113–137 (2006)
Mingqi, X., Rădulescu, V.D., Zhang, B.: Combined effects for fractional Schrödinger–Kirchhoff systems with critical nonlinearities. ESAIM Control Optim. Calc. Var. 24, 1249–1273 (2018)
Mingqi, X., Rădulescu, V.D., Zhang, B.: Nonlocal Kirchhoff diffusion problems: local existence and blow-up of solutions. Nonlinearity 31, 3228–3250 (2018)
Mingqi, X., Rădulescu, V.D., Zhang, B.: Fractional Kirchhoff problems with critical Trudinger Moser nonlinearity. Calc. Var. Partial Differ. Equ. 58, 57 (2019). https://doi.org/10.1007/s00526-019-1499-y
Mingqi, X., Rădulescu, V.D., Zhang, B.: A critical fractional Choquard–Kirchhoff problem with magnetic field. Commun. Contemp. Math. 21(4), 185004 (2019)
Ouyang, A., Liu, L., Sheng, Z., Wu, F.: A class of parameter estimation methods for nonlinear Muskingum model using hybrid invasive weed optimization algorithm. Math. Probl. Eng. 2015, Article ID 573894 (2015)
Ouyang, A., Tang, Z., Li, K., Sallam, A., Sha, E.: Estimating parameters of Muskingum model using an adaptive hybrid PSO algorithm. Int. J. Pattern Recognit. Artif. Intell. 28, Article ID 1459003 (2014)
Polak, E.: The conjugate gradient method in extreme problems. Comput. Math. Math. Phys. 9, 94–112 (1969)
Polak, E., Ribière, G.: Note sur la convergence de directions conjugees. Rev. Fr. Inform. Rech. Oper. 3, 35–43 (1969)
Powell, M.J.D.: Nonconvex minimization calculations and the conjugate gradient method. In: Numerical Analysis. Lecture Notes in Mathematics, vol. 1066, pp. 122–141. Spinger, Berlin (1984)
Powell, M.J.D.: Convergence properties of algorithm for nonlinear optimization. SIAM Rev. 28, 487–500 (1986)
Sun, Y.: Indirect boundary integral equation method for the Cauchy problem of the Laplace equation. J. Sci. Comput. 71, 469–498 (2017)
Wei, Z., Yao, S., Liu, L.: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 183, 1341–1350 (2006)
Yuan, G.: Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems. Optim. Lett. 3, 11–21 (2009)
Yuan, G., Lu, X.: A modified PRP conjugate gradient method. Ann. Oper. Res. 166, 73–90 (2009)
Yuan, G., Lu, X., Wei, Z.: A conjugate gradient method with descent direction for unconstrained optimization. J. Comput. Appl. Math. 233, 519–530 (2009)
Yuan, G., Meng, Z., Li, Y.: A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 168, 129–152 (2016)
Yuan, G., Sheng, Z., Wang, B., Hu, W., Li, C.: The global convergence of a modified BFGS method for nonconvex functions. J. Comput. Appl. Math. 327, 274–294 (2018)
Yuan, G., Wei, Z., Li, G.: A modified Polak–Ribière–Polyak conjugate gradient algorithm for nonsmooth convex programs. J. Comput. Appl. Math. 255, 86–96 (2014)
Yuan, G., Wei, Z., Lu, X.: Global convergence of the BFGS method and the PRP method for general functions under a modified weak Wolfe–Powell line search. Appl. Math. Model. 47, 811–825 (2017)
Yuan, G., Wei, Z., Yang, Y.: The global convergence of the Polak–Ribière–Polyak conjugate gradient algorithm under inexact line search for nonconvex functions. J. Comput. Appl. Math. 362 262–275 (2019). https://doi.org/10.1016/j.cam.2018.10.057
Yuan, G., Zhang, M.: A three-terms Polak–Ribière–Polyak conjugate gradient algorithm for large-scale nonlinear equations. J. Comput. Appl. Math. 286, 186–195 (2015)
Acknowledgements
The authors would like to express the sincerest appreciation to editors and the anonymous referees for their valuable comments that helped improve the manuscript.
Funding
This work is supported by the National Natural Science Foundation of China (Grant No. 11661009), the Guangxi Science Fund for Distinguished Young Scholars (No. 2015GXNSFGA139001), and the Guangxi Natural Science Key Fund (No. 2017GXNSFDA198046).
Author information
Authors and Affiliations
Contributions
XL organized this whole paper, TY wrote some parts of the proof, and XW did the experiments and wrote some parts of the proof. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
There is no potential conflicts of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Li, X., Yang, T. & Wang, X. The global proof of the Polak–Ribière–Polak algorithm under the YWL inexact line search technique. J Inequal Appl 2019, 195 (2019). https://doi.org/10.1186/s13660-019-2148-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-019-2148-x