# New cautious BFGS algorithm based on modified Armijo-type line search

- Zhong Wan
^{1}Email author, - Shuai Huang
^{1}and - Xiao Dong Zheng
^{1}

**2012**:241

https://doi.org/10.1186/1029-242X-2012-241

© Wan et al.; licensee Springer 2012

**Received: **8 October 2011

**Accepted: **20 September 2012

**Published: **17 October 2012

## Abstract

In this paper, a new inexact line search rule is presented, which is a modified version of the classical Armijo line search rule. With lower cost of computation, a larger descent magnitude of objective function is obtained at every iteration. In addition, the initial step size in the modified line search is adjusted automatically for each iteration. On the basis of this line search, a new cautious Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm is developed. Under some mild assumptions, the global convergence of the algorithm is established for nonconvex optimization problems. Numerical results demonstrate that the proposed method is promising, especially in comparison with the existent methods.

## Keywords

## 1 Introduction

where $f:{R}^{n}\to R$ is a twice continuously differentiable function.

Amongst the variant methods to solve problem (1), it is well known that the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method has obtained great success either in the aspect of the theoretical research or in the field of engineering applications. In this connection, it is referred to, for example, the literature [1–11] and the references therein.

*f*, and ${g}_{k}$ is the value of

*g*at ${x}_{k}$. At the new iterate point ${x}_{k+1}$, ${B}_{k}$ is updated by

where ${s}_{k}={x}_{k+1}-{x}_{k}$, ${y}_{k}={g}_{k+1}-{g}_{k}$.

where *ϵ* and *γ* are positive constants.

In this paper, we shall first present a modified Armijo-type line search rule. Then, on the basis of this line search, a new cautious BFGS algorithm is developed. It will be shown that in our line search, a larger descent magnitude of an objective function is obtained with lower cost of computation at every iteration. In addition, the initial step size is adjusted automatically at each iteration.

The rest of this paper is organized as follows. In Section 2, a modified Armijo-type inexact line search rule is presented and a new cautious BFGS algorithm is developed. Section 3 is devoted to establishing the global convergence of the proposed algorithm under some suitable assumptions. In Section 4, numerical results are reported to demonstrate the efficiency of the algorithm. Some conclusions are given in the last section.

## 2 Modified Armijo-type line search rule and new cautious BFGS algorithm

where ${\sigma}_{1}\in (0,1)$ is a given constant scalar. In a computer procedure, ${\alpha}_{k}$ in (6) is obtained by searching in the set $\{\beta ,\beta \rho ,\beta {\rho}^{2}\dots ,\}$ such that ${\alpha}_{k}$ is the largest component satisfying (6), where $\rho \in (0,1)$ and $\beta >0$ are given constant scalars.

Compared with other line search methods, the computer procedure of the Armijo line search is simplest, and the computational cost to find a feasible stepsize is very low, especially for $1>\rho >0$ being close to 0. Its drawback lies in that at each iteration, there may be only little reduction of an objective function to be obtained.

*g*is a Lipschitz continuous function. Let

*L*be the Lipschitz constant. Let ${L}_{k}$ be an approximation of

*L*. Set

holds, where $\sigma \in (0,1)$, $\mu \in [0,+\mathrm{\infty})$, $\rho \in (0,1)$ are given constant scalars.

In the following proposition, we show that the new line search (7) is well defined.

**Proposition 1**

*Let*$f:{R}^{n}\to R$

*be a continuously differentiable function*.

*Suppose that the gradient function*

*g*

*of*

*f*

*is Lipschitz continuous*.

*Let*${L}_{k}>0$

*be an approximation value of the Lipschitz constant*.

*If*${d}_{k}$

*is a descent direction of*

*f*

*at*${x}_{k}$,

*then there is an*$\alpha >0$

*in the set*$\{{\beta}_{k},{\beta}_{k}\rho ,{\beta}_{k}{\rho}^{2}\dots \}$

*such that the following inequality holds*:

*where* $\sigma \in (0,1)$, $\mu \in [0,+\mathrm{\infty})$, $\rho \in (0,1)$ *are given constant scalars*.

*Proof*In fact, we only need to prove that a step length

*α*is obtained in finitely many steps. If it is not true, then for all sufficiently large positive integer

*m*, we have

From $\sigma \in (0,1)$, it follows that ${g}_{k}^{T}{d}_{k}\ge 0$. This contradicts the fact that ${d}_{k}$ is a descent direction. □

**Remark 1** Since the third term on the right-hand side of (7) is negative, it is easy to see that the obtained step size *α* ensures a larger descent magnitude of the objective function than that in (6).

It is noted that (7) reduces to (6) when $\mu =0$.

**Remark 2**In the MALS, the parameter ${L}_{k}$ should be estimated at each iteration. In this paper, for $k>1$, we choose

Therefore, it is acceptable that ${L}_{k}$ is an approximation of *L*.

Based on Proposition 1, Remarks 1 and 2, a new cautious BFGS algorithm is developed for solving problem (1).

**Algorithm 1** (New cautious BFGS algorithm)

Step 1. Choose an initial point ${x}_{0}\in {R}^{n}$ and a positive definite matrix ${B}_{0}$. Choose $\sigma \in (0,1)$, $\mu \ge 0$, $\u03f5>0$ and ${L}_{0}>0$. Set $k:=0$.

Step 2. If $\parallel {g}_{k}\parallel \le \u03f5$, the algorithm stops. Otherwise, go to Step 3.

Step 4. Determine a step size ${\alpha}_{k}$ satisfying (7).

Step 5. Set ${x}_{k+1}:={x}_{k}+{\alpha}_{k}{d}_{k}$. Compute ${s}_{k}$ and ${y}_{k}$. Update ${B}_{k}$ as ${B}_{k+1}$ by (5). Set $k:=k+1$, return to Step 2.

## 3 Global convergence

In this section, we are going to prove the global convergence of Algorithm 1.

We need the following conditions.

**Assumption 1**

- 1.
*The level set*$\mathrm{\Omega}=\{x\in {R}^{n}|f(x)\le f({x}_{0})\}$*is bounded*. - 2.
*In some neighborhood N of*Ω,*f**is continuously differentiable and its gradient is Lipschitz continuous*,*namely there exists a constant*$L>0$*such that*$\parallel g(x)-g(y)\parallel \le L\parallel x-y\parallel ,\phantom{\rule{1em}{0ex}}\mathrm{\forall}x,y\in N.$(14) - 3.
*The sequence*$\{{L}_{k}\}$, $0<{L}_{k}\le ML$,*where**M**is a positive constant*.

Before the statement of the global convergence, we first prove the following useful lemmas.

**Lemma 1**

*Let*$\{{x}_{k}\}$

*be a sequence generated by Algorithm*1.

*If*${L}_{k}>0$

*for each*$k\ge 0$,

*then for any given initial point*${x}_{0}$,

*the following results hold*:

- 1.
$\{{f}_{k}\}$

*is a decreasing sequence*. - 2.
$\{{x}_{k}\}\in \mathrm{\Omega}$.

- 3.
${\sum}_{k=1}^{\mathrm{\infty}}({f}_{k}-{f}_{k+1})<+\mathrm{\infty}$.

*Proof* The first and second results are directly from the condition ${L}_{k}>0$ for each $k\ge 0$, Proposition 1 and the definition of Ω. We only need to prove the third result.

□

**Lemma 2**

*Let*$\{{x}_{k}\}$

*be a sequence generated by Algorithm*1.

*Let*$\{{d}_{k}\}$

*be the sequence of search direction*.

*If Assumption*1

*holds*,

*then*

*In particular*,

*Proof*Denote

*g*, it is obtained that

□

**Lemma 3**

*Let*$\{{x}_{k}\}$

*be a sequence generated by Algorithm*1.

*Suppose that there exist constants*${a}_{1},{a}_{2}>0$

*such that the following relations hold for infinitely many*

*k*:

*Then*

*Proof* Let Λ be the indices set of *k* satisfying (32).

The desired result (33) is proved. □

Lemma 3 indicates that for the establishment of the global convergence, it suffices to show that (32) holds for infinitely many *k* in Algorithm 1. The following lemma gives sufficient conditions for (32) to hold (see Theorem 2.1 in [18]).

**Lemma 4**

*Let*${B}_{0}$

*be a symmetric and positive matrix and*${B}_{k}$

*be updated by*(3).

*Suppose that there are positive constants*${m}_{1}$, ${m}_{2}$ (${m}_{1}<{m}_{2}$)

*such that for all*$k\ge 0$,

*Then there exist constants* ${a}_{1}$, ${a}_{2}$ *such that for any positive integer* *t*, (32) *holds for at least* $[t/2]$ *values of* $k\in \{1,2,\dots ,t\}$.

**Theorem 1** *Let* $\{{x}_{k}\}$ *be a sequence generated by Algorithm* 1. *Under Assumption * 1, (33) *holds*.

*Proof* From Lemma 3, we only need to show that (32) holds for infinitely many *k*.

If $\tilde{K}$ is a finite set, then ${B}_{k}$ remains a constant matrix after a finite number of iterations. Hence, there are constants ${a}_{1}$, ${a}_{2}$ such that (32) holds for all *k* sufficiently large. The proof of the result is completed.

In the following, we prove (33) in the case that $\tilde{K}$ is a infinite set.

*k*. From (40), the inequality

holds for all $k\in \tilde{K}$.

From Lemma 4, it follows that there exist constants ${a}_{1}$, ${a}_{2}$ such that (32) holds for infinitely many *k*. It contradicts the result in Lemma 3.

The proof is completed. □

## 4 Numerical experiments

In this section, we report the numerical performance of Algorithm 1. The numerical experiments are carried out on a set of 16 test problems from [19]. We make comparisons with the cautious BFGS method associated with the ordinary Armijo line search rule.

In order to study the numerical performance of Algorithm 1, we record the run time of CPU, the total number of function evaluations required in the process of line search and the total number of iterations for each algorithm.

As to the parameters in the cautious update (5), we first let $\gamma =0.01$ if $\parallel {g}_{k}\parallel \ge 1$, and $\gamma =3$ if $\parallel {g}_{k}\parallel <1$.

The performance of algorithms and the solution results are reported in Table 1. In this table, we use the following denotations:

*Dim*: the dimension of the objective function;

*GV*: the gradient value of the objective function when the algorithm stops;

*NI*: the number of iterations;

*NF*: the number of function evaluations;

*CT*: the run time of CPU;

CBFGS: the CBFGS method associated with Armijo line search rule;

NCBFGS: the new BFGS method proposed in this paper.

**Comparison of efficiency with other method**

Functions | Algorithm | Dim | GV | NI | NF | CT |
---|---|---|---|---|---|---|

Rosenbrock | CBFGS | 2 | 6.2782e-007 | 35 | 74 | 0.0310s |

NCBFGS | 2 | 1.1028e-007 | 40 | 70 | 0.0310s | |

Freudenstein and Roth | CBFGS | 2 | 7.9817e-007 | 28 | 82 | 0.0310s |

NCBFGS | 2 | 2.7179e-007 | 11 | 25 | 0.0320s | |

Beale | CBFGS | 2 | 7.2275e-007 | 40 | 55 | 0.0310s |

NCBFGS | 2 | 3.1136e-007 | 18 | 23 | 0.0470s | |

Brown badly | CBFGS | 2 | 7.7272e-007 | 36 | 223 | 0.0310s |

NCBFGS | 2 | 0 | 29 | 50 | 0.0620s | |

Broyden tridiagonal | CBFGS | 4 | 7.5723e-007 | 26 | 126 | 0.0320s |

NCBFGS | 4 | 3.8712e-007 | 15 | 21 | 0.0310s | |

Powell singular | CBFGS | 4 | 9.9993e-007 | 13,993 | 14,031 | 2.4530s |

NCBFGS | 4 | 9.4607e-007 | 31 | 38 | 0.0320s | |

Kowalik and Osborne | CBFGS | 4 | 9.9783e-007 | 3126 | 3128 | 2.1250s |

NCBFGS | 4 | 4.4454e-007 | 30 | 45 | 0.0470s | |

Brown almost-linear | CBFGS | 6 | 9.5864e-007 | 263 | 300 | 0.1100s |

NCBFGS | 6 | 1.2290e-007 | 22 | 30 | 0.0160s | |

Discrete boundary | CBFGS | 6 | 8.6773e-007 | 79 | 85 | 0.0470s |

NCBFGS | 6 | 3.3650e-007 | 14 | 17 | 0.0320s | |

Variably dimensioned | CBFGS | 8 | 3.4688e-008 | 7 | 51 | 0.0470s |

NCBFGS | 8 | 3.1482e-007 | 10 | 21 | 0.0320s | |

Extended Rosenbrock | CBFGS | 8 | 8.2943e-007 | 91 | 190 | 0.0470s |

NCBFGS | 8 | 7.7959e-007 | 99 | 149 | 0.0320s | |

Extended Powell singular | CBFGS | 8 | 9.9975e-007 | 6154 | 6199 | 1.4690s |

NCBFGS | 8 | 6.5685e-007 | 42 | 55 | 0.0630s | |

Brown almost-linear | CBFGS | 8 | 9.8392e-007 | 364 | 379 | 0.1880s |

NCBFGS | 8 | 4.8080e-007 | 20 | 27 | 0.0780s | |

Broyden tridiagonal | CBFGS | 9 | 4.4261e-007 | 38 | 86 | 0.0470s |

NCBFGS | 9 | 6.2059e-007 | 41 | 56 | 0.0310s | |

Linear-rank1 | CBFGS | 10 | - | - | - | - |

NCBFGS | 10 | 2.6592e-007 | 4 | 15 | 0.0310s | |

Linear-full rank | CBFGS | 12 | 9.5231e-007 | 18 | 36 | 0.0160s |

NCBFGS | 12 | 9.4206e-016 | 2 | 3 | 0.0150s |

In Table 1 it is shown that the developed algorithm in this paper is promising. In some cases, it requires less number of iterations, less number of function evaluation or less CPU time to find an optimal solution with the same tolerance than another algorithm.

## 5 Conclusions

A modified Armijo-type line search with an automatical adjustment of initial step size has been presented in this paper. Combined with the cautious BFGS method, a new BFGS algorithm has been developed. Under some assumptions, the global convergence was established for nonconvex optimization problems. Numerical results demonstrate that the proposed method is promising.

## Declarations

### Acknowledgements

Supported by the National Natural Science Foundation of China (Grant No. 71210003, 71071162).

## Authors’ Affiliations

## References

- Al-Baali M: Quasi-Newton algorithms for large-scale nonlinear least-squares. In
*High Performance Algorithms and Software for Nonlinear Optimization*. Edited by: Pillo G, Murli A. Kluwer Academic, Dordrecht; 2003:1–21.View ArticleGoogle Scholar - Al-Baali M, Grandinetti L: On practical modifications of the quasi-Newton BFGS method.
*Adv. Model. Optim.*2009, 11(1):63–76.MathSciNetGoogle Scholar - Gill PE, Leonard MW: Limited-Memory reduced-Hessian methods for large-scale unconstrained optimization.
*SIAM J. Optim.*2003, 14: 380–401. 10.1137/S1052623497319973MathSciNetView ArticleGoogle Scholar - Guo Q, Liu JG: Global convergence of a modified BFGS-type method for unconstrained nonconvex minimization.
*J. Appl. Math. Comput.*2006, 21: 259–267. 10.1007/BF02896404MathSciNetView ArticleGoogle Scholar - Li DH, Fukushima M: On the global convergence of the BFGS method for nonconvex unconstrained optimization problems.
*SIAM J. Optim.*2001, 11(4):1054–1064. 10.1137/S1052623499354242MathSciNetView ArticleGoogle Scholar - Mascarenhas WF: The BFGS method with exact line searches fails for non-convex objective functions.
*Math. Program.*2004, 99: 49–61. 10.1007/s10107-003-0421-7MathSciNetView ArticleGoogle Scholar - Nocedal J: Theory of algorithms for unconstrained optimization.
*Acta Numer.*1992, 1: 199–242.MathSciNetView ArticleGoogle Scholar - Xiao YH, Wei ZX, Zhang L: A modified BFGS method without line searches for nonconvex unconstrained optimization.
*Adv. Theor. Appl. Math.*2006, 1(2):149–162.MathSciNetGoogle Scholar - Zhou W, Li D: A globally convergent BFGS method for nonlinear monotone equations without any merit function.
*Math. Comput.*2008, 77: 2231–2240. 10.1090/S0025-5718-08-02121-2View ArticleGoogle Scholar - Zhou W, Zhang L: Global convergence of the nonmonotone MBFGS method for nonconvex unconstrained minimization.
*J. Comput. Appl. Math.*2009, 223: 40–47. 10.1016/j.cam.2007.12.011MathSciNetView ArticleGoogle Scholar - Zhou W, Zhang L: Global convergence of a regularized factorized quasi-Newton method for nonlinear least squares problems.
*Comput. Appl. Math.*2010, 29(2):195–214.MathSciNetGoogle Scholar - Cohen AI: Stepsize analysis for descent methods.
*J. Optim. Theory Appl.*1981, 33: 187–205. 10.1007/BF00935546MathSciNetView ArticleGoogle Scholar - Dennis JE, Schnable RB:
*Numerical Methods for Unconstrained Optimization and Nonlinear Equations*. Prentice-Hall, Englewood Cliffs; 1983.Google Scholar - Shi ZJ, Shen J: New inexact line search method for unconstrained optimization.
*J. Optim. Theory Appl.*2005, 127(2):425–446. 10.1007/s10957-005-6553-6MathSciNetView ArticleGoogle Scholar - Sun WY, Han JY, Sun J: Global convergence of nonmonotone descent methods for unconstrained optimization problems.
*J. Comput. Appl. Math.*2002, 146: 89–98. 10.1016/S0377-0427(02)00420-XMathSciNetView ArticleGoogle Scholar - Wolfe P: Convergence conditions for ascent methods.
*SIAM Rev.*1969, 11: 226–235. 10.1137/1011036MathSciNetView ArticleGoogle Scholar - Nocedal J, Wright JS:
*Numerical Optimization*. Springer, New York; 1999.View ArticleGoogle Scholar - Byrd R, Nocedal J: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization.
*SIAM J. Numer. Anal.*1989, 26: 727–739. 10.1137/0726042MathSciNetView ArticleGoogle Scholar - More JJ, Garbow BS, Hillstrom KE: Testing unconstrained optimization software.
*ACM Trans. Math. Softw.*1981, 7: 17–41. 10.1145/355934.355936MathSciNetView ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.