• Research
• Open Access

# A modified nonmonotone BFGS algorithm for unconstrained optimization

Journal of Inequalities and Applications20172017:183

https://doi.org/10.1186/s13660-017-1453-5

• Received: 20 March 2017
• Accepted: 14 July 2017
• Published:

## Abstract

In this paper, a modified BFGS algorithm is proposed for unconstrained optimization. The proposed algorithm has the following properties: (i) a nonmonotone line search technique is used to obtain the step size $$\alpha_{k}$$ to improve the effectiveness of the algorithm; (ii) the algorithm possesses not only global convergence but also superlinear convergence for generally convex functions; (iii) the algorithm produces better numerical results than those of the normal BFGS method.

## Keywords

• BFGS update
• global convergence
• superlinear convergence
• nonmonotone

• 65K05
• 90C26

## 1 Introduction

Consider
$$\min \bigl\{ f(x) \vert x \in \Re^{n} \bigr\} ,$$
(1.1)
where $$f(x):\Re^{n}\rightarrow \Re$$ is continuously differentiable. Many similar problems can be transformed into the above optimization problem (see  etc.). The following iteration formula is used to address the iteration point of (1.1):
$$x_{k+1}=x_{k}+\alpha_{k}d_{k},\quad k=0, 1, 2,\ldots,$$
(1.2)
where $$x_{k}$$ is the kth iterative point, $$\alpha_{k}>0$$ is the step length, and $$d_{k}$$ is the search direction of f at $$x_{k}$$. The search direction $$d_{k}$$ determines the line search method (see ). The quasi-Newton method is defined by
$$B_{k}d_{k}+g_{k}=0,$$
(1.3)
where $$g_{k}=\nabla f(x_{k})$$, $$B_{k}$$ is the quasi-Newton update matrix, and the sequence $$\{B_{k}\}$$ satisfies the so-called quasi-Newton equation
$$B_{k+1}s_{k}=y_{k},$$
(1.4)
where $$s_{k}=x_{k+1}-x_{k}$$, $$y_{k}=g_{k+1}-g_{k}$$, and $$g_{k+1}= \nabla f(x_{k+1})$$. The following update of $$B_{k}$$:
$$B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}y_{k}^{T}}{s_{k}^{T}y_{k}}$$
(1.5)
is the BFGS formula (Broyden , Fletcher , Goldfar , and Shanno ), which is one of the most effective quasi-Newton methods. Convex functions can be combined with exact line or certain special inexact line search techniques that have global convergence (see  etc.) and superlinear convergence (see [33, 34] etc.). For general functions, under inexact line search techniques, Dai  constructed an example to show that the BFGS method fails. Mascarenhas  proved the nonconvergence of this method, even with the exact line search technique. To obtain global convergence of a BFGS method without the convexity assumption, Li and Fukushima [37, 38] proposed the following modified BFGS methods.

### Formula 1



The BFGS update formula is defined by
$$B_{k+1}=B_{k}+\frac{\delta_{k}^{T}\delta_{k}}{s_{k}^{T}\delta_{k}}- \frac{B _{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}},$$
(1.6)
where $$\delta_{k}=y_{k}+(\max \{0,-\frac{y_{k}^{T}s_{k}}{\Vert s_{k}\Vert ^{2}}\}+\phi (\Vert g_{k}\Vert ))s_{k}$$ and function $$\phi:\Re \rightarrow \Re$$ satisfies: (i) $$\phi (t)>0$$ for all $$t>0$$; (ii) $$\phi (t)=0$$ if and only if $$t=0$$; (iii) if t is in a bounded set, and $$\phi (t)$$ is bounded. Using the definition of $$\delta_{k}$$, it is not difficult to obtain
$$\delta_{k}^{T}s_{k}\geq \max \bigl\{ s_{k}^{T}y_{k},\phi \bigl(\Vert g_{k}\Vert \bigr)\Vert s_{k}\Vert ^{2}\bigr\} >0.$$
This is sufficient to guarantee the positive definiteness of $$B_{k+1}$$ as long as $$B_{k}$$ is positive definite. Li and Fukashima presented $$\phi (t)=\mu t$$ with some constant $$\mu >0$$.

### Formula 2



The BFGS update formula is defined by
\begin{aligned} B_{k+1}=\textstyle\begin{cases} B_{k}+\frac{\delta_{k}^{T}\delta_{k}}{s_{k}^{T}\delta_{k}}-\frac{B _{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}, & \mbox{if } \frac{\delta_{k}^{T}s_{k}}{\Vert s_{k}\Vert ^{2}}\geq \phi (\Vert g_{k}\Vert ), \\ B_{k},& \mbox{otherwise}, \end{cases}\displaystyle \end{aligned}
(1.7)
where $$\delta_{k}$$, ϕ and the properties are the same as those in Formula 1. For nonconvex functions, these two methods possess global convergence and superlinear convergence.

Some scholars have conducted further research to obtain a better approximation of the Hessian matrix of the objective function.

### Formula 3



The BFGS update formula is defined by
$$B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}^{m*}{y_{k}^{m*}}^{T}}{s_{k}^{T}y_{k}^{m*}},$$
(1.8)
where $$y_{k}^{m*}=y_{k} + \frac{\rho_{k}}{\Vert s_{k}\Vert ^{2}}s_{k}$$ and $$\rho_{k}=2[f(x_{k})-f(x_{k}+\alpha_{k}d_{k})]+(g(x_{k}+\alpha_{k} d _{k})+g(x_{k}))^{T}s_{k}$$. It is easy to conclude that this formula contains both gradient and function value information. One may believe that the resulting methods will outperform the normal BFGS method. In fact, the practical computation shows that the method is better than the normal BFGS method and that it has some theoretical advantages (see [39, 40]). Under the WWP line search, Wei et al.  proposed the quasi-Newton method and established its superlinear convergence for uniformly convex functions. Its global convergence can be found in , but the method fails for general convex functions. One of the main reasons for the failure is the non-positive definiteness of matrix $$B_{k}$$ for general convex functions. Byrd et al. [31, 32] showed that the positive definiteness of matrix $$B_{k}$$ plays an important role in the convergence of the quasi-Newton algorithm. Yuan and Wei  first analyzed the global convergence and superlinear convergence of the modified BFGS formula in  using gradient and function value information for general convex functions. Based on equation (1.9), Yuan and Wei  proposed another BFGS formula.

### Formula 4



The BFGS update formula is defined by
$$B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}^{m}{y_{k}^{m}}^{T}}{s_{k}^{T}y_{k}^{m}},$$
(1.9)
where $$y_{k}^{m}=y_{k} + \max \{\frac{\rho_{k}}{\Vert s_{k}\Vert ^{2}},0\}s _{k}$$. This modified method obtains global convergence and superlinear convergence for generally convex functions. The same work was previously performed by Zhang et al. .

### Formula 5



The BFGS update formula is defined by
$$B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}^{1*}{y_{k}^{1*}}^{T}}{s_{k}^{T}y_{k}^{1*}},$$
(1.10)
where $$y_{k}^{1*}=y_{k}+\bar{A}_{k}s_{k}$$, $$\bar{A}_{k}=\frac{6[f(x _{k})-f(x_{k}+\alpha_{k}d_{k})]+3(\nabla f(x_{k}+\alpha_{k} d_{k})+ \nabla f(x_{k}))^{T}s_{k}}{\Vert s_{k}\Vert ^{2}}$$. It is clear that the quasi-Newton equation (1.10) also contains both gradient and function value information, and it has been proved that the new formula has a higher order approximation to $$\nabla^{2} f(x)$$. Furthermore, Yuan et al.  extended a similar technique to $$y_{k}^{1*}$$ in a limited memory BFGS method, where global convergence is only obtained for uniformly convex functions. Several other modified quasi-Newton methods have been reported (see [23, 40, 44, 45]).
The monotone line search technique is often used to determine the step size $$\alpha_{k}$$. One famous technique is the weak Wolfe-Powell (WWP) technique.
1. (i)
WWP line search technique. $$\alpha_{k}$$ is determined by
$$f(x_{k}+\alpha_{k}d_{k})\leq f(x_{k})+\delta \alpha_{k}g_{k}^{T}d_{k},\qquad g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq \sigma g_{k}^{T}d_{k},$$
(1.11)
where $$0<\delta <\sigma <1$$. Recently, a modified WWP line search technique was proposed by Yuan, Wei, and Lu  to ensure that the BFGS and the PRP methods have global convergence for nonconvex functions; these two open problems have been solved. However, monotonicity may generate a series of extremely small steps if the contours of the objective functions are a family of curves with large curvature . Nonmonotonic line search to solve unconstrained optimization was proposed by Grippo et al. in  and was further studied by . Grippo, Lamparillo, and Lucidi  proposed the following nonmonotone line search and called it GLL line search.

2. (ii)
GLL nonmonotone line search. $$\alpha_{k}$$ is determined by
\begin{aligned}& f(x_{k+1}) \leq \max_{0\leq j \leq M_{0}}f(x_{k-j})+ \epsilon_{1}\alpha _{k}g_{k}^{T}d_{k}, \end{aligned}
(1.12)
\begin{aligned}& g(x_{k+1})^{T}d_{k} \geq \max \bigl\{ \epsilon_{2}, 1-\bigl(\alpha_{k}\Vert d_{k} \Vert \bigr)^{p} \bigr\} g_{k}^{T}d_{k}, \end{aligned}
(1.13)
where $$p\in (-\infty,1)$$, $$k=0, 1, 2, \ldots$$ , $$\varepsilon_{1} \in (0,1)$$, $$\varepsilon_{2} \in (0,\frac{1}{2})$$, $$M_{0}$$ is a nonnegative integer. By combining this line search with the normal BFGS formula, Han and Liu  established the global convergence of the convex objective function; its superlinear convergence was established by Yuan and Wei . Although these nonmonotone techniques perform well in many cases, the numerical performance is dependent on the choice of $$M_{0}$$ to some extent (see [47, 53, 54] in detail). Zhang and Hager  presented another nonmonotone line search technique.

3. (iii)
Zhang and Hager nonmonotone line search technique . In this technique $$\alpha_{k}$$ is found by
$$Q_{k+1}=\eta_{k}Q_{k}+1,\qquad C_{k+1}= \frac{\eta_{k}Q_{k}C_{k}+f(x_{k+1})}{Q_{k+1}},$$
(1.14)
where $$\eta_{k}\in [\eta_{\min },\eta_{\max }]$$, $$0\leq \eta_{\min } \leq \eta_{\max }\leq 1$$, $$C_{0}=f(x_{0})$$ and $$Q_{0}=1$$. It is easy to conclude that $$C_{k+1}$$ is a convex combination of $$C_{k}$$ and $$f(x_{k+1})$$. The numerical results show that this technique is more competitive than the nonmonotone method of , but it requires strong assumption conditions for convergence analysis.

Motivated by the above observations, we study the modified BFGS-type method of Yuan et al.  based on the formula (1.10). The modified BFGS-type method and the proposed algorithm have the following characteristics:
• The GLL line search technique is used in the algorithm to ensure good convergence.

• The major contribution of the new algorithm is an extension of the modified BFGS update from  and .

• Another contribution is the proof of global convergence for generally convex functions.

• The major aim of the proposed method is to establish the superlinear convergence and the global convergence for generally convex functions.

• The experimental problems, including both normal unconstrained optimization and engineering problems (benchmark problems), indicate that the proposed algorithm is competitive with the normal method.

This paper is organized as follows. In the next section, we present the algorithm. The global convergence and superlinear convergence are established in Section 3 and Section 4, respectively. Numerical results are reported in Section 5. In the final section, we present a conclusion. Throughout this paper, $$\Vert \cdot \Vert$$ denotes the Euclidean norm of a vector or matrix.

## 2 Algorithm

In this paper, we study the modified formula of  and obtain global convergence and superlinear convergence under generally convex conditions. The modified BFGS update of (1.10) is presented as
$$B_{k+1}^{*}=B_{k}^{*} - \frac{B_{k}^{*} s_{k} s_{k}^{T} B_{k}^{*}}{s _{k}^{T} B_{k}^{*} s_{k}} + \frac{y_{k}^{*} {y_{k}^{*}}^{T}}{{y_{k} ^{*}}^{T} s_{k}},$$
(2.1)
where $${y_{k}^{*}}=y_{k}+A_{k}^{*}s_{k}$$, $$A_{k}^{*}=\max \{\bar{A} _{k},0\}$$. The corresponding quasi-Newton equation is
$$B_{k+1}^{*}s_{k}=y_{k}^{*}.$$
(2.2)
By the definition of the convex property of f, $$s_{k}^{T}y_{k}^{*}>0$$ holds (see  in detail). Therefore, the update matrix $$B_{k+1}^{*}$$ from (2.1) inherits the positive definiteness of $$B_{k}^{*}$$ for generally convex functions. Now, we state the algorithm as follows.

### Algorithm 1

Mod-non-BFGS-A

Step 0::

Given a symmetric and positive definite matrix $$B_{0}^{*}$$ and an integer $$M_{0}>0$$, choose an initial point $$x_{0} \in \Re^{n}$$, $$0<\varepsilon <1$$, $$0<\epsilon_{1}<\epsilon_{2}<1$$, $$p\in (-\infty,1)$$; Set $$k:=0$$.

Step 1::

$$\Vert g_{k}\Vert \leq \varepsilon$$, stop; Otherwise, go to the next step.

Step 2::
Solve
$$B_{k}^{*}d_{k}+g_{k}=0$$
(2.3)
to obtain $$d_{k}$$.
Step 3::

The step length $$\alpha_{k}$$ is determined by GLL (1.12) and (1.13).

Step 4::

Let $$x_{k+1}=x_{k}+\alpha_{k}d_{k}$$.

Step 5::

Generate $$B_{k+1}^{*}$$ from (2.1) and set $$k=k+1$$; Go to Step 1.

## 3 Global convergence

The following assumptions are required to obtain the global convergence of Algorithm 1.

### Assumption A

1. (i)

The level set $$\L_{0}=\{x \mid f(x) \le f(x _{0}) \}$$ is bounded.

2. (ii)
The objective function f is continuously differentiable and convex on $$L_{0}$$. Moreover, there exists a constant $$L\ge 0$$ satisfying
$$\bigl\Vert g(x)-g(y)\bigr\Vert \le L\Vert x-y \Vert ,\quad \forall x, y \in L_{0}.$$
(3.1)

Assumption A implies that there exist constants $$M>0$$ and $$\varrho >0$$ satisfying
$$\bigl\Vert G(x)\bigr\Vert \leq M,\qquad G(x)=\nabla^{2} f(x), \quad x\in L_{0},$$
and
$$\frac{\Vert y_{k}\Vert ^{2}}{s_{k}^{T}y_{k}}\leq \varrho,\quad k\geq 0\ (\mbox{see }).$$
(3.2)

### Lemma 3.1

Suppose Assumption A holds. Then there exists a constant $$M_{*}>0$$ such that
$$\frac{\Vert y^{*}_{k}\Vert ^{2}}{s^{T}_{k}y^{*}_{k}}\leq M_{*}.$$

The proof is similar to , so it is not presented here.

### Lemma 3.2

Let $$B_{k}$$ be updated by (2.1); then the relation
$$\det \bigl(B_{k+1}^{*}\bigr)=\det \bigl(B_{k}^{*} \bigr)\frac{(y_{k}^{*})^{T} s_{k}}{s_{k} ^{T} B_{k}^{*}s_{k}}$$
holds, where $$\det (B_{k}^{*})$$ denotes the determinant of $$B_{k}^{*}$$.

### Lemma 3.3

Assume that Assumption A holds and that sequence $$\{x_{k}\}$$ is generated by Algorithm 1. If
$$\liminf_{k\rightarrow \infty } \Vert g_{k}\Vert >0,$$
then there exists a constant $$\epsilon '>0$$ satisfying
$$\prod_{j=1}^{k} \gamma_{j} \geq \bigl(\epsilon '\bigr)^{k},\quad \textit{for all }k\geq 1,$$
where $$\gamma_{j}=\frac{-g_{j}^{T}d_{j}}{\Vert d_{j}\Vert }$$.

### Proof

For $$k=0$$, by the positive definiteness of $$B_{0}$$, we have $$s_{0}^{T}y_{0}^{*}>0$$. Then $$B_{1}$$ is generated by (2.1), and $$B_{1}$$ is positive definite. Assume that $$B_{k}$$ is positive definite; for all $$k\geq 1$$, we prove that $$s_{k}^{T}y_{k}^{*}>0$$ holds by the following three cases.

Case 1: $$\bar{A}_{k}<0$$ . The definition of $$y_{k}^{*}$$, the convexity of $$f(x)$$, and Assumption A generate
$$s_{k}^{T}y_{k}^{*}=s_{k}^{T}y_{k}>0.$$
Case 2: $$\bar{A}_{k}=0$$ . By (1.13), (2.3), Assumption A, the definition of $$y_{k}^{*}$$, and the positive definiteness of $$B_{k}$$, we get
$$s_{k}^{T}y_{k}^{*}=s_{k}^{T}y_{k} \geq -(1-\sigma_{*})\alpha_{k}d_{k} ^{T}g_{k}=(1-\sigma_{*})\alpha_{k}d_{k}^{T}B_{k}^{*}d_{k}>0,$$
where $$\sigma_{*}\in (0,1)$$.

Case 3: $$\bar{A}_{k}>0$$ . The proof can be found in 

Similar to the proof of Theorem 3.1 in , we can establish the global convergence theorem of Algorithm 1. Here, we state the theorem but omit the proof. □

### Theorem 3.1

Let the conditions of Lemma 3.3 hold; then we have
$$\liminf_{k\rightarrow \infty } \Vert g_{k}\Vert =0.$$
(3.3)

## 4 Superlinear convergence analysis

Based on Theorem 3.1, we suppose that $$x^{*}$$ is the limit of the sequence $$\{x_{k}\}$$. To establish the superlinear convergence of Algorithm 1, the following additional assumption is needed.

### Assumption B

$$g(x^{*})=0$$ with $$x_{k}\rightarrow x^{*}$$. $$G(x^{*})$$ is positive definite and Hölder continuous at $$x^{*}$$, namely, for all x in the neighborhood of $$x^{*}$$, there exist constants $$u\geq (0,1)$$ and $$\zeta \geq 0$$ satisfying
$$\bigl\Vert G(x)-G\bigl(x^{*}\bigr)\bigr\Vert \le \zeta \bigl\Vert x-x^{*}\bigr\Vert ^{u},$$
(4.1)
where $$G(x)=\nabla^{2} f(x)$$.

In a way similar to , we can obtain the superlinear convergence of Algorithm 1, which we state as follows but we omit its proof.

### Theorem 4.1

Let Assumption A and B hold and $$\{x_{k}\}$$ be generated by Algorithm 1. Then the sequence $$\{x_{k}\}$$ superlinearly tends to $$x^{*}$$.

## 5 Numerical results

This section reports the numerical results of Algorithm 1. All code was written in MATLAB 7.0 and run on a PC with a 2.60 GHz CPU processor, 256 MB memory and the Windows XP operating system. The parameters are chosen as $$\delta =0.1$$, $$\sigma =0.9$$, $$\varepsilon =10^{-5}$$, $$\epsilon_{1}=0.1$$, $$\epsilon_{2}=0.01$$, $$p=5$$, $$M_{0}=8$$, and the initial matrix $$B_{0}=I$$ is the unit matrix. Since the line search cannot ensure the descent condition $$d_{k}^{T}g_{k}<0$$, an uphill search direction may occur in the numerical experiments. In this case, the line search rule may fail. To avoid this case, the step size $$\alpha_{k}$$ is accepted if the search number is greater than 25 in the line search. The following is the Himmeblau stop rule: If $$\vert f(x_{k})\vert > e_{1}$$, let $$\mathit{stop}1=\frac{\vert f(x_{k})-f(x_{k+1})\vert }{\vert f(x_{k})\vert }$$; otherwise, let $$\mathit{stop}1=\vert f(x_{k})-f(x_{k+1})\vert$$. In the experiment, if $$\Vert g(x)\Vert < \varepsilon$$ or $$\mathit{stop} 1 < e_{2}$$ satisfies $$e_{1}=e _{2}=10^{-5}$$, we end the program.

### 5.1  problems

It has been proved that  problems with initial points are an effective tool to estimate the performance of algorithms and are one of the most commonly used sets of optimization problems. Many scholars use these problems to assess their algorithms (see [23, 40, 42, 51]). In this paper, we also perform experiments on these problems. The detailed numerical results are listed in Table 1, where the columns of Table 1 have the following meaning:
Problem::

the name of the test problem;

Dim::

the dimensions of the problem;

NI::

the total number of iterations;

Time::

the cpu time in seconds;

NFG::

$$NFG=NF+5NG$$, where NF and NG are the total number of function and gradient evaluations, respectively (see ).

In Table 1, ‘BFGS-WP’, ‘BFGS-Non’, ‘BFGS-WP-Zhang’, and ‘BFGS-M-Non’ stand for the normal BFGS formula with WWP rule, the normal BFGS formula with GLL rule, the modified BFGS equation (1.10) with WWP rule, and MN-BFGS-A, respectively. The numerical results in Table 1 indicate that the proposed method is competitive with the other three similar methods.

To directly illustrate the performance of these methods, we utilize the tool of Dolan and Moré  to analyze their efficiency. Figures 1, 2, and 3 show that the performance is related to NI, NFG, and Time, respectively. According to these three figures, the MN-BFGS-A method has the best performance (the highest probability of being the optimal solver). Figure 1 Performance profiles of these methods (NI). Figure 2 Performance profiles of these methods (NFG). Figure 3 Performance profiles of these methods (Time).

Figure 1 shows that BFGS-M-Non and BFGS-Non outperform BFGS-WP and BFGS-WP-Zhang on approximately 9% and 6% of the problems, respectively. The BFGS-WP-Zhang and BFGS-WP methods can successfully solve 94% and 91% of the test problems, respectively.

Figure 2 shows that BFGS-M-Non and BFGS-Non are superior to BFGS-WP and BFGS-WP-Zhang on approximately 12% and 9% of these problems, respectively. The BFGS-M-Non and BFGS-Non methods solve 100% of the test problems at $$t\approx 10$$. The BFGS-WP-Zhang and the BFGS-WP methods solve the test problems with probabilities of 91% and 88%, respectively.

Figure 3 shows that the success rates when using the BFGS-M-Non and BFGS-Non methods to address the test problems are higher than the success rates when using BFGS-WP and BFGS-WP-Zhang by approximately 6% and 9%, respectively. Additionally, the BFGS-M-Non and BFGS-Non algorithms can address almost all the test problems. Moreover, BFGS-WP-Zhang has better results than BFGS-WP.

### 5.2 Benchmark problems

The benchmark problems listed in Table 2 are widely applied in various practical engineering situations. A function is multimodal if it has two or more local optima. A function p of the responding variables is separable provided that it can be rewritten as a sum of p functions of just one variable . Separability is closely related to the concept of epistasis or interrelation among the variables of a function. Non-separable functions are more difficult to optimize because the accuracy of the searching direction depends on two or more variables. By contrast, separable functions can be optimized for each variable in turn. The problem is even more difficult if the function is multimodal. The search process must be able to avoid the regions around local minima in order to approximate, as closely as possible, the global optimum. The most complex case appears when the local optima are randomly distributed in the search space.
The dimensionality of the search space is another important factor in the complexity of the problem. A study of the dimensionality problem and its features was conducted by Friedman . To establish the same degree of difficulty in all cases, a search space of dimensionality $$p=30$$ is chosen for all the functions. In the experiment, we do not fix the value to $$p=30$$, namely, it can be larger than 30. The exact dimensions can be found in Table 3.

However, the effectiveness of one algorithm compared another algorithm cannot be determined based on the number of problems that it solves better. The ‘no free lunch’ theorem (see ) states that provided we compare two searching algorithms with all possible functions, the performance of any two algorithms will be, on average, the same. As a result, attempting to find a perfect test set where all the functions are present to determine whether an algorithm is better than another algorithm for every function is a fruitless task. Therefore, when an algorithm is evaluated, we identify the types of problems where its performance is good to characterize the types of problems for which the algorithm is suitable. The authors previously studied functions to be optimized to construct a test set with a better selection of fewer functions (see [62, 63]). This enables us to draw conclusions about the performance of the algorithm depending on the type of function.

The above benchmark problems and the discussions of the choice of test problems for an algorithm can be found at
Many scholars use these problems to test numerical optimization methods (see [64, 65] etc.). Based on the above discussions, in this subsection, we test the four algorithms on the Benchmark problems. The test results are presented in Table 3, where $$x_{0}$$ denotes the initial point, $$x_{Sph10}=(-2,-2,\ldots,-2)$$, $$x_{Sph20}=(2,2,\ldots,2)$$, $$x_{Sph30}=(-2,0,-2,0,\ldots)$$, $$x_{Sph40}=(2,0,2,0,\ldots)$$, $$x_{SchDS10}=(-0.0001,-0.0001,\ldots,-0.0001)$$, $$x_{SchDS20}=(0.00001,0.00001,\ldots, 0.00001)$$, $$x_{SchDS30}=(-0.0001,0,-0.0001,0, \ldots)$$, $$x_{SchDS40}=(0.00001,0,0.00001,0,\ldots)$$, $$x_{Gri10}=(-21,-21,\ldots,-21)$$, $$x_{Gri20}=(32,32,\ldots,32)$$, $$x_{Gri30}=(-21,0,-21,0,\ldots)$$, $$x_{Gri40}=(32,0,32,0,\ldots)$$, $$x_{Ros10}=(1.45,1.45,\ldots,1.45)$$, $$x_{Ros20}=(2.1,2.1,\ldots,2.1)$$, $$x_{Ros30}=(1.45,0,1.45, 0,\ldots)$$, $$x_{Ros40}=(2.1,0,2.1,0,\ldots)$$, $$x_{Ack10}=(-0.002,-0.002,\ldots,-0.002)$$, $$x_{Ack20}=(0.004, 0.004,\ldots,0.004)$$, $$x_{Ack30}=(-0.002,0,-0.002,0,\ldots)$$, and $$x_{Ack40}=(0.004,0,0.004,0,\ldots)$$.
The numerical results in Table 3 show that the proposed algorithm performs the best among the four methods. The total cpu time of the proposed algorithm is the shortest. BFGS-Non performs better than BFGS-WP and BFGS-WP-Zhang, which is consistent with the results of . Additionally, BFGS-WP-Zhang performs better than BFGS-WP, which is consistent with the results of . To directly illustrate the performances of these four methods, we also use the tool of Dolan and Moré  to analyze the results with respect to NI and NFG in Table 3. Figures 4 and 5 show their performances. Figure 4 Performance profiles of these methods (NI). Figure 5 Performance profiles of these methods (NFG).

Figure 4 indicates that BFGS-WP can solve approximately 93% of the test problems and that the other three methods can solve all the problems. The proposed algorithm solves the problems in the shortest amount of time.

The performance in Figure 5 is similar to that in Figure 4. BFGS-WP can solve approximately 95% of the test problems, while the other methods can solve all the problems.

According to these two figures, the proposed algorithm has the best performance among these four methods, and the BFGS-WP performs the worst. In summary, based on the numerical results of the  and benchmark problems, the GLL nonmonotone line search with quasi-Newton update is more effective than the normal WWP line search with quasi-Newton update, which is consistent with the results of [47, 51]. Moreover, these numerical results indicate that the modified BFGS equation (1.10) is better than the normal BFGS update, which is consistent with the results of . Furthermore, the proposed algorithm is competitive with the related methods.

## 6 Conclusion

1. (i)

This paper conducts a further study of the modified BFGS update formula in . The main contribution is the global convergence and superlinear convergence for generally convex functions. The numerical results show that the proposed method is competitive with other quasi-Newton methods for the test problems.

2. (ii)

In contrast to  and , this paper achieves both superlinear and global convergence. Moreover, the convergence is obtained for generally convex functions, whereas the other two papers only obtained convergence for uniformly convex functions. The conditions of this paper are weaker than those of the previous research.

3. (iii)

For further research, we should study the performance of the new algorithm under different stop rules and in different testing environments (such as ). Moreover, more numerical experiments for large practical problems should be performed in the future.

## Declarations

### Acknowledgements

The authors thank the referees for their valuable comments, which greatly improved their paper. 