Skip to main content

A modified nonmonotone BFGS algorithm for unconstrained optimization

Abstract

In this paper, a modified BFGS algorithm is proposed for unconstrained optimization. The proposed algorithm has the following properties: (i) a nonmonotone line search technique is used to obtain the step size \(\alpha_{k}\) to improve the effectiveness of the algorithm; (ii) the algorithm possesses not only global convergence but also superlinear convergence for generally convex functions; (iii) the algorithm produces better numerical results than those of the normal BFGS method.

1 Introduction

Consider

$$ \min \bigl\{ f(x) \vert x \in \Re^{n} \bigr\} , $$
(1.1)

where \(f(x):\Re^{n}\rightarrow \Re \) is continuously differentiable. Many similar problems can be transformed into the above optimization problem (see [1–16] etc.). The following iteration formula is used to address the iteration point of (1.1):

$$ x_{k+1}=x_{k}+\alpha_{k}d_{k},\quad k=0, 1, 2,\ldots, $$
(1.2)

where \(x_{k}\) is the kth iterative point, \(\alpha_{k}>0\) is the step length, and \(d_{k}\) is the search direction of f at \(x_{k}\). The search direction \(d_{k}\) determines the line search method (see [17–25]). The quasi-Newton method is defined by

$$ B_{k}d_{k}+g_{k}=0, $$
(1.3)

where \(g_{k}=\nabla f(x_{k})\), \(B_{k}\) is the quasi-Newton update matrix, and the sequence \(\{B_{k}\}\) satisfies the so-called quasi-Newton equation

$$ B_{k+1}s_{k}=y_{k}, $$
(1.4)

where \(s_{k}=x_{k+1}-x_{k}\), \(y_{k}=g_{k+1}-g_{k}\), and \(g_{k+1}= \nabla f(x_{k+1})\). The following update of \(B_{k}\):

$$ B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}y_{k}^{T}}{s_{k}^{T}y_{k}} $$
(1.5)

is the BFGS formula (Broyden [26], Fletcher [27], Goldfar [28], and Shanno [29]), which is one of the most effective quasi-Newton methods. Convex functions can be combined with exact line or certain special inexact line search techniques that have global convergence (see [30–32] etc.) and superlinear convergence (see [33, 34] etc.). For general functions, under inexact line search techniques, Dai [35] constructed an example to show that the BFGS method fails. Mascarenhas [36] proved the nonconvergence of this method, even with the exact line search technique. To obtain global convergence of a BFGS method without the convexity assumption, Li and Fukushima [37, 38] proposed the following modified BFGS methods.

Formula 1

[37]

The BFGS update formula is defined by

$$ B_{k+1}=B_{k}+\frac{\delta_{k}^{T}\delta_{k}}{s_{k}^{T}\delta_{k}}- \frac{B _{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}, $$
(1.6)

where \(\delta_{k}=y_{k}+(\max \{0,-\frac{y_{k}^{T}s_{k}}{\Vert s_{k}\Vert ^{2}}\}+\phi (\Vert g_{k}\Vert ))s_{k}\) and function \(\phi:\Re \rightarrow \Re \) satisfies: (i) \(\phi (t)>0\) for all \(t>0\); (ii) \(\phi (t)=0\) if and only if \(t=0\); (iii) if t is in a bounded set, and \(\phi (t)\) is bounded. Using the definition of \(\delta_{k}\), it is not difficult to obtain

$$\delta_{k}^{T}s_{k}\geq \max \bigl\{ s_{k}^{T}y_{k},\phi \bigl(\Vert g_{k}\Vert \bigr)\Vert s_{k}\Vert ^{2}\bigr\} >0. $$

This is sufficient to guarantee the positive definiteness of \(B_{k+1}\) as long as \(B_{k}\) is positive definite. Li and Fukashima presented \(\phi (t)=\mu t\) with some constant \(\mu >0\).

Formula 2

[38]

The BFGS update formula is defined by

$$\begin{aligned} B_{k+1}=\textstyle\begin{cases} B_{k}+\frac{\delta_{k}^{T}\delta_{k}}{s_{k}^{T}\delta_{k}}-\frac{B _{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}, & \mbox{if } \frac{\delta_{k}^{T}s_{k}}{\Vert s_{k}\Vert ^{2}}\geq \phi (\Vert g_{k}\Vert ), \\ B_{k},& \mbox{otherwise}, \end{cases}\displaystyle \end{aligned}$$
(1.7)

where \(\delta_{k}\), Ï• and the properties are the same as those in Formula 1. For nonconvex functions, these two methods possess global convergence and superlinear convergence.

Some scholars have conducted further research to obtain a better approximation of the Hessian matrix of the objective function.

Formula 3

[39]

The BFGS update formula is defined by

$$ B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}^{m*}{y_{k}^{m*}}^{T}}{s_{k}^{T}y_{k}^{m*}}, $$
(1.8)

where \(y_{k}^{m*}=y_{k} + \frac{\rho_{k}}{\Vert s_{k}\Vert ^{2}}s_{k}\) and \(\rho_{k}=2[f(x_{k})-f(x_{k}+\alpha_{k}d_{k})]+(g(x_{k}+\alpha_{k} d _{k})+g(x_{k}))^{T}s_{k} \). It is easy to conclude that this formula contains both gradient and function value information. One may believe that the resulting methods will outperform the normal BFGS method. In fact, the practical computation shows that the method is better than the normal BFGS method and that it has some theoretical advantages (see [39, 40]). Under the WWP line search, Wei et al. [39] proposed the quasi-Newton method and established its superlinear convergence for uniformly convex functions. Its global convergence can be found in [40], but the method fails for general convex functions. One of the main reasons for the failure is the non-positive definiteness of matrix \(B_{k}\) for general convex functions. Byrd et al. [31, 32] showed that the positive definiteness of matrix \(B_{k}\) plays an important role in the convergence of the quasi-Newton algorithm. Yuan and Wei [41] first analyzed the global convergence and superlinear convergence of the modified BFGS formula in [39] using gradient and function value information for general convex functions. Based on equation (1.9), Yuan and Wei [41] proposed another BFGS formula.

Formula 4

[41]

The BFGS update formula is defined by

$$ B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}^{m}{y_{k}^{m}}^{T}}{s_{k}^{T}y_{k}^{m}}, $$
(1.9)

where \(y_{k}^{m}=y_{k} + \max \{\frac{\rho_{k}}{\Vert s_{k}\Vert ^{2}},0\}s _{k}\). This modified method obtains global convergence and superlinear convergence for generally convex functions. The same work was previously performed by Zhang et al. [42].

Formula 5

[42]

The BFGS update formula is defined by

$$ B_{k+1}=B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{y _{k}^{1*}{y_{k}^{1*}}^{T}}{s_{k}^{T}y_{k}^{1*}}, $$
(1.10)

where \(y_{k}^{1*}=y_{k}+\bar{A}_{k}s_{k}\), \(\bar{A}_{k}=\frac{6[f(x _{k})-f(x_{k}+\alpha_{k}d_{k})]+3(\nabla f(x_{k}+\alpha_{k} d_{k})+ \nabla f(x_{k}))^{T}s_{k}}{\Vert s_{k}\Vert ^{2}}\). It is clear that the quasi-Newton equation (1.10) also contains both gradient and function value information, and it has been proved that the new formula has a higher order approximation to \(\nabla^{2} f(x)\). Furthermore, Yuan et al. [43] extended a similar technique to \(y_{k}^{1*}\) in a limited memory BFGS method, where global convergence is only obtained for uniformly convex functions. Several other modified quasi-Newton methods have been reported (see [23, 40, 44, 45]).

The monotone line search technique is often used to determine the step size \(\alpha_{k}\). One famous technique is the weak Wolfe-Powell (WWP) technique.

  1. (i)

    WWP line search technique. \(\alpha_{k}\) is determined by

    $$ f(x_{k}+\alpha_{k}d_{k})\leq f(x_{k})+\delta \alpha_{k}g_{k}^{T}d_{k},\qquad g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq \sigma g_{k}^{T}d_{k}, $$
    (1.11)

    where \(0<\delta <\sigma <1\). Recently, a modified WWP line search technique was proposed by Yuan, Wei, and Lu [46] to ensure that the BFGS and the PRP methods have global convergence for nonconvex functions; these two open problems have been solved. However, monotonicity may generate a series of extremely small steps if the contours of the objective functions are a family of curves with large curvature [47]. Nonmonotonic line search to solve unconstrained optimization was proposed by Grippo et al. in [47–49] and was further studied by [50]. Grippo, Lamparillo, and Lucidi [47] proposed the following nonmonotone line search and called it GLL line search.

  2. (ii)

    GLL nonmonotone line search. \(\alpha_{k}\) is determined by

    $$\begin{aligned}& f(x_{k+1}) \leq \max_{0\leq j \leq M_{0}}f(x_{k-j})+ \epsilon_{1}\alpha _{k}g_{k}^{T}d_{k}, \end{aligned}$$
    (1.12)
    $$\begin{aligned}& g(x_{k+1})^{T}d_{k} \geq \max \bigl\{ \epsilon_{2}, 1-\bigl(\alpha_{k}\Vert d_{k} \Vert \bigr)^{p} \bigr\} g_{k}^{T}d_{k}, \end{aligned}$$
    (1.13)

    where \(p\in (-\infty,1)\), \(k=0, 1, 2, \ldots\) , \(\varepsilon_{1} \in (0,1)\), \(\varepsilon_{2} \in (0,\frac{1}{2})\), \(M_{0}\) is a nonnegative integer. By combining this line search with the normal BFGS formula, Han and Liu [51] established the global convergence of the convex objective function; its superlinear convergence was established by Yuan and Wei [52]. Although these nonmonotone techniques perform well in many cases, the numerical performance is dependent on the choice of \(M_{0}\) to some extent (see [47, 53, 54] in detail). Zhang and Hager [55] presented another nonmonotone line search technique.

  3. (iii)

    Zhang and Hager nonmonotone line search technique [55]. In this technique \(\alpha_{k}\) is found by

    $$ Q_{k+1}=\eta_{k}Q_{k}+1,\qquad C_{k+1}= \frac{\eta_{k}Q_{k}C_{k}+f(x_{k+1})}{Q_{k+1}}, $$
    (1.14)

    where \(\eta_{k}\in [\eta_{\min },\eta_{\max }]\), \(0\leq \eta_{\min } \leq \eta_{\max }\leq 1\), \(C_{0}=f(x_{0})\) and \(Q_{0}=1\). It is easy to conclude that \(C_{k+1}\) is a convex combination of \(C_{k}\) and \(f(x_{k+1})\). The numerical results show that this technique is more competitive than the nonmonotone method of [47], but it requires strong assumption conditions for convergence analysis.

Motivated by the above observations, we study the modified BFGS-type method of Yuan et al. [43] based on the formula (1.10). The modified BFGS-type method and the proposed algorithm have the following characteristics:

  • The GLL line search technique is used in the algorithm to ensure good convergence.

  • The major contribution of the new algorithm is an extension of the modified BFGS update from [43] and [42].

  • Another contribution is the proof of global convergence for generally convex functions.

  • The major aim of the proposed method is to establish the superlinear convergence and the global convergence for generally convex functions.

  • The experimental problems, including both normal unconstrained optimization and engineering problems (benchmark problems), indicate that the proposed algorithm is competitive with the normal method.

This paper is organized as follows. In the next section, we present the algorithm. The global convergence and superlinear convergence are established in Section 3 and Section 4, respectively. Numerical results are reported in Section 5. In the final section, we present a conclusion. Throughout this paper, \(\Vert \cdot \Vert \) denotes the Euclidean norm of a vector or matrix.

2 Algorithm

In this paper, we study the modified formula of [43] and obtain global convergence and superlinear convergence under generally convex conditions. The modified BFGS update of (1.10) is presented as

$$ B_{k+1}^{*}=B_{k}^{*} - \frac{B_{k}^{*} s_{k} s_{k}^{T} B_{k}^{*}}{s _{k}^{T} B_{k}^{*} s_{k}} + \frac{y_{k}^{*} {y_{k}^{*}}^{T}}{{y_{k} ^{*}}^{T} s_{k}}, $$
(2.1)

where \({y_{k}^{*}}=y_{k}+A_{k}^{*}s_{k}\), \(A_{k}^{*}=\max \{\bar{A} _{k},0\}\). The corresponding quasi-Newton equation is

$$ B_{k+1}^{*}s_{k}=y_{k}^{*}. $$
(2.2)

By the definition of the convex property of f, \(s_{k}^{T}y_{k}^{*}>0\) holds (see [43] in detail). Therefore, the update matrix \(B_{k+1}^{*}\) from (2.1) inherits the positive definiteness of \(B_{k}^{*}\) for generally convex functions. Now, we state the algorithm as follows.

Algorithm 1

Mod-non-BFGS-A

Step 0::

Given a symmetric and positive definite matrix \(B_{0}^{*}\) and an integer \(M_{0}>0\), choose an initial point \(x_{0} \in \Re^{n}\), \(0<\varepsilon <1\), \(0<\epsilon_{1}<\epsilon_{2}<1\), \(p\in (-\infty,1)\); Set \(k:=0\).

Step 1::

\(\Vert g_{k}\Vert \leq \varepsilon\), stop; Otherwise, go to the next step.

Step 2::

Solve

$$ B_{k}^{*}d_{k}+g_{k}=0 $$
(2.3)

to obtain \(d_{k}\).

Step 3::

The step length \(\alpha_{k}\) is determined by GLL (1.12) and (1.13).

Step 4::

Let \(x_{k+1}=x_{k}+\alpha_{k}d_{k}\).

Step 5::

Generate \(B_{k+1}^{*}\) from (2.1) and set \(k=k+1\); Go to Step 1.

3 Global convergence

The following assumptions are required to obtain the global convergence of Algorithm 1.

Assumption A

  1. (i)

    The level set \(\L_{0}=\{x \mid f(x) \le f(x _{0}) \}\) is bounded.

  2. (ii)

    The objective function f is continuously differentiable and convex on \(L_{0}\). Moreover, there exists a constant \(L\ge 0\) satisfying

    $$ \bigl\Vert g(x)-g(y)\bigr\Vert \le L\Vert x-y \Vert ,\quad \forall x, y \in L_{0}. $$
    (3.1)

Assumption A implies that there exist constants \(M>0\) and \(\varrho >0\) satisfying

$$\bigl\Vert G(x)\bigr\Vert \leq M,\qquad G(x)=\nabla^{2} f(x), \quad x\in L_{0}, $$

and

$$ \frac{\Vert y_{k}\Vert ^{2}}{s_{k}^{T}y_{k}}\leq \varrho,\quad k\geq 0\ (\mbox{see [56]}). $$
(3.2)

Lemma 3.1

Suppose Assumption A holds. Then there exists a constant \(M_{*}>0\) such that

$$\frac{\Vert y^{*}_{k}\Vert ^{2}}{s^{T}_{k}y^{*}_{k}}\leq M_{*}. $$

The proof is similar to [41], so it is not presented here.

Lemma 3.2

Let \(B_{k}\) be updated by (2.1); then the relation

$$\det \bigl(B_{k+1}^{*}\bigr)=\det \bigl(B_{k}^{*} \bigr)\frac{(y_{k}^{*})^{T} s_{k}}{s_{k} ^{T} B_{k}^{*}s_{k}} $$

holds, where \(\det (B_{k}^{*})\) denotes the determinant of \(B_{k}^{*}\).

Lemma 3.3

Assume that Assumption A holds and that sequence \(\{x_{k}\}\) is generated by Algorithm 1. If

$$\liminf_{k\rightarrow \infty } \Vert g_{k}\Vert >0, $$

then there exists a constant \(\epsilon '>0\) satisfying

$$\prod_{j=1}^{k} \gamma_{j} \geq \bigl(\epsilon '\bigr)^{k},\quad \textit{for all }k\geq 1, $$

where \(\gamma_{j}=\frac{-g_{j}^{T}d_{j}}{\Vert d_{j}\Vert }\).

Proof

For \(k=0\), by the positive definiteness of \(B_{0}\), we have \(s_{0}^{T}y_{0}^{*}>0\). Then \(B_{1}\) is generated by (2.1), and \(B_{1}\) is positive definite. Assume that \(B_{k}\) is positive definite; for all \(k\geq 1\), we prove that \(s_{k}^{T}y_{k}^{*}>0\) holds by the following three cases.

Case 1: \(\bar{A}_{k}<0\) . The definition of \(y_{k}^{*}\), the convexity of \(f(x)\), and Assumption A generate

$$s_{k}^{T}y_{k}^{*}=s_{k}^{T}y_{k}>0. $$

Case 2: \(\bar{A}_{k}=0\) . By (1.13), (2.3), Assumption A, the definition of \(y_{k}^{*}\), and the positive definiteness of \(B_{k}\), we get

$$s_{k}^{T}y_{k}^{*}=s_{k}^{T}y_{k} \geq -(1-\sigma_{*})\alpha_{k}d_{k} ^{T}g_{k}=(1-\sigma_{*})\alpha_{k}d_{k}^{T}B_{k}^{*}d_{k}>0, $$

where \(\sigma_{*}\in (0,1)\).

Case 3: \(\bar{A}_{k}>0\) . The proof can be found in [41]

Similar to the proof of Theorem 3.1 in [51], we can establish the global convergence theorem of Algorithm 1. Here, we state the theorem but omit the proof. □

Theorem 3.1

Let the conditions of Lemma 3.3 hold; then we have

$$ \liminf_{k\rightarrow \infty } \Vert g_{k}\Vert =0. $$
(3.3)

4 Superlinear convergence analysis

Based on Theorem 3.1, we suppose that \(x^{*}\) is the limit of the sequence \(\{x_{k}\}\). To establish the superlinear convergence of Algorithm 1, the following additional assumption is needed.

Assumption B

\(g(x^{*})=0\) with \(x_{k}\rightarrow x^{*}\). \(G(x^{*}) \) is positive definite and Hölder continuous at \(x^{*}\), namely, for all x in the neighborhood of \(x^{*}\), there exist constants \(u\geq (0,1)\) and \(\zeta \geq 0\) satisfying

$$ \bigl\Vert G(x)-G\bigl(x^{*}\bigr)\bigr\Vert \le \zeta \bigl\Vert x-x^{*}\bigr\Vert ^{u}, $$
(4.1)

where \(G(x)=\nabla^{2} f(x)\).

In a way similar to [41], we can obtain the superlinear convergence of Algorithm 1, which we state as follows but we omit its proof.

Theorem 4.1

Let Assumption A and B hold and \(\{x_{k}\}\) be generated by Algorithm 1. Then the sequence \(\{x_{k}\}\) superlinearly tends to \(x^{*}\).

5 Numerical results

This section reports the numerical results of Algorithm 1. All code was written in MATLAB 7.0 and run on a PC with a 2.60 GHz CPU processor, 256 MB memory and the Windows XP operating system. The parameters are chosen as \(\delta =0.1\), \(\sigma =0.9\), \(\varepsilon =10^{-5}\), \(\epsilon_{1}=0.1\), \(\epsilon_{2}=0.01\), \(p=5\), \(M_{0}=8\), and the initial matrix \(B_{0}=I\) is the unit matrix. Since the line search cannot ensure the descent condition \(d_{k}^{T}g_{k}<0\), an uphill search direction may occur in the numerical experiments. In this case, the line search rule may fail. To avoid this case, the step size \(\alpha_{k}\) is accepted if the search number is greater than 25 in the line search. The following is the Himmeblau stop rule: If \(\vert f(x_{k})\vert > e_{1}\), let \(\mathit{stop}1=\frac{\vert f(x_{k})-f(x_{k+1})\vert }{\vert f(x_{k})\vert }\); otherwise, let \(\mathit{stop}1=\vert f(x_{k})-f(x_{k+1})\vert \). In the experiment, if \(\Vert g(x)\Vert < \varepsilon \) or \(\mathit{stop} 1 < e_{2}\) satisfies \(e_{1}=e _{2}=10^{-5}\), we end the program.

5.1 [57] problems

It has been proved that [57] problems with initial points are an effective tool to estimate the performance of algorithms and are one of the most commonly used sets of optimization problems. Many scholars use these problems to assess their algorithms (see [23, 40, 42, 51]). In this paper, we also perform experiments on these problems. The detailed numerical results are listed in Table 1, where the columns of Table 1 have the following meaning:

Problem::

the name of the test problem;

Dim::

the dimensions of the problem;

NI::

the total number of iterations;

Time::

the cpu time in seconds;

NFG::

\(NFG=NF+5NG\), where NF and NG are the total number of function and gradient evaluations, respectively (see [47]).

Table 1 Numerical results

In Table 1, ‘BFGS-WP’, ‘BFGS-Non’, ‘BFGS-WP-Zhang’, and ‘BFGS-M-Non’ stand for the normal BFGS formula with WWP rule, the normal BFGS formula with GLL rule, the modified BFGS equation (1.10) with WWP rule, and MN-BFGS-A, respectively. The numerical results in Table 1 indicate that the proposed method is competitive with the other three similar methods.

To directly illustrate the performance of these methods, we utilize the tool of Dolan and Moré [58] to analyze their efficiency. Figures 1, 2, and 3 show that the performance is related to NI, NFG, and Time, respectively. According to these three figures, the MN-BFGS-A method has the best performance (the highest probability of being the optimal solver).

Figure 1
figure 1

Performance profiles of these methods (NI).

Figure 2
figure 2

Performance profiles of these methods (NFG).

Figure 3
figure 3

Performance profiles of these methods (Time).

Figure 1 shows that BFGS-M-Non and BFGS-Non outperform BFGS-WP and BFGS-WP-Zhang on approximately 9% and 6% of the problems, respectively. The BFGS-WP-Zhang and BFGS-WP methods can successfully solve 94% and 91% of the test problems, respectively.

Figure 2 shows that BFGS-M-Non and BFGS-Non are superior to BFGS-WP and BFGS-WP-Zhang on approximately 12% and 9% of these problems, respectively. The BFGS-M-Non and BFGS-Non methods solve 100% of the test problems at \(t\approx 10\). The BFGS-WP-Zhang and the BFGS-WP methods solve the test problems with probabilities of 91% and 88%, respectively.

Figure 3 shows that the success rates when using the BFGS-M-Non and BFGS-Non methods to address the test problems are higher than the success rates when using BFGS-WP and BFGS-WP-Zhang by approximately 6% and 9%, respectively. Additionally, the BFGS-M-Non and BFGS-Non algorithms can address almost all the test problems. Moreover, BFGS-WP-Zhang has better results than BFGS-WP.

5.2 Benchmark problems

The benchmark problems listed in Table 2 are widely applied in various practical engineering situations. A function is multimodal if it has two or more local optima. A function p of the responding variables is separable provided that it can be rewritten as a sum of p functions of just one variable [59]. Separability is closely related to the concept of epistasis or interrelation among the variables of a function. Non-separable functions are more difficult to optimize because the accuracy of the searching direction depends on two or more variables. By contrast, separable functions can be optimized for each variable in turn. The problem is even more difficult if the function is multimodal. The search process must be able to avoid the regions around local minima in order to approximate, as closely as possible, the global optimum. The most complex case appears when the local optima are randomly distributed in the search space.

Table 2 Definition of the benchmark problems and their features

The dimensionality of the search space is another important factor in the complexity of the problem. A study of the dimensionality problem and its features was conducted by Friedman [60]. To establish the same degree of difficulty in all cases, a search space of dimensionality \(p=30\) is chosen for all the functions. In the experiment, we do not fix the value to \(p=30\), namely, it can be larger than 30. The exact dimensions can be found in Table 3.

Table 3 Numerical results of the benchmark problems

However, the effectiveness of one algorithm compared another algorithm cannot be determined based on the number of problems that it solves better. The ‘no free lunch’ theorem (see [61]) states that provided we compare two searching algorithms with all possible functions, the performance of any two algorithms will be, on average, the same. As a result, attempting to find a perfect test set where all the functions are present to determine whether an algorithm is better than another algorithm for every function is a fruitless task. Therefore, when an algorithm is evaluated, we identify the types of problems where its performance is good to characterize the types of problems for which the algorithm is suitable. The authors previously studied functions to be optimized to construct a test set with a better selection of fewer functions (see [62, 63]). This enables us to draw conclusions about the performance of the algorithm depending on the type of function.

The above benchmark problems and the discussions of the choice of test problems for an algorithm can be found at

Many scholars use these problems to test numerical optimization methods (see [64, 65] etc.). Based on the above discussions, in this subsection, we test the four algorithms on the Benchmark problems. The test results are presented in Table 3, where \(x_{0}\) denotes the initial point, \(x_{Sph10}=(-2,-2,\ldots,-2)\), \(x_{Sph20}=(2,2,\ldots,2)\), \(x_{Sph30}=(-2,0,-2,0,\ldots)\), \(x_{Sph40}=(2,0,2,0,\ldots)\), \(x_{SchDS10}=(-0.0001,-0.0001,\ldots,-0.0001)\), \(x_{SchDS20}=(0.00001,0.00001,\ldots, 0.00001)\), \(x_{SchDS30}=(-0.0001,0,-0.0001,0, \ldots)\), \(x_{SchDS40}=(0.00001,0,0.00001,0,\ldots)\), \(x_{Gri10}=(-21,-21,\ldots,-21)\), \(x_{Gri20}=(32,32,\ldots,32)\), \(x_{Gri30}=(-21,0,-21,0,\ldots)\), \(x_{Gri40}=(32,0,32,0,\ldots)\), \(x_{Ros10}=(1.45,1.45,\ldots,1.45)\), \(x_{Ros20}=(2.1,2.1,\ldots,2.1)\), \(x_{Ros30}=(1.45,0,1.45, 0,\ldots)\), \(x_{Ros40}=(2.1,0,2.1,0,\ldots)\), \(x_{Ack10}=(-0.002,-0.002,\ldots,-0.002)\), \(x_{Ack20}=(0.004, 0.004,\ldots,0.004)\), \(x_{Ack30}=(-0.002,0,-0.002,0,\ldots)\), and \(x_{Ack40}=(0.004,0,0.004,0,\ldots)\).

The numerical results in Table 3 show that the proposed algorithm performs the best among the four methods. The total cpu time of the proposed algorithm is the shortest. BFGS-Non performs better than BFGS-WP and BFGS-WP-Zhang, which is consistent with the results of [51]. Additionally, BFGS-WP-Zhang performs better than BFGS-WP, which is consistent with the results of [42]. To directly illustrate the performances of these four methods, we also use the tool of Dolan and Moré [58] to analyze the results with respect to NI and NFG in Table 3. Figures 4 and 5 show their performances.

Figure 4
figure 4

Performance profiles of these methods (NI).

Figure 5
figure 5

Performance profiles of these methods (NFG).

Figure 4 indicates that BFGS-WP can solve approximately 93% of the test problems and that the other three methods can solve all the problems. The proposed algorithm solves the problems in the shortest amount of time.

The performance in Figure 5 is similar to that in Figure 4. BFGS-WP can solve approximately 95% of the test problems, while the other methods can solve all the problems.

According to these two figures, the proposed algorithm has the best performance among these four methods, and the BFGS-WP performs the worst. In summary, based on the numerical results of the [57] and benchmark problems, the GLL nonmonotone line search with quasi-Newton update is more effective than the normal WWP line search with quasi-Newton update, which is consistent with the results of [47, 51]. Moreover, these numerical results indicate that the modified BFGS equation (1.10) is better than the normal BFGS update, which is consistent with the results of [42]. Furthermore, the proposed algorithm is competitive with the related methods.

6 Conclusion

  1. (i)

    This paper conducts a further study of the modified BFGS update formula in [43]. The main contribution is the global convergence and superlinear convergence for generally convex functions. The numerical results show that the proposed method is competitive with other quasi-Newton methods for the test problems.

  2. (ii)

    In contrast to [42] and [43], this paper achieves both superlinear and global convergence. Moreover, the convergence is obtained for generally convex functions, whereas the other two papers only obtained convergence for uniformly convex functions. The conditions of this paper are weaker than those of the previous research.

  3. (iii)

    For further research, we should study the performance of the new algorithm under different stop rules and in different testing environments (such as [66]). Moreover, more numerical experiments for large practical problems should be performed in the future.

References

  1. Fu, Z, Wu, X, Guan, C, et al.: Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans. Inf. Forensics Secur. 11(12), 2706-2716 (2016)

    Article  Google Scholar 

  2. Gu, B, Sheng, VS, Tay, KY, et al.: Incremental support vector learning for ordinal regression. IEEE Trans. Neural Netw. Learn. Syst. 26(7), 1403-1416 (2015)

    Article  MathSciNet  Google Scholar 

  3. Gu, B, Sun, X, Sheng, VS: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. 99, 1-11 (2016)

    Google Scholar 

  4. Li, J, Li, X, Yang, B, et al.: Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur. 10(3), 507-518 (2015)

    Article  Google Scholar 

  5. Pan, Z, Zhang, Y, Kwong, S: Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans. Broadcast. 61(2), 166-176 (2015)

    Article  Google Scholar 

  6. Pan, Z, Lei, J, Zhang, Y, et al.: Fast motion estimation based on content property for low-complexity H.265/HEVC Encoder. IEEE Trans. Broadcast. 99, 1-10 (2016)

    Google Scholar 

  7. Yuan, G, Lu, S, Wei, Z: A new trust-region method with line search for solving symmetric nonlinear equations. Int. J. Comput. Math. 88(10), 2109-2123 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Yuan, G, Meng, Z, Li, Y: A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 168(1), 129-152 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  9. Yuan, G, Wei, Z: The Barzilai and Borwein gradient method with nonmonotone line search for nonsmooth convex optimization problems. Math. Model. Anal. 17(2), 203-216 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  10. Yuan, G, Wei, Z, Li, G: A modified Polak-Ribière-Polyak conjugate gradient algorithm for nonsmooth convex programs. J. Comput. Appl. Math. 255, 86-96 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Yuan, G, Wei, Z, Lu, S: Limited memory BFGS method with backtracking for symmetric nonlinear equations. Math. Comput. Model. 54(1-2), 367-377 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Yuan, G, Wei, Z, Lu, X: A BFGS trust-region method for nonlinear equations. Computing 92(4), 317-333 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Yuan, G, Wei, Z, Wang, Z: Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minimization. Comput. Optim. Appl. 54(1), 45-64 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  14. Yuan, G, Yao, S: A BFGS algorithm for solving symmetric nonlinear equations. Optimization 62(1), 85-99 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. Yuan, G, Zhang, M: A three-terms Polak-Ribière-Polyak conjugate gradient algorithm for large-scale nonlinear equations. J. Comput. Appl. Math. 286, 186-195 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Yuan, G, Zhang, M: A modified Hestenes-Stiefel conjugate gradient algorithm for large-scale optimization. Numer. Funct. Anal. Optim. 34(8), 914-937 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  17. Schropp, J: A note on minimization problems and multistep methods. Numer. Math. 78(1), 87-101 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  18. Schropp, J: One-step and multistep procedures for constrained minimization problems. IMA J. Numer. Anal. 20(1), 135-152 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  19. Wyk, DV: Differential optimization techniques. Appl. Math. Model. 8(6), 419-424 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  20. Vrahatis, MN, Androulakis, GS, Lambrinos, JN, et al.: A class of gradient unconstrained minimization algorithms with adaptive stepsize. J. Comput. Appl. Math. 114(2), 367-386 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  21. Yuan, G: Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems. Optim. Lett. 3(1), 11-21 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  22. Yuan, G, Duan, X, Liu, W, et al.: Two new PRP conjugate gradient algorithms for minimization optimization models. PLoS ONE 10(10), e0140071 (2015)

    Article  Google Scholar 

  23. Yuan, G, Wei, Z: New line search methods for unconstrained optimization. J. Korean Stat. Soc. 38(1), 29-39 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  24. Yuan, G, Wei, Z: A trust region algorithm with conjugate gradient technique for optimization problems. Numer. Funct. Anal. Optim. 32(2), 212-232 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yuan, G, Wei, Z, Zhao, Q: A modified Polak-Ribière-Polyak conjugate gradient algorithm for large-scale optimization problems. IIE Trans. 46(4), 397-413 (2014)

    Article  Google Scholar 

  26. Broyden, C: The convergence of a class of double rank minimization algorithms. J. Inst. Math. Appl. 6(1), 222-231 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  27. Fletcher, R: A new approach to variable metric algorithms. Comput. J. 13(2), 317-322 (1970)

    Article  MATH  Google Scholar 

  28. Goldfarb, A: A family of variable metric methods derived by variational means. Math. Comput. 24(109), 23-26 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  29. Schanno, J: Conditions of quasi-Newton methods for function minimization. Math. Comput. 24(4), 647-650 (1970)

    Article  Google Scholar 

  30. Broyden, CG, Dennis, JE, Moré, JJ: On the local and superlinear convergence of quasi-Newton methods. J. Inst. Math. Appl. 12(3), 223-245 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  31. Byrd, RH, Nocedal, J: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Sci. Comput. 26(3), 727-739 (1989)

    MathSciNet  MATH  Google Scholar 

  32. Byrd, RH: Global convergence of a cass of quasi-Newton methods on convex problems. SIAM J. Numer. Anal. 24(5), 1171-1190 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  33. Dennis, JE: Quasi-Newton methods, motivation and theory. SIAM Rev. 19(1), 46-89 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  34. Dennis, JE: A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comput. 28(126), 549-560 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  35. Dai, YH: Convergence properties of the BFGS algoritm. SIAM J. Optim. 13(3), 693-701 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  36. Mascarenhas, WF: The BFGS method with exact line searches fails for non-convex objective functions. Math. Program. 99(1), 49-61 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  37. Li, DH, Fukushima, M: A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math. 129(1-2), 15-35 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  38. Li, DH, Fukushima, M: On the global convergence of BFGS method for nonconvex unconstrained optimization problems. SIAM J. Optim. 11(4), 1054-1064 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  39. Wei, Z, Yu, G, Yuan, G, et al.: The superlinear convergence of a modified BFGS-type method for unconstrained optimization. Comput. Optim. Appl. 29(3), 315-332 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  40. Wei, Z, Li, G, Qi, L: New quasi-Newton methods for unconstrained optimization problems. Appl. Math. Comput. 175(2), 1156-1188 (2006)

    MathSciNet  MATH  Google Scholar 

  41. Yuan, G, Wei, Z: Convergence analysis of a modified BFGS method on convex minimizations. Comput. Optim. Appl. 47(2), 237-255 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  42. Zhang, JZ, Deng, NY, Chen, LH: New quasi-Newton equation and related methods for unconstrained optimization. J. Optim. Theory Appl. 102(1), 147-167 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  43. Yuan, G, Wei, Z, Wu, Y: Modified limited memory BFGS method with nonmonotone line search for unconstrained optimization. J. Korean Math. Soc. 47(4), 767-788 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  44. Davidon, WC: Variable metric method for minimization. SIAM J. Optim. 1(1), 1-17 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  45. Powell, MJD: A new algorithm for unconstrained optimization. In: Nonlinear Programming, pp. 31-65. Academic Press, New York (1970)

    Chapter  Google Scholar 

  46. Yuan, G, Wei, Z, Lu, X: Global convergence of BFGS and PRP methods under a modified weak Wolfe-Powell line search. Appl. Math. Model. 47, 811-825 (2017)

    Article  MathSciNet  Google Scholar 

  47. Grippo, L, Lampariello, F, Lucidi, S: A nonmonotone line search technique for Newton’s method. SIAM J. Sci. Comput. 23(4), 707-716 (1986)

    MathSciNet  MATH  Google Scholar 

  48. Grippo, L, Lampariello, F, Lucidi, S: A truncated Newton method with nonmonotone line search for unconstrained optimization. J. Optim. Theory Appl. 60(3), 401-419 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  49. Grippo, L, Lampariello, F, Lucidi, S: A class of nonmonotone stabilization methods in unconstrained optimization. Numer. Math. 59(1), 779-805 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  50. Liu, G, Han, J, Sun, D: Global convergence of the BFGS algorithm with nonmonotone linesearch. Optimization 34(2), 147-159 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  51. Han, J, Liu, G: Global convergence analysis of a new nonmonotone BFGS algorithm on convex objective functions. Comput. Optim. Appl. 7(3), 277-289 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  52. Yuan, GL, Wei, ZX: The superlinear convergence analysis of a nonmonotone BFGS algorithm on convex objective functions. Acta Math. Sin. Engl. Ser. 24(1), 35-42 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  53. Raydan, M: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Sci. Comput. 7(1), 26-33 (1997)

    MathSciNet  MATH  Google Scholar 

  54. Toint, PL: An assessment of nonmonotone linesearch techniques for unconstrained optimization. SIAM J. Sci. Comput. 17(3), 725-739 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  55. Zhang, H, Hager, WW: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043-1056 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  56. Powell, MJD: Some properties of the variable metric algorithm. In: Numerical Methods for Non-linear Optimization, pp. 1-17. Academic Press, London (1972)

    Google Scholar 

  57. Moré, JJ, Garbow, BS, Hillstrom, KE: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7(1), 17-41 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  58. Dolan, ED, Moré, JJ: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201-213 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  59. Hadley, G: Nonlinear and Dynamics Programming. Addison-Wesley, New Jersey (1964)

    MATH  Google Scholar 

  60. Friedman, JH: An overview of predictive learning and function approximation. In: Cherkassky, V, Friedman, JH, Wechsler, H (eds.) From Statistics to Neural Networks, Theory and Pattern Recognition Applications. NATO ASI Series F, vol. 136, pp. 1-61. Springer, Berlin (1994)

    Google Scholar 

  61. Wolpert, DH, Macready, WG: No free-lunch theorems for search. Technical Report 95-02-010, Santa Fe Institute (1995)

  62. Salomon, R: Reevaluating genetic algorithm performance under coordinate rotation of benchmark functions. Biosystems 39(3), 263-278 (1996)

    Article  Google Scholar 

  63. Whitley, D, Mathias, K, Rana, S, Dzubera, J: Building better test functions. In: Eshelman, L (ed.) Sixth International Conference on Genetic Algorithms, pp. 239-246. Kaufmann, California (1995)

    Google Scholar 

  64. Yuan, G, Lu, X, Wei, Z: A conjugate gradient method with descent direction for unconstrained optimization. J. Comput. Appl. Math. 233(2), 519-530 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  65. Yuan, G, Lu, X, Wei, Z: BFGS trust-region method for symmetric nonlinear equations. Biosystems 230(1), 44-58 (2009)

    MathSciNet  MATH  Google Scholar 

  66. Gould, NIM, Orban, D, Toint, PL: CUTEr and SifDec: a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29(4), 373-394 (2003)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the referees for their valuable comments, which greatly improved their paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangrong Li.

Additional information

Funding

This work is supported by the China NSF (Grant No. 11261006 and 11661009), the Guangxi Science Fund for Distinguished Young Scholars (No. 2015GXNSFGA139001),and the basic ability promotion project of Guangxi young and middle-aged teachers (No. 2017KY0019).

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Mr. XL wrote and organized the paper. Dr. BW performed the algorithm experiments and wrote the code. Dr. WH studied the BFGS-type methods. Only the authors contributed to writing this paper. All authors read and approved the final manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Wang, B. & Hu, W. A modified nonmonotone BFGS algorithm for unconstrained optimization. J Inequal Appl 2017, 183 (2017). https://doi.org/10.1186/s13660-017-1453-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-017-1453-5

MSC

Keywords