A modified nonmonotone BFGS algorithm for unconstrained optimization

In this paper, a modified BFGS algorithm is proposed for unconstrained optimization. The proposed algorithm has the following properties: (i) a nonmonotone line search technique is used to obtain the step size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\alpha_{k}$\end{document}αk to improve the effectiveness of the algorithm; (ii) the algorithm possesses not only global convergence but also superlinear convergence for generally convex functions; (iii) the algorithm produces better numerical results than those of the normal BFGS method.

where s k = x k+x k , y k = g k+g k , and g k+ = ∇f (x k+ ). The following update of B k : where δ k = y k + (max{, -y T k s k s k  } + φ( g k ))s k and function φ : → satisfies: (i) φ(t) >  for all t > ; (ii) φ(t) =  if and only if t = ; (iii) if t is in a bounded set, and φ(t) is bounded. Using the definition of δ k , it is not difficult to obtain δ T k s k ≥ max s T k y k , φ g k s k  > .
This is sufficient to guarantee the positive definiteness of B k+ as long as B k is positive definite. Li and Fukashima presented φ(t) = μt with some constant μ > . where δ k , φ and the properties are the same as those in Formula . For nonconvex functions, these two methods possess global convergence and superlinear convergence.

Formula  ([]) The BFGS update formula is defined by
Some scholars have conducted further research to obtain a better approximation of the Hessian matrix of the objective function. where y m * k = y k + ρ k s k  s k and ρ k = [f (x k )f (x k + α k d k )] + (g(x k + α k d k ) + g(x k )) T s k . It is easy to conclude that this formula contains both gradient and function value information.
One may believe that the resulting methods will outperform the normal BFGS method. In fact, the practical computation shows that the method is better than the normal BFGS method and that it has some theoretical advantages (see [, ] where y m k = y k + max{ ρ k s k  , }s k . This modified method obtains global convergence and superlinear convergence for generally convex functions. The same work was previously performed by Zhang et al. [].

Formula  ([]) The BFGS update formula is defined by
It is clear that the quasi-Newton equation (.) also contains both gradient and function value information, and it has been proved that the new formula has a higher order approximation to ∇  f (x). Furthermore, Yuan It is easy to conclude that C k+ is a convex combination of C k and f (x k+ ). The numerical results show that this technique is more competitive than the nonmonotone method of [], but it requires strong assumption conditions for convergence analysis. Motivated by the above observations, we study the modified BFGS-type method of Yuan et al.
[] based on the formula (.). The modified BFGS-type method and the proposed algorithm have the following characteristics: • The GLL line search technique is used in the algorithm to ensure good convergence.
• The major contribution of the new algorithm is an extension of the modified BFGS update from [] and []. • Another contribution is the proof of global convergence for generally convex functions. • The major aim of the proposed method is to establish the superlinear convergence and the global convergence for generally convex functions. • The experimental problems, including both normal unconstrained optimization and engineering problems (benchmark problems), indicate that the proposed algorithm is competitive with the normal method. This paper is organized as follows. In the next section, we present the algorithm. The global convergence and superlinear convergence are established in Section  and Section , respectively. Numerical results are reported in Section . In the final section, we present a conclusion. Throughout this paper, · denotes the Euclidean norm of a vector or matrix.

Algorithm
In this paper, we study the modified formula of [] and obtain global convergence and superlinear convergence under generally convex conditions. The modified BFGS update of (.) is presented as By the definition of the convex property of f , s T k y * k >  holds (see [] in detail). Therefore, the update matrix B * k+ from (.) inherits the positive definiteness of B * k for generally convex functions. Now, we state the algorithm as follows.
Step : g k ≤ ε, stop; Otherwise, go to the next step.
Step : Solve to obtain d k .
Step : Let x k+ = x k + α k d k .
Step : Generate B * k+ from (.) and set k = k + ; Go to Step .

Global convergence
The following assumptions are required to obtain the global convergence of Algorithm .
The objective function f is continuously differentiable and convex on L  . Moreover, there exists a constant L ≥  satisfying Assumption A implies that there exist constants M >  and >  satisfying and

Lemma . Suppose Assumption A holds. Then there exists a constant M * >  such that
The proof is similar to [], so it is not presented here.
Lemma . Let B k be updated by (.); then the relation

Lemma . Assume that Assumption A holds and that sequence
Proof For k = , by the positive definiteness of B  , we have s T  y *  > . Then B  is generated by (.), and B  is positive definite. Assume that B k is positive definite; for all k ≥ , we prove that s T k y * k >  holds by the following three cases. Case :Ā k < . The definition of y * k , the convexity of f (x), and Assumption A generate Case :Ā k = . By (.), (.), Assumption A, the definition of y * k , and the positive definiteness of B k , we get where σ * ∈ (, ).

Superlinear convergence analysis
Based on Theorem ., we suppose that x * is the limit of the sequence {x k }. To establish the superlinear convergence of Algorithm , the following additional assumption is needed.
is positive definite and Hölder continuous at x * , namely, for all x in the neighborhood of x * , there exist constants u ≥ (, ) and ζ ≥  satisfying In a way similar to [], we can obtain the superlinear convergence of Algorithm , which we state as follows but we omit its proof.
Theorem . Let Assumption A and B hold and {x k } be generated by Algorithm . Then the sequence {x k } superlinearly tends to x * .

Numerical results
This section reports the numerical results of Algorithm . All code was written in MAT-LAB . and run on a PC with a . GHz CPU processor,  MB memory and the Windows XP operating system. The parameters are chosen as δ = ., σ = ., ε =  - ,  = .,  = ., p = , M  = , and the initial matrix B  = I is the unit matrix. Since the line search cannot ensure the descent condition d T k g k < , an uphill search direction may occur in the numerical experiments. In this case, the line search rule may fail. To avoid this case, the step size α k is accepted if the search number is greater than  in the line search. The following is the Himmeblau stop rule: In the experiment, if g(x) < ε or stop < e  satisfies e  = e  =  - , we end the program.

[57] problems
It has been proved that [] problems with initial points are an effective tool to estimate the performance of algorithms and are one of the most commonly used sets of optimization problems. Many scholars use these problems to assess their algorithms (see [, , , ]). In this paper, we also perform experiments on these problems. The detailed numerical results are listed in In Table 

Figure 2 Performance profiles of these methods (NFG).
results in Table  indicate that the proposed method is competitive with the other three similar methods.
To directly illustrate the performance of these methods, we utilize the tool of Dolan and Moré [] to analyze their efficiency. Figures , , and  show that the performance is related to NI, NFG, and Time, respectively. According to these three figures, the MN-BFGS-A method has the best performance (the highest probability of being the optimal solver). Zhang and the BFGS-WP methods solve the test problems with probabilities of % and %, respectively. Figure  shows that the success rates when using the BFGS-M-Non and BFGS-Non methods to address the test problems are higher than the success rates when using BFGS-WP and BFGS-WP-Zhang by approximately % and %, respectively. Additionally, the BFGS-M-Non and BFGS-Non algorithms can address almost all the test problems. Moreover, BFGS-WP-Zhang has better results than BFGS-WP.

Benchmark problems
The benchmark problems listed in Table  are widely applied in various practical engineering situations. A function is multimodal if it has two or more local optima. A function p of the responding variables is separable provided that it can be rewritten as a sum of p functions of just one variable []. Separability is closely related to the concept of epistasis or interrelation among the variables of a function. Non-separable functions are more difficult to optimize because the accuracy of the searching direction depends on two or more variables. By contrast, separable functions can be optimized for each variable in turn. The problem is even more difficult if the function is multimodal. The search process must be able to avoid the regions around local minima in order to approximate, as closely as possible, the global optimum. The most complex case appears when the local optima are randomly distributed in the search space.
The dimensionality of the search space is another important factor in the complexity of the problem. A study of the dimensionality problem and its features was conducted by Friedman []. To establish the same degree of difficulty in all cases, a search space of dimensionality p =  is chosen for all the functions. In the experiment, we do not fix the value to p = , namely, it can be larger than . The exact dimensions can be found in Table .
However, the effectiveness of one algorithm compared another algorithm cannot be determined based on the number of problems that it solves better. The 'no free lunch' theorem (see []) states that provided we compare two searching algorithms with all possible functions, the performance of any two algorithms will be, on average, the same. As a result, attempting to find a perfect test set where all the functions are present to determine Sphere -5.12, 5.12], x * = (0, 0, . . . , 0), f Sph (x * ) = 0.
whether an algorithm is better than another algorithm for every function is a fruitless task. Therefore, when an algorithm is evaluated, we identify the types of problems where its performance is good to characterize the types of problems for which the algorithm is suitable. The authors previously studied functions to be optimized to construct a test set with a better selection of fewer functions (see [, ]). This enables us to draw conclusions about the performance of the algorithm depending on the type of function.
The above benchmark problems and the discussions of the choice of test problems for an algorithm can be found at http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume/ortizboyera-html/ node.html. Many scholars use these problems to test numerical optimization methods (see [, ] etc.). Based on the above discussions, in this subsection, we test the four algorithms on the Benchmark problems. The test results are presented in Table , where x  denotes the initial point, The numerical results in Table  show that the proposed algorithm performs the best among the four methods. The total cpu time of the proposed algorithm is the shortest. BFGS-Non performs better than BFGS-WP and BFGS-WP-Zhang, which is consistent with the results of []. Additionally, BFGS-WP-Zhang performs better than BFGS-WP, which is consistent with the results of []. To directly illustrate the performances of these four methods, we also use the tool of Dolan and Moré [] to analyze the results with respect to NI and NFG in Table . Figures  and  show their performances. Figure  indicates that BFGS-WP can solve approximately % of the test problems and that the other three methods can solve all the problems. The proposed algorithm solves the problems in the shortest amount of time.
The performance in Figure  is similar to that in Figure . BFGS-WP can solve approximately % of the test problems, while the other methods can solve all the problems.
According to these two figures, the proposed algorithm has the best performance among these four methods, and the BFGS-WP performs the worst. In summary, based on the