A class of derivative-free trust-region methods with interior backtracking technique for nonlinear optimization problems subject to linear inequality constraints

This paper focuses on a class of nonlinear optimization subject to linear inequality constraints with unavailable-derivative objective functions. We propose a derivative-free trust-region methods with interior backtracking technique for this optimization. The proposed algorithm has four properties. Firstly, the derivative-free strategy is applied to reduce the algorithm’s requirement for first- or second-order derivatives information. Secondly, an interior backtracking technique ensures not only to reduce the number of iterations for solving trust-region subproblem but also the global convergence to standard stationary points. Thirdly, the local convergence rate is analyzed under some reasonable assumptions. Finally, numerical experiments demonstrate that the new algorithm is effective.


Introduction
In this paper, we analyze the solution of following nonlinear optimization problem: where f (x) is a nonlinear twice continuously differentiable function, but its first-order or second-order derivatives are not explicitly available, A

Affine-scaling matrix for inequality constraints
The KKT system of (1) is where λ f ∈ m . A feasibility x * is said to be the stationary point for problem (1), if there exists a vector 0 ≤ λ f * ∈ m such that the KKT system (2) holds.
To solve this KKT system, some effective affine-scaling algorithms are designed. Reference [1] proposed an affine-scaling trust-region method with interior-point technique for bound-constrained semismooth equations. Reference [2] introduced affine-scaling interior-point Newton methods for bound-constrained nonlinear optimization. In particular, [3] proved the superlinear and quadratic convergence properties of affine-scaling interior-point Newton methods for bound optimization problems without strict complementarity assumption. Different affine-scaling matrix denotes different algorithm. In [4], the Dikin affine scaling was denoted by Moreover, diagonal matrix C f k def = diag{|λ f k |} was presented in [4]. Then λ f k could be obtained as a least-squares Lagrangian multiplier approximation computed by One efficient affine-scaling interior-point trust-region model is the one which is presented in [5] and [6], written in the form min q f k (p) = ∇f T k p + where ∇f (x k ) is the gradient of f (x) at the current iteration, H f k is either ∇ 2 f (x k ) or its approximation. Furthermore, ∇f T k h f k ≤ ε, where and ε is a small enough constant, is usually considered as the termination criterion in this class of algorithms.

Motivation
The above discussions illustrate that the affine-scaling interior-point trustregion method is an effective way to solve the nonlinear optimization problems with inequality constraints. The trust-region frame guarantees the stable numerical performance. However, in Eqs. (4)-(6) the first-and second-order derivatives play important roles during the computational process, which maybe fail to solve the optimization problems like (1). If both the feasibility and the stability of the algorithm need to be guaranteed, we should consider the derivative-free trust-region methods.

Derivative-free technique for trust-region subproblem
Since the first-or second-order derivatives of objective functions are not explicitly available, the derivative-free optimization algorithms have been favored by researchers for a time. The application forms of the derivative-free theory are devise [7,8] and widely applied. Reference [9] proposed a derivative-free algorithm for least-squares minimization, and proved the local convergence in [10]. Reference [11] presented a derivative-free approach to constrained multiobjective nonsmooth optimization. Reference [12] presented a higher-order contingent derivative of perturbation maps in multiobjective optimization.
In [13], Conn proposed an unconstrained derivative-free trust-region method. They constructed the trust-region subproblem Following this idea, we consider that Y k = {y 0 k , y 1 k , . . . , y t k } is an interpolation sample set around the current iteration point x k , and we construct the trust-region subproblem We should note that the gradient and Hessian in (5) and (7), (4) and (8), (6) and (9) are different. Meanwhile, since the algorithm in this paper adopts both the decrease direction p and the stepsize α to update the iteration point, we give a new definition of the error bounds between the objective function f (x k + αp) and the approximation function m(x k + αp) to ensure the global convergence. We shall show the details after assumption (A1).

Assumption
(A1) Suppose that a level set L(x 0 ) and a maximal radius max are given. Assume that f is twice continuously differentiable with Lipschitz continuous Hessian in an appropriate open domain containing the max neighborhood x∈L(x 0 ) B(x, max ) of the set L(x 0 ).

Definition 1
Given a function f satisfies (A1). M = {m : n → , m ∈ C 2 } is a set of model functions. If there exist positive constants κ ef , κ eg , κ eh , and κ blh , such that, for any x ∈ L(x 0 ), ∈ (0, max ], and α ∈ (0, 1], there is a model function m(x + αp) ∈ M, with Lipschitz continuous Hessian and corresponding Lipschitz constant bounded by κ blh , and such that: 1 the error between the Hessian of the model m(x + αp) and the Hessian of the function f (x + αp) satisfies 2 the error between the gradient of the model m(x + αp) and the gradient of the function f (x + αp) satisfies 3 the error between the model m(x + αp) and the function f (x + αp) satisfies Such a model m is called fully quadratic on B(x, ).
In this paper, we aim to present a class of derivative-free trust-region method for nonlinear programming with linear inequality constraints. The main features of this paper are: • We use the derivatives of approximation function m(x k + αp) to replace the derivatives of objective function f (x k + αp) to reduce the algorithm's requirement for gradient and Hessian of the iteration points. We solve an affine-scaling trust-region subproblem to find a feasible search direction in each iteration. • In the kth iteration, a feasible search direction p is obtained from an affine-scaling trust-region subproblem. Meanwhile, interior backtracking skill will be applied both for determining stepsize α and for guaranteeing the feasibility of iteration point. • We will show that the iteration points generated by the proposed algorithm could converge to the optimal points of (1). • Local convergence will be given under some reasonable assumptions. This paper is organized as follows: we describe a class of derivative-free trust-region method in Sect. 2. The main results including global convergence property and local convergence rate will be discussed in Sect. 3. The numerical results will be illustrated in Sect. 4. Finally, we give some conclusions.
Notation In this paper, · is the 2-norm for a vector and the induced 2-norm for a matrix. B ⊂ n is a closed ball and B(x, ) is the closed ball centered at x, with radius > 0. Y is a sample set and L(x 0 ) = {x ∈ n |f (x) ≤ f (x 0 ), Ax ≥ b} is the level set about the objective function f . We use the subscript f k and subscript m k to distinguish the relevant information between the original function and the approximate function. For example, H f k is the Hessian of f at kth iteration and H m k is the Hessian of m k at kth iteration.

A derivative-free trust region method with interior backtracking technique
To solve the optimization problem (1) with not all available first-or second-order derivatives, we design a derivative-free trust-region method. An affine-scaling matrix is denoted by (3) for linear inequality constraints. We chose a stepsize α k satisfying the following inequalities: Moreover, set where θ k ∈ (θ 0 , 1], for some 0 < θ 0 < 1. The θ k is to ensure the iterative points generated by the algorithm are strictly interior. Combining with (13a), (13b) and (14), this interior backtracking technique is to guarantee the feasibility of the iterative points. The algorithm possesses the trust-region property and the derivative-free technique is reflected in the trust-region subproblem (7) since the gradient g k and Hessian H m k come from the approximation function, which are different from ∇f k and H f k in (5), satisfying the error bounds (11) and (12). We adopt g T k h m k to be a termination criterion. Now we present the derivative-free trust-region method in detail (see Algorithm 1).
Remark 1 We add a backtracking interior line-search technique in the algorithm. It is helpful to reducing the number of iterations. Equation (13a) is used to guarantee the descent property of f (x) and (13b) ensures the feasibility of x k + α k p k .
Remark 2 The scalar α k , given in step 5, denotes the stepsize along p k to the boundary (13b) of the linear inequality constraints A key property of the scalar α k is that an arbitrary step α k p k to the point x k + α k p k does not violate any linear inequality constraints.

Remark 3 Let
The first-order necessary conditions of (7) implies that there exists v m k ≥ 0 such that In order to obtain a suitable approximation function, Algorithm 1 needs to update the objective function of the trust-region subproblem if necessary. The model-improvement algorithm is applied only if g T k h m k ≤ ε and at least one of the following holds: The model m(x k + αp) is not certifiably fully quadratic on B(x k , k ) or k > ι g T k h m k . It improves on the current approximate function m(x k + αp) to meet the requirements of the error bounds so that the model function becomes fully quadratic. We display the model-improvement Algorithm 1 Derivative-free trust-region method with interior backtracking technique 1 Initialization: Given x 0 , max > 0, 0 ∈ (0, max ], 0 ≤ η 0 ≤ η 1 < 1, (η 1 = 0) and θ 0 ∈ (0, 1) are constants. k := 0. 2 Construct model function: Let y k = x k , and obtain an interpolation point set (8) and (9). 3 Termination criterion: If g T k h m k > ε, then go to step 4. Consider the model m k on B(x k , k ), there are two cases would be happened: (a) If m k is not fully quadratic on B(x k , k ) or k > ι g T k h m k holds, construct another model to modified m k , k = min{max{˜ k , β g T k hm k }, k } and go to step 4. (b) Otherwise, we get the optimal point and stop. 4 Trust-region subproblem: solve trust-region subproblem (7) to find descent direction p k . 5 Step size: choose α k = 1, α, α 2 , . . . , until the inequalities (13a) and (13b) are satisfied. 6 Set s k = α k θ k p k by (14).
call Algorithm 2 to guarantee m k is fully quadratic.
(a) If m k is not fully quadratic, modify k . (b) Otherwise, set m k+1 = m k . 9 Trust-region radius update: set 10 Set k := k + 1 and go to step 2.

Main results and discussion
In this section, we mainly discuss some properties about the proposed algorithm, including the discussion of the error bounds, the sufficiently descent property, the global and local convergence properties. First of all, we make some necessary assumptions as follows.

Assumptions
(A2) The level set L(x 0 ) is bounded.
(A3) There exist positive constants κ g f and κ g m such that ∇f k ≤ κ g f and g k ≤ κ g m , respectively, for all

Error bounds
Observe first that some error bounds hold immediately.

Lemma 1
Suppose that (A1)-(A5), the error bounds (10)- (12) and the fact that k ≤ max hold. If m k is a fully quadratic model on B(x k , k ), then the following bound is true: Proof Using the theory of matrix perturbation analysis, Eqs. (4) and (8), we obtain where AA T is a positive definition matrix and D k is a diagonal matrix related with x k ∈ L(x 0 ). By (A2), there exists a constant κ λ > 0 such that (AA T + D k ) -1 A ≤ κ λ . Thus, from (6), (9) and the error bound (11), one has the fact that Clearly, the conclusion holds with Lemma 2 Suppose that (A1)-(A5), the error bounds (10)- (12) and the fact that k ≤ max hold. If m k is a fully quadratic model on B(x k , k ), for some constant κ 2 , one has Proof Using the triangle inequality, Cauchy-Schwarz inequality, (18), (A3), the error bounds (10)- (12) and the fact that α k ∈ (0, 1] and k ≤ max successively, we obtain which implies that the inequality (20) holds with κ 2 = κ eg κ g f max + κ g m κ h .

Lemma 3
Suppose that (A1)-(A5), the error bounds (10)- (12) and the fact that k ≤ max hold. If ∇f T k h f k = 0, then step 3 of Algorithm 1 will stop in a finite number of improvement steps.
Proof Now we should prove that ∇f T k h f k must be zero if the loop of Algorithm 2 is infinite.
In fact, there are two cases could cause Algorithm 2 to be implemented. One is that m k is not fully quadratic, the other is that the radius k > ι g T k h m k . Then set m (0) k = m k , and improve the model to be fully quadratic on B(x k , k ), which denoted by m (1) k .
Algorithm 2 will improve the model on B(x k , ω k ) and the resulting model is denoted by m (2) k . If m (2) k satisfies ι (g T k h m k ) (2) ≥ ω k , the procedure stops. If not, the radius should be multiplied by ω and Algorithm 2 will improve the model on B(x k , ω 2 k ), and go on. The only case for Algorithm 2 to be infinite is if By the choice of ω ∈ (0, 1) the above inequality means that ∇f T k h f k = 0. The conclusion shows us step 3 will stop in a finite number of improvements.

Sufficiently descent property
In order to guarantee the global convergence property of the proposed algorithm, it is necessary to show that a sufficiently descent condition is satisfied at the kth iteration. We obtained in [6] if step p k is the optimal point of the trust-region subproblem (7), there is a constant κ 3 > 0 such that Lemma 4 Suppose that (A1)-(A5) and the error bounds (10)- (12) hold. p k is the solution of the trust-region subproblem (7). Then there must exist an appropriate α k > 0 which satisfied inequalities (13a).
Proof We start by considering the maximal step-length along the trust-region subproblem descent direction that preserves sufficient feasibility in the sense of the (13a). Successively using the mean value theorem and (11), we obviously obtain where ξ k ∈ (x k , x k + s k ).
There are two cases that may be considered. The first is p T k ∇ 2 f (ξ k )p k ≤ 0. By canceling the last term of Eqs. κ λm κ Hm ≤ k for large enough k and the fact that k ≤ max , it is thus easy to see that there exists an α * = [ κ 3 (1-κ 1 )κ Hm κ λm κ eg max ] 1 2 > 0 such that (13a) holds. The second case is p T k ∇ 2 f (ξ k )p k > 0. Using the Cauchy-Schwarz inequality and the fact that α k ∈ (0, 1] and k ≤ max , we deduce that Thus the final conclusion obtained. We therefore see that it is reasonable to design line-search step criterion in step 5, which provided us a nonincreasing sequence {f (x k )}. (7). Suppose that (A1)-(A5) hold. Then there exists a positive constant κ 4 such that step p k satisfies the following sufficiently descent condition:

Lemma 5 Let step p k be the solution of the trust-region subproblem
for all g k , h m k , M m k , and k .

Global convergence
Every iteration point in the k + 1th iteration will be chosen on the region B(x k , α k k ).
Following the lemma one first shows that the current iteration must be successful if α k k is small enough.

Lemma 6
Suppose that (A1)-(A5) and the error bounds (10)- (12) hold. m k is fully quadratic on B(x k , k ), g T k h m k = 0 and where κ λ m is the bound of C m k , for all x ∈ L(x 0 ). Then the kth iteration is successful.

Lemma 7
Suppose that (A1)-(A5) and the error bounds (10)- (12) hold. If the number of successful iteration is finite, then Proof We consider that all the model-improving iterations before m k becomes fully quadratic are less than a constant N . Suppose that the current iteration is an iteration after a successful one. It means that an infinite number of iterations are acceptable or not nice. In these two cases, k is shrinking. Furthermore, k is reduced by a factor ζ at least once every N iterations, which implies k → 0. For the jth iteration, we denote the ith iteration after j by the index i j , then Using the triangle inequality, we obtain The following work is to show that all these terms on the right-hand side are converging to zero. Because of the Lipschitz continuity of ∇f and the fact that x i jx j → 0 the first and second terms converge to zero. The inequalities (10) and (11) imply the third and fourth terms on the right-hand side are converging to zero. According to Lemma 3, if g T i j h m i j 0 for small enough i j , i j would be a successful iteration, which yield a contradiction. Thus the last term converges to zero. Proof The key is that we may find a contradiction with the fact that {f (x k )} is a nonincreasing bounded sequence unless x k is a stationary point. We thus have to verify that there exists some > 0 such that {f (x k )} is not convergent under the assumption of g T k h m k ≥ 2 .
We observe from (13a), Lemma 4 and (21) that Thus from (24), two cases should be considered next, that is, and lim k→∞ k = 0.
We now start the proof of (25). On one hand, α k is accepted by (13b) the boundary of inequality constraints along p k . From Eq. (15) (17), we know that there exists λ m k+1 such that wherep i k and λ i m k+1 are the ith components of the vectorsp k and λ m k+1 , respectively. Hence, there exists j ∈ 1, . . . , m such that From (17), we have Since [ A -D 1 2 k ] T is full row rank for all x ∈ L(x 0 ), λ m k is bounded and m(x) is twice continuously differentiable, there exist κ 5 > 0 and κ 6 > 0 such that Using the fact that v m k ( k -p k p k ) = 0 and taking the norm to both sides of (17), we deduce that And noting (p k ;p k ) ≤ k , we can obtain Combining the assumption g T k h m k > 2 with k → 0 deduced from (24), it is clear from the fact M m k ≤ κ λ m κ H m that, for ∀k, Furthermore, k → 0 means that lim k→∞ p k = 0, from which we deduce that, for some 0 < θ 0 < 1 and θ k -1 = O( p k 2 ), the strictly feasible stepsize θ k ∈ (θ 0 , 1] → 1. From the above, we have already seen that (25) holds in the case that α k is determined by (13b). There is another case that α k is determined by (13a). In this case, we are able to verify that α k = 1 is acceptable when k sufficiently large. If not, must hold. Applying the Taylor series, (10)-(11), (A3) and the fact that k ≤ max , we deduce that where ξ k ∈ (x k , x k + s k ). This inequality is equivalent to the form of (1κ 1 )g T k p k + κ eg max + Moreover, (21) and g T k h m k ≥ 2 imply that Thus if k ≤ 2(1-κ 1 )κ 3 2κ eg max +κ eh max +κ Hm ≤ κ Hm we deduce from the inequality k k κ eg max + Clearly, a contradiction appears here. It implies that α k = 1 for k sufficiently large. Therefore (25) always holds.
On the other hand, we should prove that (26) is true. From step 3 of Algorithm 1, we know that By the assumption that g T k h m k ≥ 2 , we obtain k ≥ ι .
Whenever k falls below a constantκ 7 given bȳ the kth iteration is either successful or model-improving, and hence from step 9, we are able to deduce both that k+1 ≥ k and k+1 ≥ ζ k . Combining with the rules of step 9 we conclude that k+1 ≥ min{ι , ζκ 7 } = κ 7 . It means that k 0, if g T k h m k ≥ 2 . In conclusion, the sequence {f (x k )} is not convergent if we suppose that g T k h m k ≥ 2 , which contradicts the fact that {f (x k )} is a nonincreasing bounded sequence. It implies that lim inf k→+∞ g T k h m k = 0.

Lemma 9 For any subsequence {k
we also have Proof First, we note that, by (28), g T k i h m k i ≤ ε when i sufficiently large. Thus the criticality step ensures that the model m k i is a fully quadratic function on the ball B(x k i , k i ), with k i ≤ ι g T k i h m k i for all i sufficiently large (if ∇f T k i h f k i = 0). Then, using the bound (20) on the error between the terminal conditions of function and model, we have As a consequence, we have for all i sufficiently large. But g T k i h m k i → 0 implies (29) holds.
Then we obtain the global convergence derived from Lemmas 8 and 9.
Theorem 1 Suppose that (A1)-(A5), the error bounds (10)- (12) and (23) hold. Suppose furthermore that the strict complementarity of the problem (1) holds. Let {x k } ⊂ n be sequence generated by Algorithm 1. Then The above theorem shows us there exists a limit point that is first-order critical. In fact, we are able to prove that all limit points of the sequence of iterations are first-order critical.
Theorem 2 Suppose that (A1)-(A5), the error bounds (10)- (12) and (23) hold. Suppose furthermore that the strict complementarity of the problem (1) holds. Let {x k } ⊂ n be sequence generated by Algorithm 1. Then Proof We first obtained from Lemma 7 that the theorem holds in the case when S is finite. Hence, we will assume that S is infinite. For the purpose of deriving a contradiction, we suppose that there exists a subsequence {k i } of successful or acceptable iterations such that for sufficiently large i, with inequality (30) being retained.
We now restrict our attention to the set K corresponding to the subsequence of iterations whose indices are in the set where k i and i belong to the two subsequences given above in (31).
We know that g T k h m k ≥ 2 2 for k ∈ K. From Lemma 8 lim k→+∞ α k k = 0 and by Lemma 5 we conclude that for any large enough k ∈ K the iteration k is either successful if the model is fully quadratic or model-improving otherwise. Moreover, for each k ∈ K ∩ S we have and, for any such k large enough, k ≤ 2 κ hm κ λm . Hence, we have for k ∈ K ∩ S sufficiently large. Since for any k ∈ K large enough the iteration is either successful or model-improving and since for a model-improving iteration x k+1 = x k + s k , we have, for all i sufficiently large, Because the sequence {f (x k )} is bounded below and monotonic decreasing, we see that the right-hand side of this inequality must converge to zero, and we therefore obtain Since ∇f is Lipschitz continuity, we see that the first term of the above inequality for i large enough. This result contradicts (30), which implies the initial assumption is false and the theorem follows.

Local convergence
if the iteration points are not optimal. According to the definitions of error bounds in our algorithm, the gradient (or Hessian) of the model function must be bounded if there exists a constant such that the gradient (or Hessian) norm of the objective function is bounded. Therefore, most of the above test problems satisfy the assumptions (A2)-(A5). For example (HS21) ∇f (x) = 0.02x 1 2x 2 ≤ 980.0005, ∇ 2 f (x) = 0.02 0 0 2 = 2.
Of course, we will use the level set to limit the bound of ∇f (x) during program execution, which will be much smaller than this value. Even if the boundedness of the gradient and of the Hessian of the objective functions cannot be satisfied at the same time, at least the boundedness within the level set can be guaranteed.
We use the tool of Dolan and Moré [17] to analyze the efficiency of the given algorithm. Figures 1 and 2 show that Algorithm 1 is feasible and has the robust property.
Furthermore we test five simple linear inequality constrained optimization problems from [16] and compare the experiment results of different trust-region radius upper bound   Table 2 shows the experiment results, where nf represents the number of function evaluations, n is the dimension of the test problems and F means the algorithm terminated in the case that the iteration number exceeds the maximum number. The CPU times of the test problems are reported. Table 2 indicates that Algorithm 1 is executable to reach optimal point. The choice of max = 6 is made to enable us to carry out more gratifying results. But the results show that the number of iterations maybe higher than any other derivative-based algorithms. The reason we think is that the derivatives of most of the test problems we chose are available and a derivative-free technique may increase the number of executions; then higher iteration numbers are necessary.

Conclusions
In this paper, we propose an affine-scaling derivative-free method for linear inequality constrained optimizations.
(1) This algorithm is mainly designed to solve the unavailable derivatives optimization problems in engineering. The proposed algorithm adopts interior backtracking technique and possesses the trust-region property. (2) The global convergence is proved by using the definition of fully quadratic. It shows that the iteration points generated by the proposed algorithm could converge to the optimal points of (1). Meanwhile, we get the result that the local convergence rate of the proposed algorithm depends on p k . If p k becomes the quasi-Newton step, then the sequence x k generated by the algorithm converges to x * superlinearly. (3) The preliminary numerical experiments verify the new algorithm we proposed is feasible and effective for solving unavailable-derivative linear inequality constrained optimization problems.