Let Ω be the level set with
$$ \Omega= \bigl\{ x| \bigl\Vert e(x) \bigr\Vert \le \bigl\Vert e(x_{0}) \bigr\Vert \bigr\} . $$
(3.1)
Similar to [31, 32, 50], the following assumptions are needed to prove the global convergence of Algorithm 1.
Assumption A
(i) e is continuously differentiable on an open convex set \(\Omega_{1}\) containing Ω.
(ii) The Jacobian of e is symmetric, bounded, and positive definite on \(\Omega_{1}\), i.e., there exist positive constants \(M^{*}\geq m_{*}>0\) such that
$$ \bigl\Vert \nabla e(x) \bigr\Vert \le M^{*} \quad \forall x \in \Omega_{1} $$
(3.2)
and
$$ m_{*}\Vert d\Vert ^{2}\le d^{T}\nabla e(x)d\quad \forall x \in\Omega_{1},d\in \Re^{n}. $$
(3.3)
Assumption B
\(B_{k}\) is a good approximation to \(\nabla e_{k}\), i.e.,
$$ \bigl\Vert (\nabla e_{k}-B_{k})d_{k} \bigr\Vert \leq\epsilon_{*}\Vert e_{k}\Vert , $$
(3.4)
where \(\epsilon_{*}\in(0,1)\) is a small quantity.
Considering Assumption B and using the von Neumann lemma, we deduce that \(B_{k}\) is also bounded (see [31]).
Lemma 3.1
Let Assumption
B
hold. Then
\(d_{k}\)
is a descent direction of
\(p(x)\)
at
\(x_{k}\), i.e.,
$$ \nabla p(x_{k})^{T}d_{k}\leq-(1- \epsilon_{*}) \bigl\Vert e(x_{k}) \bigr\Vert ^{2}. $$
(3.5)
Proof
By using (1.10), we get
$$\begin{aligned} \nabla p(x_{k})^{T}d_{k} =& e(x_{k})^{T} \nabla e(x_{k}) d_{k} \\ =&e(x_{k})^{T} \bigl[ \bigl(\nabla e(x_{k})-B_{k} \bigr)d_{k}-e(x_{k}) \bigr] \\ =&e(x_{k})^{T} \bigl(\nabla e(x_{k})-B_{k} \bigr)d_{k} - e(x_{k})^{T}e(x_{k}). \end{aligned}$$
(3.6)
Thus, we have
$$\begin{aligned} \nabla p(x_{k})^{T}d_{k}+ \bigl\Vert e(x_{k}) \bigr\Vert ^{2} =&e(x_{k})^{T} \bigl(\nabla e(x_{k})-B_{k} \bigr)d_{k} \\ \leq& \bigl\Vert e(x_{k}) \bigr\Vert \bigl\Vert \bigl(\nabla e(x_{k})-B_{k} \bigr)d_{k} \bigr\Vert . \end{aligned}$$
It follows from (3.4) that
$$\begin{aligned} \nabla p(x_{k})^{T}d_{k} \leq& \bigl\Vert e(x_{k}) \bigr\Vert \bigl\Vert \bigl(\nabla e(x_{k})-B_{k} \bigr)d_{k} \bigr\Vert - \bigl\Vert e(x_{k}) \bigr\Vert ^{2} \\ \leq& -(1-\epsilon_{*}) \bigl\Vert e(x_{k}) \bigr\Vert ^{2}. \end{aligned}$$
(3.7)
The proof is complete. □
The following lemma shows that the line search technique (1.13) is reasonable, then Algorithm 1 is well defined.
Lemma 3.2
Let Assumptions
A
and
B
hold. Then Algorithm
1
will produce an iteration
\(x_{k+1}=x_{k}+\alpha_{k}d_{k}\)
in a finite number of backtracking steps.
Proof
From Lemma 3.5 in [32] we have in a finite number of backtracking steps
$$p(x_{k}+\alpha_{k}d_{k})\leq p(x_{k}) +\alpha_{k} \sigma e(x_{k})^{T}d_{k}, $$
from which, in view of the definition of \(p(x_{l(k)})=\max_{0\leq j\leq m(k)}\{p(x_{k-j})\}\geq p(x_{k})\), we obtain (1.13). Thus we conclude the result of this lemma. The proof is complete. □
Now we establish the global convergence theorem of Algorithm 1.
Theorem 3.1
Let Assumptions
A
and
B
hold, and
\(\{\alpha_{k}, d_{k}, x_{k+1}, e_{k+1}\}\)
be generated by Algorithm
1. Then
$$ \lim_{k\rightarrow\infty} \Vert e_{k}\Vert =0. $$
(3.8)
Proof
By the acceptance rule (1.13), we have
$$ p(x_{k+1})-p(x_{l(k)}) \leq\sigma \alpha_{k} e_{k}^{T}d_{k} < 0. $$
(3.9)
Using \(m(k+1)\leq m(k)+1\) and \(p(x_{k+1})\leq p(x_{l(k)})\), we obtain
$$p(x_{l(k+1)})\leq\max \bigl\{ p(x_{l(k)}),p(x_{k+1}) \bigr\} =p(x_{l(k)}). $$
This means that the sequence \(\{p(x_{l(k)})\}\) is decreasing for all k. Then \(\{p(x_{l(k)})\}\) is convergent. Based on Assumptions A and B, similar to Lemma 3.4 in [32], it is not difficult to deduce that there exist constants \(b_{1}\geq b_{2}>0\) such that
$$ b_{2}\Vert d_{k}\Vert ^{2}\leq d_{k}^{T}B_{k}d_{k}=-e_{k}^{T}d_{k} \leq b_{1}\Vert d_{k}\Vert ^{2}. $$
(3.10)
By (1.13) and (3.10), for all \(k>M\), we get
$$\begin{aligned} p(x_{l(k)}) =&p(x_{l(k)-1}+\alpha_{l(k)-1}d_{l(k)-1}) \\ \leq&\max_{0\leq j\leq m(l(k)-1)} \bigl\{ p(x_{l(k)-j-1}) \bigr\} +\sigma \alpha_{l(k)-1}g_{l(k)-1}^{T}d_{l(k)-1} \\ \leq& \max_{0\leq j\leq m(l(k)-1)} \bigl\{ p(x_{l(k)-j-1}) \bigr\} -\sigma b_{2} \alpha_{l(k)-1}\Vert d_{l(k)-1}\Vert ^{2}. \end{aligned}$$
(3.11)
Since \(\{p(x_{l(k)})\}\) is convergent, from the above inequality, we have
$$\lim_{k\rightarrow\infty}\alpha_{l(k)-1}\Vert d_{l(k)-1}\Vert ^{2}=0. $$
This implies that either
$$ \lim_{k\rightarrow\infty}\inf d_{l(k)-1}=0 $$
(3.12)
or
$$ \lim_{k\rightarrow\infty}\inf \alpha_{l(k)-1}=0. $$
(3.13)
If (3.12) holds, following [40], by induction we can prove that
$$ \lim_{k\rightarrow\infty} \Vert d_{l(k)-j}\Vert =0 $$
(3.14)
and
$$\lim_{k\rightarrow\infty} p(x_{l(k)-j})=\lim_{k\rightarrow\infty} p(x_{l(k)}) $$
for any positive integer j. As \(k\geq l(k)\geq k-M\) and M is a positive constant, by
$$x_{k}=x_{k-M-1}+\alpha_{k-M-1}d_{k-M-1}+ \cdots+ \alpha_{l(k)-1}d_{l(k)-1} $$
and (3.14), it can be derived that
$$ \lim_{k\rightarrow\infty} p(x_{l(k)})=\lim _{k\rightarrow \infty} p(x_{k}). $$
(3.15)
According to (3.10) and the rule for accepting the step \(\alpha_{k}d_{k}\),
$$ p(x_{k+1})-p(x_{l(k)})\leq\alpha_{k} \sigma e_{k}^{T}d_{k}\leq \alpha_{k} \sigma b_{2}\Vert d_{k}\Vert ^{2}. $$
(3.16)
This means
$$\lim_{k\rightarrow\infty}\alpha_{k}\Vert d_{k}\Vert ^{2}=0, $$
which implies that
$$ \lim_{k\rightarrow\infty}\alpha_{k}=0 $$
(3.17)
or
$$ \lim_{k\rightarrow\infty} \Vert d_{k}\Vert =0. $$
(3.18)
If equation (3.18) holds, since \(B_{k}\) is bounded, then \(\Vert e_{k}\Vert =\Vert B_{k}d_{k}\Vert \leq \Vert B_{k}\Vert \Vert d_{k}\Vert \rightarrow0\) holds. The conclusion of this lemma holds. If (3.17) holds. Then acceptance rule (1.13) means that, for all large enough k, \(\alpha_{k}'=\frac{\alpha_{k}}{r}\) such that
$$\begin{aligned} p \bigl(x_{k}+\alpha_{k}'d_{k} \bigr)-p(x_{k}) \geq& p \bigl(x_{k}+\alpha_{k}'d_{k} \bigr)-p(x_{l(k)}) >\sigma\alpha_{k}'e_{k}^{T}d_{k}. \end{aligned}$$
(3.19)
Since
$$ p \bigl(x_{k}+\alpha_{k}'d_{k} \bigr)-p(x_{k})= \alpha_{k}' \nabla p(x_{k})^{T}d_{k}+o \bigl(\alpha_{k}' \Vert d_{k}\Vert \bigr). $$
(3.20)
Using (3.19) and (3.20) in [32], we have
$$\nabla p(x_{k})^{T}d_{k}=e_{k}^{T} \nabla e(x_{k})d_{k}\leq\delta^{*} e_{k}^{T}d_{k}, $$
where \(\delta^{*} >0\) is a constant and \(\sigma< \delta^{*}\). So we get
$$ \bigl[\delta^{*}-\sigma \bigr] \alpha_{k}' e_{k}^{T}d_{k}+o \bigl(\alpha_{k}' \Vert d_{k}\Vert \bigr)\geq0. $$
(3.21)
Note that \(\delta^{*}-\sigma>0\) and \(e_{k}^{T}d_{k}<0\), we have from dividing (3.21) by \(\alpha_{k}'\Vert d_{k}\Vert \)
$$ \lim_{k\rightarrow\infty}\frac{e_{k}^{T}d_{k}}{\Vert d_{k}\Vert }=0. $$
(3.22)
By (3.10), we have
$$ \lim_{k\rightarrow\infty} \Vert d_{k}\Vert =0. $$
(3.23)
Consider \(\Vert e_{k}\Vert =\Vert B_{k}d_{k}\Vert \leq \Vert B_{k}\Vert \Vert d_{k}\Vert \) and the bounded \(B_{k}\) again, we complete the proof. □
Lemma 3.3
see Lemma 4.1 in [31]
Let
e
be continuously differentiable, and
\(\nabla e(x)\)
be nonsingular at
\(x^{*}\)
which satisfies
\(e(x^{*})=0\). Let
$$ a\equiv \biggl\{ \bigl\Vert \nabla e \bigl(x^{*} \bigr) \bigr\Vert + \frac{1}{2c},2c \biggr\} , c= \bigl\Vert \nabla e \bigl(x^{*} \bigr)^{-1} \bigr\Vert . $$
(3.24)
If
\(\Vert x_{k}-x^{*}\Vert \)
sufficiently small, then the inequality
$$ \frac{1}{a} \bigl\Vert x_{k}-x^{*} \bigr\Vert \leq \bigl\Vert e(x_{k}) \bigr\Vert \leq a \bigl\Vert x_{k}-x^{*} \bigr\Vert $$
(3.25)
holds.
Theorem 3.2
Let the assumptions in Lemma
3.3
hold. Assume that there exists a sufficiently small
\(\varepsilon_{0}>0\)
such that
\(\Vert B_{k}-\nabla e(x_{k})\Vert \leq\varepsilon_{0}\)
for each
k. Then the sequence
\(\{x_{k}\}\)
converges to
\(x^{*}\)
superlinearly for
\(\alpha_{k}=1\). Moreover, if
e
is q-order smooth at
\(x^{*}\)
and there is a neighborhood
U
of
\(x^{*}\)
satisfying for any
\(x_{k}\in U\),
$$ \bigl\Vert \bigl[B_{k}-\nabla e \bigl(x^{*} \bigr) \bigr] \bigl(x_{k}-x^{*} \bigr) \bigr\Vert \leq\eta \bigl\Vert x_{k}-x^{*} \bigr\Vert ^{1+q}, $$
(3.26)
then
\(x_{k}\rightarrow x^{*}\)
with order at least
\(1+q\), where
η
is a constant.
Proof
Since g is continuously differentiable and \(\nabla e(x)\) is nonsingular at \(x^{*}\), there exists a constant \(\gamma>0\) and a neighborhood U of \(x^{*}\) satisfying
$$\max \bigl\{ \bigl\Vert \nabla e(y) \bigr\Vert , \bigl\Vert \nabla e(y)^{-1} \bigr\Vert \bigr\} \leq\gamma, $$
where \(\nabla e(y)\) is nonsingular for any \(y\in U\). Consider the following equality when \(\alpha_{k}=1\):
$$\begin{aligned} &B_{k} \bigl(x_{k+1}-x^{*} \bigr)+ \bigl[ \nabla e(x_{k}) \bigl(x_{k}-x^{*} \bigr)-B_{k} \bigl(x_{k}-x^{*} \bigr) \bigr] \\ &\qquad{}+ \bigl[e(x_{k})-e \bigl(x^{*} \bigr)-\nabla e(x_{k}) \bigl(x_{k}-x^{*} \bigr) \bigr] \\ &\quad=e(x_{k})+B_{k}d_{k}=0, \end{aligned}$$
(3.27)
the second term and the third term are \(o(\Vert x_{k}-x^{*}\Vert )\). By the von Neumann lemma, and considering that \(\nabla e(x_{k})\) is nonsingular, \(B_{k}\) is also nonsingular. For any \(y\in U\) and \(\nabla e(y)\) being nonsingular and \(\max\{\Vert \nabla e(y)\Vert ,\Vert \nabla e(y)^{-1}\Vert \}\leq\gamma\), then we obtain from Lemma 3.3
$$\bigl\Vert x_{k+1}-x^{*} \bigr\Vert =o \bigl( \bigl\Vert x_{k}-x^{*} \bigr\Vert \bigr)=o \bigl( \bigl\Vert e(x_{k}) \bigr\Vert \bigr),\quad \mbox{as }k\rightarrow \infty, $$
this means that the sequence \(\{x_{k}\}\) converges to \(x^{*}\) superlinearly for \(\alpha_{k}=1\).
If e is q-order smooth at \(x^{*}\), then we get
$$e(x_{k})-e \bigl(x^{*} \bigr)-\nabla e(x_{k}) \bigl(x_{k}-x^{*} \bigr)=O \bigl( \bigl\Vert x_{k}-x^{*} \bigr\Vert ^{q+1} \bigr). $$
Consider the second term of (3.27) as \(x_{k}\rightarrow x^{*}\), and use (3.26), we can deduce that the second term of (3.27) is also \(O(\Vert x_{k}-x^{*}\Vert ^{q+1})\). Therefore, we have
$$\bigl\Vert x_{k+1}-x^{*} \bigr\Vert =O \bigl( \bigl\Vert x_{k}-x^{*} \bigr\Vert ^{q+1} \bigr), \quad\mbox{as }x_{k} \rightarrow x^{*}. $$
The proof is complete. □