Skip to main content

An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems

Abstract

In this paper, we propose an alternating linearization bundle method for minimizing the sum of a nonconvex function and a convex function, both of which are not necessarily differentiable. The nonconvex function is first locally “convexified” by imposing a quadratic term, and then a cutting-planes model of the local convexification function is generated. The convex function is assumed to be “simple” in the sense that finding its proximal-like point is relatively easy. At each iteration, the method solves two subproblems in which the functions are alternately represented by the linearizations of the cutting-planes model and the convex objective function. It is proved that the sequence of iteration points converges to a stationary point. Numerical results show the good performance of the method.

1 Introduction

In this paper, we consider the structured nonconvex minimization problem

$$ \min_{x\in\mathbb{R}^{n}} \bigl\{ F(x):=f(x)+h(x)\bigr\} , $$
(1)

where \(f: \mathbb{R}^{n}\rightarrow\mathbb{R}\) is possibly a nonconvex nonsmooth function and \(h: \mathbb{R}^{n}\rightarrow(-\infty,\infty]\) is a closed proper convex function.

Problems of the form (1) often appear in practice, such as signal processing, image reconstruction, engineering, optimal control, and so on. Three typical examples are given below.

Example 1

(Unconstrained transformation of a constrained problem)

Consider the constrained problem

$$ \min \bigl\{ f(x): x\in C\bigr\} , $$
(2)

where f is possibly a nonsmooth nonconvex function and C is a convex subset of \(\mathbb{R}^{n}\). Problem (2) can be written equivalently as

$$ \min_{x\in\mathbb{R}^{n}} f(x)+\imath_{C}(x), $$
(3)

where \(\imath_{C}\) is the indicator function of C, i.e., \(\imath _{C}(x)\) equals 0 on C and infinity elsewhere. Clearly, problem (3) is a special case of problem (1) with \(h(x)=\imath_{C}(x)\). We note that the proximal point of \(\imath_{C}\) can easily be calculated or even has a closed-form solution if C has some special structure.

Example 2

(Nonconvex regularization of a convex function)

Consider the \(l_{q}\ (0< q<1)\) regularization problem

$$ \min_{x\in\mathbb{R}^{n}} \frac{1}{2} \Vert Ax-b \Vert ^{2}+\lambda \Vert x \Vert _{q}, $$
(4)

which has many practical applications in compressed sensing and imaging science (see e.g., [1]), where \(\Vert x \Vert _{q}=(\sum_{i=1}^{n} \vert x_{i} \vert ^{q})^{1/q}\). The objective function of problem (4) is also the sum of a convex function and a nonconvex function.

Example 3

(Convex regularization of a nonconvex function)

Hare et al. [2] studied the function of the form

$$ F(x)=\sum^{n}_{i=1} \bigl\vert f_{i}(x) \bigr\vert +\frac{1}{2} \Vert x \Vert ^{2}, $$
(5)

where \(f_{i}(x): \mathbb{R}^{n}\rightarrow\mathbb{R}, i=1,\ldots,n\) are Ferrier polynomials defined as

$$f_{i}(x)=\bigl(ix_{i}^{2}-2x_{i}\bigr)+ \sum_{j=1}^{n}x_{j}. $$

It is well known that \(f(x)=\sum^{n}_{i=1} \vert f_{i}(x) \vert \) is a nonconvex nonsmooth function, and \(h(x)=\frac{1}{2} \Vert x \Vert ^{2}\) is a simple convex function.

The methods for minimizing the sum of two functions have been well studied during the past several decades. Different methods are developed based on these two types of functions; see e.g., [312]. In particular, Kiwiel [9] proposed an alternating linearization bundle method for the sum of two convex functions and one of them is “simple” (i.e., minimizing this function plus a separable convex quadratic function is “easy”). Goldfarb et al. [7] proposed a fast alternating linearization methods for the sum of two convex functions both of which are “simple”. Li et al. [10] presented a proximal alternating linearization method for the sum of two nonconvex functions based on the assumption that the proximal point of the two functions at a given point can easily be calculated. Attouch et al. [3] and Bolte et al. [4] considered a broad class of nonconvex and nonsmooth minimization problems that include as a special case minimizing the sum of two nonconvex functions, in which the proximal alternating minimization technique is used and the Kurdyka–Łojasiewicz property is assumed.

In this paper, we consider to minimize the sum of a nonconvex function and a convex function with the form of (1). In particular, we assume that f is lower-\(C^{2}\) and h is “simple” in the sense that minimizing h plus a quadratic term is relatively easy. The method presented in this paper can be viewed as a generalized version of the methods given in [9] and [13]. On one hand, we generalize the method of [9] from minimizing the sum of two convex functions to the sum of a nonconvex function and a convex function. On the other hand, we generalize the method of [13] from minimizing a single nonconvex nonsmooth function to the sum of two functions.

Our method will produce three sequences of points: \(\{z^{\ell}\}\), \(\{ y^{\ell}\}\) and \(\{x^{k(\ell)}\}\), where \(\{z^{\ell}\}\) is the sequence of proximal points, \(\{y^{\ell}\}\) is the sequence of trial points, and \(\{x^{k(\ell)}\}\) is the sequence of stability centers (i.e., \(x^{k(\ell)}\in\{y^{\ell}\}\) is the “best” point obtained so far for iteration , which will be abbreviated as \(x^{k}\) if there is no confusion). More precisely, our method will alternately solve the following two subproblems:

$$\begin{aligned} &z^{\ell+1}:=\arg\min\biggl\{ \check{\varphi}_{\ell}(\cdot)+\bar {h}_{\ell-1}(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$
(6)
$$\begin{aligned} &y^{\ell+1}:=\arg\min\biggl\{ \bar{\varphi}_{\ell}(\cdot)+h(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$
(7)

where \(\check{\varphi}_{\ell}\) is a cutting-planes model [14, 15] of the local convexification function of f at iteration , which is based on the idea of the redistributed proximal bundle method in [13] and will be made more precise later; \(\bar{h}_{\ell-1}\) is a linearization of h at iteration \(\ell-1\); \(\bar{\varphi}_{\ell}\) is a linearization of \(\check{\varphi }_{\ell}\); \(\mu_{\ell}\) is the proximal parameter. Our convergence analysis shows that, under suitable assumptions, any accumulation point of the sequence \(\{x^{k}\}\) is a stationary point of F if there is an infinite number of serious steps; otherwise, the last stability center is a stationary point of F.

This paper is organized as follows. In Sect. 2, we review some basic definitions and results required for this work. In Sect. 3, we present the alternating linearization bundle method for problem (1). Section 4 examines the convergence properties of the algorithm. Some preliminary numerical results are given in Sect. 5. The Euclidean inner product in \(\mathbb{R}^{n}\) is denoted by \(\langle x, y\rangle=x^{T}y\), and the associated norm by \(\Vert \cdot \Vert \).

2 Preliminaries

In this section, we recall some basic definitions and results that are closely relevant to our method, which can be found in [13, 16, 17].

  • The limiting subdifferential of f at is defined by

    $$\partial f(\bar{x}):=\lim_{x\rightarrow\bar{x}}\sup_{f(x)\rightarrow f(\bar{x})} \hat{\partial}f(x), $$

    where \(\hat{\partial}f(\bar{x})\) is the regular subdifferential defined by

    $$\hat{\partial}f(\bar{x}):=\biggl\{ g\in\mathbb{R}^{n}: \lim _{x\rightarrow \bar{x}}\inf_{x\neq\bar{x}}\frac{f(x)-f(\bar{x})-\langle g,x-\bar {x}\rangle}{ \Vert x-\bar{x} \Vert }\geq0\biggr\} . $$

    An element \(g\in\partial f(\bar{x})\) is called a subgradient of f at .

  • The function f is prox-bounded if there exists \(R\geq0\) such that the function \(f(\cdot)+\frac{1}{2}R \Vert \cdot \Vert ^{2}\) is bounded below. The corresponding threshold is the smallest \(r_{pb}\geq0\) such that \(f(\cdot)+\frac{1}{2}R \Vert \cdot \Vert ^{2}\) is bounded below for all \(R>r_{pb}\).

  • The function f is lower-\(C^{2}\) on an open set V if for each \(\bar{x}\in V\) there is a neighborhood \(V'\) of upon which a representation \(f(x)=\max_{t\in T} f_{t}(x)\) holds, where T is a compact set and the functions \(f_{t}\) are of class \(C^{2}\) on V such that \(f_{t}\), \(\bigtriangledown f_{t}\) and \(\bigtriangledown^{2}f_{t}\) depend continuously on \((t,x)\in T\times V\).

  • The proximal point mapping of the function f at the point \(x\in\mathbb{R}^{n}\) is defined by

    $$p_{R}f(x):=\arg\min_{\omega\in\mathbb{R}^{n}} \biggl\{ f(\omega )+ \frac{1}{2}R \Vert \omega-x \Vert ^{2} \biggr\} . $$

Lemma 1

([16])

Suppose that the function f is lower-\(C^{2}\) on V and \(\bar{x}\in V\). Then there exist \(\varepsilon>0, K>0\), and \(\rho>0\) such that

  1. (i)

    for any point \(x^{0}\) and parameter \(R\geq\rho\) the function \(f+\frac{1}{2}R \Vert \cdot-x^{0} \Vert ^{2}\) is convex and finite valued on the closed ball \(\bar{B}_{\varepsilon }(\bar{x})\), and

  2. (ii)

    the function f is Lipschitz continuous with constant K on \(\bar{B}_{\varepsilon}(\bar{x})\).

Theorem 1

([16])

Suppose that the lower semicontinuous function f is prox-bounded with threshold \(r_{pb}\) and lower-\(C^{2}\) on V. Let \(\bar{x}\in V\) and let \(\varepsilon>0,K>0\) and \(\rho>0\) be given by Lemma 1. Then is a stationary point of f if and only if \(\bar {x}=p_{R}f(\bar{x})\) for any \(R>R_{\bar{x}}:=\max\{4K/\varepsilon, \rho, r_{pb}\}\).

Assumption 1

([13])

Given \(x^{0}\in\mathbb{R}^{n}\) and \(M_{0}\geq0\), there exist an open bounded set \(\mathcal{O}\) and a function H such that \(\mathcal{L}_{0}:=\{x\in\mathbb{R}^{n}: f(x)\leq f(x^{0})+M_{0}\}\subset\mathcal{O}\), and H is lower-\(C^{2}\) on \(\mathcal{O}\) with \(H\equiv f\) on \(\mathcal{L}_{0}\).

Theorem 2

([13])

For a function f satisfying Assumption 1, the following results hold:

  1. (i)

    The level set \(\mathcal{L}_{0}\) is nonempty and compact.

  2. (ii)

    There exists \(\rho^{\mathrm{id}}>0\) such that, for any \(\rho\geq\rho^{\mathrm{id}}\) and any given \(y\in\mathcal{L}_{0}\), the function \(f+\frac{1}{2}\rho \Vert \cdot-y \Vert ^{2}\) is convex on \(\mathcal{L}_{0}\).

  3. (iii)

    The function f is Lipschitz continuous on \(\mathcal{L}_{0}\).

3 The alternating linearization bundle method

3.1 Motivation and framework

The classic proximal point algorithm (see e.g. [18]) for solving problem (1) generates the new iterate by

$$ y^{\ell+1}=\arg\min \biggl\{ f(\cdot)+h(\cdot)+ \frac{1}{2}R_{\ell } \bigl\Vert \cdot-y^{\ell} \bigr\Vert ^{2} \biggr\} , $$
(8)

where \(R_{\ell}>0\) is the proximal parameter.

However, since f is a nonconvex function, solving problem (8) may not be easy and is usually as difficult as the original problem (1). Therefore, we will tackle the difficulty via the following three steps.

1. Generate the local convexification function of the nonconvex function f. Following the redistribution idea of [13], we split the prox-parameter \(R_{\ell}\) into two dynamic parameters \(\eta_{\ell}\) and \(\mu_{\ell}\) which are nonnegative and satisfy \(R_{\ell}=\eta_{\ell}+\mu_{\ell}\). Then the problem (8) (after replacing \(y^{\ell}\) by \(x^{k}\)) can be written as

$$ y^{\ell+1}=\arg\min\biggl\{ \varphi_{\ell}(\cdot)+h(\cdot)+\frac{1}{2}\mu _{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , $$
(9)

where

$$ \varphi_{\ell}(\cdot)=f(\cdot)+\frac{1}{2} \eta_{\ell} \bigl\Vert \cdot -x^{k} \bigr\Vert ^{2} $$
(10)

is called the local convexification function of f, since it is convex whenever \(\eta_{\ell}\) is large enough (see Theorem 2).

2. Generate the cutting-planes model of \(\varphi_{\ell}\). Let be the current iteration index, \(y^{i}\), \(i\in J_{\ell }\subseteq\{0,1,\ldots,\ell\}\) be trial points generated in the previous iterations, and \(g_{f}^{i}\in\partial f(y^{i})\). Define the cutting-planes model of \(\varphi_{\ell}\) by

$$ \check{\varphi}_{\ell}(\cdot)=\max_{i\in J_{\ell}} \biggl\{ f\bigl(y^{i}\bigr)+\frac{1}{2}\eta_{\ell} \bigl\Vert y^{i}-x^{k} \bigr\Vert ^{2}+ \bigl\langle g_{f}^{i}+\eta_{\ell}\bigl(y^{i}-x^{k} \bigr), \cdot-y^{i}\bigr\rangle \biggr\} , $$
(11)

where \(g_{f}^{i}=g_{f}(y^{i})\in\partial f(y^{i})\). Therefore, we obtain an approximate version of problem (9) as follows:

$$ y^{\ell+1}:=\arg\min\biggl\{ \check{\varphi}_{\ell}( \cdot)+h(\cdot )+\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} . $$
(12)

3. Apply the alternating linearization bundle strategy to solve problem (12). Since problem (12) may still be difficult, motivating by the idea of the alternating linearization bundle [9], we consider to alternately solve the following two subproblems:

$$\begin{aligned} &z^{\ell+1}:=\arg\min\biggl\{ \check{\varphi}_{\ell}(\cdot)+\bar {h}_{\ell-1}(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$
(13)
$$\begin{aligned} &y^{\ell+1}:=\arg\min\biggl\{ \bar{\varphi}_{\ell}(\cdot)+h(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} . \end{aligned}$$
(14)

The above two subproblems are much easier to solve, whose objective functions are alternately represented by linear models of \(h(\cdot)\) and \(\check{\varphi}_{\ell}(\cdot)\), respectively.

3.2 Further description via bundle terminologies

Bundle methods [1921] are among the most robust and reliable methods to solve general nonsmooth optimization problems, which can be considered stabilized variants of cutting-planes method [14, 15]. In general, for a convex function h, bundle methods store the trial points \(y^{i}\), \(i\in J_{\ell}\) with their function values and subgradients in a bundle of information:

$$ \bigcup_{i\in J_{\ell}} \bigl\{ \bigl(y^{i}, h\bigl(y^{i}\bigr), g^{i}_{h} \in \partial h\bigl(y^{i}\bigr)\bigr) \bigr\} , $$
(15)

and a point \(x^{k}:=x^{k(\ell)}\) (called stability center) which is the “best” point obtained so far. A storage-saving form of (15) (refer to the current stability center \(x^{k}\)) is given by

$$\bigcup_{i\in J_{\ell}}\bigl\{ \bigl(e^{i,k}_{h}, g^{i}_{h}\in\partial _{e^{i,k}_{h}}h\bigl(x^{k} \bigr)\bigr)\bigr\} , $$

where \(\partial_{e}h\) is the e-subdifferential of h in convex analysis, and \(e^{i,k}_{h}\) are the linearization errors of h defined by

$$ e^{i,k}_{h}=h\bigl(x^{k}\bigr)- \bigl(h\bigl(y^{i}\bigr)+\bigl\langle g^{i}_{h}, x^{k}-y^{i}\bigr\rangle \bigr). $$
(16)

Following the notations above, the bundle information of the function \(\varphi_{\ell}(\cdot)\) can be written as (see also [13]):

$$ \bigcup_{i\in J_{\ell}}\bigl\{ \bigl(e^{i,k}_{f}, d^{k}_{i}, \Delta^{k}_{i}, g^{i}_{f}\bigr)\bigr\} \quad\text{with } \textstyle\begin{cases} e^{i,k}_{f}=f(x^{k})-(f(y^{i})+\langle g^{i}_{f}, x^{k}-y^{i} \rangle ), \\ d^{k}_{i}=\frac{1}{2} \Vert y^{i}-x^{k} \Vert ^{2}, \\ g^{i}_{f}\in\partial f(y^{i}), \\ \Delta^{k}_{i}=y^{i}-x^{k}, \end{cases} $$
(17)

where \(e^{i,k}_{f}\) and \(d^{k}_{i}\) are the linearization errors of f and \(\frac{1}{2} \Vert \cdot-x^{k} \Vert ^{2}\), respectively, \(g^{i}_{f}\) is a subgradient of f at \(y^{i}\), and \(\Delta^{k}_{i}\) is the gradient of \(\frac{1}{2} \Vert \cdot-x^{k} \Vert ^{2}\) at \(y^{i}\). From (17), we know that \(e^{i,k}_{f}\), \(d^{k}_{i}\) and \(\Delta^{k}_{i}\) depend on the point \(x^{k}\), so they should be updated whenever a new stability center is generated (details are given below).

By the optimality conditions of subproblem (13), there exists a multiplier vector \((\alpha^{\ell}_{i}, i\in J_{\ell})\in S^{\ell}\) such that

$$ z^{\ell+1}=x^{k}-\frac{1}{\mu_{\ell}} \biggl(\sum _{i\in J_{\ell }}\alpha^{\ell}_{i} \bigl(g^{i}_{f}+\eta_{\ell}\Delta^{k}_{i} \bigr)+g^{\ell -1}_{h} \biggr), $$
(18)

where \(S^{\ell}\) denotes the unit simplex in \(\mathbb{R}^{ \vert J_{\ell } \vert }\) and \(g^{\ell-1}_{h}=\nabla\bar{h}_{\ell-1}(z^{\ell+1})\).

As iterations go along, the number of elements in the bundle may increase infinitely, which could lead to serious problems with storage and computation. The subgradient aggregation strategy [22] is the most popular and efficient way to overcome such a difficulty. We use the notation \(g^{-\ell}_{\eta_{\ell}}\) to denote the aggregate subgradient, i.e.,

$$ g^{-\ell}_{\eta_{\ell}}:=\sum _{i\in J_{\ell}}\alpha^{\ell }_{i}\bigl(g^{i}_{f}+ \eta_{\ell}\Delta^{k}_{i}\bigr)\in \partial\check{ \varphi}_{\ell}\bigl(z^{\ell+1}\bigr). $$
(19)

Define the strongly active set of subgradients by

$$J^{\mathrm{act}}_{\ell}:=\bigl\{ i\in J_{\ell}: \alpha^{\ell}_{i}>0\bigr\} . $$

Then the corresponding aggregate bundle elements are given by

$$ \bigl(e^{-\ell}_{f}, d^{k}_{-\ell}, \Delta^{k}_{-\ell}, g^{-\ell}_{f}\bigr):= \sum_{i\in J_{\ell}}\alpha^{\ell }_{i} \bigl(e^{i,k}_{f},d^{k}_{i}, \Delta^{k}_{i},g^{i}_{f}\bigr) =\sum _{j\in J^{\mathrm{act}}_{\ell}}\alpha^{\ell }_{j} \bigl(e^{j,k}_{f},d^{k}_{j}, \Delta^{k}_{j},g^{j}_{f}\bigr). $$
(20)

Therefore

$$g^{-\ell}_{\eta_{\ell}}=\sum_{i\in J_{\ell}} \alpha^{\ell }_{i}\bigl(g^{i}_{f}+ \eta_{\ell}\Delta^{k}_{i}\bigr)=g^{-\ell}_{f}+ \eta_{\ell }\Delta^{k}_{-\ell}=\mu_{\ell} \bigl(x^{k}-z^{\ell+1}\bigr)-g^{\ell-1}_{h}. $$

Here, as in [13, 23], we use negative index − to express the aggregate bundle elements, hence \(J_{\ell}\subseteq\{ -\ell,-\ell+1,\dots,0,1,\dots,\ell-1,\ell\}\) in general.

By making use of the notations above, the cutting-planes model \(\check {\varphi}_{\ell}\) in (11) can be rewritten as:

$$ \check{\varphi}_{\ell}(\cdot)=f\bigl(x^{k} \bigr)+\max_{i\in J_{\ell}}\bigl\{ -\bigl(e^{i,k}_{f}+ \eta_{\ell}d^{k}_{i}\bigr)+\bigl\langle g^{i}_{f}+\eta_{\ell }\Delta^{k}_{i}, \cdot-x^{k}\bigr\rangle \bigr\} . $$
(21)

Note that, for all \(j\in J^{\mathrm{act}}_{\ell}\) we have

$$ \check{\varphi}_{\ell}\bigl(z^{\ell+1}\bigr)=f \bigl(x^{k}\bigr)-e^{j,k}_{f}-\eta_{\ell }d^{k}_{j}+ \bigl\langle g^{j}_{f}+\eta_{\ell} \Delta^{k}_{j},z^{\ell +1}-x^{k}\bigr\rangle , $$
(22)

and the aggregate model of \(\check{\varphi}_{\ell}\) in \(J_{\ell}\) is

$$\tilde{\varphi}_{-\ell}(\cdot)=f\bigl(x^{k} \bigr)-e^{-\ell,k}_{f}-\eta_{\ell }d_{-\ell}^{k}+ \bigl\langle g^{-\ell}_{f}+\eta_{\ell} \Delta_{-\ell }^{k},\cdot-x^{k}\bigr\rangle . $$

For a new stability center \(x^{k+1}\), the bundle elements can be updated by (see [13])

$$\begin{aligned} \begin{aligned} &e^{i,k+1}_{f}=e^{i,k}_{f}+f \bigl(x^{k+1}\bigr)-\bigl(f\bigl(x^{k}\bigr)+\bigl\langle g^{i}_{f},x^{k+1}-x^{k}\bigr\rangle \bigr), \\ &d^{k+1}_{i}=d^{k}_{i}+ \frac{1}{2} \bigl\Vert x^{k+1}-x^{k} \bigr\Vert ^{2}+\bigl\langle \Delta ^{k}_{i},x^{k+1}-x^{k} \bigr\rangle , \\ &\Delta^{k+1}_{i}=\Delta^{k}_{i}+x^{k}-x^{k+1}. \end{aligned} \end{aligned}$$
(23)

On the other hand, since f is possibly nonconvex, the linearization errors \(e^{i,k}_{f}+\eta_{\ell}d^{k}_{i}, i\in J_{\ell}\) may be negative, and therefore the model \(\check{\varphi}_{\ell}\) is not necessarily a lower approximation to \(\varphi_{\ell}\). In the case of \(e^{i,k}_{f}+\eta_{\ell}d^{k}_{i}\geq0\), one has

$$ g^{i}_{f}+\eta_{\ell} \Delta^{k}_{i}\in\partial_{e^{i,k}_{f}+\eta _{\ell}d^{k}_{i}}\check{ \varphi}_{\ell}\bigl(x^{k}\bigr). $$
(24)

In order to ensure that the linearization errors are all nonnegative, the convexification parameter \(\eta_{\ell}\) should be adjusted to asymptotically estimate the ideal convexity threshold \(\rho^{\mathrm{id}}\) in Theorem 2. Hare et al. [2] suggested a lower bound for \(\eta_{\ell }\) as follows:

$$ \eta^{\min}_{\ell}:=\max_{i\in J_{\ell},d^{k}_{i}>0}- \frac {e^{i,k}_{f}}{d^{k}_{i}}, $$
(25)

which guarantees that \(e^{i,k}_{f}+\eta d^{k}_{i}\geq0\) for all \(i\in J_{\ell}\) whenever \(\eta\geq\eta^{\min}_{\ell}\).

Finally, in our algorithm, we define the predicted descent \(\delta _{\ell}\) and the linearization error \(\varepsilon_{\ell}\) as follows:

$$\begin{aligned} &\delta_{\ell}:=f\bigl(x^{k}\bigr)+\frac{1}{2} \eta_{\ell} \bigl\Vert y^{\ell +1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr) -\bigl[ \bar{\varphi}_{\ell} \bigl(y^{\ell+1}\bigr)+h\bigl(y^{\ell+1}\bigr) \bigr], \end{aligned}$$
(26)
$$\begin{aligned} &\varepsilon_{\ell}:=F\bigl(x^{k}\bigr)-\bigl[ \bar{ \varphi}_{\ell}\bigl(x^{k}\bigr)+\bar {h}_{\ell} \bigl(x^{k}\bigr) \bigr]. \end{aligned}$$
(27)

For a fixed parameter \(\kappa\in(0,1)\), a descent step is taken if

$$ F\bigl(y^{\ell+1}\bigr)\leq F\bigl(x^{k}\bigr)- \kappa\delta_{\ell}, $$
(28)

holds, and then update the stability center \(x^{k+1}=y^{\ell+1}\). Otherwise, a null step occurs, and then the aggregate linearization and the new linearization are used to produce a better model \(\check{\varphi}_{\ell+1}\).

3.3 The algorithm

Algorithm 1

Step 0. (Initialization). Select a starting point \(y^{0}\) and set \(x^{0}=y^{0}\). Set parameters \(M>0\), \(R_{0}>0\), \(\kappa\in(0,1)\), \(\epsilon\geq0\), and \(\Gamma\geq1\). Initialize the iteration counter \(\ell=0\), the descent step counter \(k:=k(\ell)=0\) with \(i_{0}=0\). Set \((\mu_{0}, \eta_{0})=(R_{0}, 0)\) and \(J_{0}:=\{0\}\). Compute \(f(x^{0}), g^{0}_{f}\in\partial f(x^{0})\) and the bundle information \((e^{0,0}_{f}, d^{0}_{0}, \triangle^{0}_{0}):=(0, 0, 0)\). Set \(s^{-1}_{h}=g^{0}_{h}\in\partial h(x^{0})\).

Step 1. Find \(z^{\ell+1}\) by solving subproblem (13), and set

$$ \bar{\varphi}_{\ell}(\cdot)=\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)+ \bigl\langle s_{\varphi}^{\ell}, \cdot-z^{\ell+1} \bigr\rangle \quad\textit{with } s_{\varphi}^{\ell}= \mu_{\ell}\bigl(x^{k}-z^{\ell +1}\bigr)-s_{h}^{\ell-1}. $$
(29)

Step 2. Find \(y^{\ell+1}\) by solving subproblem (14), and set

$$ \bar{h}_{\ell}(\cdot)=h\bigl(y^{\ell+1}\bigr)+\bigl\langle s_{h}^{\ell}, \cdot -y^{\ell+1} \bigr\rangle \quad\textit{with } s_{h}^{\ell}=\mu_{\ell} \bigl(x^{k}-y^{\ell +1}\bigr)-s_{\varphi}^{\ell}. $$
(30)

Step 3. (Stopping criterion). Compute \(f(y^{\ell +1})\), \(h(y^{\ell+1})\), \(g_{f}^{\ell+1}\in\partial f(y^{\ell+1})\) and \(g_{h}^{\ell+1}\in\partial h(y^{\ell+1})\). If \(\delta_{\ell} \leq \epsilon\), then STOP. Otherwise, compute the new bundle elements by

$$\begin{aligned} &\Delta^{k}_{\ell+1}:=y^{\ell+1}-x^{k},\qquad d^{k}_{\ell+1}:= \bigl\Vert \Delta ^{k}_{\ell+1} \bigr\Vert ^{2}/2, \\ &e^{\ell+1,k}_{f}:=f\bigl(x^{k}\bigr)-\bigl(f \bigl(y^{\ell+1}\bigr)+\bigl\langle g^{\ell +1}_{f}, \Delta^{k}_{\ell+1}\bigr\rangle \bigr). \end{aligned}$$

Select a new index set \(J_{\ell+1}\) satisfying

$$ J_{\ell+1}\supseteq\{\ell+1, i_{k}\} \quad\textit{and}\quad \textstyle\begin{cases} \textit{either } J_{\ell+1}\supseteq J_{\ell}^{\mathrm{act}},\\ \textit{or } J_{\ell+1}\supseteq\{-\ell\}. \end{cases} $$
(31)

Step 4. (Descent test). If (28) holds, declare a descent step, set \(k(\ell +1)=k+1, i_{k+1}=\ell+1, x^{k+1}=y^{\ell+1}\), and update the bundle elements by (23). Otherwise, declare a null step, and set \(k(\ell+1)=k(\ell)\).

Step 5. (Update η). Update the convexification parameter by

$$ \textstyle\begin{cases} \eta_{\ell+1}:=\eta_{\ell} \quad\textit{if } \eta_{\ell+1}^{\min }\leq\eta_{\ell}, \\ \eta_{\ell+1}:=\Gamma\eta_{\ell+1}^{\min} \quad \textit{and} \quad R_{\ell +1}:=\mu_{\ell}+\eta_{\ell+1} \quad \textit{otherwise}. \end{cases} $$
(32)

Step 6. (Update μ). If \(F(y^{\ell +1})>F(x^{k})+M\), then the objective increase is unacceptable, let \(\mu_{\ell+1}:=\Gamma\mu_{\ell}\) and loop to Step 1; otherwise, set \(\mu_{\ell+1}:=\mu_{\ell}\).

Step 7. (Loop). Increase by 1 and go to Step 1.

Remark 1

(1) The predicted descent \(\delta_{\ell}\) and the linearization error \(\varepsilon_{\ell}\) are nonnegative (the details are given below); (2) in Step 6, if \(F(y^{\ell+1})>F(x^{k})+M\) holds, then the current model is considered as “bad”, so we should become more “conservative”, and therefore increase the proximal parameter μ by setting \(\mu_{\ell+1}=\Gamma\mu_{\ell}\). In the next section, we will prove that the number of increasing μ is finite; (3) the parameters \(\mu_{\ell}, \eta_{\ell}\) in the algorithm will be stable eventually.

Lemma 2

The predicted descent \(\delta_{\ell}\) and the linearization error \(\varepsilon_{\ell}\) are nonnegative, and satisfy

$$ \varepsilon_{\ell}=\delta_{\ell}-\frac{R_{\ell}+\mu_{\ell }}{2\mu_{\ell}^{2}} \bigl\Vert s^{\ell} \bigr\Vert ^{2}, \quad \textit{with } s^{\ell }=s_{\varphi}^{\ell}+s_{h}^{\ell}. $$
(33)

Proof

In Step 2, from (30) and (18) we know \(g^{\ell-1}_{h}=s^{\ell-1}_{h}\), hence

$$ g^{-\ell}_{\eta_{\ell}}={\mu_{\ell}} \bigl(x^{k}-z^{\ell+1}\bigr)-g^{\ell -1}_{h}={ \mu_{\ell}}\bigl(x^{k}-z^{\ell+1}\bigr)-s^{\ell-1}_{h}=s^{\ell }_{\varphi}. $$
(34)

Next, we prove \(\delta_{\ell}\geq0\) and \(\varepsilon_{\ell}\geq 0\). From (26) and (29) we have

$$\begin{aligned} \delta_{\ell}&=f\bigl(x^{k}\bigr)+ \frac{1}{2}\eta_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-\bigl[ \bar{ \varphi}_{\ell}\bigl(y^{\ell+1}\bigr)+ \bar{h}_{\ell} \bigl(y^{\ell+1}\bigr) \bigr] \\ &=f\bigl(x^{k}\bigr)+\frac{1}{2}\eta_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h \bigl(x^{k}\bigr)-\check{\varphi}_{\ell}\bigl(z^{\ell+1} \bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1} \bigr\rangle -h\bigl(y^{\ell+1}\bigr) \\ &=f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1}\bigr\rangle +\frac{1}{2} \eta_{\ell } \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-h\bigl(y^{\ell+1}\bigr). \end{aligned}$$
(35)

Let \(j=-\ell\) in (22) and from (34) we can obtain

$$\begin{aligned} &f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1}\bigr\rangle \\ &\quad=f\bigl(x^{k}\bigr)-\bigl[ f\bigl(x^{k} \bigr)-e^{-\ell}_{f}-\eta_{\ell}d^{k}_{-\ell }+ \bigl\langle g^{-\ell}_{f}+\eta_{\ell} \triangle^{k}_{-\ell},z^{\ell +1}-x^{k}\bigr\rangle \bigr]-\bigl\langle s^{\ell}_{\varphi}, y^{\ell+1}-z^{\ell +1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}- \bigl\langle s^{\ell}_{\varphi }, z^{\ell+1}-x^{k} \bigr\rangle -\bigl\langle s^{\ell}_{\varphi}, y^{\ell +1}-z^{\ell+1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}+ \bigl\langle s^{\ell}_{\varphi }, x^{k}-y^{\ell+1} \bigr\rangle . \end{aligned}$$
(36)

On the other hand, from (30), we have

$$\begin{aligned} -h\bigl(y^{\ell+1}\bigr)&=-h\bigl(y^{\ell+1}\bigr)-\bigl\langle s^{\ell}_{h},x^{k}-y^{\ell +1}\bigr\rangle + \bigl\langle s^{\ell}_{h},x^{k}-y^{\ell+1}\bigr\rangle \\ &=-\bar{h}_{\ell}\bigl(x^{k}\bigr)+\bigl\langle s^{\ell}_{h},x^{k}-y^{\ell+1}\bigr\rangle . \end{aligned}$$
(37)

Hence, by combining (36), (37) and (30), (35) can be written as

$$\begin{aligned} \delta_{\ell}={}&e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell }+ \bigl\langle s^{\ell}_{\varphi}, x^{k}-y^{\ell+1} \bigr\rangle +\frac {1}{2}\eta_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h \bigl(x^{k}\bigr)-\bar{h}_{\ell }\bigl(x^{k}\bigr) \\ &{} +\bigl\langle s^{\ell}_{h},x^{k}-y^{\ell+1} \bigr\rangle \end{aligned}$$
(38)
$$\begin{aligned} ={}&e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}+ \bigl\langle \mu_{\ell }\bigl(x^{k}-y^{\ell+1}\bigr), x^{k}-y^{\ell+1}\bigr\rangle +\frac{1}{2} \eta_{\ell } \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-\bar{h}_{\ell} \bigl(x^{k}\bigr) \\ ={}&e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}+ \frac{R_{\ell}+\mu _{\ell}}{2} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-\bar{h}_{\ell } \bigl(x^{k}\bigr). \end{aligned}$$
(39)

In Step 5, the update for \(\eta_{\ell}\) is done to ensure \(\eta _{\ell}\geq\eta^{\min}_{\ell}\) for all iterations, so that \(e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}\geq0\). Therefore, the predicted descent \(\delta_{\ell}\geq0\) since h is convex.

For \(\varepsilon_{\ell}\), from (27), one has

$$\begin{aligned} \varepsilon_{\ell}&=F\bigl(x^{k}\bigr)-\bigl[ \bar{ \varphi}_{\ell}\bigl(x^{k}\bigr)+\bar {h}_{\ell} \bigl(x^{k}\bigr) \bigr] \\ &=f\bigl(x^{k}\bigr)+h\bigl(x^{k}\bigr)-\check{ \varphi}_{\ell}\bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell}_{\varphi}, x^{k}-z^{\ell+1}\bigr\rangle - \bar{h}_{\ell }\bigl(x^{k}\bigr) \\ &=f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, x^{k}-z^{\ell+1}\bigr\rangle +h\bigl(x^{k}\bigr)- \bar{h}_{\ell }\bigl(x^{k}\bigr). \end{aligned}$$
(40)

Similar to (36), we have

$$\begin{aligned} &f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, x^{k}-z^{\ell+1}\bigr\rangle \\ &\quad=f\bigl(x^{k}\bigr)-\bigl[ f\bigl(x^{k} \bigr)-e^{-\ell}_{f}-\eta_{\ell}d^{k}_{-\ell }+ \bigl\langle g^{-\ell}_{f}+\eta_{\ell} \triangle^{k}_{-\ell},z^{\ell +1}-x^{k}\bigr\rangle \bigr] \\ &\qquad{}-\bigl\langle s^{\ell}_{\varphi}, x^{k}-z^{\ell +1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}- \bigl\langle s^{\ell}_{\varphi }, z^{\ell+1}-x^{k} \bigr\rangle -\bigl\langle s^{\ell}_{\varphi}, x^{k}-z^{\ell+1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}. \end{aligned}$$
(41)

Thus, we have

$$ \varepsilon_{\ell}=e^{-\ell}_{f}+ \eta_{\ell}d^{k}_{-\ell }+h\bigl(x^{k}\bigr)- \bar{h}_{\ell}\bigl(x^{k}\bigr)\geq0. $$
(42)

Equation (33) follows immediately from (39) and (42). □

From (33), We know that \(\delta_{\ell}\geq\varepsilon _{\ell}\). Therefore, if \(\delta_{\ell}\leq\epsilon\), then \(\varepsilon_{\ell}\leq\epsilon\). So, we only use \(\delta_{\ell }\leq\epsilon\) as the termination criterion in Step 5.

Lemma 3

The vectors \(s_{\varphi}^{\ell}\) and \(s_{h}^{\ell}\) of (30) and (29) are in fact subgradients, i.e.,

$$ s_{\varphi}^{\ell}\in\partial\check{ \varphi}_{\ell}\bigl(z^{\ell+1}\bigr) \quad\textit{and}\quad s_{h}^{\ell}\in\partial h_{\ell}\bigl(y^{\ell+1} \bigr). $$
(43)

Furthermore, we have

$$ \bar{\varphi}_{\ell}\leq\check{\varphi}_{\ell}\quad \textit{and}\quad \bar{h}_{\ell}\leq h. $$
(44)

Proof

Let \(\phi^{\ell}_{f}\) and \(\phi^{\ell}_{h}\) denote the objectives of (13) and (14), respectively, i.e.,

$$\begin{aligned} &\phi^{\ell}_{f}(\cdot):=\check{\varphi}_{\ell}( \cdot)+\bar {h}_{\ell-1}(\cdot)+\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}, \end{aligned}$$
(45)
$$\begin{aligned} &\phi^{\ell}_{h}(\cdot):=\bar{\varphi}_{\ell}( \cdot)+h(\cdot )+\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2} . \end{aligned}$$
(46)

By (13), (29) and the optimality condition of (45), we have

$$0\in\partial\check{\varphi}_{\ell}\bigl(z^{\ell+1} \bigr)+s_{h}^{\ell -1}+\mu_{\ell}\bigl(z^{\ell+1}-x^{k} \bigr)=\partial\check{\varphi}_{\ell }\bigl(z^{\ell+1}\bigr) -s^{\ell}_{\varphi}, $$

which implies \(s_{\varphi}^{\ell}\in\partial\check{\varphi}_{\ell }(z^{\ell+1})\). Similarly, by (14) and the optimality condition of (46), we obtain

$$0\in\partial h\bigl(y^{\ell+1}\bigr)+s_{\varphi}^{\ell}+ \mu_{\ell }\bigl(x^{k}-y^{\ell+1}\bigr)=\partial h \bigl(y^{\ell+1}\bigr)-s^{\ell}_{h}, $$

which implies \(s_{h}^{\ell}\in\partial h_{\ell}(y^{\ell+1})\). So (43) holds.

Equation (44) follows immediately from (43). □

4 Convergence

In this section, we will study the convergence properties of Algorithm 1. Firstly, based on the objective function of problem (1), we need to slightly modify Assumption 1 as follows.

Assumption 2

Given \(x^{0}\in\mathbb{R}^{n}\) and \(M_{0}\geq0\), there exist an open bounded set \(\mathcal{O}\) and a function H such that \(\mathcal{L}_{0}:=\{x\in\mathbb{R}^{n}: F(x)\leq F(x^{0})+M_{0}\}\subset\mathcal{O}\), and H is lower-\(C^{2}\) on \(\mathcal{O}\) with \(H\equiv f\) on \(\mathcal{L}_{0}\).

For convenience, we assume that Assumption 2 holds throughout the rest of convergence analysis.

In addition, from [24] we know that, if f is a locally Lipschitz continuous function, then the subgradients of f are locally bounded, i.e.,

$$ \bigl\{ g^{\ell}_{f}\bigr\} \text{ is bounded if } \bigl\{ y^{\ell}\bigr\} \text{ is bounded.} $$
(47)

Further, as in [9], it follows that the model subgradients \(s^{\ell}_{\varphi}\) in (43) satisfy

$$ \bigl\{ s^{\ell}_{\varphi}\bigr\} \text{ is bounded if } \bigl\{ y^{\ell}\bigr\} \text{ is bounded.} $$
(48)

Remark 2

Note that (47) implies that \(\{g^{\ell}_{\varphi}:=g^{\ell }_{f}+\eta_{\ell}\triangle^{\ell}_{i}\}\) \((g^{\ell}_{\varphi}\in\partial\check{\varphi}_{\ell})\) is bounded if \(\{y^{\ell}\}\) is bounded, since \(\{\triangle^{\ell}_{i}\}\) in (17) is bounded if \(\{y^{\ell}\}\) is bounded. Since \(s^{\ell}_{\varphi}\in\partial\check{\varphi}_{\ell}\), then \(s^{\ell}_{\varphi}\in \operatorname{conv}\{g^{j}_{\varphi}\}_{j\in J_{\ell}}\), thus we have \(\Vert s^{\ell}_{\varphi} \Vert \leq\max^{\ell }_{j=1} \Vert g^{j}_{\varphi} \Vert \), and the model \(\check{\varphi}_{\ell}\) satisfies condition (48) automatically when (47) holds.

The following lemma shows the properties of the model function \(\check {\varphi}_{\ell}\), whose proof can be found in [13, 16].

Lemma 4

For the model function \(\check{\varphi}_{\ell}\) and convexification parameter \(\eta_{\ell}\), we have

  1. (i)

    \(\check{\varphi}_{\ell}\) is a convex function.

  2. (ii)

    If \(\eta_{\ell}\geq\eta_{\ell}^{\min}\), then

    $$ \check{\varphi}_{\ell}\bigl(x^{k}\bigr)\leq f \bigl(x^{k}\bigr). $$
    (49)
  3. (iii)

    If \(\eta_{\ell+1}=\eta_{\ell}\), and either \(J_{\ell +1}\supseteq J_{\ell}^{\mathrm{act}}\) or \(J_{\ell+1}\supseteq\{-\ell\}\), then

    $$\check{\varphi}_{\ell+1}(\cdot)\geq\check{\varphi}_{\ell } \bigl(z^{\ell+1}\bigr)+\bigl\langle s^{\ell}_{\varphi}, \cdot-z^{\ell+1}\bigr\rangle $$

    if \(y^{\ell+1}\) is a null step.

  4. (iv)

    If \(J_{\ell}\supseteq\{\ell\}\), then

    $$\check{\varphi}_{\ell}(\cdot)\geq f\bigl(y^{\ell}\bigr)+ \frac{1}{2}\eta _{\ell} \bigl\Vert y^{\ell}-x^{k} \bigr\Vert ^{2}+\bigl\langle g^{\ell}_{f}+ \eta_{\ell }\bigl(y^{\ell}-x^{k}\bigr), \cdot-y^{\ell}\bigr\rangle , $$

    for some \(g^{\ell}_{f}\in\partial f(y^{\ell})\).

  5. (v)

    If \(\eta_{\ell}\geq\rho^{\mathrm{id}}\), then

    $$ \check{\varphi}_{\ell}(\omega)\leq f(\omega)+ \frac{1}{2}\eta _{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \quad\textit{for all } \omega\in\mathcal{L}_{0}. $$
    (50)

From the updating rule in Step 5 of Algorithm 1, the convexification parameter \(\eta_{\ell}\) is either unchanged or increasing. The following lemma shows that \(\eta_{\ell}\) can be fixed in a finite number of iterations, whose proof can be found in [13].

Lemma 5

There exist an index \(\ell_{1}\) and a positive constant \(\overline {\eta}>0\) such that

$$\eta_{\ell}\equiv\overline{\eta}, \quad\textit{for all } \ell\geq \ell_{1}. $$

Lemma 6

Suppose that there exists an integer K such that, for all \(\ell\geq K\), only null steps occur without increasing μ. Then the following results hold:

  1. (i)

    The sequences

    $$\begin{aligned} &\biggl\{ \phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)=\check{ \varphi}_{\ell}\bigl(z^{\ell +1}\bigr)+\bar{h}_{\ell-1} \bigl(z^{\ell+1}\bigr)+ \frac{1}{2}\mu_{\ell} \bigl\Vert z^{\ell+1}-x^{k} \bigr\Vert ^{2}\biggr\} _{\ell\geq K}, \\ &\biggl\{ \phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)=\bar{ \varphi}_{\ell}\bigl(y^{\ell +1}\bigr)+h\bigl(y^{\ell+1}\bigr)+ \frac{1}{2}\mu_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}\biggr\} _{\ell\geq K} \end{aligned}$$

    are nondecreasing and convergent.

  2. (ii)

    The sequences \(\{y^{\ell+1}\}\) and \(\{z^{\ell+1}\}\) are bounded, \(\Vert z^{\ell+1}-y^{\ell+1} \Vert \rightarrow0\) and \(\Vert z^{\ell+2}-y^{\ell+1} \Vert \rightarrow0\) as \(\ell\rightarrow \infty\).

Proof

First, using partial linearizations of the subproblems to show (i) is hold. Fixed \(\ell\geq K\). By the definitions in (13) and (29), we have \(\bar{\varphi}_{\ell}(z^{\ell+1})=\check{\varphi}_{\ell }(z^{\ell+1})\) and

$$ z^{\ell+1}=\arg\min\biggl\{ \bar{\phi}^{\ell}_{f}( \cdot):=\bar{\varphi }_{\ell}(\cdot)+\bar{h}_{\ell-1}(\cdot) + \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , $$
(51)

from \(\nabla\bar{\phi}^{\ell}_{f}(z^{\ell+1})=0\). Since \(\bar {\phi}^{\ell}_{f}\) is quadratic and \(\bar{\phi}^{\ell}_{f}(z^{\ell+1})=\phi^{\ell}_{f}(z^{\ell +1})\), by Taylor’s expansion

$$\begin{aligned} \bar{\phi}^{\ell}_{f}(\cdot)&=\bar{ \phi}^{\ell}_{f}\bigl(z^{\ell +1}\bigr)+\nabla\bar{ \phi}^{\ell}_{f}\bigl(z^{\ell+1}\bigr) \bigl( \cdot-z^{\ell+1}\bigr) +\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-z^{\ell+1} \bigr\Vert ^{2} \\ &=\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot -z^{\ell+1} \bigr\Vert ^{2}. \end{aligned}$$
(52)

Similarly, by the definitions in (14) and (30), we have \(\bar{h}_{\ell}(y^{\ell+1})=h(y^{\ell +1})\), and

$$\begin{aligned} &y^{\ell+1}=\arg\min\biggl\{ \bar{\phi}^{\ell}_{h}( \cdot):=\bar {\varphi}_{\ell}(\cdot)+\bar{h}_{\ell}(\cdot) + \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$
(53)
$$\begin{aligned} &\bar{\phi}^{\ell}_{h}(\cdot)=\phi^{\ell}_{h} \bigl(y^{\ell+1}\bigr)+\frac {1}{2}\mu_{\ell} \bigl\Vert \cdot-y^{\ell+1} \bigr\Vert ^{2}. \end{aligned}$$
(54)

Next, to bound the objective values of the linearized subproblem (51) and (53) from above, we use \(\bar{\varphi}_{\ell}\leq\check{\varphi}_{\ell}\) and \(\bar{h}_{\ell-1}\leq h\), \(\bar{h}_{\ell}\leq h\) of (44) and \(\check{\varphi}_{\ell}(x^{k})\leq f(x^{k})\) in (ii) of Lemma 4

$$\begin{aligned} &\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert x^{k}-z^{\ell +1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{f}\bigl(x^{k} \bigr)\leq\check{\varphi}_{\ell}\bigl(x^{k}\bigr) +h \bigl(x^{k}\bigr)\leq F\bigl(x^{k}\bigr), \end{aligned}$$
(55)
$$\begin{aligned} &\phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert x^{k}-y^{\ell +1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{h}\bigl(x^{k} \bigr)\leq\check{\varphi}_{\ell}\bigl(x^{k}\bigr) +h \bigl(x^{k}\bigr)\leq F\bigl(x^{k}\bigr). \end{aligned}$$
(56)

From (14) and (51), we have \(\bar{\phi }_{f}^{\ell}\leq\phi^{\ell}_{h}\). On the other hand, since only null step occurred, so \(x^{k+1}=x^{k}\), the algorithm ensures that \(\mu_{\ell}=\mu_{\ell +1}\), and \(\bar{\varphi}_{\ell}\leq\check{\varphi}_{\ell+1}\) by (iii) of Lemma 4, we can obtain \(\bar{\phi}^{\ell}_{h}\leq\phi ^{\ell+1}_{f}\). By (52) and (54), we see that

$$\begin{aligned} &\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert y^{\ell +1}-z^{\ell+1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{f}\bigl(y^{\ell+1} \bigr) \leq\phi^{\ell}_{h}\bigl(y^{\ell+1} \bigr), \end{aligned}$$
(57)
$$\begin{aligned} &\phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert z^{\ell +2}-y^{\ell+1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{h}\bigl(z^{\ell+2} \bigr) \leq\phi^{\ell+1}_{f}\bigl(z^{\ell+2} \bigr). \end{aligned}$$
(58)

In particular, from (57) and (58), we have the relation

$$\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)\leq \phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)\leq \phi^{\ell+1}_{f}\bigl(z^{\ell+2}\bigr) $$

which implies that \(\{\phi^{\ell}_{f}(z^{\ell+1})\}_{\ell\geq K}\) and \(\{\phi^{\ell}_{h}(y^{\ell+1})\}_{\ell\geq K}\) are nondecreasing sequences. Together with the bound of \(F(x^{k})\) from (55) and (56), the convergence is established.

For (ii), we have proved the convergence of \(\{\phi^{\ell }_{f}(z^{\ell+1})\}\) and \(\{\phi^{\ell}_{h}(y^{\ell+1})\}\) in (i) when \(\ell\geq K\), so there must have a common limit, say \(\phi_{\infty}\leq F(x^{k})\), such that

$$ \phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)\rightarrow \phi_{\infty}, \qquad\phi^{\ell }_{h}\bigl(y^{\ell+1} \bigr)\rightarrow\phi_{\infty} $$
(59)

and we have \(\Vert z^{\ell+1}-y^{\ell+1} \Vert \rightarrow0\) and \(\Vert z^{\ell +2}-y^{\ell+1} \Vert \rightarrow0\) from (57) and (58), \(\{y^{\ell+1}\}\) and \(\{z^{\ell+1}\}\) are bounded from (55) and (56). Then the sequences \(\{g^{\ell}_{f}\}\) and \(\{s^{\ell}_{\varphi}\}\) are bounded by (47) and (48). □

The following lemma shows that the number of times of increasing μ is finite.

Lemma 7

Suppose that \(i_{k}\in J_{\ell}\), and let \(N_{\ell}\) be the number of times of increasing μ. Then there exists a positive constant L such that

$$ N_{\ell}\leq \biggl\lceil \frac{\ln\frac{L^{2}}{M\mu_{0}}}{\ln\Gamma } \biggr\rceil , $$
(60)

where \(\lceil a\rceil\) is the smallest integer greater than or equal to a. As a result, there exists an index \(\ell_{2}\) such that

$$\mu_{\ell}=\bar{\mu}, \quad\textit{for all } \ell\geq\ell_{2}. $$

Proof

Let \(\ell_{r}\) be the index corresponding to the rth time that μ increases, then when \(\ell_{r}+1\leq\ell<\ell_{r+1}\), we have

$$ \mu_{\ell}=\Gamma^{r}\mu_{0}. $$
(61)

Since \(i_{k}\in J_{\ell}\), from (24), we obtain \(g^{i_{k}}\in\partial\check{\varphi}(x^{k})\) by writing \(i=i_{k}\), and it also holds that \(g_{f}^{i_{k}}\in\partial f(x^{k})\) from (21), so \(\Vert g_{f}^{i_{k}} \Vert \) is bounded. Hence

$$\begin{aligned} &p_{\mu_{\ell}}(\bar{\varphi}_{\ell}+h) \bigl(x^{k}\bigr) \\ &\quad=\arg\min\biggl\{ \bar{\varphi}_{\ell}(y)+h(y)+\frac{1}{2} \mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2} \biggr\} \\ &\quad\in\biggl\{ y \big| \bar{\varphi}_{\ell}(y)+h(y)+\frac{1}{2} \mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2} \leq\bar{\varphi}_{\ell}\bigl(x^{k}\bigr)+h \bigl(x^{k}\bigr)\biggr\} \\ &\quad\subseteq\biggl\{ y \big| \bar{\varphi}_{\ell}\bigl(x^{k}\bigr)+ \bigl\langle g_{f}^{i_{k}}, y-x^{k}\bigr\rangle +h \bigl(x^{k}\bigr)+\bigl\langle g^{k}_{h}, y-x^{k}\bigr\rangle \\ &\qquad{}+\frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq\bar{\varphi}_{\ell }\bigl(x^{k} \bigr)+h\bigl(x^{k}\bigr)\biggr\} \\ &\quad=\biggl\{ y \big| \bigl\langle g_{f}^{i_{k}}, y-x^{k} \bigr\rangle +\bigl\langle g^{k}_{h},y-x^{k}\bigr\rangle +\frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq0\biggr\} \\ &\quad=\biggl\{ y \big|\frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq-\bigl\langle g_{f}^{i_{k}}+g^{k}_{h}, y-x^{k}\bigr\rangle \biggr\} \\ &\quad\subseteq\biggl\{ y \big| \frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq \bigl\Vert g_{f}^{i_{k}}+g^{k}_{h} \bigr\Vert \bigl\Vert y-x^{k} \bigr\Vert \biggr\} \\ &\quad=\biggl\{ y \big| \bigl\Vert y-x^{k} \bigr\Vert \leq \frac{2 \Vert g_{f}^{i_{k}} \Vert +2 \Vert g^{k}_{h} \Vert }{\mu_{\ell }}\biggr\} . \end{aligned}$$

When \(\ell_{r}+1\leq\ell<\ell_{r+1}\), if \(y^{\ell+1}\) is a null step, we know \(g^{\ell}_{f}\) is bounded from Lemma 6, else \(y^{\ell+1}\) is a descent step, and the corresponding subgradient \(g^{i_{k(\ell+1)}}\in\partial f(x^{k+1})\) is also bounded. Therefore, there exists a constant \(L>0\) such that \(\max\{ \Vert g_{f}^{i_{k}} \Vert , \Vert g_{f}^{\ell+1} \Vert , \Vert g^{k}_{h} \Vert , \Vert g^{\ell+1}_{h} \Vert \}\leq\frac{L}{2}\). Thus we have

$$y^{\ell+1}=p_{\mu_{\ell}}(\bar{\varphi}_{\ell}+h) \bigl(x^{k}\bigr)\in\biggl\{ y\Big| \bigl\Vert y-x^{k} \bigr\Vert \leq\frac{2L}{\mu_{\ell}}\biggr\} . $$

This together with (61) shows that

$$\begin{aligned} & \bigl\vert F\bigl(y^{\ell+1}\bigr)-F\bigl(x^{k}\bigr) \bigr\vert \\ &\quad= \bigl\vert f\bigl(x^{k}\bigr)+\bigl\langle g^{\ell+1}_{f}, y^{\ell+1}-x^{k}\bigr\rangle -f\bigl(x^{k}\bigr)+h \bigl(x^{k}\bigr)+\bigl\langle g^{\ell+1}_{h}, y^{\ell+1}-x^{k}\bigr\rangle -h\bigl(x^{k}\bigr) \bigr\vert \\ &\quad\leq \bigl\Vert g^{\ell+1}_{f}+g^{\ell+1}_{h} \bigr\Vert \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert \\ &\quad\leq2\cdot\frac{L}{2}\cdot\frac{2L}{\Gamma^{r}\mu_{0}}\leq M. \end{aligned}$$

Thus, if

$$r\geq\frac{\ln\frac{L^{2}}{M\mu_{0}}}{\ln\Gamma}, $$

then

$$f\bigl(y^{\ell+1}\bigr)\leq f\bigl(x^{k}\bigr)+M, \quad\forall \ell_{r}+1\leq\ell\leq\ell_{r+1}. $$

This means that the number of times \(N_{\ell}\) of increasing μ satisfies (60). The latter part of the lemma follows immediately from the above result. □

Theorem 3

If \(\delta_{\ell}=0\) and \(\eta_{\ell}\geq\rho^{\mathrm{id}}\), then \(x^{k}\) is a stationary point of F.

Proof

From (33), we have the relation

$$\delta_{\ell}=\varepsilon_{\ell}+\frac{R_{\ell}+\mu_{\ell }}{2\mu_{\ell}^{2}} \bigl\Vert s^{\ell} \bigr\Vert ^{2}. $$

If \(\delta_{\ell}=0\), then \(\varepsilon_{\ell}=0\) and \(s^{\ell }=0\), and therefore

$$ x^{k}=y^{\ell+1}=p_{\mu_{\ell}}(\bar{ \varphi}_{\ell}+h) \bigl(x^{k}\bigr). $$
(62)

From the last result of Theorem 1, we know that \(x^{k}\) is a stationary point of \(\bar{\varphi}_{\ell}+h\). In addition, from \(\varepsilon_{\ell}=0\) in (27), we have

$$f\bigl(x^{k}\bigr)+h\bigl(x^{k}\bigr)=\bar{ \varphi}_{\ell}\bigl(x^{k}\bigr)+\bar{h}_{\ell} \bigl(x^{k}\bigr). $$

This together with \(\bar{h}_{\ell}(x^{k})\leq h(x^{k})\) shows that

$$ f\bigl(x^{k}\bigr)\leq\bar{\varphi}_{\ell} \bigl(x^{k}\bigr). $$
(63)

On one hand, for \(\omega\in\mathcal{L}_{0}\), if \(\eta_{\ell}\geq \rho^{\mathrm{id}}\), we obtain by (62) and (63)

$$ F\bigl(x^{k}\bigr)=f\bigl(x^{k}\bigr)+h \bigl(x^{k}\bigr)\leq\bar{\varphi}_{\ell }\bigl(x^{k} \bigr)+h\bigl(x^{k}\bigr)\leq\bar{\varphi}_{\ell}(\omega)+h( \omega) +\frac{1}{2}\mu_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$
(64)

From the convexity of \(\check{\varphi}_{\ell}\) and (50), we have

$$\bar{\varphi}_{\ell}(\omega)\leq\check{\varphi}_{\ell}(\omega ) \leq f(\omega)+\frac{1}{2}\eta_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$

So, (64) can be written as

$$ F\bigl(x^{k}\bigr)\leq f(\omega)+\frac{1}{2} \eta_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}+h(\omega)+\frac{1}{2}\mu_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$
(65)

On the other hand, for \(\omega\notin\mathcal{L}_{0}\), from (28) we can obtain

$$ F\bigl(x^{k}\bigr)\leq F\bigl(x^{0}\bigr) \leq F\bigl(x^{0}\bigr)+M\leq F(\omega)=F(\omega)+\frac {1}{2}R_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$
(66)

Combining (64) and (66), we have

$$F\bigl(x^{k}\bigr)\leq F(\omega)+\frac{1}{2}R_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \quad\text{for all } \omega\in\mathbb{R}^{n}. $$

Hence

$$x^{k}=p_{R_{\ell}}F\bigl(x^{k}\bigr), $$

which together with Theorem 1 shows that \(x^{k}\) is a stationary point of F. □

We are now in a position to present the main convergence result of our algorithm. As usual in bundle methods, two cases are considered: the algorithm generates finite number of descent steps; and the algorithm generates infinite number of descent steps. We set the stopping parameter \(\epsilon=0\).

Theorem 4

Let η̄ be stabilized value for the convexification parameter sequence and assume \(\bar{\eta}\geq\rho^{\mathrm{id}}\). Then the following mutually exclusive situations hold:

  1. (i)

    Algorithm 1 generates finite number of descent steps followed by infinitely many null steps. Let be the last stability center. Then \(y^{\ell +1}\rightarrow\bar{x}\), and is a stationary point of F.

  2. (ii)

    Algorithm 1 generates an infinite sequence \(\{ x^{k}\}\) of stability centers. Then any accumulation point of \(\{x^{k}\}\) is a stationary point of F.

Proof

For (i), without loss of generality, we may assume \(\eta_{\ell }=\overline{\eta}\), \(\mu_{\ell}=\overline{\mu}\), and \(R_{\ell }=\overline{R}\) throughout. As in Lemma 6, for the bounded sequences \(\{y^{\ell}\}\) and \(\{ z^{\ell}\}\) we showed that \(\Vert y^{\ell}-z^{\ell} \Vert \rightarrow0\) and \(\Vert z^{\ell+1}-y^{\ell } \Vert \rightarrow0\) as \(\ell\rightarrow\infty\). Therefore \(y^{\ell_{i}}\rightarrow p\) as \(i\rightarrow\infty\) implies \(z^{\ell_{i}}\rightarrow p\) and \(z^{\ell_{i}+1}\rightarrow p\) as \(i\rightarrow\infty\). For \(\omega\in\mathcal{L}_{0}\) near p, by (50) and the convexity of h, we have

$$\begin{aligned} F(\omega)&=f(\omega)+h(\omega) \\ &\geq\check{\varphi}_{\ell_{i}}(\omega)-\frac{1}{2}\bar{\eta} \Vert \omega-\bar{x} \Vert ^{2}+h(\omega) \\ &\geq\bar{\varphi}_{\ell_{i}}(\omega)-\frac{1}{2}\bar{\eta} \Vert \omega-\bar{x} \Vert ^{2}+\bar{h}_{\ell_{i}-1}(\omega) \\ &=\check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)+\bigl\langle s^{\ell _{i}}_{\varphi}, \omega-z^{\ell_{i}+1}\bigr\rangle - \frac{1}{2}\bar {\eta} \Vert \omega-\bar{x} \Vert ^{2}+h \bigl(y^{\ell_{i}}\bigr)+\bigl\langle s^{\ell _{i}-1}_{h}, \omega-y^{\ell_{i}}\bigr\rangle . \end{aligned}$$
(67)

Let \(x^{k}=\bar{x}\), \(\mu_{\ell}=\bar{\mu}\), from (29) and the boundedness of \(s^{\ell}_{\varphi}+s^{\ell -1}_{h}=\overline{\mu}(\bar{x}-z^{\ell+1})\), we know

$$\begin{aligned} &\bigl\langle s^{\ell_{i}}_{\varphi}, \omega-z^{\ell_{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, \omega-y^{\ell_{i}} \bigr\rangle \\ &\quad=\bigl\langle s^{\ell_{i}}_{\varphi}, \omega-z^{\ell_{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, \omega-z^{\ell_{i}+1}+z^{\ell _{i}+1}-y^{\ell_{i}} \bigr\rangle \\ &\quad=\bar{\mu}\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega-z^{\ell _{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell_{i}+1}-y^{\ell _{i}} \bigr\rangle . \end{aligned}$$
(68)

Note that

$$\begin{aligned} &{-}\frac{1}{2}\bar{\eta} \Vert \omega-\bar{x} \Vert ^{2} \end{aligned}$$
(69)
$$\begin{aligned} &\quad=-\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1}+z^{\ell _{i}+1}- \bar{x} \bigr\Vert ^{2} \\ &\quad=-\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2}-\frac {1}{2}\bar{\eta} \bigl\Vert z^{\ell_{i}+1}- \bar{x} \bigr\Vert ^{2} +\bar{\eta}\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega-z^{\ell _{i}+1}\bigr\rangle . \end{aligned}$$
(70)

Combining (68) and (70), (67) can be written as

$$\begin{aligned} F(\omega)={}&\check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)- \frac{1}{2}\bar{\eta } \bigl\Vert z^{\ell_{i}+1}-\bar{x} \bigr\Vert ^{2}+h\bigl(y^{\ell_{i}}\bigr) +(\bar{\eta}+\bar{\mu})\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega -z^{\ell_{i}+1}\bigr\rangle \\ &{} +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell_{i}+1}-y^{\ell_{i}} \bigr\rangle -\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2} . \end{aligned}$$

By Claim (iv) of Lemma 4 written with \(\ell=\ell_{i}\) for \(\omega=z^{\ell_{i}+1}\), we have the following inequality:

$$\begin{aligned} \check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)-\frac{1}{2} \overline {\eta} \bigl\Vert z^{\ell_{i}+1}-\bar{x} \bigr\Vert ^{2}\geq{}& f\bigl(y^{\ell_{i}}\bigr)+\frac{1}{2}\overline{ \eta} \bigl\Vert y^{\ell _{i}}-\bar{x} \bigr\Vert ^{2}- \frac{1}{2}\overline{\eta} \bigl\Vert z^{\ell _{i}+1}-\bar{x} \bigr\Vert ^{2} \\ &{} +\bigl\langle g^{\ell_{i}}_{f}+\overline{\eta} \bigl(y^{\ell_{i}}-\bar {x}\bigr), z^{\ell_{i}+1}-y^{\ell_{i}}\bigr\rangle . \end{aligned}$$

Then

$$\begin{aligned} F(\omega)={}&f(\omega)+h(\omega) \\ \geq{}&\check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)- \frac{1}{2}\bar {\eta} \bigl\Vert z^{\ell_{i}+1}-\bar{x} \bigr\Vert ^{2}+h\bigl(y^{\ell_{i}}\bigr) -\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2} \\ &{} +(\bar{\eta}+\bar{\mu})\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega-z^{\ell_{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell _{i}+1}-y^{\ell_{i}}\bigr\rangle \\ \geq{}& f\bigl(y^{\ell_{i}}\bigr)+h\bigl(y^{\ell_{i}}\bigr)+ \frac{1}{2}\bar{\eta} \bigl\Vert y^{\ell_{i}}-\bar{x} \bigr\Vert ^{2} -\frac{1}{2}\bar{\eta} \bigl\Vert z^{\ell_{i}+1}- \bar{x} \bigr\Vert ^{2}-\frac {1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2} \\ & {}+\bigl\langle g^{\ell_{i}}_{f}+\bar{\eta}\bigl(y^{\ell_{i}}- \bar {x}\bigr),z^{\ell_{i}+1}-y^{\ell_{i}}\bigr\rangle +(\bar{\eta}+\bar{ \mu})\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega -z^{\ell_{i}+1}\bigr\rangle \\ &{} +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell_{i}+1}-y^{\ell_{i}} \bigr\rangle . \end{aligned}$$

From \(s^{\ell-1}_{h}=\overline{\mu}(\bar{x}-z^{\ell+1})-s^{\ell }_{\varphi}\), the bounded sequence \(\{\overline{\mu}(\bar {x}-z^{\ell+1})\}\), \(\{s^{\ell}_{\varphi}\}\), \(\{g^{\ell}_{f}\}\) and \(\{y^{\ell}\}\), we know that \(\{s^{\ell-1}_{h}\}\) and \(\{g^{\ell_{i}}_{f}+ \overline{\eta}(y^{\ell_{i}}-\bar{x})\}\) are bounded. Taking the limit as \(i\rightarrow\infty\), and using the fact that f is continuous at p, we obtain

$$\begin{aligned} F(\omega)={}&f(\omega)+h(\omega) \\ \geq{}&\lim_{i\rightarrow\infty}\check{\varphi}_{\ell_{i}} \bigl(z^{\ell _{i}+1}\bigr)-\frac{1}{2}\bar{\eta} \Vert p-\bar{x} \Vert ^{2} +h(p)-\frac{1}{2}\bar{\eta} \Vert \omega-p \Vert ^{2}+\bar{R}\langle\bar {x}-p, \omega-p\rangle \\ \geq{}& f(p)+h(p)-\frac{1}{2}\bar{\eta} \Vert \omega-p \Vert ^{2}+\bar {R}\langle\bar{x}-p, \omega-p\rangle \\ ={}&F(p)-\frac{1}{2}\bar{\eta} \Vert \omega-p \Vert ^{2}+ \bar{R}\langle\bar {x}-p, \omega-p\rangle, \end{aligned}$$
(71)

for all \(\omega\in\mathcal{L}_{0}\) near p. Since \(\frac {1}{2}\overline{\eta} \Vert \omega-p \Vert ^{2}=o( \Vert \omega-p \Vert )\), the last inequality means that \(\overline{R}(\bar{x}-p)\in\partial F(p)\) by Definition 8.3 in [17], which implies \(p=p_{\bar{R}}F(\bar{x})\) by Theorem 1. Since f is continuous and condition (50) holds at all accumulation points of \(\{y^{\ell}\}\), then the entire sequence \(\{y^{\ell}\}\) converges to the proximal point \(p_{\bar{R}}F(\bar{x})\). Furthermore, evaluating the relations at \(\omega=p\) shows that the following equation holds for the entire sequence by Theorem 2 in [16]:

$$ \lim_{i\rightarrow\infty}\check{\varphi}_{\ell_{i}} \bigl(z^{\ell _{i}+1}\bigr)=f(p)+\frac{1}{2}\overline{\eta} \Vert p- \bar{x} \Vert ^{2}. $$
(72)

So as \(\ell\rightarrow\infty\), the whole sequence

$$\bigl\{ y^{\ell}\bigr\} \rightarrow p=p_{\bar{R}}F(\bar{x}) \quad\text{with } \check{\varphi}_{\ell}\bigl(z^{\ell+1}\bigr)\rightarrow f(p) +\frac{1}{2}\overline{\eta} \Vert p-\bar{x} \Vert ^{2}. $$

Thus, from (26) we have

$$\begin{aligned} \delta_{\ell} &=f(\bar{x})+\frac{1}{2}\bar{\eta} \bigl\Vert y^{\ell+1}-\bar{x} \bigr\Vert ^{2}+h(\bar{x})-\bigl[ \bar{ \varphi}_{\ell}\bigl(y^{\ell+1}\bigr)+h\bigl(y^{\ell +1}\bigr) \bigr] \\ &=f(\bar{x})+\frac{1}{2}\bar{\eta} \bigl\Vert y^{\ell+1}-\bar{x} \bigr\Vert ^{2}+h(\bar{x})-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1}\bigr\rangle -h\bigl(y^{\ell+1}\bigr) \\ &\rightarrow f(\bar{x})+\frac{1}{2}\bar{\eta} \Vert p-\bar{x} \Vert ^{2}+h(\bar{x})-f(p)-\frac{1}{2}\bar{\eta} \Vert p-\bar{x} \Vert ^{2}-h(p) \\ &=f(\bar{x})+h(\bar{x})-f(p)-h(p) \\ &=F(\bar{x})-F(p). \end{aligned}$$

Since null step does not satisfy the descent test in Step 4 of the algorithm, we have \(F(y^{\ell+1})> F(\bar{x})-\kappa\delta_{\ell}\). Taking the limit as \(\ell\rightarrow\infty\) gives the relation \(F(p)\geq F(\bar{x})-\kappa(F(\bar{x})-F(p))\), so \(F(\bar{x})\leq F(p)\) because \(\kappa\in(0,1)\). But \(p=p_{\bar{R}}F(\bar{x})\) implies

$$F(p)+\bar{R} \Vert p-\bar{x} \Vert ^{2}\leq F(\bar{x}), $$

which shows that \(\bar{x}=p\). That is, \(\bar{x}=p_{\bar{R}}F(\bar {x})\), so is a stationary point of F from Theorem 1.

For (ii), \(\mathcal{L}_{0}\) is a compact set, and the sequence \(\{ x^{k}\}\subset\mathcal{L}_{0}\), so it has an accumulation point, i.e., there exists some infinite set K such that \(x^{k}\rightarrow\hat {x}\in\mathcal{L}_{0}\) as \(K\ni k\rightarrow\infty\). Since \(x^{k+1}=y^{i_{k+1}}\), let \(j_{k}=i_{k+1}-1\) so that \(x^{k+1}=p_{\bar{\mu}}(\bar{\varphi}_{j_{k}}+h)(x^{k})\). The descent test

$$F\bigl(x^{k+1}\bigr)\leq F\bigl(x^{k}\bigr)-\kappa \delta_{j_{k}} $$

implies that, as \(k\rightarrow\infty\), either \(F(x^{k})\searrow -\infty\), or \(\delta_{j_{k}}\rightarrow0\). By Assumption 2, \(F(x^{k})\) is bounded below, therefore, \(\delta _{j_{k}}\rightarrow0\). From (39), this means that

$$\bigl\Vert y^{j_{k}+1}-x^{k} \bigr\Vert ,\qquad e^{-j_{k}}_{f}+\bar{\eta}d^{k}_{-j_{k}},\qquad h \bigl(x^{k}\bigr)-\bar{h}_{j_{k}}\bigl(x^{k}\bigr) $$

must converge to 0. By

$$\bigl\Vert z^{j_{k}+1}-x^{k} \bigr\Vert \leq \bigl\Vert z^{j_{k}+1}-y^{j_{k}+1} \bigr\Vert + \bigl\Vert y^{j_{k}+1}-x^{k} \bigr\Vert $$

and

$$\bigl\Vert z^{j_{k}+1}-y^{j_{k}+1} \bigr\Vert \rightarrow0 $$

in Lemma 6, we have

$$\bigl\Vert z^{j_{k}+1}-x^{k} \bigr\Vert \rightarrow0. $$

By (21), \(\check{\varphi}_{j_{k}}(z^{j_{k}+1})-f(x^{k})\rightarrow 0\) as \(k\rightarrow\infty\), from (29) we know that

$$\bar{\varphi}_{j_{k}}\bigl(y^{j_{k}+1}\bigr)=\check{\varphi }_{j_{k}}\bigl(z^{j_{k}+1}\bigr)+\bigl\langle s^{j_{k}}_{\varphi}, y^{j_{k}+1}-z^{j_{k}+1}\bigr\rangle . $$

Therefore,

$$\bar{\varphi}_{j_{k}}\bigl(y^{j_{k}+1}\bigr)-f\bigl(x^{k} \bigr)\rightarrow0 $$

as \(k\rightarrow\infty\). Consider now \(k\in K\). Since \(\Vert x^{k+1}-x^{k} \Vert = \Vert y^{j_{k}+1}-x^{k} \Vert \rightarrow0\), both \(x^{k+1}\) and \(x^{k}\) converge to \(x^{\inf}\) as \(K\ni k\rightarrow \infty\) with

$$\bar{\varphi}_{j_{k}}\bigl(x^{k+1}\bigr)\rightarrow f(\hat{x}). $$

And from \(x^{k+1}=p_{\bar{\mu}}(\bar{\varphi}_{j_{k}}+h)(x^{k})\), \(\bar {\eta}\geq\rho^{\mathrm{id}}\) and (50), for all \(\omega\in\mathcal{L}_{0}\),

$$\begin{aligned} &\bar{\varphi}_{j_{k}}\bigl(x^{k+1}\bigr)+h\bigl(x^{k+1} \bigr)+\frac{1}{2}\bar{\mu} \bigl\Vert x^{k+1}-x^{k} \bigr\Vert ^{2} \\ &\quad\leq\bar{\varphi}_{j_{k}}(\omega)+h(\omega)+ \frac{1}{2}\bar{\mu } \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \\ &\quad\leq\check{\varphi}_{j_{k}}(\omega)+h(\omega)+\frac{1}{2}\bar{\mu } \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \\ &\quad\leq f(\omega)+h(\omega)+\frac{1}{2}\bar{R} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. \end{aligned}$$

Therefore, taking the limit \(k\in K\), we have

$$f(\hat{x})+h(\hat{x})\leq f(\omega)+h(\omega)+\frac{1}{2}\bar{R} \Vert \omega-\hat{x} \Vert ^{2},\quad \text{for all } \omega\in \mathcal{L}_{0}. $$

On the other hand, \(x^{\inf}\in\mathcal{L}_{0}\) and for any \(\omega \notin\mathcal{L}_{0}\), it follows

$$F(\hat{x})\leq F\bigl(x^{0}\bigr)\leq F\bigl(x^{0}\bigr)+M< F(\omega)< F(\omega)+\frac {1}{2}\bar{R} \Vert \omega-\hat{x} \Vert ^{2}. $$

Hence,

$$F(\hat{x})\leq F(\omega)+\frac{1}{2}\bar{R} \Vert \omega-\hat{x} \Vert ^{2}, \quad\text{for all } \omega\in\mathbb{R}^{n}. $$

Therefore, \(\hat{x}=p_{\bar{R}}F(\hat{x})\) with \(\bar{R}\geq\rho ^{\mathrm{id}}\), hence is a stationary point of F from Theorem 1. □

5 Numerical results

This section aims to test the practical effectiveness of Algorithm 1. We tested a set of nine problems. The first set of seven problems are generalized from the unconstrained versions in [25] by imposing suitable constraints, the second set of two nonconvex unconstrained problems are taken from [13, 26] which are the sum of a nonconvex function and a convex function.

All numerical experiments were implemented by using MATLAB R2014a, and on a ThinkPad laptop with Windows 7 operating system. The first seven problems have the form of (2) with \(C=\{ x: \Vert x-a \Vert \leq b\}\), where \(a\in\mathbb{R}^{n}\) and \(0< b\in\mathbb{R}\) are given below. These problems are transformed to the form of (3) by using the indicator function. The detailed data for the seven problems are listed below. For simplicity, we use the MATLAB notations: ones(p,q) and zeros(p,q) denote p-by-q matrices of ones and zeros, respectively.

CB2: \(f(x)=\max\{x_{1}^{2}+x_{2}^{4}, (2-x_{1})^{2}+(2-x_{2})^{2}, 2e^{x_{2}-x_{1}}\}\), \(y^{0}=(3,3)^{T}\), \(a=(0,0)^{T}\), \(b=1\).

CB3: \(f(x)=\max\{x_{1}^{4}+x_{2}^{2}, (2-x_{1})^{2}+(2-x_{2})^{2}, 2e^{x_{2}-x_{1}}\}\), \(y^{0}=(3,3)^{T}\), \(a=(3,3)^{T}\), \(b=1\).

LQ: \(f(x)=\max\{-x_{1}-x_{2},-x_{1}-x_{2}+x_{1}^{2}+x_{2}^{2}-1\}\), \(y^{0}=(1,1)^{T}\), \(a=(1,-1)^{T}\), \(b=1\).

Mifflin1: \(f(x)=-x_{1}+20\max\{x_{1}^{2}+x_{2}^{2}-1, 0\}\), \(y^{0}=(1.5,0.5)^{T}\), \(a=(-2,2)^{T}\), \(b=1\).

Rosen–Suzuki: \(f(x)=\max_{1\leq i\leq4}f_{i}(x)\), \(y^{0}=(1,2.1,-3,-0.9)^{T}\), \(a=(1,2,3,4)^{T}\), \(b=2\), with

$$\begin{aligned} &f_{1}(x)=x_{1}^{2}+x_{2}^{2}+2x_{3}^{2}+x_{4}^{2}-5x_{1}-5x_{2}-21x_{3}+7x_{4}, \\ &f_{2}(x)=f_{1}(x)+10\bigl(x_{1}^{2}+x_{2}^{2}+x_{3}^{2}+x_{4}^{2}+x_{1}-x_{2}+x_{3}-x_{4}-8 \bigr), \\ &f_{3}(x)=f_{1}(x)+10\bigl(x_{1}^{2}+2x_{2}^{2}+x_{3}^{2}+2x_{4}^{2}-x_{1}-x_{4}-10 \bigr), \\ &f_{4}(x)=f_{1}(x)+10\bigl(2x_{1}^{2}+x_{2}^{2}+x_{3}^{2}+2x_{1}-x_{2}-x_{4}-5 \bigr). \end{aligned}$$

Shor: \(f(x)=\max_{1\leq i\leq10}\{d_{i}\sum^{5}_{j=1}(x_{j}-c_{ij})^{2})\}\), \(y^{0}=\texttt {zeros(10,1)}\), \(a=\texttt {zeros(10,1)}\), \(b=3\), \(d=(1,5,10, 2, 4, 3, 1.7, 2.5, 6, 3.5)^{T}\),

$$C=\left ( \begin{matrix} 0& 2& 1& 1& 3& 0& 1& 1& 0&1\\ 0& 1& 2& 4& 2& 2& 1& 0& 0&1\\ 0& 1& 1& 1& 1& 1& 1& 1& 2&2\\ 0& 1& 1& 2& 0& 0& 1& 2& 1&0\\ 0& 3& 2& 2& 1& 1& 1& 1& 0&0 \end{matrix} \right ). $$

MAXL: \(f(x)=\max_{1\leq i\leq20} \vert x_{i} \vert \), \(a = (-\texttt {ones(1,10)}, \texttt {ones(1,10)})^{T}\), \(b=4\), \(y^{0}=(1,1.1,3, 1.1,5,1.1,7,1.1,9,1.1,-11,0.1,-13,0.1,-15,0.1,-17,0.1,-19,0.1)^{T}\).

The second set of two problems are:

Regular: \(F(x)=\sum_{i=1}^{n} \vert f_{i}(x) \vert +\frac{1}{2} \Vert x \Vert ^{2}\), where

$$f_{i}(x):=\bigl(ix_{i}^{2}-2x_{i}\bigr)+ \sum_{j=1}^{n}x_{j},\quad i=1,2, \ldots,n $$

are the Ferrier polynomials.

L-Mifflin: \(F(x)=2(x_{1}^{2}+x_{2}^{2}-1)+1.75 \vert x_{1}^{2}+x_{2}^{2}-1 \vert \).

The nonconvexity of the above two problems can be seen from Fig. 1.

Figure 1
figure 1

The 3D images of Regular and L-Mifflin

In the test, the parameters are selected as \(M=5\), \(R_{0}=10\), \(\kappa =0.3\), \(\epsilon=10^{-5}\), \(\Gamma=2\). The numerical results are reported in Tables 1, 2 and 3. The notations are: the dimension of problem n; the number of iterations NI; the number of descent steps ND; the number of function evaluations NF; the approximately optimal solution \(x^{*}\); the approximately optimal objective value \(F^{*}\). The comparisons between Algorithm 1 and PPBM (the algorithm in [27]) for the first seven problems are listed in Table 1. From Table 1, we see that Algorithm 1 performs better than PPBM. In Table 2, we compare our algorithm with RedistProx in [13] for problem Regular with various n. From Table 2, under the same conditions (terminates if NF is more than 300), we see that the approximately optimal values and accuracies of Algorithm 1 are better than RedistProx. Finally, in Table 3, we report the approximately optimal solutions and values for problem L-Mifflin with different starting points.

Table 1 Numerical results for the first set of seven problems
Table 2 Numerical results for “Regular” compared with RedistProx
Table 3 Numerical results for “Regular” and “L-Mifflin”

References

  1. Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10), 707–710 (2007)

    Article  Google Scholar 

  2. Hare, W., Sagastizábal, C., Solodov, M.V.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl. 63(1), 1–28 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dinh, Q.T., Diehl, M.: Proximal methods for minimizing the sum of a convex function and a composite function (2011). arXiv:1105.0276

  6. Eckstein, J., Svaiter, B.F.: A family of projective splitting methods for the sum of two maximal monotone operators. Math. Program. 111(1), 173–199 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141, 349–382 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  8. Kiwiel, K.C.: A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 48(3), 437–449 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kiwiel, K.C.: An alternating linearization bundle method for convex optimization and nonlinear multicommodity flow problems. Math. Program. 130(1), 59–84 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Li, D., Pang, L., Chen, S.: A proximal alternating linearization method for nonconvex optimization problems. Optim. Methods Softw. 29(4), 771–785 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Mine, H., Fukushima, M.: A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33(1), 9–23 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  12. Tuy, H., Tam, B.T., Dan, N.D.: Minimizing the sum of a convex function and a specially structured nonconvex function. Optimization 28, 237–248 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hare, W.L., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5), 2442–2473 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cheney, E.W., Goldstein, A.A.: Newton’s method for convex programming and Tchebycheff approximations. Numer. Math. 1, 253–268 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kelley, J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8, 703–712 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  16. Hare, W., Sagastizábal, C.A.: Computing proximal points of nonconvex functions. Math. Program. 116(1), 221–258 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  18. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lemaréchal, C.: An extension of Davidon methods to nondifferentiable problems. Math. Program. Stud. 3, 95–109 (1975)

    Article  MATH  Google Scholar 

  20. Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. Stud. 3, 145–173 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  21. Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.: Numerical Optimization: Theoretical and Practical Aspects, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  22. Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics, vol. 1133. Springer, Berlin (1985)

    MATH  Google Scholar 

  23. Kiwiel, K.C.: A method of centers with approximate subgradient linearizations for nonsmooth convex optimization. SIAM J. Optim. 18(4), 1467–1489 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  24. Kiwiel, K.C.: An algorithm for linearly constrained convex nondifferentiable minimization problems. J. Math. Anal. Appl. 105(2), 452–465 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lukšan, L., Vlček, J.: Test problems for nonsmooth unconstrained and linearly constrained optimization. Technical Report No. 798, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague (2000)

  26. Bagirov, A.M., Karmitsa, N., Mäkelä, M.: Introduction to Nonsmooth Optimization: Theory, Practice and Software. Springer, Cham (2014)

    Book  MATH  Google Scholar 

  27. Kiwiel, K.C.: A proximal-projection bundle method for Lagrangian relaxation, including semidefinite programming. SIAM J. Optim. 17(4), 1015–1034 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Project supported by the National Natural Science Foundation (11761013, 11771383) and Guangxi Natural Science Foundation (2013GXNSFAA019013, 2014GXNSFFA118001, 2016GXNSFDA380019) of China.

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript. CT mainly contributed to the algorithm design and convergence analysis; JL mainly contributed to the convergence analysis and numerical results; and JJ mainly contributed to the idea of the method and algorithm design.

Corresponding author

Correspondence to Jinbao Jian.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, C., Lv, J. & Jian, J. An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems. J Inequal Appl 2018, 101 (2018). https://doi.org/10.1186/s13660-018-1683-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-018-1683-1

Keywords