Inertial proximal alternating minimization for nonconvex and nonsmooth problems
 Yaxuan Zhang^{1}Email authorView ORCID ID profile and
 Songnian He^{1}
https://doi.org/10.1186/s136600171504y
© The Author(s) 2017
Received: 25 April 2017
Accepted: 8 September 2017
Published: 20 September 2017
Abstract
In this paper, we study the minimization problem of the type \(L(x,y)=f(x)+R(x,y)+g(y)\), where f and g are both nonconvex nonsmooth functions, and R is a smooth function we can choose. We present a proximal alternating minimization algorithm with inertial effect. We obtain the convergence by constructing a key function H that guarantees a sufficient decrease property of the iterates. In fact, we prove that if H satisfies the KurdykaLojasiewicz inequality, then every bounded sequence generated by the algorithm converges strongly to a critical point of L.
Keywords
1 Introduction
For problem (2), we introduce a proximal alternating minimization algorithm with inertial effect and investigate the convergence of the generated iterates. Inertial proximal methods go back to [1, 2], where it has been noticed that the discretization of a differential system of second order in time gives rise to a generalization of the classical proximalpoint algorithm. The main feature of the inertial proximal algorithm is that the next iterate is defined by using the last two iterates. Recently, there has been an increasing interest in algorithms with inertial effect; see [3–12].
 (H1):

\(f: \mathbb {R}^{n}\to(\infty,\infty]\) and \(g: \mathbb {R}^{m}\to (\infty,\infty]\) are proper lower semicontinuous functions;
 (H2):

\(R: \mathbb {R}^{n}\times\mathbb {R}^{m}\to\mathbb {R}\) is a continuously differentiable function;
 (H3):

∇R is Lipschitz continuous on bounded subsets of \(\mathbb {R}^{n}\times\mathbb {R}^{m}\);
 (H4):

\(\inf L>\infty\);
 (H5):

\(0<\mu_{}\leq\mu_{k}\leq\mu_{+}, 0<\lambda_{}\leq\lambda_{k}\leq \lambda_{+}, 0\leq\alpha_{k}\leq\alpha, 0\leq\beta_{k}\leq\beta\);
 (H6):

\(\sigma>\max\{\alpha,\beta\}\cdot\max\{\lambda_{+},\mu_{+}\} \cdot(\sigma^{2}+1)\).
To prove the convergence of the algorithm under these assumptions, we construct a key function H, which is defined as in (11). Based on H, we can obtain a sufficient decrease property of the iterates, the existence of a subgradient lower bound for the iterate gap, and some important analytic features of the objective function. Finally, we can prove that every bounded sequence generated by the algorithm converges to a critical point of L if H satisfies the KurdykaLojasiewicz inequality.
The rest of the paper is arranged as follows. In Section 2, we recall some elementary notions and facts of nonsmooth nonconvex analysis. In Section 3, we present a detailed proof of the convergence of the algorithm. In Section 4, we give a brief conclusion.
2 Preliminaries
In this section, we recall some definitions and results. Let \(\mathbb {N}\) be the set of nonnegative integers. For \(m\geq1\), the Euclidean scalar product and induced norm on \(\mathbb {R}^{m}\) are denoted by \(\langle\cdot ,\cdot\rangle\) and \(\Vert \cdot \Vert \), respectively.
It is known that both notions of subdifferentials coincide with the convex subdifferential if f is convex, that is, \(\hat{\partial}f(x)=\partial f(x)=\{v\in\mathbb {R}^{m}:f(y)\geq f(x)+\langle v,yx\rangle, \forall y\in\mathbb {R}^{m}\}\). Notice that if f is continuously differentiable around \(x\in\mathbb {R}^{m}\), then we have \(\partial f(x)=\{\nabla f(x)\}\). Generally, the inclusion \(\hat{\partial}f(x)\subset\partial f(x)\) holds for each \(x\in\mathbb {R}^{m}\).
We also denote \(\operatorname{dist}(x,\Omega)=\inf_{y\in\Omega} \Vert xy \Vert \) for \(x\in\mathbb {R}^{m}\) and \(\Omega\subset\mathbb {R}^{m}\).
Now let us recall the KurdykaLojasiexicz property, which plays an important role in the proof of the convergence of our algorithm.
Definition 2.1
KurdykaLojasiexicz property; see [13, 16]
 (i)
\(\varphi(0)=0\);
 (ii)
φ is continuously differentiable on \((0,\eta)\) and continuous at 0;
 (iii)
\(\varphi'(s)>0\) for all \(s\in(0,\eta)\);
 (iv)for all \(x\in U\cap\{x\in\mathbb {R}^{m}: f(\bar{x})< f(x)< f(\bar{x})+\eta\}\), we have the KL inequality:If f satisfies the KL property at each point in \(\operatorname{dom} \partial f\), then we call f a KL function.$$ \varphi' \bigl(f(x)f(\bar{x}) \bigr) \operatorname{dist} \bigl(0, \partial f(x) \bigr)\geq1. $$(7)
It is worth mentioning that many functions in applied science are the KL functions (see [16]). In fact, semialgebraic functions, real subanalytic functions, semiconvex functions, and uniformly convex functions are all KL functions.
The following result (see [16], Lemma 6) is crucial to our convergence analysis.
Lemma 2.1
 (i)
\(\varphi(0)=0\);
 (ii)
φ is continuously differentiable on \((0,\eta)\) and continuous at 0;
 (iii)
\(\varphi'(s)>0\) for all \(s\in(0,\eta)\);
 (iv)for all \(\bar{x}\in\Omega\) and all \(x\in\{x\in\mathbb {R}^{m}: \operatorname{dist}(x,\Omega)<\epsilon\}\cap\{x\in\mathbb {R}^{m}: f(\bar{x})<f(x)<f(\bar{x})+\eta\}\), we have the KL inequality:$$ \varphi' \bigl(f(x)f(\bar{x}) \bigr) \operatorname{dist} \bigl(0, \partial f(x) \bigr)\geq1. $$(8)
We need the following two lemmas. The first one was often used in the context of Fejer monotonicity techniques for proving convergence results of classical algorithms for convex optimization problems or, more generally, for monotone inclusion problems (see [17]). The second one is easy to verify (see [12]).
Lemma 2.2
Let \(\{a_{n}\}_{n\in\mathbb {N}}\) and \(\{b_{n}\}_{n\in\mathbb {N}}\) be real sequences such that \(b_{n}\geq0\) for all \(n\in\mathbb {N}\), \(\{a_{n}\} _{n\in\mathbb {N}}\) is bounded below, and \(a_{n+1}+b_{n}\leq a_{n}\) for all \(n\in\mathbb {N}\). Then \(\{a_{n}\}_{n\in\mathbb {N}}\) is a monotonically decreasing and convergent sequence, and \(\sum_{n\in\mathbb {N}}b_{n}<+\infty\).
Lemma 2.3
Let \(\{a_{n}\}_{n\in\mathbb {N}}\) and \(\{b_{n}\}_{n\in\mathbb {N}}\) be nonnegative real sequences such that \(\sum_{n\in\mathbb {N}} b_{n}<\infty\) and \(a_{n+1}\leq a\cdot a_{n}+b\cdot a_{n1}+b_{n}\) for all \(n\geq1\), where \(a\in\mathbb {R}, b\geq0\), and \(a+b<1\). Then \(\sum_{n\in\mathbb {N}} a_{n}<\infty\).
3 The convergence of the algorithm
In this section, we prove the convergence of our algorithm. Motivated by [11] and [13], we divide the proof into three main steps, which are listed in the following three subsections, respectively.
We always use \(\{(x_{k},y_{k})\}\) as the sequence generated by (3)(4).
3.1 A sufficient decrease property of the iterates
In this subsection, we construct the key function H and prove that the iterates have a sufficient decrease property.
Lemma 3.1
Proof
An elementary verification shows that \(m_{2}>m_{1}>0\) under assumption (H6). □
Remark 3.1
More precisely, we have the following lemma.
Lemma 3.2
3.2 Norm estimate of the subdifferential of H
In this subsection, we prove that there exists a subgradient lower bound for the iterate gap. First, we estimate the norm of the subdifferential of L.
Lemma 3.3
Proof
Hence the norm estimate can be immediately derived. □
The norm estimate of the subdifferential of H is a direct consequence of Lemma 3.3.
Lemma 3.4
Proof
The norm estimate, together with the closeness of the limiting subdifferential, is used to obtain the following convergence of the subsequence of \(\{x_{k},y_{k}\}\).
Lemma 3.5
Preconvergence result
 (i)
\(\sum_{k=1}^{\infty} \Vert z_{k+1}z_{k} \Vert ^{2}<\infty\); particularly, \(\Vert x_{k+1}x_{k} \Vert \to0, \Vert y_{k+1}y_{k} \Vert \to0, k\to\infty\);
 (ii)
the sequence \(\{L(x_{k},y_{k})\}\) is convergent;
 (iii)
the sequence \(\{H(z_{k+1},z_{k})\}\) is convergent;
 (iv)
if \(\{(x_{k},y_{k})\}\) has a cluster point \((x^{*},y^{*})\), then \((x^{*},y^{*})\in\operatorname{crit} L\).
Proof
On the other hand, Lemma 3.5(i) and Lemma 3.3 give \((p_{k_{j}+1},q_{k_{j}+1})\in\partial L(x_{k_{j}+1},y_{k_{j}+1})\) and \((p_{k_{j}+1},q_{k_{j}+1})\to0, j\to\infty\). Thus the closeness of the limiting subdifferential (see (6)) indicates that \(0\in \partial L(x^{*},y^{*})\). □
3.3 Analytic property of the key function H
Denote by Ω the set of the cluster points of the sequence \(\{ (z_{k+1},z_{k})\}\).
Lemma 3.6
 (i)
Ω is nonempty, compact, and connected. Moreover, \(\operatorname{dist}((z_{k+1},z_{k}),\Omega)\to0\) as \(k\to\infty\).
 (ii)
\(\Omega\subset\operatorname{crit} H=\{(z^{*},z^{*}):z^{*}=(x^{*},y^{*})\in\operatorname{crit} L\}\).
 (iii)
H is finite and constant on Ω.
Proof
(i) It is easy to check by some elementary tools (see, e.g., [16]).
Theorem 3.1
Convergence
 (i)
\(\sum_{k=1}^{\infty} \Vert z_{k}z_{k1} \Vert <\infty\), that is, \(\sum_{k=1}^{\infty}( \Vert x_{k}x_{k1} \Vert + \Vert y_{k}y_{k1} \Vert )<\infty\);
 (ii)
\(\{(x_{k},y_{k})\}\) converges to a critical point \((x^{*},y^{*})\) of \(L(x,y)\).
Proof
According to Lemma 3.6, we consider an element \((x^{*},y^{*})\in\operatorname{crit} L(x,y)\) such that \((z^{*},z^{*})\in \Omega\), where \(z^{*}=(x^{*},y^{*})\). From the previous proof we can easily obtain that \(\lim_{k\to\infty }H(z_{k+1},z_{k})=H(z^{*},z^{*})\). Next, we prove the theorem in two cases.
Case 1. There exists a positive integer \(k_{0}\) such that \(H(z_{k_{0}+1},z_{k_{0}})=H(z^{*},z^{*})\) .
Since \(\{H(z_{k+1},z_{k})\}\) is decreasing, we know that \(H(z_{k+1},z_{k})=H(z^{*},z^{*})\) for all \(k\geq k_{0}\). This, together with the definition of H, shows that \(z_{k}=z_{k_{0}}\) for all \(k\geq k_{0}\), and the desired results follow.
Case 2. \(H(z_{k+1},z_{k})>H(z^{*},z^{*})\) for all \(k\in\mathbb {N}\) .
 (i)
\(\varphi(0)=0\);
 (ii)
φ is continuously differentiable on \((0,\eta)\) and continuous at 0;
 (iii)
\(\varphi'(s)>0\) for all \(s\in(0,\eta)\);
 (iv)for allwe have$$\begin{aligned} (z,w)\in{}& \bigl\{ (z,w)\in\mathbb {R}^{n}\times\mathbb {R}^{m}: \operatorname{dist} \bigl((z,w),\Omega \bigr)< \epsilon \bigr\} \\ &{}\cap \bigl\{ (z,w)\in\mathbb {R}^{n}\times\mathbb {R}^{m}: H \bigl(z^{*},z^{*} \bigr)< H(z,w)< H \bigl(z^{*},z^{*} \bigr)+\eta \bigr\} , \end{aligned}$$(18)Notice that \(H(z_{k+1},z_{k})\to H(z^{*},z^{*})\), \(k\to\infty\), and \(H(z_{k+1},z_{k})>H(z^{*},z^{*})\). Let \(k_{1}\) be such that \(H(z^{*},z^{*})< H(z_{k+1},z_{k})< H(z^{*},z^{*})+\eta\) for all \(k\geq k_{1}\). By Lemma 3.6(i) there exists \(k_{2}\) such that \(\operatorname{dist}((z_{k+1},z_{k}),\Omega)<\epsilon\) for all \(k\geq k_{2}\). Take \(k_{3}=\max\{k_{1},k_{2}\}\). Then, for \(k\geq k_{3}\), \(\{(z_{k+1},z_{k})\}\) belongs to the intersection in (18). Hence$$\varphi' \bigl(H(z,w)H \bigl(z^{*},z^{*} \bigr) \bigr) \operatorname{dist} \bigl(0, \partial H(z,w) \bigr)\geq1. $$$$\varphi' \bigl(H(z_{k+1},z_{k})H \bigl(z^{*},z^{*} \bigr) \bigr)\operatorname{dist} \bigl(0,\partial H(z_{k+1},z_{k}) \bigr) \geq1, \quad \forall k\geq k_{3}. $$
4 Conclusion
Declarations
Acknowledgements
The authors would like to express great thanks to the referees for their valuable comments, which notably improved the presentation of this manuscript. The authors also thank Professor Qiaoli Dong for her helpful advice.
Funding
This research was supported by National Natural Science Foundation of China (No. 61503385), and the Science Research Foundation in CAUC (No. 2011QD02S).
Authors’ contributions
All the authors contributed, read, and approved this manuscript.
Competing interests
We confirm that we have read SpringerOpen’s guidance on competing interests and none of the authors has any financial and nonfinancial competing interests in the manuscript.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Alvarez, F: On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 11021119 (2000) MathSciNetView ArticleMATHGoogle Scholar
 Alvarez, F, Attouch, H: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. SetValued Anal. 9, 311 (2001) MathSciNetView ArticleMATHGoogle Scholar
 Mainge, PE, Noudafi, A: Convergence of new inertial proximal methods for DC programming. SIAM J. Optim. 19(1), 397413 (2008) MathSciNetView ArticleMATHGoogle Scholar
 Beck, A, Teboulle, M: A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183202 (2009) MathSciNetView ArticleMATHGoogle Scholar
 Ochs, P, Chen, Y, Brox, T, Pock, T: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7, 13881419 (2014) MathSciNetView ArticleMATHGoogle Scholar
 Bot, RI, Csetnek, ER, Hendrich, C: Inertial DouglasRachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472487 (2015) MathSciNetMATHGoogle Scholar
 Bot, RI, Csetnek, ER: In inertial alternating direction method of multipliers. Minimax Theory Appl. 1, 2949 (2016) MathSciNetMATHGoogle Scholar
 Chambolle, A, Dossal, C: On the convergence of the iterates of the ‘fast iterative shrinkage/thresholding algorithm’. J. Optim. Theory Appl. 166, 968982 (2016) MathSciNetView ArticleMATHGoogle Scholar
 Chen, C, Ma, S, Yang, J: A general inertial proximal for mixed variational inequality problem. SIAM J. Optim. 25, 21202142 (2015) MathSciNetView ArticleMATHGoogle Scholar
 Dong, QL, Lu, YY, Yang, JF: The extragradient algorithm with inertial effects for solving the variational inequality. Optimization 65(12), 22172226 (2016) MathSciNetView ArticleMATHGoogle Scholar
 Bot, RI, Csetnek, ET, Laszlo, SC: An inertial forwardbackward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4(1), 123 (2016) MathSciNetView ArticleMATHGoogle Scholar
 Bot, RI, Csetnek, ER: An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems. J. Optim. Theory Appl. 171(2), 600616 (2016) MathSciNetView ArticleMATHGoogle Scholar
 Attouch, H, Bolte, J, Redont, P, Soubeyran, A: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the KurdykaLojasiewicz inequality. Math. Oper. Res. 35(2), 428457 (2010) View ArticleMATHGoogle Scholar
 Mordukhovich, B: Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications. Springer, Berlin (2006) Google Scholar
 Rochafellar, RT, Wets, RJB: Variational Analysis. Fundamental Principles of Mathematical Sciences, vol. 317. Springer, Berlin (1998) View ArticleGoogle Scholar
 Bolte, J, Sabach, S, Teboulle, M: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program., Ser. A 146(12), 459494 (2014) MathSciNetView ArticleMATHGoogle Scholar
 Bauschke, HH, Combettes, PL: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York (2011) View ArticleMATHGoogle Scholar