Inertial proximal alternating minimization for nonconvex and nonsmooth problems

In this paper, we study the minimization problem of the type \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L(x,y)=f(x)+R(x,y)+g(y)$\end{document}L(x,y)=f(x)+R(x,y)+g(y), where f and g are both nonconvex nonsmooth functions, and R is a smooth function we can choose. We present a proximal alternating minimization algorithm with inertial effect. We obtain the convergence by constructing a key function H that guarantees a sufficient decrease property of the iterates. In fact, we prove that if H satisfies the Kurdyka-Lojasiewicz inequality, then every bounded sequence generated by the algorithm converges strongly to a critical point of L.


Introduction
Nonconvex and nonsmooth optimization problems are extremely useful in many applied sciences, including statistics, machine learning, regression, classification, and so on. One of the most practical and classical optimization problems is of the form min x∈R n f (x) + g(x) .
(  ) In this paper, we study the problem in the nonconvex and nonsmooth setting, where f , g : R n → (-∞, ∞] are proper lower semicontinuous functions. We aim at finding the critical points of (with R being smooth) and possibly solving the corresponding minimization problem (). This can be seen by setting where ρ >  is a relaxation parameter. For problem (), we introduce a proximal alternating minimization algorithm with inertial effect and investigate the convergence of the generated iterates. Inertial proximal methods go back to [, ], where it has been noticed that the discretization of a differential system of second order in time gives rise to a generalization of the classical proximal-point algorithm. The main feature of the inertial proximal algorithm is that the next iterate is defined by using the last two iterates. Recently, there has been an increasing interest in algorithms with inertial effect; see [-].
Generally, we consider the problem min L(x, y) = min f (x) + R(x, y) + g(y) with x ∈ R n and y ∈ R m . In [], the authors proposed the alternating minimization algorithm ⎧ ⎨ ⎩ x k+ ∈ argmin{L(u, y k ) +  λ k ux k  : u ∈ R n }, which can be viewed as a proximal regularization of a two-block Gauss-Seidel method for minimizing L, Inspired by [], we propose the algorithm We need the following assumptions on the functions and parameters.
(H) f : R n → (-∞, ∞] and g : R m → (-∞, ∞] are proper lower semicontinuous functions; To prove the convergence of the algorithm under these assumptions, we construct a key function H, which is defined as in (). Based on H, we can obtain a sufficient decrease property of the iterates, the existence of a subgradient lower bound for the iterate gap, and some important analytic features of the objective function. Finally, we can prove that every bounded sequence generated by the algorithm converges to a critical point of L if H satisfies the Kurdyka-Lojasiewicz inequality.
The rest of the paper is arranged as follows. In Section , we recall some elementary notions and facts of nonsmooth nonconvex analysis. In Section , we present a detailed proof of the convergence of the algorithm. In Section , we give a brief conclusion.

Preliminaries
In this section, we recall some definitions and results. Let N be the set of nonnegative integers. For m ≥ , the Euclidean scalar product and induced norm on R m are denoted by ·, · and · , respectively.
The domain of a function f : whereas for x / ∈ dom f , we take ∂f (x) := ∅.
It is known that both notions of subdifferentials coincide with the convex subdifferential The Fermat rule reads in this nonsmooth setting as follows: if x ∈ R m is a local minimizer of f , then  ∈ ∂f (x).
Denote by the set of (limiting) critical points of f . Let us mention also the following subdifferential rule: if f : R m → (-∞, ∞] is proper lower semicontinuous and g : R m → R is a continuously differentiable function, then We also denote dist(x, ) = inf y∈ xy for x ∈ R m and ⊂ R m . Now let us recall the Kurdyka-Lojasiexicz property, which plays an important role in the proof of the convergence of our algorithm.  (ii) ϕ is continuously differentiable on (, η) and continuous at ; If f satisfies the KL property at each point in dom ∂f , then we call f a KL function.
It is worth mentioning that many functions in applied science are the KL functions (see []). In fact, semialgebraic functions, real subanalytic functions, semiconvex functions, and uniformly convex functions are all KL functions.
The following result (see [], Lemma ) is crucial to our convergence analysis.

Lemma . Let
we have the KL inequality: We need the following two lemmas. The first one was often used in the context of Fejer monotonicity techniques for proving convergence results of classical algorithms for convex optimization problems or, more generally, for monotone inclusion problems (see []). The second one is easy to verify (see []).
Lemma . Let {a n } n∈N and {b n } n∈N be real sequences such that b n ≥  for all n ∈ N, {a n } n∈N is bounded below, and a n+ + b n ≤ a n for all n ∈ N. Then {a n } n∈N is a monotonically decreasing and convergent sequence, and n∈N b n < +∞.
Lemma . Let {a n } n∈N and {b n } n∈N be nonnegative real sequences such that n∈N b n < ∞ and a n+ ≤ a · a n + b · a n- + b n for all n ≥ , where a ∈ R, b ≥ , and a + b < . Then n∈N a n < ∞.

The convergence of the algorithm
In this section, we prove the convergence of our algorithm. Motivated by [] and [], we divide the proof into three main steps, which are listed in the following three subsections, respectively.
We always use {(x k , y k )} as the sequence generated by ()-().

A sufficient decrease property of the iterates
In this subsection, we construct the key function H and prove that the iterates have a sufficient decrease property.
Proof Assumption (H) indicates that, for any r  , r  >  and (x,ȳ), (x,ŷ) ∈ R n × R m , the functions . Using the definition of x k+ and y k+ in () and (), we have This leads to Clearly assumption (H) implies that An elementary verification shows that m  > m  >  under assumption (H).
Remark . Based on Lemma ., we can define the new function  H(x, y) the key function.
More precisely, we have the following lemma.

Lemma . Let H(z, w) be defined as in (). Then under assumptions (H)-(H), we have
where z k = (x k , y k ), that is, the sequence {H(z k+ , z k )} is decreasing.
Proof Set m := m m  > . Then the result follows directly from () or ().

Norm estimate of the subdifferential of H
In this subsection, we prove that there exists a subgradient lower bound for the iterate gap. First, we estimate the norm of the subdifferential of L.

Lemma . Define
Then, under assumptions (H)-(H), (p k+ , q k+ ) ∈ ∂L(x k+ , y k+ ). Moreover, if {(x k , y k )} is bounded, then there exists a positive constant C  >  such that Proof According to the definition of x k+ and y k+ and the Fermat rule, we get Thus and ∈ ∇ y R(x k+ , y k+ ) + ∂g(y k+ ) = ∂ y L(x k+ , y k+ ).
Using assumption (H), we obtain that where is the Lipschitz constant of ∇R(x, y) on the bounded set {(x k , y k )}. Hence the norm estimate can be immediately derived.
The norm estimate of the subdifferential of H is a direct consequence of Lemma .. For all k ∈ N, H(z, w) has a subdifferential at (z k+ , z k ) of the form

Lemma .
Moreover, there exists a positive constant C  >  such that Proof According to the definition of H(z, w), we get The rest is immediately obtained.
The norm estimate, together with the closeness of the limiting subdifferential, is used to obtain the following convergence of the subsequence of {x k , y k }.

Lemma . (Preconvergence result)
Under assumptions (H)-(H), we have the following statements: . Then Lemma . gives a k+ + b k ≤ a k . Then assumption (H) ensures that a n is bounded below. Thus Lemma . implies (i) and (ii). Moreover, the definition of H(z, w) yields that Thus (iii) is derived from (i) and (ii).
On the other hand, the definition of x k+ shows that from which we get Hence where we have used assumption (H) and replaced x k , y k by x k j , y k j . Due to the fact that x k+x k →  from (i), we have x k j +x k j → . This, together with x k jx * → , yields x k j +x * → . Using the continuity of R(x, y) by assumption (H), the last inequality yields Therefore In a similar way, we can prove that lim j→∞ g(y k j ) = g(y * ). Combining with the continuity of R(x, y), we immediately obtain that On the other hand, Lemma .(i) and Lemma . give (p k j + , q k j + ) ∈ ∂L(x k j + , y k j + ) and (p k j + , q k j + ) → , j → ∞. Thus the closeness of the limiting subdifferential (see ()) indicates that  ∈ ∂L(x * , y * ).

Analytic property of the key function H
Denote by the set of the cluster points of the sequence {(z k+ , z k )}. Thus crit H = {(z * , z * ) : z * = (x * , y * ) ∈ crit L}, and hence ⊂ crit H.

Theorem . (Convergence) Assume that H(z, w) is a KL function and that the sequence
{(x k , y k )} is bounded. Then, under assumptions (H)-(H), we have Proof According to Lemma ., we consider an element (x * , y * ) ∈ crit L(x, y) such that (z * , z * ) ∈ , where z * = (x * , y * ). From the previous proof we can easily obtain that lim k→∞ H(z k+ , z k ) = H(z * , z * ). Next, we prove the theorem in two cases.
Case . There exists a positive integer k  such that H(z k  + , z k  ) = H(z * , z * ).
Since {H(z k+ , z k )} is decreasing, we know that H(z k+ , z k ) = H(z * , z * ) for all k ≥ k  . This, together with the definition of H, shows that z k = z k  for all k ≥ k  , and the desired results follow.
Since H satisfies the KL property, Lemma . says that there exist , η >  and a concave function ϕ such that (i) ϕ() = ; (ii) ϕ is continuously differentiable on (, η) and continuous at ; (iii) ϕ (s) >  for all s ∈ (, η); (iv) for all Due to the concavity of ϕ, By Lemma . there exist a point ω k+ ∈ ∂H(z k+ , z k ) defined as in () and a positive constant C  >  such that From Lemma . we have H(z k , z k- ) -H(z k+ , z k ) ≥ m z k+z k  . Thus Set b k = C  m (ϕ(H(z k , z k- ) -H(z * , z * ))ϕ(H(z k+ , z k ) -H(z * , z * ))) ≥ , a k = z kz k- ≥ . Then () can be equivalently rewritten as Since ϕ ≥ , we know that N k= b k ≤ C  m ϕ H(z  , z  ) -H z * , z * , ∀N ∈ N, and hence ∞ k= b k < ∞. Note that from () we have So Lemma . gives that ∞ k= a k < ∞, that is, ∞ k= z kz k- < ∞, which is equivalent to ∞ k= ( x kx k- + y ky k- ) < ∞. This indicates that {z k } is a Cauchy sequence. So {z k } = {(x k , y k )} is convergent. Let (x k , y k ) → (x * , y * ), k → ∞. According to Lemma .(iv), it is clear that (x * , y * ) is a critical point of H.

Conclusion
In this paper, we present a proximal alternating minimization algorithm with inertial effect for the minimization problem of the type L(x, y) = f (x) + R(x, y) + g(y), where f and g are both nonconvex nonsmooth functions, and R is a smooth function. We prove that every bounded sequence generated by the algorithm converges to a critical point of L.
The key point is to construct a function H (see ()) that satisfies the Kurdyka-Lojasiewicz inequality. It is worth mentioning that assumption (H) requires max{α, β} · max{λ + , μ + } <   , which can be achieved by appropriate choice of the parameters.