- Research
- Open access
- Published:
A modified inertial proximal alternating direction method of multipliers with dual-relaxed term for structured nonconvex and nonsmooth problem
Journal of Inequalities and Applications volume 2024, Article number: 117 (2024)
Abstract
In this research, we introduce a novel optimization algorithm termed the dual-relaxed inertial alternating direction method of multipliers (DR-IADM), tailored for handling nonconvex and nonsmooth problems. These problems are characterized by an objective function that is a composite of three elements: a smooth composite function combined with a linear operator, a nonsmooth function, and a mixed function of two variables. To facilitate the iterative process, we adopt a straightforward parameter selection approach, integrate inertial components within each subproblem, and introduce two relaxed terms to refine the dual variable update step. Within a set of reasonable assumptions, we establish the boundedness of the sequence generated by our DR-IADM algorithm. Furthermore, leveraging the Kurdyka–Łojasiewicz (KŁ) property, we demonstrate the global convergence of the proposed method. To validate the practicality and efficacy of our algorithm, we present numerical experiments that corroborate its performance. In summary, our contribution lies in proposing DR-IADM for a specific class of optimization problems, proving its convergence properties, and supporting the theoretical claims with numerical evidence.
1 Introduction
The present paper deals with the following nonconvex and nonsmooth problem as in [5]:
where \(F:{R}^{p} \rightarrow {R}\) is a continuously Lipschitz differentiable function, \(G:{R}^{q} \rightarrow {R} \cup \{+\infty \}\) is a proper and lower semicontinuous function, \(H: R^{m} \times R^{q} \rightarrow R \) is a Frechet differentiable function with Lipschitz continuous gradient, and \(A: R^{m} \rightarrow R^{p}\) is a linear operator. Many application problems can be modeled as (1.1), e.g., in compressed sensing [2, 9], matrix factorization [4], sparse approximations of signals and images [16, 19], and so on.
Obviously, when \(m=p\) and A is the identity operator, (1.1) can be written as
A general method for addressing problem (1.2) is the alternating minimization method, as mentioned in the literature [3, 17, 22]. In the context of nonconvex and nonsmooth problems, Bolte et al. investigated a proximal alternating linearized minimization (PALM) algorithm in [4]. Following this, Driggs et al. introduced a generic stochastic variant of the PALM algorithm in [10], which allows for various variance-reduced gradient approximations. The PALM algorithm is essentially a blockwise implementation of the well-known proximal forward–backward algorithm, as referenced in [8, 13],
where \(H:R^{m} \to R\) is Frechet differentiable and possesses a Lipschitz continuous gradient. In the convex case, the alternating direction method of multipliers (ADMM) [1, 3] and linearized ADMMi [11, 18, 23, 24] have proven to be highly effective in solving problem (1.3). Following that, Bot et al. [6] introduced a proximal linearized ADMM algorithm, and Liu et al., as seen in [14], presented a two-block linearized ADMM and a multi-block parallel linearized ADMM for the nonconvex case.
For the problem denoted by equation (1.1), Bot [5] converted it into a three-block nonseparable problem by introducing an additional variable:
The augmented Lagrangian function \({L_{\beta }}:{R^{m}} \times {R^{q}} \times {R^{p}} \times {R^{p}} \to R \cup \{ + \infty \} \) associated with problem (1.4) reads
where u is the Lagrangian multiplier and β is the penalty parameter. Bot gave a proximal minimization algorithm (PMA) to solve it in [5], which takes the following iterative form:
where \(\tau > 0\), \(0 < \sigma < 1\). In [5], sufficient conditions are established to ensure that the sequence generated is bounded, and it is demonstrated that the global convergence is achieved in accordance with the Kurdyka–Łojasiewicz inequality.
Recently, numerous scholars have integrated the inertial effect with ADMM for various nonconvex problems to enhance convergence [15, 25]. For example, Le et al. [12] introduced an inertial Alternating Direction Method of Multipliers (iADMM) tailored for tackling a category of nonconvex, nonsmooth multi-block composite optimization challenges characterized by linear constraints. In [23], an inertial proximal partially symmetric ADMM was introduced by Wang for tackling linearly constrained multi-block nonconvex separable optimization problems. This method involves updating the Lagrange multiplier not once but twice and incorporates distinct relaxation factors [20, 21] at every iteration. For the problem (1.3), Chao et al. [7] combined the inertial technique with ADMM and employed the KŁ assumption to achieve global convergence in the nonconvex setting.
Motivated by the aforementioned algorithms, we are poised to present a novel approach in this document. This approach is a dual-relaxed variant of the inertial proximal alternating direction method of multipliers, tailored for addressing the challenges posed by nonconvex and nonsmooth problems, specifically referring to problem (1.1). The key contributions of this paper are delineated as follows:
(1) In contrast to the approach described in [5], our algorithm integrates the fundamental concepts of the ADMM with an inertial component applied uniformly across all subproblems, rather than selectively to certain subproblems. This strategic implementation significantly enhances convergence rates.
(2) In contrast to the conventional ADMM or its variants, our proposed algorithm introduces two relaxation terms (instead of merely one) during the dual variable update phase, which consequently establishes a novel iterative dynamic for the dual ascent procedure.
(3) We provide straightforward sufficient conditions for the boundedness of the sequence generated by our algorithm. Unlike other studies, there is no need to assume that the sequence generated by the algorithm is bounded a priori.
The structure of the paper is as follows. In Sect. 2, we compile a collection of useful definitions and findings that will serve as a foundation for our convergence analysis. In Sect. 3, we introduces a novel weak inertial proximal minimization algorithm and delves into its convergence properties. A numerical experiment aimed at validating the efficacy of our proposed algorithm is conducted in the fourth section. The paper concludes with a summary of key points in the fifth section.
2 Notation and preliminaries
In the following, \({R^{n}}\) stands for the n-dimensional Euclidean space,
where T stands for the transpose operation. For a set \(S \subset {R^{n}}\) and a point \(x \in {R^{n}}\), let \(\operatorname{dist}(x,S) = \inf _{y \in S} { \Vert {y - x} \Vert ^{2}}\). If \(S = \emptyset \), we set \(\operatorname{dist}(x, S)=+\infty \) for \(\forall x \in {R^{n}}\).
Definition 2.1
(Lipschitz differentiability)
Function \(f(x)\) is said to be \(L_{f}\) Lipschitz differentiable if for all x, y we have
Lemma 2.1
([17] (Descent lemma))
Let \(f :{R}^{n}\rightarrow \mathrm{{R} }\) be Frechet differentiable such that its gradient is Lipschitz continuous with constant \(\ell > 0\). Then, for \(\forall x,y\in{{R}^{n}}\) and \(z= \{ (1-t )x+ty:t\in [0,1 ] \} \in [x,y ]\), it holds that
Lemma 2.2
([23])
Suppose the sequence of real numbers \(\{ a_{k} \}_{k \ge 0}\) is bounded from below, \(\{ b_{k} \}_{k \ge 0}\) is a sequence of real nonnegative numbers, and for \(\forall k \ge 0\),
Then the following statements are valid:
(i) The sequence \(\{ a_{k} \}_{k \ge 0}\) is monotonically decreasing and convergent.
(ii) The sequence \(\{ b_{k} \}_{k \ge 0}\) is summable, namely \(\sum_{k \ge 0} b_{k}<\infty \).
Lemma 2.3
([23])
Let \(\{a_{k} \}_{k \in N}\) and \(\{b_{k} \}_{k \in N}\) be nonnegative real sequences such that \(\sum_{k \in N} b_{k}<\infty \) and \(a_{k+1} \leq a \cdot a_{k}+b \cdot a_{k-1}+b_{k}\), for \(\forall k \geq 1\), where \(a \in R\), \(b \geq 0\) and \(a+b<1\). Then \(\sum_{k \in N} a_{k}<\infty \).
We proceed to introduce a function that exhibits the Kurdyka–Łojasiewicz property. This particular class of functions will be integral in establishing the convergence outcomes for our recommended algorithm.
Definition 2.2
([2])
Let \(\eta \in (0,+\infty ]\). We use \(\Phi _{\eta}\) to denote the set of all concave and continuous functions \(\varphi :[0, \eta ) \rightarrow [0,+\infty )\). A function φ belonging to the set \(\Phi _{\eta}\) for \(\eta \in (0,+\infty ]\) is called a desingularization function if it satisfies the following conditions:
(i) \(\varphi (0)=0\).
(ii) φ is continuously differentiable on \((0, \eta )\) and continuous at 0.
(iii) \(\varphi ^{\prime}(s)>0\) for any \(s \in (0, \eta )\).
Definition 2.3
([2] (Kurdyka–Łojasiewicz property))
Let \(f: {R}^{n} \rightarrow {R} \cup \{+\infty \}\) be proper and lower semicontinuous. The function f is said to have the Kundyka–Łojasiewicz (KŁ) property at a point \(\hat{v} \in \operatorname{dom} \partial f:= \{v \in {R}^{n}: \partial f(v) \neq \emptyset \}\) if there exists \(\eta \in (0,+\infty ]\), a neighborhood V of v̂, and a function \(\varphi \in f_{\eta}\) such that
for any
If f satisfies the KŁproperty at each point of \(\operatorname{dom} \partial f\), then f is called a KŁ function. Next, we recall the following result which is called the uniformized KŁproperty.
Lemma 2.4
([2] (Uniformized KŁproperty))
Let Ω be a compact set and \(f: {R}^{n} \rightarrow {R} \cup \{+\infty \}\) be a proper and lower semicontinuous function. Assume that f is constant on Ω and satisfies the KŁ property at each point of Ω. Then there exist \(\varepsilon >0\), \(\eta >0\), and \(\varphi \in f_{\eta}\) such that
for any \(\hat{v} \in \Omega \) and every element v in the intersection
Definition 2.4
([2] (Subdifferentials))
Let \(f: {R}^{n} \rightarrow (-\infty ,+\infty ]\) be a proper and lower semicontinuous function. Suppose
When \(x \notin \operatorname{dom} f\), we set \(\widehat{\partial} f(x)=\emptyset \).
(ii) The limiting-subdifferential, or simply the subdifferential, of f at \(x \in {R}^{n}\), written \(\partial f(x)\), is defined through the following closure process \(\partial f(x):= \{u \in {R}^{n}: \exists x^{k} \rightarrow x, f (x^{k} ) \rightarrow f(x) \text{ and } u^{k} \in \widehat{\partial} f (x^{k} ) \rightarrow u\text{ as } k \rightarrow \infty \} \).
3 Algorithm and its convergence
In this section, we put forward a synchronized approach for solving the optimization problem (1.1) through an inertial proximal minimization algorithm with dual relaxation and subsequently examine its convergence properties.
Algorithm 3.1
Let α, β, \(\tau >0\), \(0<\theta <1\). For the starting points \(({x^{0}},{y^{0}},{z^{0}})=(x^{{1}}, y^{{1}},{z^{{1}}})\in {R^{m}} \times {R^{q}} \times {R^{p}}\) and \({u^{\mathrm{{1}}}} \in {R^{p}}\). The sequence \(\{ (x^{k}, y^{k}, z^{k}, u^{k} ) \}_{k \geq 0}\) for \(\forall k \geq 1\) is generated by:
where
Remark 3.1
Inertial terms \(\tau \Vert \cdot \Vert ^{2}\) are added into the y-, z-, and x- subproblem, respectively, and there exist two relaxed terms \(\beta ({z^{k + 1}}-A{x^{k + 1}} )\) and \(2\tau ({z^{k + 1}} - z_{z}^{k})\) in the dual update step in (3.1d). Hence we call our algorithm as dual-relaxed inertial proximal ADMM.
We will analyze the convergence of Algorithm 3.1 under the following assumptions:
Assumption A
(i) The function F is Lipschitz differentiable, i.e.,
(ii) The function \(L_{\beta}\) is bounded from below and there exists a constant \(\underline{L}\) such that
(iii) For any fixed \(y \in {R^{q}}\), there exists \({\ell _{1}}(y) \ge 0\) such that
Furthermore, there exists \({\ell _{1, + }} > 0\) such that \(\mathop {\sup } _{y \in {R^{q}}} {\ell _{1}}(y) \le {\ell _{1, + }}\).
(iv) The parameters satisfy
(v) Let \(T:={R^{m}} \times {R^{q}} \times {R^{p}} \times {R^{p}}\). The set \(\{\omega \in T: L_{\beta}(\omega ) \leq L_{\beta}(\omega ^{1} )\}\) is bounded.
Lemma 3.1
By the definitions of \(z_{x}^{k}\), \(z_{y}^{k}\), \(z_{z}^{k}\), it holds that
Proof
By the definition of \(\Vert x^{k}-z_{x}^{k} \Vert ^{2}\), we have
Similarly, we get
The proof is completed. □
The following Lemmas 3.2 and 3.3 provide the descent properties of the key function defined in (3.11) and are important for the convergence.
Lemma 3.2
Suppose that Assumption Aholds, while \({L_{\beta }}\) is defined as (1.5). Then,
where
Proof
From (3.1a), (3.1b), and (3.1c), we have
and
respectively. Adding (3.3), (3.4) and (3.5) yields
By the definition of \(L_{\beta}\), we have
Then,
The optimality condition for (3.1b) implies
Combing (3.7) and (3.1d), we obtain
Hence,
Inserting the u-updating rule (3.1d), we get
From (3.9), (3.10), and (3.6), we have
which can be written as
where
The proof is completed. □
Remark 3.2
Obviously, Assumption A(iv) implies \(C_{1}>0\), \(C_{2}>0\), and \(C_{3}>0\).
Based on Lemma 3.2, we define the following key function (the regularized augmented Lagrangian function)
where \(\eta _{1}=3\beta (1 + \tau ){ \Vert A \Vert ^{2}} + \theta \tau \), \(\eta _{2}= \theta \tau \), and \(\eta _{3}=\frac{{2\tau {\theta ^{2}}}}{\beta } + \theta \tau \).
Let \(\hat{\omega}= ( {x,y,z,u,x',y',z'} )\), \({{\hat{\omega}}^{k}} = ( {{x^{k}},{y^{k}},{z^{k}},{u^{k}},{x^{k - 1}},{y^{k - 1}},{z^{k - 1}}} )\), \({\omega ^{k}} = ( {{x^{k}},{y^{k}},{z^{k}},{u^{k}}} )\). Then the following lemma implies that the sequence \({ \{ {{{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} )} \}_{k \ge 1}}\) is decreasing. It is of great importance for the following convergence analysis.
Lemma 3.3
(Descent property)
Suppose that Assumption Aholds. Let \({{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} )\) be defined as in (3.11). Then we have \(C_{1},C_{2},C_{3} > 0\) such that
Proof
The result follows directly from Lemma 3.2 and Remark 3.2. The proof is completed. □
Theorem 3.1
(Boundedness)
Suppose that Assumption Aholds. Suppose \(\{ \omega ^{k} \}_{k \ge 0}\) is a sequence generated by Algorithm 3.1, then the following statements are true:
(i) The sequence \(\{ {\hat{L}}_{\beta }({\hat{\omega}}^{k}) \}_{k \ge 1}\) is bounded from below and convergent.
(ii) One has
(iii) The sequence \({ \{ {{{ L}_{\beta }}({{ \omega }^{k}})} \}_{k \ge 1}}\) is convergent.
(iv) The sequence \({ \{ { ( {{x^{k}},{y^{k}},{z^{k}},{u^{k}}} )} \}_{k \ge 0}}\) is bounded.
Proof
For \(\eta _{1}>0\), \(\eta _{2}>0\), \(\eta _{3}>0\), one can obtain
that is,
From Assumption A(ii), we know that \(\hat{L}_{\beta} (\hat{\omega}^{k} )\ge \underline{L}\), which implies that the sequence \(\{ {\hat{L}}_{\beta }({\hat{\omega}}^{k}) \}_{k \ge 1}\) is bounded from below. Combining (3.12) and Lemma 2.1, it is easy to get that the sequence \({ \{ {{{\hat{L}}_{\beta }}({{\hat{\omega}}^{k}})} \}_{k \ge 1}}\) is convergent and also that
Then, according to (3.9), it follows that \({u^{k + 1}} - {u^{k}} \to 0\) as \(k \to \infty \). By the definition of \({ \{ {{{\hat{L}}_{\beta }}({{\hat{\omega}}^{k}})} \}_{k \ge 1}}\), we obtain that \(\{L_{\beta} (\omega ^{k} ) \}\) is convergent. From (3.12), we have that \({{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} ) \le {{ \hat{L}}_{\beta }} ( {{{\hat{\omega}}^{1}}} )\), for \(\forall k>0\). In addition, \({{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{1}}} )={{ L}_{ \beta }} ( {{{ \omega }^{1}}} )\) due to \(x^{0}=x^{1} \), \(y^{0}=y^{1}\), and \(z^{0}=z^{1}\). So, from (3.13), we get
Therefore, it follows that the sequence \({ \{ { ( {{x^{k}},{y^{k}},{z^{k}},{u^{k}}} )} \}_{k \ge 0}}\) generated by Algorithm 3.1 is bounded by Assumption A(v). The proof is completed. □
The next lemma provides upper estimates for the limiting subgradients of \(\hat{L}_{\beta} ({{{\hat{\omega}}^{k}}})\).
Lemma 3.4
Suppose that Assumption Aholds. Denote \({\nu ^{k}} = ( {{x^{k}},{y^{k}},{z^{k}}} )\). Then there exists \(\zeta >0\) such that
Proof
Let \(k \ge 1\) be fixed. Applying the calculus rules of the limiting subdifferential, we get
By the optimality condition for (3.1c), we have
Substituting it into (3.15a) leads to
By the optimality condition for (3.1a), we have
Substituting it into (3.15b) leads to
Substituting (3.7) into (3.15c) leads to
Let \(D^{k}= ( {d_{x}^{k+1},d_{y}^{k+1},d_{z}^{k+1},d_{u}^{k+1},d_{x'}^{k+1},d_{y'}^{k+1},d_{z'}^{k+1}} )\), where
Then it follows that \(D^{k+1} \in \partial {{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k+1}}} )\) and \(( {d_{x}^{k+1},d_{y}^{k+1},d_{z}^{k+1},d_{u}^{k+1}} ) \in \partial {{ L}_{\beta }} ( {{\omega ^{{k+1}}}} )\).
Thus \(\operatorname{dist}^{2} ( {0,\partial {{\hat{L}}_{\beta }} ( {{ \omega ^{k+1}}} )} ) \le \Vert {{D^{{k+1}}}} \Vert ^{2}\). By Assumption A(iii), we have
Then, there exists \({\zeta _{1}}>0\) such that
Thus, by (3.9), there exists \(\zeta >0\) such that
For \({\nu ^{k}} = ( {{x^{k}},{y^{k}},{z^{k}}} )\), it follows that
Combining with (3.16), the latter gives
The proof is completed. □
Now we prove that any cluster point of \(\{ (x^{k}, y^{k}, z^{k}, u^{k} ) \}_{k \ge 0}\) is a KKT point of the optimization problem (1.1). Let Ω and Ω̂ denote the cluster point set of the sequences \(\{\omega ^{k} \}\) and \(\{\hat{\omega}^{k} \}\), respectively.
Theorem 3.2
(Subsequence convergence)
Suppose that Assumption Aholds. Then we have that
(i) Ω̂ is nonempty, compact, and connected.
(ii) \({\operatorname{dist}} ( { {{{\hat{\omega}}^{k}}} ,\hat{\Omega}} ) \to 0\) as \(k \rightarrow \infty \).
(iii) If \({ \{ { ( {{x^{{k_{j}}}} ,{y^{{k_{j}}}} ,{z^{{k_{j}}}} ,{u^{{k_{j}}}}} )} \}_{j \ge 0}}\) is a subsequence of \({ \{ { ( {{x^{k}} ,{y^{k}} ,{z^{k}} ,{u^{k}}} )} \}_{k \ge 0}}\) that converges to \((x^{*} , y^{*} , z^{*} , u^{*})\) as \(k \rightarrow +\infty \) and \(\hat{\omega}\in \hat{\Omega}\), then
(iv) \(\hat{\Omega}\subset \operatorname{crit}{{\hat{L}}_{\beta }}(\hat{\omega})\).
(v) The function \(\hat{L}_{\beta}\) takes on Ω̂ the value
Proof
By the definition of Ω and Ω̂, (i) and (ii) are trivial.
(iii) Let \(\{\omega ^{k_{j}} \}\) be a subsequence of \(\{\omega ^{k} \}\) such that \(\omega ^{k_{j}} \rightarrow \omega ^{*}\), \(j \rightarrow \infty \). Since \(L_{\beta}(\cdot )\) is lower semicontinuous, we have
On the other hand, the definition of \(x^{k+1}\) shows that
from which we get
Replacing \({x^{k}}\), \({y^{k}}\), \({z^{k+1}}\), \({u^{k}}\) by \({{x^{{k_{j}}}},{y^{{k_{j}}}},{z^{{k_{j}} + 1}},{u^{{k_{j}}}}}\), we get
Combining with Theorem 3.1(ii), it follows that
and then we have
which implies that
From \({z^{{k} + 1}} - {z^{{k}}} \to 0\) as \(k \to \infty \), it is easy to get
Then, we have
Therefore, from (3.18) and (3.19), it follows that
By the definition of \({ {{\hat{L}}_{\beta }} ( {{{{\hat{\omega}}^{k}}}} )}\) and \(\Vert \omega ^{k}-\omega ^{k-1} \Vert \to 0 \) as \(k \to \infty \), and since the sequence \({ \{ {{{\hat{L}}_{\beta }}({{\hat{\omega}}^{k}})} \}_{k \ge 1}}\) is convergent, so we have
(iv) For the sequence \(D^{k}\) defined in Lemma 3.4, for any \(j \ge 1\), we have \({D^{k_{j}}} \in \partial {{\hat{L}}_{\beta }} ( {\hat{\omega}^{k_{j}}} )\). Then it also holds that
and thus
The closedness criterion of the limiting subdifferential guarantees that \(0 \in \partial {{\hat{L}}_{\beta }} ( {\hat{\omega}^{k_{j}}} )\), or, in other words, \({\hat{\omega}^{*}} \in \operatorname{crit}({\hat{L}}_{\beta })\).
(v) Due to Theorem 3.1(ii) and the fact that \(\{u_{n} \}_{n \geq 0}\) is bounded, the sequences \({ \{ {{{\hat{L}}_{\beta }}({{\hat{\omega}}^{k}})} \}_{k \ge 0}}\) and \(\{F (z^{k} )+G (y^{k} )+H (x^{k}, y^{k} ) \}_{k \geq 0}\) have the same limit:
The conclusion follows by taking into consideration the statements (iii) and (iv). The proof is completed. □
Theorem 3.3
(Strong convergence)
Let \({\nu ^{k}} = ( {{x^{k}},{y^{k}},{z^{k}}} )\). Assume that \(\hat{L}_{\beta}({{{\hat{\omega}}^{k}}})\) is a KŁ function and Assumption A is satisfied. Then we have
(i) The sequence \(\{ {{\omega ^{k}}} \}\) has finite length, namely, \(\sum_{k=1}^{\infty} \Vert \omega ^{k+1}-\omega ^{k} \Vert < \infty \).
(ii) The sequence \(\{\omega ^{k} \}\) converges to a critical point of \(L_{\beta}(\omega ^{*})\).
Proof
(i) From the proof of Theorem 3.2, it follows that \(\lim _{k \to + \infty } \hat{L}_{\beta} ( \hat{\omega}^{k} )=\hat{L}_{\beta} (\hat{\omega}^{*} )\). We consider two cases.
Case 1. There exists an integer \(k_{0}>0\) such that \(\hat{L}_{\beta} (\hat{\omega}^{k_{0}} )=\hat{L}_{\beta} (\hat{\omega}^{*} )\).
Since \(\{\hat{L}_{\beta} (\hat{\omega}^{k} ) \}\) is decreasing, we know that for all \(k>k_{0}\),
which implies that \(x^{k+1}=x^{k}\), \(y^{k+1}=y^{k}\), \(z^{k+1}=z^{k}\) for \(\forall k>k_{0}\). Then, from (3.6), we get \({{u^{k + 1}} = {u^{k}}}\), for \(\forall k>k_{0}+1\). Thus, \(\omega ^{k+1}=\omega ^{k}\), the result is obtained.
Case 2. One has \(\hat{L}_{\beta} (\hat{\omega}^{k} )>\hat{L}_{\beta} ( \hat{\omega}^{*} )\) for \(\forall k>0\).
Since \(\operatorname{dist} (\hat{\omega}^{k}, {\hat{\Omega}} ) \rightarrow 0\), for \(\forall {\varepsilon _{1}}>0\) there exists \(k_{1}>0\) such that, for \(\forall k>k_{1}\), \(\operatorname{dist} (\hat{\omega}^{k}, {\hat{\Omega}} )<{\varepsilon _{1}}\). Due to \(\lim _{k \to + \infty } \hat{L}_{\beta} ( \hat{\omega}^{k} )=\hat{L}_{\beta} (\hat{\omega}^{*} )\), for \(\forall {\varepsilon _{2}}>0\) there exists \(k_{2}>0\) such that \(\hat{L}_{\beta} (\hat{\omega}^{k} )<\hat{L}_{\beta} ( \hat{\omega}^{*} )+{\varepsilon _{2}}\), for \(\forall k>k_{2}\). Therefore, for \(\forall {\varepsilon _{1}}, {\varepsilon _{2}}>0\), when \(k>\widetilde{k}=\max \{k_{1}, k_{2} \}\), we have \(\operatorname{dist} (\hat{\omega}^{k}, {\hat{\Omega}} )<{ \varepsilon _{1}}\), \(\hat{L}_{\beta} (\hat{\omega}^{*} )< \hat{L}_{\beta} (\hat{\omega}^{k} )<\hat{L}_{\beta} ( \hat{\omega}^{*} )+{\varepsilon _{2}}\). Since \(\{\omega ^{k} \}\) is bounded, by Theorem 3.2, we know that Ω̂ is a nonempty compact set and \(\hat{L}_{\beta}(\cdot )\) is constant on Ω̂. Applying Lemma 2.4, we deduce that, for \(\forall k>\tilde{k}\),
Since \(\varphi ^{\prime} (\hat{L}_{\beta} (\hat{\omega}^{k} )-\hat{L}_{\beta} (\hat{\omega}^{*} ) )>0\), then
Making use of the concavity of φ, we get that
Combining with the KŁ property, it follows that
By Lemma 3.4, we get
From Lemma 3.2, we have
where \(\eta =\min \{\eta _{1},\eta _{2},\eta _{3}\}\). Putting (3.21) and (3.22) into (3.20), we obtain
Set \(b_{k}=\frac{{\sqrt {\zeta}}}{\eta} (\varphi (\hat{L}_{\beta} (\hat{\omega}^{k} )-\hat{L}_{\beta} (\hat{\omega}^{*} ) )-\varphi (\hat{L}_{\beta} (\hat{\omega}^{k+1} )-\hat{L}_{\beta} (\hat{\omega}^{*} ) ) ) \geq 0\), \(a_{k}={ \Vert {{\nu ^{k}} - {\nu ^{k - 1}}} \Vert } \geq 0\). Then (3.23) can be equivalently rewritten as
Since \(\varphi \geq 0\), we know that
hence \(\sum_{k = 1}^{\infty }{{b_{k}}} < \infty \). Note that from (3.24) we have
So Lemma 2.4 gives that \(\sum_{k=1}^{\infty} a_{k}<\infty \). Then,
Combining it with (3.8), we get
(ii) Statement (i) indicates that \(\{\omega ^{k} \}\) is a Cauchy sequence. So \(\{\omega ^{k} \}\) is convergent. Let \(\omega ^{k} \rightarrow \omega ^{*}\), \(k \rightarrow \infty \). According to Theorem 3.2(iv), it is clear that \(\hat{\omega}^{*} \in \hat{\Omega}\subset \operatorname{crit}{{\hat{L}}_{ \beta }}(\hat{\omega})\). Thus \(\hat{\omega}^{*}\) is a critical point of \(\hat{L}_{\beta}(\hat{\omega})\). Therefore, by the definition of \(\hat{L}_{\beta}\), \(\{\omega ^{k} \}\) converges to a critical point of \(L_{\beta}(\omega ^{*})\). The proof is completed. □
4 Numerical experiments
In this section, we illustrate two computational instances to contrast the efficacy of our methodology with the PMA technique detailed in [5]. The computational trials are executed on 64-bit MATLAB R2019b installed on a 64-bit computer equipped with an Intel(R) Core(TM) i7-6700HQ CPU operating at 2.6 GHz and possessing 32 GB of RAM.
Example 4.1
We consider the following optimization problem:
which can be written as
Select random matrices \(A = {({a_{ij}})_{p \times m}}\) and \(B = {({b_{ij}})_{q \times m}}\), where \({a_{ij}},{b_{ij}} \in (0,1)\). Let m, p, q be three positive integers with \(m=q\). Take the initial points, \(x^{0}=x^{-1}=\operatorname{zeros}(m,1)\), \(y^{0}=y^{-1}=\operatorname{zeros}(q,1)\), \(z^{0}=z^{-1}=\operatorname{zeros}(p,1)\), \(u^{0}=\operatorname{zeros}(p,1)\) for Algorithm 3.1. The parameters are set as \(l_{F}=1\), \(\tau =10\), \(\beta = 67\), \(\alpha = 6.6\times 10^{7}\), \(c_{1}=c_{2}=1\). The initial points for PMA in [5] are also set as the previous \(x^{0}\), \(y^{0}\), \(z^{0}\), \(u^{0}\), and the parameter is taken as \(\sigma =0.1\). Define \({ \Vert {Ax - z} \Vert ^{2}}\) as the error, and select \({ \Vert {Ax - z} \Vert ^{2}} < {10^{ - 4}}\) as the stopping criterion. The results are presented in Table 1 for clarity and, to provide a clear evaluation of the algorithm’s performance, we also depict the error curve. The respective outcomes are illustrated in Figs. 1 and 2. In the table, k denotes the number of iterations, s denotes the computing time.
Considering Table 1 and Fig. 1, we observe that the inclusion of an inertial factor positively impacts the convergence of Algorithm 3.1. Furthermore, a comparison between Table 1 and Fig. 2 suggests that our algorithm requires fewer iterations and achieves convergence at a faster rate compared to the PMA. In summary, empirical evidence indicates that our algorithm, which incorporates an inertial approach, outperforms the PMA as reported in [5].
Example 4.2
In the second example, we consider the SCAD-\(l_{2}\), which takes the form of
where \(A \in {R^{m \times n}}\), \(c \in {R^{m}}\), and \({{f_{k}}( \vert {{z_{i}}} \vert )}\) is defined by
with \(a>2\) and \(k>0\) being the knots of a quadratic spline function. We select random \(m \times n\) matrices \(A,D \sim N(0,1)\), and all columns are normalized. We select random sparse vectors in \(R^{m}\) with the density 0.01 as \(x^{*}\), \(y^{*}\) and the vector \(c=Mx^{*}-My^{*}+Q\) with the noise vector \(Q \sim N(0,10^{-3}I)\). For the sole purpose of showing the numerical efficiency, we fix the parameters \(k=3\), \(a=4\) as constants for \({{f_{k}}( \vert {{z_{i}}} \vert )}\). In addition, we set \(l_{F}=3\), \(\tau =10\), \(\beta =36.43 \), \(\alpha =1.99\times 10^{4}\) in Algorithm 3.1, and select \(\sigma =9.17\times 10^{-4}\), \(\beta =1.67\times 10^{3}\), \(\tau =4.33 \times 10^{4}\), \(\mu =276\) in PMA [5]. The initial points are selected as \(x^{0}=x^{-1}=\operatorname{zeros}(m,1)\), \(y^{0}=y^{-1}=\operatorname{zeros}(q,1)\), \(z^{0}=z^{-1}= \operatorname{rand}(p,1)\), \(u=\operatorname{ones}(p,1)\) in Algorithm 3.1, and \(x^{0}=\operatorname{zeros}(m,1)\), \(y^{0}=\operatorname{zeros}(q,1)\), \(z^{0}= \operatorname{rand}(p,1)\), \(u=\operatorname{ones}(p,1)\) in PMA [5]. The stopping criterion is taken as \(\mathrm{Error}={ \Vert {Ax - z} \Vert ^{2}} < {10^{ - 4}}\).
Figures 3 and 4 show the results of evolution of the Error with respect to iterations when we run Algorithm 3.1 and PMA in [5]. Figure 4 shows that the Error of Algorithm 3.1 decreases faster than that of PMA. One can see that for larger values of θ, Algorithm 3.1 has a smaller error value in Table 2 and Fig. 3.
5 Conclusion
This paper presents a dual-relaxed inertial proximal minimization algorithm designed for addressing a specific category of structured nonconvex and nonsmooth optimization problems. The objective function in these problems is characterized by being the sum of a composite function, a nonsmooth function, and a mixed function. The algorithm introduced herein features an update mechanism for each subproblem that incorporates inertial effects and employs two relaxed terms during the dual update phase. Additionally, the parameters within our algorithm are determined using a straightforward approach. Computational experiments demonstrate that our algorithm is both practical and effective.
Data Availability
No datasets were generated or analysed during the current study.
References
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Auslender, A.: Méthodes numériques pour la décomposition et la minimisation de fonctions nondifferentiates. Numer. Math. 18(3), 213–223 (1971)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Bot, R.I., Csetnek, E.R., Nguyen, D.K.: A proximal minimization algorithm for structured nonconvex and nonsmooth problems. SIAM J. Optim. 29(2), 1300–1328 (2019)
Bot, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45(2), 682–712 (2020)
Chao, M.T., Zhang, Y., Jian, J.B.: An inertial proximal alternating direction method of multipliers for nonconvex optimization. Int. J. Comput. Math. 98(6), 1199–1217 (2021)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Driggs, D., Tang, J., Liang, J.: SPRING: a fast stochastic proximal alternating method for non-smooth non-convex optimization (2020). arXiv:2002.12266
Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. ESAIM: Math. Model. Numer. Anal. 9(2), 41–76 (1975)
Hien, L.T.K., Phan, D.N., Gillis, N.: Inertial alternating direction method of multipliers for non-convex non-smooth optimization. Comput. Optim. Appl. 83, 247–285 (2022)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Liu, Q., Shen, X., Gu, Y.: Linearized ADMM for non-convex non-smooth optimization with convergence analysis. IEEE Access 7, 76131–76144 (2017)
Miao, L., Tang, Y., Wang, C.: A parameterized three-operator splitting algorithm for non-convex minimization problems with applications. J. Nonlinear Var. Anal. 8, 451–471 (2024)
Ochs, P., Brox, T., Pock, T.: iPiasco: inertial proximal algorithm for strongly convex optimization. J. Math. Imaging Vis. 53(2), 171–181 (2015)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, San Diego (1970)
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D, Nonlinear Phenom. 60(1–4), 259–268 (1992)
Shi, S., Fu, Z., Wu, Q.: On the average operators, oscillatory integrals, singulars, singular integrals and their applications. J. Appl. Anal. Comput. 14, 334–378 (2024)
Shi, S., Zhang, L.: Dual characterization of fractional capacity via solution of fractional p-Laplace equation. Math. Nachr. 293, 2233–2247 (2020)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Wang, X.Q., Shao, H., Liu, P.J., Wu, T.: An inertial proximal partially symmetric ADMM-based algorithm for linearly constrained multi-block nonconvex optimization problems with applications. J. Comput. Appl. Math. 420 (2022)
Yang, J., Yuan, X.: Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math. Comput. 82(281), 301–329 (2013)
Yu, Y., Yin, T.C.: Strong convergence theorems for a nonmonotone equilibrium problem and a quasi-variational inclusion problem. J. Nonlinear Convex Anal. 25, 503–512 (2024)
Funding
The National Natural Science Foundation of China (72071130; 71901145); National Key R and D Program Key Special Projects (2023YFC3306103, 2023YFC3306105); Shanghai Philosophy and Social Sciences Research Project (2023EFX011); Youth Fund for Humanities and Social Sciences of the Ministry of Education (20YJC820030); The Major Project of Chinese Society of Criminology (FZXXH2022A02).
Author information
Authors and Affiliations
Contributions
Yang Liu wrote the main manuscript text and Yazheng Dang constructed the algorithm and proved the convergence. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethics approval and consent to participate not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, Y., Wang, L. & Dang, Y. A modified inertial proximal alternating direction method of multipliers with dual-relaxed term for structured nonconvex and nonsmooth problem. J Inequal Appl 2024, 117 (2024). https://doi.org/10.1186/s13660-024-03197-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-024-03197-z