# A new criterion for an inexact parallel splitting augmented Lagrangian method

## Abstract

In this paper, we study the computational method for solving the variational inequality problem with the separable structure and linear constraints. We propose a new relaxed inexact criterion and a prediction-correction approach in the inexact splitting parallel augmented Lagrangian methods, which make it easier to solve the resulting subproblems. Under a mild condition, we prove the global convergence and establish a worst-case convergence rate for the new inexact algorithm. Some numerical experiments show the effectiveness and feasibility of the new inexact method.

## Introduction

In this paper, we consider the following variational inequality problem (VI) with the separable structure:

\begin{aligned} \bigl(v-v^{*}\bigr)^{T}Q\bigl(v^{*} \bigr)\geq0,\quad \forall v\in\Omega, \end{aligned}
(1)

with

\begin{aligned} &v=\left ( \begin{array}{@{}c@{}} x\\ y \end{array} \right ),\qquad Q(v)=\left ( \begin{array}{@{}c@{}} f(x)\\ g(y) \end{array} \right ), \quad\mbox{and}\\ &\Omega:=\bigl\{ v=(x,y) \mid Ax+By=b,x\in\mathcal{X},y\in\mathcal{Y}\bigr\} , \end{aligned}
(2)

where $$\mathcal{X}\subseteq \mathbb {R}^{n_{1}}$$ and $$\mathcal {Y}\subseteq \mathbb {R}^{n_{2}}$$ are nonempty, closed, and convex sets; $$A\in \mathbb {R}^{m\times n_{1}}$$ and $$B\in \mathbb {R}^{m\times n_{2}}$$ are given matrices; $$f : \mathcal{X} \rightarrow \mathbb {R}^{n_{1}}$$ and $$g: \mathcal{Y}\rightarrow \mathbb {R}^{n_{2}}$$ are given monotone mappings; $$b\in \mathbb {R}^{m}$$ is a given vector and $$n_{1}+n_{2}=n$$.

The variational inequality problems with separable structures and linear constraints (1)-(2) have wide applications in some fields; see . For solving the VIP, Glowinski and Marrocco  first proposed a Douglas-Rachford alternating direction method of multiplies (ADMM), which can decompose the original problems into subproblems with a smaller scale. The ADMM and its variants have been shown to be good efficient methods for many problems. However, the ADMM may fail since it is very difficult to solve the subproblems exactly in many practical applications. So, some strategies have been proposed to overcome this drawback of the methods mentioned above, such as adding a proximal regularization term and transforming the nonlinear equation. For more details, one can refer to [9, 10].

In , He proposed a parallel splitting augmented Lagrangian method (PSALM) for solving the VI (1)-(2). The iterative process of the PSALM can be described to solve the following subproblems:

\begin{aligned}& \widetilde{x}\in\mathcal{X},\quad \bigl\langle x-\widetilde{x},f( \widetilde {x})-A^{T}\bigl[\lambda^{k}-H\bigl(A \widetilde{x}+By^{k}-b\bigr)\bigr]\bigr\rangle \geq0, \quad\forall x\in \mathcal{X}, \end{aligned}
(3)
\begin{aligned}& \widetilde{y}\in\mathcal{Y},\quad \bigl\langle y-\widetilde{y},g( \widetilde {y})-B^{T}\bigl[\lambda^{k}-H\bigl(Ax^{k}+B \widetilde{y}-b\bigr)\bigr]\bigr\rangle \geq0, \quad\forall y\in\mathcal{Y}, \end{aligned}
(4)

where $$\lambda^{k}\in \mathbb {R}^{m}$$ is the Lagrange multiplier associated with the constraint in (2) and $$H\in \mathbb {R}^{m\times m}$$ is a positive definite matrix which plays the role of the penalty parameter for the violation of the linear constraint in (2). The PSALM differs from other splitting methods, since the subproblems (3) and (4) can be computed in parallel. The advantage of parallel computation is easy implementation and attractive efficiency when the problems have a large scale. However, it is also very difficult to solve the subproblems (3) and (4) unless $$f(x)$$ and $$g(y)$$ have very particular structures. In , an inexact splitting parallel augmented Lagrangian method (IPSALM) was proposed to solve the subproblems (3) and (4) approximately so that their solutions satisfy a certain inexact criterion and are closed form ones. The prediction step is generated, for $$\nu>0$$, by

\begin{aligned}& \hat{x}^{k}=P_{\mathcal{X}} \biggl\{ x^{k}- \frac {1}{r_{k}}\bigl[f\bigl(x^{k}\bigr)-A^{T}\bigl( \lambda^{k}-Ax^{k}+By^{k}-b\bigr) \bigr] \biggr\} , \end{aligned}
(5)
\begin{aligned}& \hat{y}^{k}=P_{\mathcal{Y}} \biggl\{ y^{k}- \frac{1}{s_{k}}\bigl[g\bigl(y^{k}\bigr)-B^{T}\bigl( \lambda^{k}-Ax^{k}+By^{k}-b\bigr) \bigr] \biggr\} , \end{aligned}
(6)

where $$r_{k}$$, $$s_{k}$$ are chosen to satisfy the following conditions:

\begin{aligned}& \bigl\| \xi_{x}^{k}+A^{T}HA \bigl(x^{k}-\hat{x}^{k}\bigr)\bigr\| \leq\nu r_{k} \bigl\| x^{k}-\hat {x}^{k}\bigr\| \quad\mbox{with } \xi_{x}^{k}:=f \bigl(x^{k}\bigr)-f\bigl(\hat{x}^{k}\bigr), \end{aligned}
(7)
\begin{aligned}& \bigl\| \xi_{y}^{k}+B^{T}HB \bigl(y^{k}-\hat{y}^{k}\bigr)\bigr\| \leq\nu s_{k} \bigl\| y^{k}-\hat {y}^{k}\bigr\| \quad\mbox{with } \xi_{y}^{k}:=g \bigl(y^{k}\bigr)-g\bigl(\hat{y}^{k}\bigr). \end{aligned}
(8)

In , Zhang et al. proposed another inexact criterion for generating the prediction step, that is,

\begin{aligned}& \bigl\langle x^{k}-\hat{x}^{k}, \xi_{x}^{k}\bigr\rangle + \bigl\| A\bigl(x^{k}- \hat{x}^{k}\bigr)\bigr\| _{H}^{2} \leq\nu r_{k}\bigl\| x^{k}-\hat{x}^{k}\bigr\| ^{2}, \end{aligned}
(9)
\begin{aligned}& \bigl\langle y^{k}-\hat{y}^{k}, \xi_{y}^{k}\bigr\rangle +\bigl\| B\bigl(y^{k}- \hat{y}^{k}\bigr)\bigr\| _{H}^{2} \leq\nu s_{k}\bigl\| y^{k}-\hat{y}^{k}\bigr\| ^{2}. \end{aligned}
(10)

These inexact methods have the common feature that the subproblems or relevant problems are solved approximately at each iteration. Therefore, the effectiveness of the inexact methods depends greatly on the involved inexact criteria used to solve the subproblem.

Motivated and inspired by the inexact criteria in [12, 13], in this paper, we present a new inexact criterion (see (17) and (19)) to solve the subproblems under a very relaxed restriction. The new criterion improves the upper bound of $$\langle x^{k}-\hat{x}^{k},\xi_{x}^{k}\rangle+ \|A(x^{k}-\hat {x}^{k})\|_{H}^{2}$$ and $$\langle y^{k}-\hat{y}^{k},\xi_{y}^{k}\rangle +\|B(y^{k}-\hat{y}^{k})\|_{H}^{2}$$. Thus, it reduces the computational load of the method considerably. Simultaneously, we also propose a prediction-correction approach in our algorithm analogous to . Numerical applications to the multiple-sets split feasibility problem (MSFP) and traffic equilibrium demonstrate that the proposed algorithm with the new criterion is very effective and feasible.

The rest of the paper is organized as follows. In Section 2, we summarize some concepts and properties, which are useful for further convergence analysis. In Sections 3 and 4, we present the inexact PSALM algorithm with the new criteria, and its global convergence and worst-case convergence rate, respectively. Preliminary numerical results of solving the multiple-sets split feasibility problem and the traffic equilibrium problem are presented in Section 5. Finally, we present a summary for our paper in Section 6.

## Preliminaries

In this section, we summarize some basic properties and concepts, which will be used in the coming convergence analysis. Let G be a positive definite matrix. The G matrix norm of $$v\in \mathbb {R}^{n}$$ is defined by $$\|v\|_{G}:=\sqrt{v^{T}Gv}$$. In particular, $$\|v\|:=\sqrt{v^{T}v}$$ is the Euclidean norm of $$v\in \mathbb {R}^{n}$$. $$\langle\cdot,\cdot\rangle$$ denotes the inner product in Euclidean spaces.

The following results are the well-known properties of the projection operator which will be used in the following analysis.

### Lemma 2.1

Let $$\Omega\subset \mathbb {R}^{n}$$ be nonempty, closed, and convex set, and let $$P_{\Omega}[\cdot]$$ be a projection operator onto the set Ω under the Euclidean norm. Then, for any $$u,v\in \mathbb {R}^{n}$$ and $$w\in\Omega$$, we have

1. (1)

$$\langle u -P_{\Omega}[u] ,P_{\Omega}[u]-w \rangle\geq0$$;

2. (2)

$$\|P_{\Omega}[u]-P_{\Omega}[v]\|^{2}\leq\langle u-v,P_{\Omega }[u]-P_{\Omega}[v]\rangle$$;

3. (3)

$$\| P_{\Omega}[u]-w\|^{2}\leq\|u-w\|^{2}-\| u-P_{\Omega}[u]\|^{2}$$.

### Lemma 2.2



Let $$\Omega\subset \mathbb {R}^{n}$$ be a nonempty, closed, and convex set. Let $$P_{\Omega}(\cdot)$$ be the projection operator onto Ω under the Euclidean norm. Then $$u^{*}$$ is a solution of VI $$(\Omega,F)$$ if and only if it satisfies

$$u^{*}=P_{\Omega}\bigl[u^{*}-\beta F \bigl(u^{*}\bigr)\bigr],\quad \forall\beta>0.$$
(11)

We recall the definition of monotone and strongly monotone mappings.

### Definition 2.1

Let F be a mapping defined on the closed convex set $$\Omega\subset \mathbb {R}^{n}$$. Then

1. (a)

F is called monotone on Ω if

$$\bigl\langle u-v , F(u)-F(v) \bigr\rangle \geq0,\quad \forall u,v\in\Omega;$$
2. (b)

F is called strongly monotone with the modulus $$\mu>0$$ on Ω if

$$\bigl\langle u-v ,F(u)-F(v) \bigr\rangle \geq\mu\|u-v\|^{2}, \quad\forall u,v\in \Omega;$$
3. (c)

F is called Lipschitz continuous on Ω if there exists a constant $$L>0$$ such that

$$\bigl\| F(u)-F(v)\bigr\| \leq L\|u-v\|, \quad\forall u,v \in\Omega.$$

In this paper, by attaching a Lagrange multiplier $$\lambda\in \mathbb {R}^{m}$$ to the linear constraint $$Ax+By=b$$, one obtains a compact form of the problem (1)-(2):

$$\bigl\langle w'-w,F(w) \bigr\rangle \geq0,\quad \forall w' \in\mathcal{W},$$
(12)

where

$$\mathcal{W}:=\mathcal{X}\times\mathcal{Y}\times \mathcal{R}^{m} \quad\mbox{and}\quad F(w):=\left ( \begin{array}{c} f(x)-A^{T}\lambda\\ g(y)-B^{T}\lambda\\ Ax+By-b \end{array} \right ) .$$
(13)

Note that the mapping F is monotone whenever f and g are monotone. In the sequel, the problem (12)-(13) will be denoted by MVI $$(\mathcal{W},F)$$.

### Remark 2.1

Because of attaching the Lagrange multiplier $$\lambda\in \mathbb {R}^{m}$$ to the linear constraints $$Ax+By=b$$, the VI (1)-(2) amounts to finding $$(x,y,\lambda)\in\mathcal{X}\times\mathcal{Y}\times \mathbb {R}^{m}$$ such that we have

\begin{aligned} \left \{ \begin{array}{@{}l} (x'-x)^{T}[f(x)-A^{T}\lambda]\geq0, \\ (y'-y)^{T}[g(y)-B^{T}\lambda]\geq0, \\ Ax+By-b=0, \end{array} \right . \quad\forall\bigl(x',y' \bigr)\in\mathcal{X}\times\mathcal{Y}. \end{aligned}

So, the above formula is equivalent to the MVI $$(\mathcal{W},F)$$ (12)-(13).

Throughout the paper, we make the following assumptions:

1. (A1)

It has a closed form solution to compute the projection onto the convex sets $$\mathcal{X}$$ and $$\mathcal{Y}$$ under Euclidean norm.

2. (A2)

The mappings $$f(x)$$ and $$g(y)$$ are Lipschitz continuous on $$\mathcal{X}$$ and $$\mathcal{Y}$$, respectively. However, the Lipschitz constants are not necessarily known.

3. (A3)

The solution set $$\mathcal{W}^{*}$$ of the MVI $$(\mathcal {W},F)$$ is nonempty.

## The inexact PSALM with new inexact criterion for MVI

In this section, we propose the inexact method for solving MVI $$(\mathcal{W},F)$$. To simplify our coming analysis, we denote some matrices

\begin{aligned} R_{k}=r_{k}I_{n_{1}},\qquad S_{k}=s_{k}I_{n_{2}}, \quad\mbox{and}\quad G_{k}= \left ( \begin{array}{@{}c@{\quad}c@{\quad}c@{}} R_{k} & & \\ & S_{k} &\\ -A & -B &H^{-1} \end{array} \right ), \end{aligned}
(14)

where $$r_{k}>0$$ and $$s_{k}>0$$.

Now, we mention our algorithm.

The inexact PSALM with new inexact criterion for MVI

Step 0 :

Given $$\nu\in(0,1)$$, $$\mu>1$$, $$\gamma\in(0,2)$$, $$\|A^{T}HA\|/\nu\geq r_{0}>0$$, $$\|B^{T}HB\|/\nu\geq s_{0}>0$$, $$\varepsilon>0$$. Let $$H\in \mathbb {R}^{m\times m}$$ be positive definite, $$w^{0}=(x^{0},y^{0},\lambda^{0})\in \mathbb {R}^{n_{1}}\times \mathbb {R}^{n_{2}}\times \mathbb {R}^{m}$$, and $$k=0$$.

Step 1 :

Prediction step: For a given $$w^{k}=(x^{k},y^{k},z^{k})$$, Generate the trial iterate $$\hat{x}^{k}$$, $$\hat{y}^{k}$$ via S1.2 and S1.3, simultaneously.

S1.1 :
$$\hat{\lambda}^{k}=\lambda ^{k}-H \bigl(Ax^{k}+By^{k}-b\bigr).$$
(15)
S1.2 :

Find the smallest nonnegative integer $$i_{k}$$ such that $$r_{k}=\mu^{i_{k}}r_{0}$$ and

$$\hat{x}^{k}=P_{\mathcal{X}} \biggl[x^{k}- \frac{1}{r_{k}}\bigl[f\bigl(x^{k}\bigr)-A^{T}\hat{ \lambda}^{k} \bigr] \biggr]$$
(16)

which satisfies

\begin{aligned} &\bigl\langle x^{k}-\hat{x}^{k},\xi _{x}^{k}\bigr\rangle +\bigl\| A\bigl(x^{k}- \hat{x}^{k}\bigr)\bigr\| _{H}^{2} \\ &\quad\leq\nu\biggl( r_{k}\bigl\| x^{k}-\hat{x}^{k}\bigr\| ^{2}+ \biggl\| Ax^{k}-A\hat{x}^{k}-\frac{1}{2}H^{-1} \bigl(\lambda ^{k}-\hat{\lambda}^{k}\bigr) \biggr\| _{H}^{2}\biggr). \end{aligned}
(17)
S1.3 :

Find the smallest nonnegative integer $$j_{k}$$ such that $$s_{k}=\mu^{j_{k}}s_{0}$$ and

$$\hat{y}^{k}=P_{\mathcal{Y}} \biggl[y^{k}- \frac{1}{s_{k}}\bigl[g\bigl(y^{k}\bigr)-B^{T}\hat{ \lambda}^{k} \bigr] \biggr]$$
(18)

which satisfies

\begin{aligned} &\bigl\langle y^{k}-\hat{y}^{k},\xi _{y}^{k}\bigr\rangle +\bigl\| B\bigl(y^{k}- \hat{y}^{k}\bigr)\bigr\| _{H}^{2} \\ &\quad\leq\nu\biggl( s_{k}\bigl\| y^{k}-\hat{y}^{k}\bigr\| ^{2}+ \biggl\| By^{k}-B\hat{y}^{k}-\frac{1}{2}H^{-1} \bigl(\lambda ^{k}-\hat{\lambda}^{k}\bigr) \biggr\| _{H}^{2} \biggr). \end{aligned}
(19)

Step 2 :

Convergence verification: if $$\|w^{k}-\hat{w}^{k}\|\leq \epsilon$$, then stop. $$\hat{w}^{k}=(\hat{x}^{k},\hat{y}^{k},\hat{\lambda }^{k})$$ is an acceptable approximate solution.

Step 3 :

Correction step: generate the new iterate $$w^{k+1}$$ via

Form I: :
$$w_{\mathbf{I}}^{k+1}=w^{k}-\gamma \alpha_{k}^{*}d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr),$$
(20)

where

$$d\bigl(w^{k},\hat{w}^{k},\xi^{k} \bigr)=G_{k}\bigl(w^{k}-\hat{w}^{k}\bigr)- \xi^{k} \quad\mbox{with } \xi^{k}=\left ( \begin{array}{@{}c@{}} \xi_{x}^{k} \\ \xi_{y}^{k}\\ 0 \end{array} \right )$$
(21)
or
Form II: :
$$w_{\mathbf{II}}^{k+1}=P_{\mathcal {W}} \bigl[w^{k}-\gamma\alpha_{k}^{*}F\bigl( \hat{w}^{k}\bigr)\bigr].$$
(22)

Here, the set $$\mathcal{W}$$, the mapping $$F(w)$$ and $$G_{k}$$ are defined in (13) and (14), respectively, the step size $$\alpha_{k}$$ is determined by

$$\alpha_{k}^{*}=\frac{\phi (w^{k},\hat{w}^{k},\xi^{k})}{\|d(w^{k},\hat{w}^{k},\xi^{k})\|^{2}}$$
(23)

and

$$\phi\bigl(w^{k},\hat{w}^{k},\xi ^{k}\bigr):=\bigl\langle w^{k}-\hat{w}^{k},d \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr) \bigr\rangle .$$
(24)

Set $$k:=k+1$$ and go to Step 1.

### Remark 3.1

The prediction step of the proposed algorithm differs from both that of the method in  and that of the method in  in that we adopt a new criterion. Since $$f(x)$$ and $$g(y)$$ are Lipschitz continuous with constant $$L_{f}$$ and $$L_{g}$$, respectively, we have $$\|\xi_{x}^{k}\|\leq L_{f}\| x^{k}-\hat{x}^{k}\|$$ and $$\|\xi_{y}^{k}\|\leq L_{g}\|y^{k}-\hat{y}^{k}\|$$. Using the Cauchy-Schwarz inequality, we have

$$\bigl\langle x^{k}-\hat{x}^{k},\xi_{x}^{k} \bigr\rangle +\bigl\| A\bigl(x^{k}-\hat{x}^{k}\bigr)\bigr\| _{H}^{2}\leq \bigl(L_{f}+\bigl\| A^{T}HA\bigr\| \bigr)\bigl\| x^{k}-\hat{x}^{k}\bigr\| ^{2}.$$

Thus, the inequality (17) holds as long as

$$r_{k}\geq\frac{L_{f}+\|A^{T}HA\|}{\nu}.$$

Analogously, the inequality (19) holds when $$s_{k}$$ satisfies

$$s_{k}\geq\frac{L_{g}+\|B^{T}HB\|}{\nu}.$$

Thus, in the implementation of the our algorithm, we choose, respectively, the values of $$r_{k}$$ and $$s_{k}$$ to satisfy the following conditions:

$$r_{0}\leq r_{k}\leq r_{\max}:= \frac{L_{f}+\|A^{T}HA\|}{\nu^{2}} \quad\mbox{and}\quad s_{0}\leq s_{k}\leq s_{\max}=:\frac{L_{g}+\|B^{T}HB\|}{\nu^{2}}.$$
(25)

### Remark 3.2

We note that $$\hat{x}^{k}$$ and $$\hat {y}^{k}$$ obtained by (16) and (18) are actually solutions of the following VIs, respectively:

\begin{aligned}& \bigl\langle x-\hat{x}^{k}, f\bigl(x^{k} \bigr)-A^{T}\hat{\lambda}^{k}+R_{k}\bigl( \hat{x}^{k}-x^{k}\bigr)\bigr\rangle \geq0 ,\quad \forall x\in \mathcal{X}, \end{aligned}
(26)
\begin{aligned}& \bigl\langle y-\hat{y}^{k}, g\bigl(y^{k} \bigr)-B^{T}\hat{\lambda}^{k}+S_{k}\bigl( \hat{y}^{k}-y^{k}\bigr)\bigr\rangle \geq0 ,\quad \forall y\in \mathcal{Y}. \end{aligned}
(27)

Combing (15)-(19) and (26)-(27), we have

$$\bigl\langle w-\hat{w}^{k}, F\bigl(\hat {w}^{k}\bigr)-d\bigl(w^{k},\hat{w}^{k}, \xi^{k} \bigr) \bigr\rangle \geq0,\quad \forall w\in \mathcal{W}.$$
(28)

## Convergence

### The global convergence

In this section, we establish the convergence of our algorithm.

### Lemma 4.1

Let the sequences $$\hat{w}^{k}=(\hat{x}^{k},\hat{y}^{k},\hat{\lambda }^{k})$$ be generated by the prediction step from the given point $$w^{k}=(x^{k},y^{k},\lambda^{k})$$. If $$\|\hat {w}^{k}-w^{k}\|=0$$, then $$\hat{w}^{k}$$ is a solution of MVI $$(\mathcal{W},F)$$.

### Proof

Since $$\|\hat{w}^{k}-w^{k}\|=0$$, it means that $$\hat{x}^{k}=x^{k}$$, $$\hat{y}^{k}=y^{k}$$, and $$\hat{\lambda}^{k}=\lambda^{k}$$. So, from (15) and (26)-(27), we obtain $$A\hat{x}^{k}+B\hat{y}^{k}-b=0$$ and

\begin{aligned}& \langle x'-\hat{x}^{k},f(\hat{x}^{k})-A^{T}\hat{\lambda}^{k} \rangle \geq0, \quad \forall x'\in\mathcal{X},\\& \langle y'-\hat{y}^{k},g(\hat{y}^{k})-B^{T}\hat{\lambda}^{k} \rangle \geq0, \quad \forall y'\in\mathcal{Y}, \end{aligned}

from which it follows that (12)-(13) hold for $$\hat{w}^{k}$$. The proof is completed. □

From Lemma 4.1, we terminate the algorithm if $$\|w^{k}-\hat {w}^{k}\|\leq\epsilon$$ holds for some k.

### Lemma 4.2

Let $$\phi(w^{k},\hat{w}^{k})$$ be the metrical function defined in (24). Then, for any $$k\geq1$$, we have

$$\phi\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr)\geq \frac{\delta(1-\nu)}{2}\bigl(\bigl\| w^{k}- \hat{w}^{k}\bigr\| ^{2}+\bigl\| A\hat{x}^{k}+B\hat {y}^{k}-b\bigr\| ^{2}\bigr),$$
(29)

where $$\delta:=\min \{r_{0},s_{0},\lambda_{m}(H^{-1}),\lambda _{m}(H) \}$$, and $$\lambda_{m}(H^{-1})$$ and $$\lambda_{m}(H)$$ denote the smallest eigenvalue of $$H^{-1}$$ and H, respectively.

### Proof

By the definition of $$\phi(w^{k},\hat{w}^{k},\xi ^{k})$$, we get

\begin{aligned} \phi\bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr) =& \bigl\langle w^{k}-\hat{w}^{k},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr) \bigr\rangle \\ =& \bigl\| x^{k}-\hat{x}^{k}\bigr\| _{R_{k}}^{2} - \bigl\langle Ax^{k}-A\hat{x}^{k},\xi _{x}^{k} \bigr\rangle + \bigl\| y^{k}-\hat{y}^{k}\bigr\| _{S_{k}}^{2} -\bigl\langle By^{k}-B\hat{y}^{k},\xi _{y}^{k} \bigr\rangle \\ &{}-\bigl\langle Ax^{k}-A\hat{x}^{k},\lambda^{k}- \hat{\lambda}^{k}\bigr\rangle - \bigl\langle By^{k}-B \hat{y}^{k},\lambda^{k}-\hat{\lambda}^{k}\bigr\rangle +\bigl\| \lambda^{k}-\hat{\lambda}^{k} \bigr\| _{H^{-1}}^{2} \\ =&\bigl\| x^{k}-\hat{x}^{k}\bigr\| _{R_{k}}^{2}+ \biggl\| Ax^{k}-A\hat{x}^{k}-\frac {1}{2}H^{-1} \bigl(\lambda^{k}-\hat{\lambda}^{k}\bigr) \biggr\| _{H}^{2} \\ &{} -\bigl(\bigl\langle Ax^{k}-A\hat{x}^{k}, \xi_{x}^{k}\bigr\rangle +\bigl\| Ax^{k}-A\hat {x}^{k}\bigr\| _{H}^{2} \bigr) \\ &{}+ \bigl\| y^{k}-\hat{y}^{k}\bigr\| _{S_{k}}^{2}+ \biggl\| By^{k}-B\hat{y}^{k}-\frac {1}{2}H^{-1} \bigl(\lambda^{k}-\hat{\lambda}^{k}\bigr) \biggr\| _{H}^{2} \\ &{}-\bigl(\bigl\langle By^{k}-B\hat{y}^{k}, \xi_{y}^{k}\bigr\rangle +\bigl\| By^{k}-B\hat {y}^{k}\bigr\| _{H}^{2} \bigr) +\frac{1}{2}\bigl\| \lambda^{k}-\hat{\lambda}^{k} \bigr\| _{H^{-1}}^{2}. \end{aligned}

Using the inexact criteria (16)-(19), $$2(\|a\|_{H}^{2}+\|b\|_{H}^{2})\geq\|a+b\|_{H}^{2}$$, and the definition of δ, we have

\begin{aligned} \phi\bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr) \geq&(1-\nu) \biggl(\bigl\| x^{k}-\hat{x}^{k}\bigr\| _{R_{k}}^{2}+ \biggl\| Ax^{k}-A\hat{x}^{k}-\frac{1}{2}H^{-1} \bigl(\lambda^{k}-\hat{\lambda}^{k}\bigr)\biggr\| _{H}^{2}\biggr)\\ &{}+(1-\nu) \biggl(\bigl\| y^{k}-\hat{y}^{k} \bigr\| _{S_{k}}^{2}+ \biggl\| By^{k}-B\hat{y}^{k}- \frac{1}{2}H^{-1}\bigl(\lambda^{k}-\hat{ \lambda}^{k}\bigr)\biggr\| _{H}^{2} \biggr) \\ &{}+\frac{1}{2}\bigl\| \lambda^{k}-\hat{\lambda}^{k} \bigr\| _{H^{-1}}^{2}\\ \geq&(1-\nu) \bigl(\bigl\| x^{k}-\hat{x}^{k} \bigr\| _{R_{k}}^{2}+\bigl\| y^{k}-\hat{y}^{k}\bigr\| _{S_{k}}^{2}\bigr)+\frac{1}{2}\bigl\| \lambda^{k}- \hat{\lambda}^{k}\bigr\| _{H^{-1}}^{2}\\ &{}+\frac{1-\nu}{2}\bigl\| Ax^{k}-A\hat{x}^{k}+By^{k}-B \hat {y}^{k}-H^{-1}\bigl(\lambda^{k}-\hat{ \lambda}^{k}\bigr)\bigr\| _{H}^{2} \\ \geq& \frac{1-\nu}{2}\bigl(\bigl\| x^{k}-\hat{x}^{k} \bigr\| _{R_{k}}^{2}+\bigl\| y^{k}-\hat {y}^{k} \bigr\| _{S_{k}}^{2}+\bigl\| \lambda^{k}-\hat{ \lambda}^{k}\bigr\| _{H^{-1}}^{2}+\bigl\| A\hat{x}^{k}+B\hat{y}^{k}-b\bigr\| _{H}^{2} \bigr)\\ \geq&\frac{\delta(1-\nu)}{2}\bigl(\bigl\| \hat{w}^{k}-w^{k} \bigr\| ^{2}+\bigl\| A\hat {x}^{k}+B\hat{y}^{k}-b \bigr\| ^{2} \bigr). \end{aligned}

The proof is completed. □

### Lemma 4.3

Let $$\alpha_{k}^{*}$$ be defined in (23). Then, for any $$k\geq1$$, we get $$\alpha_{k}^{*}\geq c>0$$.

### Proof

By the analysis in Remark 3.1, we have

$$\bigl\| \xi_{x}^{k}\bigr\| \leq L_{f}\bigl\| x^{k}- \hat{x}^{k}\bigr\| \quad \mbox{and} \quad\bigl\| \xi_{y}^{k}\bigr\| \leq L_{g}\bigl\| y^{k}-\hat{y}^{k}\bigr\| .$$

Moreover, it follows from (25) that $$\forall k\geq0$$

\begin{aligned} \bigl\| G_{k}\bigl(w^{k}-\hat{w}^{k}\bigr) \bigr\| =&r_{k}\bigl\| x^{k}-\hat{x}^{k}\bigr\| + s_{k} \bigl\| y^{k}-\hat{y}^{k}\bigr\| \\ &{}+\bigl\| Ax^{k}-A\hat{x}^{k}+ By^{k}-B \hat{y}^{k}- H^{-1}\bigl(\lambda^{k}-\hat { \lambda}^{k}\bigr)\bigr\| \\ \leq& r_{\max}\bigl\| x^{k}-\hat{x}^{k}\bigr\| + s_{\max}\bigl\| y^{k}-\hat{y}^{k}\bigr\| +\bigl\| A \hat{x}^{k}+B\hat{y}^{k}-b\bigr\| . \end{aligned}

Thus, there exists a positive constant C such that

\begin{aligned} \bigl\| d\bigl(w^{k},\hat{w}^{k}\bigr)\bigr\| =& \bigl\| G_{k}\bigl(w^{k}-\hat{w}^{k}\bigr)- \xi^{k}\bigr\| \\ \leq& \bigl\| G_{k}\bigl(w^{k},\hat{w}^{k}\bigr)\bigr\| + \bigl\| \xi^{k}\bigr\| \\ \leq& C \bigl(\bigl\| x^{k}-\hat{x}^{k}\bigr\| + \bigl\| y^{k}- \hat{y}^{k}\bigr\| +\bigl\| A\hat {x}^{k}+B\hat{y}^{k}-b\bigr\| \bigr) \\ \leq&C\bigl(\bigl\| w^{k}-\hat{w}^{k}\bigr\| + \bigl\| A \hat{x}^{k}+B\hat{y}^{k}-b\bigr\| \bigr), \end{aligned}
(30)

where $$C:=\max \{r_{\max},s_{\max}, L_{f},L_{g},1 \}$$. So, according to (29) and (30) and the Cauchy-Schwarz inequality, we obtain

$$\alpha_{k}^{*}=\frac{\phi(w^{k},\hat{w}^{k},\xi^{k})}{\|d(w^{k},\hat {w}^{k},\xi^{k})\|^{2}}\geq c:= \frac{\delta(1-\nu)}{4C^{2}}>0.$$

□

Before proving the convergence of our algorithm, we explain why we choose the inexact criterion and the step size $$\alpha_{k}$$ given in (23). Let $$w^{*}=(x^{*},y^{*},\lambda^{*})$$ be a solution of MVI (12)-(13). In order to find the proper step size $$\alpha_{k}$$ for correct step, by $$w_{\mathbf{I}}^{k+1}(\alpha)$$ and $$w_{\mathbf{II}}^{k+1}(\alpha)$$ we denote the correction form I and II with the undetermined step size, respectively, i.e.

$$w_{\mathbf{I}}^{k+1}(\alpha ):=w^{k}-\alpha d \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr)$$
(31)

and

$$w_{\mathbf{II}}^{k+1}(\alpha ):=P_{\mathcal{W}} \bigl[w^{k}-\alpha F\bigl(\hat{w}^{k}\bigr) \bigr].$$
(32)

Moreover, we measure the improvement obtained by the correction step as follows:

$$\psi(\alpha)=\bigl\| w^{k}-w^{*}\bigr\| ^{2}-\bigl\| w^{k+1}(\alpha)-w^{*}\bigr\| ^{2}.$$
(33)

Naturally, we choose the $$\alpha_{k}$$, which maximizes the function $$\psi(\alpha)$$, as the step size of the correct step.

### Theorem 4.1

Let $$w^{k+1}(\alpha)$$ be the correction step (31) or (32) with an undetermined step size, and $$\psi(\alpha)$$ be defined in (33). Then

$$\psi(\alpha)\geq2\alpha\phi\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)- \alpha^{2}\bigl\| d \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr) \bigr\| ^{2},$$
(34)

where $$d(w^{k},\hat{w}^{k},\xi^{k})$$ and $$\phi(w^{k},\hat{w}^{k},\xi^{k})$$ are defined in (21) and (24), respectively.

### Proof

We break our proof into two cases for the correct form I and II, respectively.

(I) First, we prove the assertion for the first correction form I. Since

\begin{aligned} \psi(\alpha) =&\bigl\| w^{k}-w^{*}\bigr\| ^{2}- \bigl\| w^{k+1}(\alpha)-w^{*}\bigr\| ^{2} \\ =& \bigl\| w^{k}-w^{*}\bigr\| ^{2}- \bigl\| w^{k}- \alpha d\bigl(w^{k},\hat{w}^{k},\xi ^{k} \bigr)-w^{*}\bigr\| ^{2} \\ =& 2\alpha\bigl\langle w^{k}-w^{*},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\rangle - \alpha^{2} \bigl\| d\bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr) \bigr\| ^{2}, \end{aligned}

we only need to prove the following inequality:

$$\bigl\langle w^{k}-w^{*},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\rangle \geq\phi \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr) =\bigl\langle w^{k}-\hat{w}^{k},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\rangle .$$

In fact, by using Lemma 2.2, we can change the inequality (28) into an equality:

$$\hat{w}^{k}=P_{\mathcal{W}} \bigl[\hat{w}^{k}-\bigl[F \bigl(\hat {w}^{k}\bigr)-d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr) \bigr] \bigr].$$

Setting $$u=\hat{w}^{k}-[F(\hat{w}^{k})-d(w^{k},\hat{w}^{k},\xi^{k})]$$ and $$w=w^{*}$$ in Lemma 2.1(1), we get

$$\bigl\langle \hat{w}^{k}-w^{*}, \hat{w}^{k}- \bigl[F\bigl(\hat{w}^{k}\bigr)-d\bigl(w^{k},\hat {w}^{k},\xi^{k}\bigr)\bigr]-\hat{w}^{k}\bigr\rangle \geq0,$$

that is,

$$\bigl\langle \hat{w}^{k}-w^{*}, d \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr)\bigr\rangle \geq\bigl\langle \hat{w}^{k}-w^{*},F\bigl(\hat {w}^{k}\bigr)\bigr\rangle .$$
(35)

Note that $$w^{*}\in\mathcal{W}^{*}$$ and the mapping $$F(w)$$ is monotone on $$\mathcal{W}$$. So, by (35) and the definition of $$\phi(w^{k},\hat {w}^{k},\xi^{k})$$, we obtain

\begin{aligned} \bigl\langle w^{k}-w^{*},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\rangle \geq& \bigl\langle w^{k}-\hat{w}^{k},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\rangle + \bigl\langle \hat{w}^{k}-w^{*},F\bigl(\hat{w}^{k}\bigr)\bigr\rangle \\ \geq& \phi\bigl(w^{k},\hat{w}^{k},\xi^{k} \bigr), \end{aligned}

which indicates that (34) holds for the correction form I.

(II) Now, we prove the assertion for the second correction form II. From (28), we obtain

$$\bigl\langle w-\hat{w}^{k} , F\bigl(\hat{w}^{k}\bigr)\bigr\rangle \geq\bigl\langle w-\hat {w}^{k},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\rangle .$$

Then it follows from $$w^{*}\in\mathcal{W}$$, the correction form II and Lemma 2.1(3) that

$$\bigl\| w^{k+1}(\alpha)-w^{*}\bigr\| ^{2}\leq \bigl\| w^{k}-\alpha F\bigl(\hat{w}^{k}\bigr)-w^{*}\bigr\| ^{2}-\bigl\| w^{k}-\alpha F\bigl(\hat{w}^{k} \bigr)-w^{k+1}(\alpha)\bigr\| ^{2}.$$

Consequently, we have

\begin{aligned} \psi(\alpha) =&\bigl\| w^{k}-w^{*}\bigr\| ^{2}- \bigl\| w^{k+1}(\alpha)-w^{*}\bigr\| ^{2}\\ \geq&\bigl\| w^{k}-w^{*}\bigr\| ^{2}+\bigl\| w^{k}- \alpha F\bigl(\hat{w}^{k}\bigr)-w^{k+1}(\alpha ) \bigr\| ^{2}-\bigl\| w^{k}-\alpha F\bigl(\hat{w}^{k} \bigr)-w^{*}\bigr\| ^{2} \\ =&\bigl\| w^{k}-w^{k+1}(\alpha)\bigr\| ^{2}+2\alpha\bigl\langle w^{k+1}(\alpha )-w^{*},F\bigl(\hat{w}^{k} \bigr) \bigr\rangle \\ \geq& \bigl\| w^{k}-w^{k+1}(\alpha)\bigr\| ^{2}+2\alpha\bigl\langle w^{k+1}(\alpha)-\hat {w}^{k},F\bigl( \hat{w}^{k}\bigr) \bigr\rangle \\ \geq& \bigl\| w^{k}-w^{k+1}(\alpha)\bigr\| ^{2}+2\alpha\bigl\langle w^{k+1}(\alpha)-\hat {w}^{k},d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\rangle \\ =& \bigl\| w^{k}-\alpha d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr)-w^{k+1}(\alpha)\bigr\| ^{2}+2\alpha\phi \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr)- \alpha^{2}\bigl\| d\bigl(w^{k},\hat {w}^{k}, \xi^{k}\bigr)\bigr\| ^{2}, \end{aligned}

which implies that (34) holds for the correction form II. The proof is completed. □

From Theorem 4.1, we may use the value of the maximum point $$\alpha_{k}^{*}$$ of the lower bound function $$h(\alpha)=2\alpha\phi (w^{k},\hat{w}^{k},\xi^{k})-\alpha^{2}\|d(w^{k},\hat{w}^{k},\xi^{k})\| ^{2}$$ as the approximate solution of α. Since $$h(\alpha)$$ is a quadratic function, it reaches its maximum at

$$\alpha_{k}^{*}=\frac{\phi(w^{k},\hat{w}^{k},\xi^{k} )}{\|d(w^{k},\hat {w}^{k},\xi^{k}) \|^{2}}.$$

Furthermore, we introduce a relaxation factor γ to the step size, and then set $$\alpha_{k}=\gamma\alpha_{k}^{*}$$ in the correction step (20) or (22). We obtain

\begin{aligned} \psi(\alpha_{k}) =&\psi\bigl(\gamma\alpha_{k}^{*} \bigr)=2\gamma \alpha_{k}^{*}\phi\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr) -\gamma^{2}\bigl( \alpha_{k}^{*}\bigr)^{2}\bigl\| d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr) \bigr\| ^{2} \\ =&\gamma(2-\gamma)\alpha_{k}^{*}\phi\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr). \end{aligned}

From Lemma 4.2, we have $$\phi( w^{k},\hat{w}^{k},\xi^{k})>0$$ if $$w^{k}\neq\hat{w}^{k}$$. Naturally, the step size $$\alpha_{k}>0$$. So, the relaxation factor must satisfy $$\gamma\in(0,2)$$ at each iteration. Based on the above analysis, we obtain the following corollary of Theorem 4.1.

### Corollary 4.1

Let $$w^{*}\in\mathcal{W}^{*}$$ and the sequence $$\{w^{k}\}$$ be generated by the proposed algorithm. Then the following inequality holds:

\begin{aligned} \bigl\| w^{k+1}-w^{*}\bigr\| ^{2} \leq& \bigl\| w^{k}-w^{*} \bigr\| ^{2}-\gamma(2-\gamma)\alpha _{k}^{*}\phi \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr) \\ \leq&\bigl\| w^{k}-w^{*}\bigr\| ^{2}- \frac{c\delta\gamma(2-\gamma)(1-v)}{2}\bigl\| w^{k}-\hat{w}^{k}\bigr\| ^{2}. \end{aligned}

From Corollary 4.1, we note that $$\{w^{k}\}$$ is Fejér monotone. Using a similar proof procedure to  and , we can easily derive the following result.

### Theorem 4.2

The sequence $$\{w^{k}\}$$ as generated by the proposed algorithm converges to a solution of MVI $$(\mathcal{W},F)$$.

### Convergence rate

Now, we show the worst-case $$O(1/t)$$ convergence rate for the proposed algorithm.

### Lemma 4.4

For the given $$w^{k}\in\mathcal{W}$$, let $$\hat{w}^{k}$$ be generated by the proposed algorithm and the new iterate is updated by correction form I or II with $$\gamma>0$$. Then

\begin{aligned} &\bigl(w-\hat{w}^{k}\bigr)^{T}\gamma \alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr)+ \frac{1}{2}\bigl(\bigl\| w-w^{k}\bigr\| ^{2}-\bigl\| w-w^{k+1} \bigr\| ^{2}\bigr) \\ &\quad\geq\frac{1}{2}\gamma(2-\gamma)\phi \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr),\quad \forall w \in\mathcal{W}, \end{aligned}
(36)

where $$\phi(w^{k},\hat{w}^{k},\xi^{k})$$ is defined in (24).

### Proof

We divide our proof into two parts for correction form I and II, respectively.

(I) Due to (28)

$$\bigl(w-\hat{w}^{k}\bigr)^{T}F\bigl( \hat{w}^{k}\bigr)\geq\bigl(w-\hat{w}^{k} \bigr)^{T}d\bigl(w^{k},\hat {w}^{k}, \xi^{k}\bigr), \quad\forall w\in\mathcal{W},$$
(37)

and the correction form I

$$w^{k}-w^{k+1}=\gamma\alpha_{k}^{*}d \bigl(w^{k},\hat{w}^{k},\xi^{k}\bigr).$$

Using the identity

$$(a-b)^{T}(c-d)=\frac{1}{2}\bigl(\|a-d\|^{2}-\|a-c \|^{2}\bigr)+\frac{1}{2}\bigl(\|b-c\| ^{2}-\|b-d \|^{2}\bigr),$$

we have

\begin{aligned} \bigl(w-\hat{w}^{k}\bigr)^{T}\gamma \alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr) \geq& \bigl(w-\hat {w}^{k}\bigr)^{T}\bigl(w^{k}-w^{k+1} \bigr) \\ =&\frac{1}{2}\bigl(\bigl\| w-w^{k+1}\bigr\| ^{2}- \bigl\| w-w^{k}\bigr\| ^{2}\bigr) \\ &+\frac{1}{2}\bigl(\bigl\| w^{k}-\hat{w}^{k} \bigr\| ^{2}-\bigl\| w^{k+1}-\hat{w}^{k}\bigr\| ^{2} \bigr). \end{aligned}
(38)

By the correction form I (20) and the definition of $$\phi(w^{k},\hat{w}^{k},\xi^{k})$$, we obtain

\begin{aligned} &\bigl\| w^{k}-\hat{w}^{k}\bigr\| ^{2}- \bigl\| w^{k+1}-\hat{w}^{k}\bigr\| ^{2} \\ &\quad=\bigl\| w^{k}- \hat {w}^{k}\bigr\| ^{2} -\bigl\| w^{k}-\hat{w}^{k}- \gamma\alpha_{k}^{*}d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\| ^{2} \\ &\quad=2\gamma\alpha_{k}^{*}\bigl(w^{k}- \hat{w}^{k}\bigr)^{T}d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)- \gamma^{2}\bigl( \alpha_{k}^{*}\bigr)^{2}\bigl\| d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)\bigr\| ^{2} \\ &\quad=\gamma(2-\gamma) \bigl(\alpha_{k}^{*} \bigr)^{2}\bigl\| d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr)\bigr\| ^{2} \\ &\quad=\gamma(2-\gamma)\phi\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr). \end{aligned}
(39)

It follows from (38) and (39) that (36) holds for the correction form I.

(II) For the correction form II, we divide firstly $$(w-\hat{w}^{k})^{T}\gamma\alpha_{k}^{*}F(\hat{w}^{k})$$ in the terms

$$\bigl(w^{k+1}-\hat{w}^{k}\bigr)^{T} \gamma\alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr) \quad\mbox{and}\quad \bigl(w-w^{k+1}\bigr)^{T}\gamma \alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr).$$
(40)

First, we deal with the term $$(w^{k+1}-\hat{w}^{k})^{T}\gamma\alpha _{k}^{*}F(\hat{w}^{k})$$. Since the new iterate $$w^{k+1}\in\mathcal{W}$$, substituting $$w=w^{k+1}$$ into (37), we have

\begin{aligned} \bigl(w^{k+1}-\hat{w}^{k}\bigr)^{T} \gamma\alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr) \geq& \gamma\alpha_{k}^{*}\bigl(w^{k+1}- \hat{w}^{k}\bigr)^{T}d\bigl(w^{k}, \hat{w}^{k},\xi ^{k}\bigr) \\ =& \gamma\alpha_{k}^{*}\bigl(w^{k}- \hat{w}^{k}\bigr)^{T}d\bigl(w^{k}, \hat{w}^{k},\xi ^{k}\bigr) \\ &{}-\gamma\alpha_{k}^{*}\bigl(w^{k}-w^{k+1} \bigr)^{T}d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr). \end{aligned}
(41)

Using the definition of step size α, we get

$$\gamma\alpha_{k}^{*}\bigl(w^{k}- \hat{w}^{k}\bigr)^{T}d\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr) =\gamma\bigl(\alpha_{k}^{*} \bigr)^{2}\bigl\| d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr)\bigr\| ^{2}.$$

Naturally, we have

\begin{aligned} -\gamma\alpha_{k}^{*}\bigl(w^{k}-w^{k+1} \bigr)^{T}d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr) =& -\frac{1}{2}\bigl(\bigl\| w^{k}-w^{k+1} \bigr\| ^{2}+\gamma^{2}\bigl(\alpha_{k}^{*} \bigr)^{2}\bigl\| d\bigl(w^{k},\hat{w}^{k}, \xi^{k}\bigr)\bigr\| ^{2}\bigr) \\ &{}+\frac{1}{2}\bigl\| w^{k}-w^{k+1}-\gamma \alpha_{k}^{*}d\bigl(w^{k},\hat {w}^{k},\xi^{k}\bigr)\bigr\| ^{2}. \end{aligned}

So, we get

\begin{aligned} \bigl(w^{k+1}-\hat{w}^{k}\bigr)^{T} \gamma\alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr) \geq& \frac{1}{2}\gamma(2-\gamma) \bigl(\alpha_{k}^{*} \bigr)^{2}\bigl\| d\bigl(w^{k},\hat{w}^{k},\xi ^{k}\bigr)\bigr\| ^{2}-\frac{1}{2}\bigl\| w^{k}-w^{k+1} \bigr\| ^{2} \\ =&\frac{1}{2}\gamma(2-\gamma)\phi\bigl(w^{k}, \hat{w}^{k},\xi^{k}\bigr)-\frac {1}{2} \bigl\| w^{k}-w^{k+1}\bigr\| ^{2}. \end{aligned}
(42)

Now, we turn to treat the second term $$(w-w^{k+1})^{T}\gamma\alpha _{k}^{*}F(\hat{w}^{k})$$ in (40). Since $$w^{k+1}$$ is corrected by the correction form II (22), $$w^{k+1}$$ is the projection of $$w^{k}-\gamma\alpha _{k}^{*}F(\hat{w}^{k})$$ on $$\mathcal{W}$$, it follows from Lemma 2.1(1) that

$$\bigl(w-w^{k+1}\bigr)^{T}\bigl(w^{k+1}-w^{k}+ \gamma\alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr)\bigr) \geq 0,\quad \forall w\in\mathcal{W},$$

and consequently, using the identity $$a^{T}b=\frac{1}{2}(\|a\|^{2}+\|b\| ^{2}-\|a-b\|^{2})$$, we obtain

\begin{aligned} \bigl(w-w^{k+1}\bigr)^{T}\gamma \alpha_{k}^{*}F\bigl(\hat{w}^{k}\bigr) \geq & \bigl(w-w^{k+1}\bigr)^{T}\bigl(w^{k}-w^{k+1} \bigr) \\ =&\frac{1}{2}\bigl(\bigl\| w-w^{k+1}\bigr\| ^{2}- \bigl\| w-w^{k}\bigr\| ^{2}\bigr)+\frac{1}{2}\bigl\| w^{k}-w^{k+1}\bigr\| ^{2}. \end{aligned}
(43)

Adding (42) and (43), we get (36). The proof is completed. □

### Theorem 4.3

For an integer $$t>0$$, there is a $$\hat {w}_{t}\in\mathcal{W}$$, which is a convex combination of the prediction iterates $$\hat{w}^{0}, \hat{w}^{1},\ldots,\hat{w}^{n}$$, satisfying

$$(\hat{w}_{t}-w)^{T}F(w)\leq\frac{1}{2\gamma\Gamma_{t}} \bigl\| w^{0}-w\bigr\| ^{2},\quad \forall w\in\mathcal{W},$$

where

$$\Gamma_{t}:=\sum_{k=0}^{t} \alpha_{k}^{*} \quad\textit{and}\quad \hat{w}_{t}:= \frac{1}{\Gamma_{t}} \sum_{k=0}^{t} \alpha_{k}^{*}\hat{w}^{k}.$$

### Proof

It follows from (29) and (36) that

$$\bigl\langle w-\hat{w}^{k},\alpha_{k}^{*}F\bigl( \hat{w}^{k}\bigr)\bigr\rangle +\frac {1}{2\gamma}\bigl\| w-w^{k} \bigr\| ^{2} \geq\frac{1}{2\gamma}\bigl\| w-w^{k+1}\bigr\| ^{2},\quad \forall w\in\mathcal{W}.$$

By the monotonicity of $$F(w)$$ and the above inequality, we have

$$\bigl\langle w-\hat{w}^{k},\alpha_{k}^{*}F(w) \bigr\rangle +\frac{1}{2\gamma}\bigl\| w-w^{k}\bigr\| ^{2} \geq \frac{1}{2\gamma}\bigl\| w-w^{k+1}\bigr\| ^{2}, \quad\forall w\in\mathcal{W}.$$
(44)

Summing the inequality (44) over $$k=0,1,\ldots,t$$, we get

$$\Biggl[ \Biggl( \sum_{k=0}^{t} \alpha_{k}^{*} \Biggr)w- \Biggl(\sum _{k=0}^{t}\alpha_{k}^{*} \hat{w}^{k} \Biggr) \Biggr]^{T}F(w)+ \frac{1}{2\gamma} \bigl\| w-w^{0}\bigr\| ^{2}\geq\frac{1}{2\gamma}\bigl\| w-w^{k+1}\bigr\| ^{2}\geq0,\quad \forall w\in\mathcal{W}.$$

Since $$\sum_{k=0}^{t}\alpha_{k}^{*}/\Gamma_{t}=1$$, $$\hat{w}_{t}$$ is a convex combination of $$\hat{w}^{0},\hat{w}^{1},\ldots,\hat{w}^{t}$$ and thus $$\hat{w}_{t}\in\mathcal{W}$$. By the definitions of $$\Gamma_{t}$$ and $$\hat{w}_{t}$$, we derive

$$(w-\hat{w}_{t})^{T}F(w)+\frac{1}{2\gamma\Gamma_{t}} \bigl\| w-w^{0}\bigr\| ^{2}\geq0.$$

The assertion follows from the above inequality immediately. □

So, we have gotten a worst-case $$O(1/t)$$ convergence rate of the proposed algorithm in the ergodic sense.

## Numerical results

In this section, we illustrate the effectiveness of our algorithm by comparing with some existing algorithms. We denote the proposed method in this paper as ‘NEW’, the alternating projection-based prediction-correction method in  as ‘HLQ’, the inexact parallel splitting augmented Lagrangian method in  as ‘TY’ and the simultaneous method in  as ‘ZHY’. All codes are written in Matlab and run on i3-2100 CPU 3.1GHZ and 3GB memory.

### Multiple-sets split feasibility problem

The multiple-sets split feasibility problem (MSFP) is to find a point in the intersection of a family of closed convex sets in one space such that its image under a certain operator is in the intersection of another family of a closed convex sets in image space. The MSFP plays a significant role in diversified areas, such as image restoration, signal processing, and medical care; see . In the paper, we consider the constrained MSFP in the following form:

$$x^{*}\in\mathcal{X}\cap \Biggl(\bigcap _{i=1}^{t_{1}}C_{i} \Biggr) \quad\mbox{and}\quad Ax^{*}\in\mathcal{Y}\cap \Biggl(\bigcap_{j=1}^{t_{2}}Q_{j} \Biggr) .$$
(45)

Censor et al.  proposed the proximity function to measure the aggregate distance to the involved sets $$C_{i}$$’s and $$Q_{j}$$’s as

$$p(x):=\frac{1}{2}\sum_{i=1}^{t_{1}}a_{i} \bigl\| x-P_{C_{i}}(x)\bigr\| ^{2} +\frac{1}{2}\sum _{j=1}^{t_{2}}b_{j}\bigl\| Ax-P_{Q_{j}}(Ax) \bigr\| ^{2},$$
(46)

where $$a_{i}>0$$ ($$i=1,\ldots,t_{1}$$) and $$b_{j}>0$$ ($$j=1,\ldots,t_{2}$$) are coefficients which can be defined as weights of important attached to sets. Note that the condition $$\sum_{i=1}^{t_{1}}a_{i}+\sum_{j=1}^{t_{2}}b_{j}=1$$ is usually assumed in practice. With the proximity function (46), Censor et al.  proposed the optimization model

$$\min\bigl\{ p(x) \mid x\in\mathcal{X}\bigr\}$$
(47)

to approximate the constrained MSFP and used the following projection gradient method to solve this model:

$$x^{k+1}=P_{\mathcal{X}} \bigl[ x^{k}-s\nabla P \bigl(x^{k}\bigr) \bigr],$$

where the gradient of $$p(x)$$ is as follows:

$$\nabla p(x)=\sum_{i=1}^{t_{1}}a_{i} \bigl(x-P_{C_{i}}(x) \bigr)+\sum_{j=1}^{t_{2}}b_{i}A^{T} \bigl(Ax-P_{Q_{j}}(Ax) \bigr),$$

where $$P_{C_{i}}$$ and $$P_{Q_{j}}$$ is projection mapping onto set $$C_{i}$$ and $$Q_{j}$$, respectively. Zhang et al. changed (45) into the following optimization model with separable structure:

$$\min\bigl\{ \theta_{1}(x)+\theta_{2}(y)\mid Ax-y=0, x\in \mathcal{X}, y\in \mathcal{Y}\bigr\} ,$$
(48)

where

$$\theta_{1}(x)=:\frac{1}{2}\sum_{i=1}^{t_{1}}a_{i} \bigl\| x-P_{C_{i}}(x)\bigr\| ^{2} \quad\mbox{and} \quad\theta_{2}(y)=: \frac{1}{2}\sum_{j=1}^{t_{2}}b_{j} \bigl\| y-P_{Q_{j}}(y)\bigr\| ^{2}.$$

According to the first order optimality condition, the constrained MSFP is equivalent to finding $$w^{*}=(x^{*},y^{*},\lambda^{*})\in\mathcal {W}:=\mathcal{X}\times\mathcal{Y}\times \mathbb {R}^{m}$$ such that, for all $$\tilde{w}=(\tilde{x},\tilde{y},\tilde{\lambda})\in\mathcal{W}$$,

$$\left \{ \begin{array}{@{}l} \langle\tilde{x}-x^{*},f(x^{*})-A^{T}\lambda^{*}\rangle\geq0,\\ \langle\tilde{y}-y^{*},g(y^{*})-\lambda^{*}\rangle\geq0,\\ \langle\tilde{\lambda}-\lambda^{*},Ax^{*}-y^{*}\rangle\geq0, \end{array} \right .$$
(49)

where

$$f(x):=\nabla\theta_{1}(x)=\sum_{i=1}^{t_{1}}a_{i} \bigl(x-P_{C_{i}}(x) \bigr) \quad\mbox{and}\quad g(y):=\nabla\theta_{2}(x)= \sum_{j=1}^{t_{2}}b_{j} \bigl(y-P_{Q_{j}}(y) \bigr).$$

From , Lemma 3, $$f(x)$$ and $$g(y)$$ are Lipschitz continuous on $$\mathcal{X}$$ and $$\mathcal{Y}$$, the Lipschitz constant is $$L_{1}=\sum_{i=1}^{t_{1}}a_{i}$$ and $$L_{2}=\sum_{j=1}^{t_{2}}b_{j}$$, respectively.

Now, we consider the special MSFP problem (45) tested in  with the sets

\begin{aligned}& C_{i}=\bigl\{ x\in \mathbb {R}^{n}\mid \|x-D_{i}\|\leq R_{i}\bigr\} ,\quad i=1,\ldots ,t_{1},\\& Q_{j}=\{y\in \mathbb {R}^{n}\mid L_{j}\leq y\leq U_{j}\},\quad j=1,\ldots,t_{2}, \end{aligned}

where $$D_{i}\in \mathbb {R}^{n}$$ is the center of the ball $$C_{i}$$ and is randomly generated in $$(0,10)$$; $$R_{i}\in \mathbb {R}^{n}$$ is the radius of the ball $$C_{i}$$ and is randomly generated in $$(40,50)$$; $$L_{j}$$ and $$U_{j}$$ are lower and upper bounds of the box set $$Q_{j}$$ and are randomly generated in $$(10,30)$$ and $$(40,80)$$, respectively. The components of the linear operator $$A\in \mathbb {R}^{n\times n}$$ are generated randomly with eigenvalues in $$(10,20)$$. The constraints $$\mathcal{X}$$ and $$\mathcal{Y}$$ in (45) are the sets $$\mathbb {R}^{n}_{+}$$ and $$\mathbb {R}^{n}$$, respectively.

We set also the same initial iterate $$x^{0}=\mathbf{0}$$ and $$y^{0}=\lambda^{0}=\mathbf{1}$$ for HLQ, TY, ZHY, and NEW where 0 and 1 are vectors whose elements are all 0 and 1, respectively. The choice of the involved parameters for the tested methods is as follows: $$\nu=0.95$$, $$H=\beta I$$ with $$\beta=0.0002$$, $$\gamma=1.2$$, $$\mu=1.8$$, $$r_{0}=s_{0}=1$$, and the stopping criterion $$\|\hat{w}^{k}-w^{k}\|\leq \epsilon$$ for HLQ, TY, ZHY, and NEW.

We report the numerical performance of various methods for the MSFP problem with different scenarios of $$t_{1}$$ and $$t_{2}$$ in Table 1. The data in Table 1 exhibits the effectiveness of the proposed method and its superiority to HLQ, TY, and ZHY when the dimensionality of the MSFP or the number of set components is large.

### Traffic equilibrium problems

In this subsection, we apply the proposed method to solve the traffic equilibrium problems with link capacity bounds, which have been well studied in the literature of transportation. Since both the travel cost and the travel disutility are functions of the path flow x, the traffic network equilibrium problem with link capacity is to seek the path flow $$\mathbf{x}^{*}$$:

$$\bigl\langle \mathbf{x}-\mathbf{x}^{*},f\bigl( \mathbf{x}^{*}\bigr)\bigr\rangle \geq0, \quad\forall \mathbf{x}\in\Pi ,$$
(50)

with

$$\Pi= \bigl\{ \mathbf{x}\in \mathbb {R}^{n}:A^{T}\mathbf{x} \leq b \bigr\} ,$$
(51)

where $$\mathbf{x}\in \mathbb {R}^{n}$$ represents the traffic flow on paths, b is the vector indicating the capacities on the links, $$A\in \mathbb {R}^{n\times m}$$ is the path-link indicating matrix, and f is the vector indicating the traffic flows on the links. By introducing the slack variable $$\mathbf{y}\geq0$$, the traffic equilibrium problem (50) is equivalent to

$$\bigl\langle \mathbf{x}-\mathbf{x}^{*},f\bigl(\mathbf{x}^{*} \bigr)\bigr\rangle \geq0,\quad \forall \mathbf{x}\in\Omega ,$$
(52)

with

$$\Omega= \bigl\{ \mathbf{x}\in \mathbb {R}^{n}:A^{T} \mathbf{x}+\mathbf {y}=b,\mathbf{x}\geq0,\mathbf{y}\geq0 \bigr\} ,$$
(53)

which is a special case of VIP with $$g(\mathbf{y})\equiv0$$, $$B=I$$, $$\mathcal{X}=\mathbb {R}^{n}_{+}$$, and $$\mathcal{Y}=\mathbb {R}^{m}_{+}$$. We test particularly Examples 7.4 and 7.5 in  to compare the new method with HLQ  and TY . For Example 7.4, $$n=49$$, $$m=28$$, and $$A\in \mathbb {R}^{49\times28}$$. For Example 7.5, $$n=55$$, $$m=37$$, and $$A\in \mathbb {R}^{55\times37}$$. As recommended in , we choose the parameters in HLQ and TY as $$\nu=0.95$$, $$H=\beta I$$ with $$\beta=1.1$$, $$\gamma=1.85$$, $$\mu =1.25$$, $$r_{0}=1$$, $$s_{0}=1.1$$. In our algorithm, we choose the parameter $$r_{0}=s_{0}=1.25$$. The different parameters were selected to make the related methods to achieve better results. Simultaneously, we choose the same initial iterative points $$x^{0}=\mathbf{1}$$, $$y^{0}=\lambda^{0}=\mathbf{0}$$, and the stopping criterion $$\|\hat{w}^{k}-w^{k}\|\leq\epsilon$$ in all methods.

For Examples 7.4 and 7.5, the numerical results of HLQ, TM, and NEW are reported in Table 2. By No. of iter., No. of f eval. and CPU we denote the number of iterations, the number of function evaluations, and the CPU time in seconds, respectively.

From Table 2, we can see that the proposed method with the new criterion is comparable to HLQ and superior to TY when the capacity $$b=30$$. For the capacity $$b=40$$, Table 2 shows that our method is more effective than HLQ and TY in terms of the number of iterations, number of function evaluations, and the CPU time.

Moreover, we report the optimal link flow generated by the proposed method. As illustrated in , the absolute value of the Lagrange multiplier $$\lambda^{*}$$ actually means the toll that should be charged on the links to avoid congestion. The numerical results of Examples 7.4 and 7.5  with the capacity $$b=40$$ are reported in Tables 3 and 4, respectively. We can see that no toll is charged on the links whose flows are lower than their capacities.

## Conclusions

In this paper, we study an inexact criterion for solving the convex problems and variational inequalities with separable structures. Based on the prediction-correction approach, two correction forms were derived. Preliminary numerical results with MSFP and traffic equilibrium problems indicate that our method is efficient in practice. In addition, the reported numerical experiments for MSFP are confined to artificial randomly generated data. In the future, we shall test the real world split inversion problems with the inexact criterion.

## References

1. 1.

Bertsekas, DP, Tsitsiklis, JN: Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs (1989)

2. 2.

Chen, G, Teboulle, M: A proximal-based decomposition method for convex minimization problems. Math. Program. 64, 81-101 (1994)

3. 3.

Chen, YM, Hager, WW, Yashtini, M, Ye, XJ, Zhang, HC: Bregman operator splitting with variable stepsize for total variation image reconstruction. Comput. Optim. Appl. 54, 317-342 (2013)

4. 4.

Facchinei, F, Pang, JS: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, New York (2003)

5. 5.

Fukushima, M: Application of the alternating direction of multipliers to separable convex programming problems. Comput. Optim. Appl. 2, 93-111 (1992)

6. 6.

Glowinski, R, Le Tallec, P: Augmented Lagrangian and Operator Splitting Methods in Nonlinear Mechanics. SIAM Studies in Applied Mathematics. SIAM, Philadelphia (1989)

7. 7.

Nagurney, A, Zhang, D: Projection Dynamical Systems and Variational Inequalities with Applications. Kluwer Academic, Boston (1996)

8. 8.

Glowinski, R, Marrocco, A: Sur l’approximation par éléments finis d’ordre un et la résolution par pénalisation-dualité d’une classe de problèmes de Dirichlet nonlinéaires. Rev. Fr. Autom. Inform. Rech. Opér., Anal. Numér. 2, 41-76 (1975)

9. 9.

He, BS, Liao, LZ, Han, DR, Yang, H: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92(1), 103-118 (2002)

10. 10.

Martinet, B: Régularisation d’inéquations variationnelles par approximations successives. Rev. Fr. Inform. Rech. Opér. 4, 154-159 (1970)

11. 11.

He, BS: Parallel splitting augmented Lagrangian method for solving for monotone structured variational inequalities. Comput. Optim. Appl. 42, 195-212 (2009)

12. 12.

Tao, M, Yuan, XM: An inexact parallel splitting augmented Lagrangian method for monotone variational inequalities with separable structures. Comput. Optim. Appl. 52, 439-461 (2012)

13. 13.

Zhang, WX, Han, DR, Yuan, XM: An efficient simultaneous method for the constrained multiple-sets split feasibility problem. Comput. Optim. Appl. 52, 825-843 (2012)

14. 14.

Cai, XJ, Gu, GY, He, BS: On the $$O(1/t)$$ convergence rate of the projection and contraction methods for variational inequalities with Lipschitz continuous monotone operators. Comput. Optim. Appl. 57, 339-363 (2014)

15. 15.

He, BS, Liao, LZ, Qian, MJ: Alternating projection based prediction-correction methods for structured variational inequalities. J. Comput. Math. 24, 693-710 (2006)

16. 16.

Byrne, C: A unified treatment of some iterative algorithms in signal processing and image reconstruction. Inverse Probl. 20, 103-120 (2004)

17. 17.

Censor, Y, Elfving, T, Kopf, N, Bortfeld, T: The multiple-sets split feasibility problem and its appplications for inverse problems. Inverse Probl. 21, 2071-2084 (2005)

18. 18.

Censor, Y, Bortfeld, T, Martin, B, Trofimov, A: A unified approach for inversion problems in intensity-modulated radiation therapy. Phys. Med. Biol. 51, 2353-2365 (2006)

## Acknowledgements

The authors thank anonymous referees for valuable comments and suggestions, which help to improve the manuscript. This research was supported by the National Natural Science Foundation of China (Grants: 11171362, 11301567 and 11401058), Specialized Research Fund for the Doctoral Program of Higher Education (Grant number: 20120191110031), and the Fundamental Research Funds for the central universities (Grant: CDJXS12101104).

## Author information

Authors

### Corresponding author

Correspondence to Xipeng Kou.

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

XPK and SJL organized and wrote this paper. XYW examined all the steps of the proofs in this research and gave some advice. All authors read and approved the final manuscript.

## Rights and permissions

Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Reprints and Permissions 