In this section, we establish a thresholding representation theory of the problem \((QP_{a}^{\lambda})\), which underlies the algorithm to be proposed. Then an iterative fraction thresholding algorithm (IFTA) is proposed to solve the problem \((QP_{a}^{\lambda})\) for all \(a>0\).
3.1 Thresholding representation theory
For any fixed positive parameters \(\lambda>0\), \(\mu>0\), \(a>0\) and \(x\in\mathbb{R}^{n}\), let
$$ C_{1}(x)= \bigl\Vert F(x)x-b \bigr\Vert _{2}^{2}+\lambda P_{a}(x) $$
(19)
and
$$ C_{2}(x, y)=\mu \bigl\Vert F(y)x-b \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x)-\mu \bigl\Vert F(y)x-F(y)y \bigr\Vert _{2}^{2}+ \Vert x-y \Vert _{2}^{2}. $$
(20)
It is clear that \(C_{2}(x,x)=\mu C_{1}(x)\) for all \(\mu>0\).
Theorem 1
For any
\(\lambda>0\)
and
\(0<\mu<L_{\ast}^{-1}\)
with
\(\Vert F(x^{\ast })x-F(x^{\ast})x^{\ast} \Vert _{2}^{2}\leq L_{\ast} \Vert x-x^{\ast} \Vert _{2}^{2}\). If
\(x^{\ast}\)
is the optimal solution of
\(\min_{x\in\mathbb {R}^{n}}C_{1}(x)\), then
\(x^{\ast}\)
is also the optimal solution of
\(\min_{x\in\mathbb{R}^{n}}C_{2}(x,x^{\ast})\), that is,
$$C_{2}\bigl(x^{\ast},x^{\ast}\bigr)\leq C_{2} \bigl(x,x^{\ast}\bigr) $$
for any
\(x\in\mathbb{R}^{n}\).
Proof
By the definition of \(C_{2}(x, y)\), we have
$$\begin{aligned} C_{2}\bigl(x,x^{\ast}\bigr)&=\mu \bigl\Vert F \bigl(x^{\ast}\bigr)x-b \bigr\Vert _{2}^{2}+\lambda \mu P_{a}(x)-\mu \bigl\Vert F\bigl(x^{\ast}\bigr)x-F \bigl(x^{\ast}\bigr)x^{\ast} \bigr\Vert _{2}^{2}+ \bigl\Vert x-x^{\ast} \bigr\Vert _{2}^{2} \\ &\geq\mu \bigl\Vert F\bigl(x^{\ast}\bigr)x-b \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x) \\ &\geq \mu C_{1}\bigl(x^{\ast}\bigr) \\ &=C_{2}\bigl(x^{\ast},x^{\ast}\bigr). \end{aligned}$$
□
Theorem 2
For any
\(\lambda>0\), \(\mu>0\)
and solution
\(x^{\ast}\)
of
\(\min_{x\in\mathbb{R}^{n}}C_{1}(x)\), \(\min_{x\in\mathbb {R}^{n}}C_{2}(x,x^{\ast})\)
is equivalent to
$$ \min_{x\in\mathbb{R}^{n}} \bigl\{ \bigl\Vert x-B_{\mu}\bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}+ \lambda\mu P_{a}(x) \bigr\} , $$
(21)
where
\(B_{\mu}(x^{\ast})=x^{\ast}+\mu F(x^{\ast})^{\top }(b-F(x^{\ast})x^{\ast})\).
Proof
By the definition, \(C_{2}(x,y)\) can be rewritten as
$$\begin{aligned} C_{2}\bigl(x,x^{\ast}\bigr)={}& \bigl\Vert x- \bigl(x^{\ast}-\mu F\bigl(x^{\ast}\bigr)^{\top}F \bigl(x^{\ast }\bigr)x^{\ast}+\mu F\bigl(x^{\ast} \bigr)^{\top}b\bigr) \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x)+\mu \Vert b \Vert _{2}^{2}+ \bigl\Vert x^{\ast} \bigr\Vert _{2}^{2} \\ &{}-\mu \bigl\Vert F\bigl(x^{\ast}\bigr)x^{\ast} \bigr\Vert _{2}^{2}- \bigl\Vert x^{\ast}-\mu F \bigl(x^{\ast }\bigr)^{\top}F\bigl(x^{\ast} \bigr)x^{\ast}+\mu F\bigl(x^{\ast}\bigr)^{\top}b \bigr\Vert _{2}^{2} \\ ={}& \bigl\Vert x-B_{\mu}\bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x)+\mu \Vert b \Vert _{2}^{2}+ \bigl\Vert x^{\ast} \bigr\Vert _{2}^{2}-\mu \bigl\Vert F\bigl(x^{\ast} \bigr)x^{\ast} \bigr\Vert _{2}^{2}- \bigl\Vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}, \end{aligned}$$
which implies that \(\min_{x\in\mathbb{R}^{n}}C_{2}(x,x^{\ast})\) for any \(\lambda>0\), \(\mu>0\) is equivalent to
$$\min_{x\in\mathbb{R}^{n}} \bigl\{ \bigl\Vert x-B_{\mu} \bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x) \bigr\} . $$
□
Combining Theorem 2, Theorem 1 and Lemma 1, the thresholding representation of \((QP_{a}^{\lambda})\) can be concluded by
$$ x^{\ast}=G_{a,\lambda\mu}\bigl(B_{\mu} \bigl(x^{\ast}\bigr)\bigr), $$
(22)
where the operator \(G_{a,\lambda\mu}\) is defined in Definition 1 and obtained by replacing λ with λμ. With the thresholding representations (22), the IFTA for solving the regularization problem \((QP_{a}^{\lambda})\) can be naturally defined as
$$ x^{k+1}=G_{a, \lambda\mu}\bigl(B_{\mu} \bigl(x^{k}\bigr)\bigr),\quad k=0,1,2,\ldots, $$
(23)
where \(B_{\mu}(x^{k})=x^{k}+\mu F(x^{k})^{\top}(b-F(x^{k})x^{k})\).
3.2 Adjusting the values for the regularization parameter \(\lambda>0\)
In this subsection, the cross-validation method (see [9, 10, 12]) is accepted to automatically adjust the value for the regularization parameter \(\lambda>0\). In other words, when some prior information is known for a regularization problem, this selection is more reasonable and intelligent. Suppose that the vector \(x^{\ast}\) of sparsity r is the optimal solution of the regularization problem \((QP_{a}^{\lambda})\), and without loss of generality, set
$$\bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{1}\geq \bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{2}\geq\cdots\geq \bigl\vert (B_{\mu} \bigl(x^{\ast}\bigr) \bigr\vert _{r}\geq \bigl\vert (B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{r+1}\geq \cdots \geq \bigl\vert (B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{n}\geq0. $$
Then it follows from (14) that
$$\begin{aligned} &\bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{i}>t_{a,\lambda\mu}^{\ast}\quad\Leftrightarrow\quad i\in\{1,2,\ldots,r \}, \\ &\bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{i}\leq t_{a,\lambda\mu}^{\ast}\quad\Leftrightarrow\quad i\in\{r+1,r+2, \ldots,n\}, \end{aligned}$$
where \(t_{a,\lambda\mu}^{\ast}\) is obtained by replacing λ with λμ in \(t_{a,\lambda}^{\ast}\).
By \(t_{a,\lambda\mu}^{2}\leq t_{a,\lambda\mu}^{1}\), we have
$$ \textstyle\begin{cases} \vert B_{\mu}(x^{\ast}) \vert _{r}\geq t_{a,\lambda\mu}^{\ast}\geq t_{a,\lambda\mu}^{2}=\sqrt{\lambda\mu}-\frac{1}{2a}; \\ \vert B_{\mu}(x^{\ast}) \vert _{r+1}< t_{a,\lambda\mu}^{\ast}\leq t_{a,\lambda \mu}^{1}=\frac{\lambda\mu}{2}a. \end{cases} $$
(24)
It follows that
$$ \frac{2 \vert B_{\mu}(x^{\ast}) \vert _{r+1}}{a\mu}\leq\lambda\leq\frac {(2a \vert B_{\mu}(x^{\ast}) \vert _{r}+1)^{2}}{4a^{2}\mu}. $$
(25)
From (25), we obtain
$$\lambda\in \biggl[\frac{2 \vert B_{\mu}(x^{\ast}) \vert _{r+1}}{a\mu}, \frac {(2a \vert B_{\mu}(x^{\ast}) \vert _{r}+1)^{2}}{4a^{2}\mu} \biggr]. $$
We denote by \(\lambda_{1}\) and \(\lambda_{2}\) the left and the right of the above interval, respectively:
$$\lambda_{1}=\frac{2 \vert B_{\mu}(x^{\ast}) \vert _{r+1}}{a\mu}\quad \text{and}\quad \lambda_{2}= \frac{(2a \vert B_{\mu}(x^{\ast }) \vert _{r}+1)^{2}}{4a^{2}\mu}. $$
A choice of λ is
$$\lambda=\textstyle\begin{cases} \lambda_{1} & \text{if } \lambda_{1}\leq\frac{1}{a^{2}\mu }; \\ \lambda_{2} & \text{if } \lambda_{1}>\frac{1}{a^{2}\mu}. \end{cases} $$
Since \(x^{\ast}\) is unknown, and \(x^{k}\) is the best available approximation to \(x^{\ast}\), so we can take
$$ \lambda=\textstyle\begin{cases} \lambda_{1,k}=\frac{2 \vert B_{\mu}(x^{k}) \vert _{r+1}}{a\mu} & \text{if } \lambda_{1,k}\leq\frac{1}{a^{2}\mu}; \\ \lambda_{2,k}=\frac{(2a \vert B_{\mu}(x^{k}) \vert _{r}+1)^{2}}{4a^{2}\mu} & \text{if } \lambda_{1,k}>\frac{1}{a^{2}\mu}, \end{cases} $$
(26)
in the kth iteration. That is, (26) can be used to automatically adjust the value of the regularization parameter \(\lambda >0\) during iteration.
Remark 1
Notice that (26) is valid for any \(\mu>0\) satisfying \(0<\mu \leq \Vert F(x_{k}) \Vert _{2}^{-2}\). In general, we can take \(\mu=\mu _{k}=\frac{1-\epsilon}{ \Vert F(x_{k}) \Vert _{2}^{2}}\) with any small \(\epsilon\in(0,1)\) below. Especially, the threshold value is \(t_{a,\lambda\mu}^{\ast}=\frac{\lambda\mu}{2}a\) when \(\lambda=\lambda_{1,k}\), and \(t_{a,\lambda\mu}^{\ast}=\sqrt{\lambda\mu}-\frac{1}{2a}\) when \(\lambda=\lambda_{2,k}\).
3.3 Iterative fraction thresholding algorithm (IFTA)
Based on the thresholding representation (23) and the analyses given in Sect. 3.2, the proposed iterative fraction thresholding algorithm (IFTA) for regularization problem \((QP_{a}^{\lambda})\) can be naturally described in Algorithm 1.
Remark 2
The convergence of IFTA is not proved theoretically in this paper, and this is our future work.