- Research
- Open access
- Published:
Efficiency of orthogonal super greedy algorithm under the restricted isometry property
Journal of Inequalities and Applications volume 2019, Article number: 124 (2019)
Abstract
We investigate the efficiency of orthogonal super greedy algorithm (OSGA) for sparse recovery and approximation under the restricted isometry property (RIP). We first show that under the RIP conditions of the measurement matrix Φ and the minimum magnitude of the nonzero coordinates of the signal, for \(l_{2}\) bounded or \(l_{\infty }\) bounded noise vector e, with explicit stopping rules, OSGA can recover the support of an arbitrary K-sparse signal x from \(y=\varPhi x +e\) in at most K steps. Then, we investigate the error performance of OSGA in m term approximation with regards to dictionaries satisfying the RIP in a separable Hilbert space. We establish a Lebesgue-type inequality for OSGA. Based on this inequality, we obtain the optimal rate of convergence for the sparse class induced by such dictionaries.
1 Introduction
Recovery and approximation by sparse linear combination of elements from a fixed redundant family is frequently utilized in many application areas, such as image or signal processing, PDE solvers and statistical learning, see [1]. In general, these problems are N-P hard. It is well known that greedy type algorithms are efficient approaches to solve them, see [2,3,4,5]. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. OGA is a simple yet powerful algorithm for highly nonlinear sparse approximation that has seen a large amount of research over its history, see [3, 6, 7] and the reference therein.
In this paper, we consider the orthogonal super greedy algorithm (OSGA), which is more efficient than OGA from the viewpoint of computational complexity. The performances of (OSGA) in greedy approximation and signal recovering were analyzed in [8,9,10,11,12]. We will make further study on the efficiency of the OSGA from two aspects.
In Sect. 2, we will study the efficiency of the OSGA in recovering N-dimensional sparse signal from linear measurements. This topic is also known in the literature as compressed sensing (CS), [13,14,15]. We show that OSGA can recover a K-sparse signal from noisy measurements in at most K steps. We remark that in the field of signal processing, orthogonal super greedy algorithm OSGA is also known as orthogonal multi matching pursuit (OMMP) [4]. So for the reader’s convenience, we will use the term OMMP instead of OSGA in Sect. 2.
In Sect. 3, we study the error performance of the OSGA in the general context. We investigate the efficiency of OSGA for m-term approximation with regard to dictionaries in a real separable Hilbert space H. Assuming that the dictionary satisfies the RIP condition, we establish a Lebesgue inequality for OSGA which bounds the error of OSGA by the best m-term approximation error. Based on this inequality, we derive the sharp convergence rate of OSGA on the sparse class induced by dictionary satisfying the RIP condition.
2 The efficiency of the OMMP in CS
In this section, we analyze the efficiency of the OMMP in compressed sensing.
We consider the following model. Suppose that \(x\in \mathbb{R}^{N}\) is an unknown N-dimensional signal and we wish to recover it from M noisy measurements y given by inner products with fixed vectors, that is,
where Φ is a known \(M\times N \) measurement matrix with \(M\ll N\) and \(e\in \mathbb{R}^{M}\) is a vector of measurement errors. The error vector can either be zero, bounded, or Gaussian noise.
For \(x=(x_{j})_{j=1}^{N}\in \mathbb{R}^{N}\), define
and
The signal \(x\in \mathbb{R}^{N}\) is said to be K-sparse if \(\|x\|_{0}:=|\textrm{supp}(x)|=|\{i: x_{i}\neq 0\}|\leq K< N\). We study the performance of OMMP for the recovery of the support of the K-sparse signal x under model (2.1).
These algorithms recover the K-sparse signal by iteratively constructing the support of it by some greedy principles.
The OMMP is a stepwise forward selection and is easy to implement, it has been used for signal recovery. Now let us recall the definition of OMMP(s) in an algorithmic way (see Algorithm 1).
OSGA(s) generalizes the orthogonal greedy algorithm (OGA) in the sense that it selects multiple s indices in each iteration, it can recover the sparse signal using fewer steps and further reduce the complexity. To investigate the efficiency of the OMMP in CS, we use the restricted isometry property (RIP) condition of Φ which ensures the stable recovery of x from noisy measurements. This property is introduced by Candes and Tao [16, 17] as follows.
A matrix Φ is said to satisfy RIP of order K if there exists a constant \(\delta \in (0,1)\) such that, for all K-sparse vectors x,
In particular, the minimum of all δ satisfying (2.2) is referred to as an isometry constant \(\delta _{K}\).
We recall some results on the efficiency of OMMP based on RIP. These results concern the estimate of the upper bound of sK order RIP constant \(\delta _{sK}\). In the noiseless case, Liu and Temlyakov [8] proved that \(\delta _{sK}<\frac{\sqrt{s}}{(2+\sqrt{2}) \sqrt{K}}\) is a sufficient condition for OMMP(s) to recover every K-sparse signal in at most K iterations successfully. Then, Wang, Kwon, and Shim [10] relaxed the condition to \(\delta _{sK}<\frac{ \sqrt{s}}{3\sqrt{s}+\sqrt{K}}\). The constant was further relaxed as \(\delta _{sK}<\frac{\sqrt{s}}{2\sqrt{s}+\sqrt{K}}\) by Satpathi, Das, and Chakraborty [18]. In [18] the authors also pointed out the possibility of extending the results to the noisy case. On the other hand, Dan Wei [9] defined the restricted orthogonality constant \(\theta _{K,K'}\) and proved that \(\delta _{sK-s+1}+\sqrt{\frac{K}{s}}\theta _{sK-s+1,s}<1\) is a sufficient condition guaranteeing the perfect recovery of K-sparse signals by OMMP. She shows that this result implies the result of [18]. Moreover in [9] the performance of OMMP for support recovery from noisy measurements was also discussed.
Motivated by the above works, we investigate the OMMP in the general setting where noise is presented under the RIP based conditions. We prove if the sampling matrix Φ satisfies the RIP of order sK with the isometry constant
then OMMP can recover every K-sparse signal in at most K steps with the \(l_{2}\) bounded and \(l_{\infty }\) bounded noise. It is easy to check that
So our results improve and extend the results in [18].
When dealing with the noisy measurements, one of the key components in OMMP is the stopping rule which depends on the structure of the noise. In general, there are several natural stopping rules for OMMP.
-
(1)
Stop after a fixed number of iterations K.
-
(2)
For \(l_{2}\) bounded noise case: The error vector e is bounded in \(l_{2}\) norm with \(\|e\|_{2}\leq b_{2}\), we set the stopping rule as \(\|r^{k}\|_{2}\leq b_{2}\).
-
(3)
For \(l_{\infty }\) bounded noise case: The error vector e is bounded in \(l_{\infty }\) norm with \(\|\varPhi ^{\ast }e\|_{\infty }\leq b _{\infty }\), we set the stopping rule as
$$ \bigl\Vert \varPhi ^{\ast }r^{k} \bigr\Vert _{\infty } \leq \biggl(1+\frac{\sqrt{sK}\delta _{sK}}{1-\delta _{sK}} \biggr)b_{\infty }, $$where \(\varPhi ^{\ast }\) is the transpose of Φ.
Since we mainly consider \(l_{2}\) bounded noise and \(l_{\infty }\) bounded noise, we use stopping rules (2) and (3) respectively. In Sect. 2.1, some notations are introduced and some consequences of restricted isometry properties are presented. In Sect. 2.2, we give the RIP based sufficient conditions for OMMP(s) with \(l_{2}\) bounded noise case by using the stopping rule (2). In Sect. 2.3, we establish the results in the \(l_{\infty }\) bounded noise case by using the stopping rule (3). In Sect. 2.4, we give the RIP based sufficient conditions for OMMP(s) with high probability in the Gaussian noise case.
2.1 Preliminary
The following notations will be used in this section. For a signal \(x\in \mathbb{R}^{N}\), let \(\varOmega =\{1,\ldots , N\}\) denote the index set of x and \(T=\mathrm{supp}(x)=\{i: x_{i}\neq 0\}\) the support of it.
Suppose that \(\phi _{1},\phi _{2},\ldots,\phi _{N}\) are the columns of the matrix Φ. We always assume \(\|\phi _{i}\|_{2}=1\), \(1\leq i\leq N\). Suppose that Λ is a subset of Ω, let \(\varPhi _{\varLambda }\) denote the matrix Φ restricted to the columns indexed by Λ. We use the same way to define \(x_{\varLambda }\) for the vector \(x\in \mathbb{R}^{N}\). Thus
We recall some useful consequences of the RIP.
Lemma 2.1
([5])
Suppose that Γ and Λ are two disjoint sets of indices. If the matrix Φ satisfies the RIP of order \(|\varGamma \cup \varLambda |\) with isometry constant \(\delta _{|\varGamma \cup \varLambda |}\), then for any \(x\in \mathbb{R}^{| \varLambda |}\) we have
and
Lemma 2.2
([6])
Let \(x\in \mathbb{R}^{N}\) be a K-sparse vector. Suppose that the matrix Φ satisfies the RIP of order K with isometry constant \(\delta _{K}\). Then, for \(\varLambda \subset \varOmega \),
where \(\varPhi ^{\dagger }=(\varPhi ^{\ast }\varPhi )^{-1}\varPhi ^{\ast }\) is the Moore–Penrose pseudo-inverse of Φ.
The residual vector after k steps can be written as
where \(P_{k}=\varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{\dagger }\) denotes the projection onto the linear space spanned by \(\{\phi _{i}, i\in \varLambda ^{k}\}\), \(r_{1}^{k}=(I-P_{k})\varPhi x \) is the signal part of the residual, and \(r_{2}^{k}=(I-P_{k})e\) is the noise part of the residual.
Denote:
Suppose that \(S_{k}\subseteq \varOmega \setminus B_{k}\), \(|S_{k}|=s\), such that
The difference between OMMP(s) and the standard orthogonal matching pursuit (OMP) is that at each step of the OMMP(s), s elements are simultaneously selected from a dictionary instead of one. For a sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration, we need to give a lower bound of \(M_{S_{k}}-m_{S_{k}}\). So we provide the following lemma which plays an important role in our analysis.
Lemma 2.3
Let \(m_{S_{k}}:=\min_{i\in S_{k}} \{ |\langle r_{1}^{k},\phi _{i} \rangle |\}\) and \(M_{A_{k}}:=\max_{i\in A_{k}} \{|\langle r_{1}^{k},\phi _{i} \rangle |\}\). Assume \(A_{k}\neq \emptyset \). Then there hold
and
Proof
Using the fact \((I-P_{k})\varPhi _{\varLambda ^{k}}=\varPhi _{\varLambda ^{k}}- \varPhi _{\varLambda ^{k}}(\varPhi _{\varLambda ^{k}}^{\ast }\varPhi _{\varLambda ^{k}})^{-1} \varPhi _{\varLambda ^{k}}^{\ast }\varPhi _{\varLambda ^{k}}=0\), we express
as
Since \(\varLambda ^{k}\cap A_{k}=\emptyset \), applying (2.5) and (2.4) of Lemma 2.1, and the fact that for any two integers \(K\leq K'\) it holds
it follows that
By (2.9), (2.11), and Lemma 2.1, we have
Thus
Using (2.9) and Lemma 2.2, we obtain
Hence
Therefore Lemma 2.3 is proved. □
2.2 \(l_{2}\) bounded noise
We shall discuss the performance of OMMP(s) for the \(l_{2}\) bounded noise case. That is, the error vector e is bounded in \(l_{2}\) norm with \(\|e\|_{2}\leq b_{2}\) for some constant \(b_{2}>0\). With a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of the K-sparse signal of x, OMMP(s) can recover the support of x in at most K iterations.
Theorem 2.1
Suppose that \(\|e\|_{2}\leq b_{2}\) and Φ satisfies the RIP of order sK with the isometry constant \(\delta :=\delta _{sK}<1+ \frac{1}{2}\sqrt{\frac{K}{s}}-\frac{1}{2}\sqrt{\frac{K}{s}+4\sqrt{ \frac{K}{s}}}\). Then OMMP(s) with the stopping rule \(\|r^{k}\|_{2} \leq b_{2}\) recovers the support of any K-sparse signal \(x\in \mathbb{R}^{N}\) from \(y=\varPhi x +e\) provided that all the nonzero components \(x_{i}\) satisfy
Proof
Denote \(E_{k}=\max_{i\in \varOmega } \{|\langle r_{2}^{k},\phi _{i} \rangle |\}\). Recalling the definition of \(m_{S_{k}}\) and \(M_{A_{k}}\) in Lemma 2.3, there hold
and
It is clear that in order for OMMP(s) to select at least one correct index at this step, it is necessary to have
A sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration until all correct indices are selected is
That is,
holds for \(k=0,1,\ldots ,K-1\). We first provide an upper bound for \(E_{k}\). The assumption \(\|e\|\leq b_{2}\) yields that
Next we give a lower bound of \(M_{A_{k}}- m_{S_{k}}\). Using Lemma 2.3 and assumption (2.14), we have
This means OMMP(s) will select at least one correct index at the \(k+1\)-th iteration, and OMMP(s) can recover the support of x in at most K iterations. It remains to show that under the stopping rule \(\|r^{k}\|\leq b_{2}\), OMMP exactly stops when all the correct indices are selected.
First, assume that \(A_{k}=\emptyset \), then \(T\subseteq \varLambda ^{k}\) and \((I-P_{k})\varPhi x=0\). Thus
So when all the K correct indices are selected, the \(l_{2}\) norm of the residual will be less than \(b_{2}\) and hence OMMP(s) stops. Now we show that OMMP(s) does not stop early. Suppose that the OMMP(s) has run k steps for some \(k< K\). We will verify that \(\|r^{k}\|_{2}>b _{2}\) and so OMMP(s) does not stop at the current step.
Secondly, assume that \(A_{k}\neq \emptyset \), then
where \(\overline{x}^{(k)}= (x_{A_{k}}\ -\varPhi _{\varLambda ^{k}} ^{\dagger }\varPhi _{A_{k}}x_{A_{k}} )^{\ast }\), and to derive the third inequality, we use the RIP condition. Hence OMMP(s) does not stop early, and the theorem is proved. □
2.3 \(l_{\infty }\) bounded noise
We now give the RIP based sufficient conditions for OMMP(s) with \(l_{\infty }\) bounded noise. Our result is the following theorem.
Theorem 2.2
Suppose that \(\|\varPhi ^{\ast }e\|_{\infty }\leq b_{\infty }\) and Φ satisfies the RIP of order sK with the same isometry constant as in Theorem 2.1. Then OMMP(s) with the stopping rule
finds the support of any K-sparse signal x if all the nonzero coefficients \(x_{i}\) satisfy
Proof
Denote \(E_{k}=\|\varPhi ^{\ast }(I-P_{k})e\|_{\infty }=\max_{i\in \varOmega } |\langle r_{2}^{k},\phi _{i} \rangle |\). We first prove that
holds for \(k=0,1,\ldots ,K-1\), where \(m_{S_{k}}\) and \(M_{A_{k}}\) are defined as in Lemma 2.3. It follows from the proof of Theorem 2.1 that inequality (2.18) can ensure OMMP recovers the true support of x in at most K steps.
We first provide an upper bound of \(E_{k}\). The assumption \(\|\varPhi ^{\ast }e\|_{\infty }\leq b_{\infty }\) and the fact \(\varPhi _{\varLambda ^{k}}^{\ast }(I-P_{k})e=0\) yield that
Here, to derive the third and fourth inequalities, we have used (2.4) and (2.5) of Lemma 2.1 respectively.
On the other hand, we can obtain a lower bound of \(M_{A_{k}}- m_{S _{k}}\). By using Lemma 2.3 and assumption (2.17), we have
The bounds (2.19) and (2.20) imply that (2.18) holds for \(k=0,1,\ldots ,K-1\). It remains to show that OMMP (s) exactly stops when all the correct indices are selected under the stopping rule \(\|\varPhi ^{\ast }r^{k}\|_{\infty }\leq (1+\frac{\sqrt{sK}\delta _{sK}}{1- \delta _{sK}})b_{\infty }\).
First, assume that \(A_{k}=\emptyset \), then \(T\subseteq \varLambda ^{k}\) and \((I-P_{k})\varPhi x=0\). Thus
where the last inequality is deduced from (2.19). Hence, OMMP(s) stops when all the correct indices are selected.
Secondly, assume that \(A_{k}\neq \emptyset \), then
Here, to derive the third and fourth inequalities, we have used (2.8) of Lemma 2.3 and assumption (2.17), respectively. Hence OMMP(s) does not stop early, and the theorem is proved. □
2.4 Gaussian noise
As an application of the above results on \(l_{2}\) and \(l_{\infty }\) bounded noise cases, we shall discuss the performance of OMMP(s) on recovering K-sparse signals with the Gaussian noise. Gaussian noise is of particular interest in statistics since it is probably the best simulation of real noise when the noise source is particularly complex, cf. [19, 20].
The motivation of applying the results on the \(l_{2}\) bounded and \(l_{\infty }\) bounded noise cases to the Gaussian noise case comes from the following results of Cai, Xu, and Zhang [21], which shows that the Gaussian noise e is essentially bounded in both \(l_{2}\) and \(l_{\infty }\) norms. They have shown that if e is zero-mean white Gaussian noise with covariance \(\sigma ^{2}I_{M\times M}\), that is, \(e\sim N(0,\sigma ^{2}I_{M\times M})\), then there hold
and
Combining the above results with those of Theorem 2.1 and Theorem 2.2, we immediately obtain the following theorems.
Theorem 2.3
Suppose that \(e\sim N(0,\sigma ^{2}I_{M\times M})\) and Φ satisfies the RIP of order sK with the same isometry constant as in Theorem 2.1 and nonzero components \(x_{i}\) satisfy
Then OMMP(s) with the stopping rule \(\|r^{k}\|_{2}\leq \sigma \sqrt{M+2\sqrt{M \log M}}\) finds the support of any K-sparse signal \(x\in \mathbb{R} ^{N}\) from \(y=\varPhi x +e\) with probability at least \(1-\frac{1}{M}\).
Theorem 2.4
Suppose that \(e\sim N(0,\sigma ^{2}I_{M\times M})\) and Φ satisfies the RIP of order sK with the same isometry constant as in Theorem 2.1 and nonzero components \(x_{i}\) satisfy
Then OMMP(s) with the stopping rule \(\|\varPhi ^{\ast }r^{k}\|_{\infty }\leq (1+\frac{\sqrt{sK}\delta }{1-\delta })\sigma \sqrt{2\log N}\) finds the support of any K-sparse signal \(x\in \mathbb{R}^{N}\) from \(y=\varPhi x +e\) with probability at least \(1-\frac{1}{2\sqrt{\pi \log N}}\).
Theorem 2.3 and Theorem 2.4 show that, with a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of the K-sparse signal of x, OMMP(s) can recover the support of x in at most K iterations with high probability from measurements with Gaussian noise.
3 Error performance of the OSGA
In this section, we study the error performance of the OSGA in the general context. Let H be a real separable Hilbert space with an inner product \(\langle \cdot , \cdot \rangle \) and the norm \(\|x\|:=\langle x , x\rangle ^{1/2}\) for all \(x\in \mathit{H}\). Recall that a set \(\mathcal{D}\) from H is called a dictionary if
Without loss of generality we shall assume that the dictionary \(\mathcal{D}\) is normalized, i.e.,
The standard measure of approximation power of a dictionary is the error of the best m-term approximation. Given a dictionary \(\mathcal{D}\), we define the m-sparse class as
The error of best m-term approximation to a function \(f\in H \) from the dictionary \(\mathcal{D}\) is defined as
Now we will use OSGA(s) to generate a m term approximant \(f_{m}\) and estimate the error \(\|f-f_{m}\|\) in terms of \(\sigma _{m}(f)\).
So let us first recall the definition of the OSGA(s) (see Algorithm 2).
To investigate the error performance of SOGA, one needs to make some assumptions on the dictionary \(\mathcal{D}\). A simple and useful assumption is the coherence μ of a dictionary \(\mathcal{D}\) defined by
Dictionaries with coherence μ are called μ-coherent. It was proved in [8] that for μ-coherent dictionaries the OSGA provides the same (in the sense of order) upper bound on the sparse class for the error as the OGA. In this paper, we will improve this results under a weaker assumption on a dictionary. This assumption is RIP which is formulated in a finite dimensional context in Sect. 2. Here we require the definition of RIP in terms of the dictionary instead of the measurement matrix in a infinite dimensional context:
A dictionary \(\mathcal{D}\) is said to satisfy RIP of order k if there exists a constant \(\delta \in (0,1)\) such that, for any subset \(J\subset I\) with \(\sharp (J)\leq k\) and any scalars \(a_{i}\), \(i\in J\), the following inequalities hold:
As before, the minimum of all δ satisfying (3.2) is referred to as an isometry constant \(\delta _{k}\).
It is known from [22] that if the dictionary \(\mathcal{D}\) has coherence μ, then it satisfies RIP of order k for \(k\leq \mu ^{-1}+1\) with isometry constant \(\delta _{k}\leq \mu (k-1)\), but not vice versa.
Recently, under the RIP assumption, the authors in [7] proved the almost optimality of OGA by establishing the corresponding Lebesgue-type inequality. This inequality quantifies the efficiency of OGA for individual elements. It is natural to ask if we can establish Lebesgue-type inequality for OSGA(s). We give a positive answer to this question. In fact, we prove the following inequality for the OSGA(s) when the dictionary \(\mathcal{D}\) satisfies RIP. This is the first Lebesgue-type inequality for the super greedy type algorithms obtained so far as we know.
Theorem 3.1
Given parameter \(s\in \mathbb{N}\), there exist fixed constants \(A:=88\), \(\delta ^{\ast }:=1/10\), \(C:=8\) such that the following holds for all \(n\geq 0\): if \(\mathcal{D}\) is a dictionary in a Hilbert space H satisfying RIP of order \((As+1)n\) with isometry constant \(\delta _{(As+1)n}\leq \delta ^{\ast }\), then for any target function \(f\in \mathit{H}\), we have
In Theorem 3.1, we remark that the values of A, \(\delta ^{ \ast }\), C for which the above result holds are coupled. For example, it is possible to obtain a smaller value of C at the price of a larger value of A or of a smaller value of \(\delta ^{\ast }\).
Based on this inequality, we will derive the convergence rate of OSGA on the sparse class induced by dictionaries satisfying RIP conditions.
The rest part of this section is organized as follows. The Sects. 3.1 and 3.2 are devoted to the proof of Theorem 3.1. In Sect. 3.3, as an application of Theorem 3.1, we study the rate of convergence of OSGA on the sparse class induced by RIP dictionaries.
3.1 Reduction of the residual
To prove Theorem 3.1, we establish the following lemma which quantifies the reduction of the residuals generated by the OSGA(s) under the RIP condition. When \(s=1\), this lemma was proved in [7]. However, to deal with the case of \(s>1\), we need some new techniques. We need to establish a new lemma (Lemma 3.1). In what follows, we denote by
the set of indices selected after k iterations of the OSGA(s) applied to the target element \(f\in \mathit{H}\).
Lemma 3.1
Let \((f_{k})_{k\geq 0}\) be the sequence of approximations generated by the OSGA(s) applied to f, and let \(g=\sum_{i\in T}z_{i}\phi _{i}\), \(\sharp (T)\leq n\). Then, if T is not contained in \(\varLambda _{k}\), we have
Lemma 3.1 quantifies the reduction of \(\|r_{k}\|\) at each iteration provided that T is not contained in \(\varLambda _{k}\) and that \(\|r_{k}\| \geq \|f-g\|\). Note that in the case when \(T\subseteq \varLambda _{k}\), we have \(\|r_{k}\| \leq \|f-g\|\).
Proof
We may assume that \(\|r_{k}\|\geq \|f-g\|\), otherwise inequality (3.3) is trivially satisfied. Denote
Then \(H_{k+1}\) is a direct sum of \(H_{k}\) and \(F_{k+1}\). Therefore,
It is clear that \(F_{k+1}\subset H_{k+1}\) implies
Now we estimate \(\|P_{F_{k+1}}(r_{k})\|\). Since the dictionary \(\mathcal{D}\) satisfies RIP of order s with isometry constant \(\delta _{s}\), we have
Thus
Using inequality (3.6), we can continue to estimate (3.4) as
Therefore to prove inequality (3.3), it suffices to prove that
To prove this, we first note that
This inequality can be written as
If we write \(f_{k}=\sum_{i\in \varLambda _{k}}c_{i}^{k}\phi _{i}\), \(\delta :=\delta _{\sharp (T\cup \varLambda _{k})}\), then the denominator of the right-hand side of inequality (3.8) satisfies the RIP
On the other hand, the numerator of the right-hand side of inequality (3.8) satisfies
Therefore we obtain
which implies (3.7). Thus we complete the proof of Lemma 3.1. □
The following proposition is an immediate consequence of Lemma 3.1.
Proposition 3.1
Assume that, for a given integer \(A>0\) and \(\delta ^{\ast }<1\), a dictionary \(\mathcal{D}\) satisfies RIP of order \((As+1)n\) with \(\delta _{(As+1)n}\leq \delta ^{\ast }\). If \(g=\sum_{i\in T} z_{i}\phi _{i}\), \(\sharp (T)\leq n\), then for any triple of non-negative integers \((j,m,L)\) such that \(\sharp (T\cup \varLambda _{j})\leq m\) and \(j+mL \leq (As+1)n\), we have
Proof
By Lemma 3.1, if \(g=\sum_{i\in T} z_{i}\phi _{i}\), \(\sharp (T) \leq n\), then for any triple of non-negative integers \((j,m,L)\) such that \(\sharp (T\cup \varLambda _{j})\leq m\) and \(j+mL\leq (As+1)n\), we have
where we have used the fact that \(\sharp (T\cup \varLambda _{l})\leq m\) for all \(l\geq j\). This implies inequality (3.9). Thus we complete the proof of Proposition 3.1. □
3.2 Lebesgue-type inequality for SOGA
We are in a position to prove Theorem 3.1. Fix \(f\in \mathit{H}\), we first observe that the assertion of the theorem can be derived from the following lemma.
Lemma 3.2
If \(0\leq k< n\) satisfies
then there exists \(k<\tilde{k}\leq n\) such that
Indeed, assuming that Lemma 3.2 holds, we complete the proof of Theorem 3.1 as follows. We let k̃ be the largest integer in \(\{0,1,\ldots ,n\}\) for which \(\|r_{A\tilde{k}}\|\leq 2\sigma _{ \tilde{k}}(f)\). Since
such k̃ exists. If \(\tilde{k}< n \), then we must have \(\sigma _{\tilde{k}}(f)\leq 4\sigma _{n}(f)\), and therefore
We are therefore left with proving Lemma 3.2. For this, we fix
and \(0\leq k< n\) such that (3.10) holds. Let \(k< K\leq n\) be the first integer such that \(\sigma _{k} >4\sigma _{K}\). By the definition of \(\sigma _{K}(f)\), for any \(B>1\), there exists \(g=\sum_{j\in T} z _{j}\psi _{j}\), \(\sharp (T)=K\) such that
The significance of K is that on the one hand
while on the other hand
To apply Proposition 3.1 for the above g and \(j=Ak\), we need to bound \(\sharp (T\cup \varLambda _{Ak})\) with A yet to be specified. To this end, we write \(K=k+M\), with \(M>0\), and notice that if \(S\subset T\) is any set with \(\sharp (S)=M\) and \(g_{S}:=\sum_{j\in S} z _{j}\psi _{j}\), then
where we have used the fact that \(g-g_{S}\in \varSigma _{k}\). Using the RIP, we obtain the following lower bound for the coefficients of g: for any set \(S\subset T\) with \(\sharp (S)=M\),
Take for S the set \(S_{g}\) of the M smallest coefficients of g and note that then, for any more general \(S\subset T\) with \(\sharp (S) \geq M\), one has
and hence
Now we consider the particular set \(S:=T\backslash \varLambda _{Ak}\) satisfying \(\sharp (S)\geq M\). Combining the above bound with the RIP, we obtain
Since \(\delta ^{\ast }=\frac{1}{10}\) this gives the bound
where the second inequality is obtained by taking B sufficiently close to 1.
We proceed now to verifying Lemma 3.2 with \(\tilde{k}=K-1\) when \(K-1> k\) and with \(\tilde{k}=k+1\) otherwise. In the first we can use the estimate in Proposition 3.1 with \(j=Ak\) in combination with (3.15) to deal with the term \(\|r_{Ak}\|\) in (3.9). When \(K=k+1\), however, we cannot bound \(\|r_{Ak}\|\) directly in terms of \(\sigma _{l}(f)\) for some \(l>k\). Accordingly, we use Proposition 3.1 in different ways for the two cases.
In the case where \(K-1> k\), i.e., \(M \geq 2\), we apply (3.9) with \(j=Ak\), \(m=11M\), and \(L=4\). Indeed \(Ak+Lm=Ak+44M\leq An\) holds for \(k+ M\leq n\) whenever \(A\geq 44\). Moreover, notice that for such A
whenever
This gives
where we have used (3.15) in the fourth inequality, and the last inequality follows by taking B sufficiently close to 1. We thus obtain (3.11) for the value \(\tilde{k}=K-1>k\).
In the case where \(K-1= k\), i.e., \(M =1\), we apply (3.9) with \(j=Ak\), \(m=11\), and \(L=8\). From (3.19), we know that \(\sharp (T \cup \varLambda _{Ak})\leq 11\) and \(An\geq A(k+1)\geq Ak+mL\) for A satisfying (3.20). This yields
This implies that \(\varLambda _{A(k+1)}\) contains T. In fact, if it missed one of the indices \(i\in T\), then we derive from the RIP condition
On the other hand, we know from (3.18) that
which for B sufficiently close to 1 is a contradiction since
This implies that \(\|r_{A(k+1)}\|\leq \sigma _{k+1}(f)\), and therefore inequality (3.11) holds for the value \(\tilde{k}=k+1\). This verifies Lemma 3.2 and hence completes the proof of Theorem 3.1.
3.3 Convergence rate on sparse class
As an application of Theorem 3.1, we will derive the convergence rate of SOGA on the sparse class induced by a dictionary. Firstly, we recall the definitions of these classes. For a general dictionary \(\mathcal{D}\) in H, we define the class of functions
and we define \(A_{1}(\mathcal{D})\) to be the closure of \(A_{1}^{0}( \mathcal{D})\) in H. It is well known that the class \(A_{1}( \mathcal{D})\) plays an important role in the study of greedy approximation with respect to dictionaries. In [23], DeVore and Temlyakov proved that, for an arbitrary dictionary \(\mathcal{D}\) in H, the OGA provides, after m iterations, an approximation of \(f\in A_{1}(\mathcal{D})\) with the following upper bound of the convergence rate:
Note that the rate \(m^{-1/2}\) is sharp since when \(\mathcal{D}\) is an ortho-normal basis of H, it is easy to find \(f_{0}\in A_{1}( \mathcal{D})\) such that
Unlike OGA, to get the same convergence rate on \(A_{1}(\mathcal{D})\) for SOGA, one must make some extra assumptions on the dictionary \(\mathcal{D}\). A specific feature of a dictionary, μ-coherence, allows us to build OSGA with the same rate of convergence. Assuming that the dictionary \(\mathcal{D}\) is μ-coherent, Liu and Temlyakov in [8] proved the following theorem.
Theorem 3.2
If \(\mathcal{D}\) is a dictionary in a Hilbert space H with coherence parameter μ. Then, for \(s\leq (2\mu )^{-1}\), OSGA(s) provides, after m iterations, an approximation of \(f\in A_{1}( \mathcal{D})\) with the following upper bound on the error:
Now we present our results. By an immediate consequence of Theorem 3.1, we get the following theorem.
Theorem 3.3
Given parameter \(s\in \mathbb{N}\), for all \(n\geq 0\), if \(\mathcal{D}\) is a dictionary in a Hilbert space H satisfying RIP of order \((88s+1)n\) with isometry constant \(\delta _{(88s+1)n}\leq 1/10\), then OSGA(s) provides, after m iterations, an approximation of \(f\in A_{1}(\mathcal{D})\), with the following error bound:
Proof
As the discussion above, the results of DeVore and Temlyakov [8] imply the following estimate for the best m term approximation error of functions \(f\in A_{1}(\mathcal{D})\):
Combining (3.22) with Theorem 3.1, we complete the proof of Theorem 3.3. □
References
DeVore, R.A.: Nonlinear approximation. Acta Numer. 7, 51–150 (1998)
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition (1993)
Lin, J.H., Li, S.: Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory 165, 20–40 (2013)
Liu, E., Temlyakov, V.N.: Super greedy type algorithm. Adv. Comput. Math. 37, 493–504 (2012)
Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26, 301–321 (2009)
Cai, T., Wang, L.: Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory 57, 4680–4688 (2011)
Cohen, A., Dahmen, W., DeVore, R.A.: Orthogonal matching pursuit under the restricted isometry property. Constr. Approx. 45, 113–127 (2017)
Liu, E., Temlyakov, V.N.: The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory 58, 2040–2047 (2012)
Wei, D.: Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math. 57, 2179–2188 (2014)
Wang, J., Kwon, S., Shim, B.: Generalized orthogonal matching pursuit. IEEE Trans. Signal Process. 60, 6202–6216 (2012)
Wei, X.J., Ye, P.X.: Estimates of restricted isometry constant in super greedy algorithms. Int. J. Future Gener. Commun. Netw. 8, 137–144 (2015)
Yi, S., Li, B., Pan, W.L., Li, J.: Analysis of generalised orthogonal matching pursuit using restricted isometry constant. Electron. Lett. 14, 1020–1022 (2014)
Baraniuk, R.G.: Compressive sensing. IEEE Signal Process. Mag. 24, 118–121 (2007)
Candes, E.J.: Compressive sampling. In: Int. Congress of Mathematics, vol. 3, pp. 1433–1452 (2006)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51, 4203–4215 (2005)
Candes, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52, 5406–5425 (2005)
Satpathi, S., Das, R., Chakraborty, M.: Improving the bound on the rip constant in generalized orthogonal matching pursuit. IEEE Signal Process. Lett. 20, 1074–1077 (2013)
Candes, E.J., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–451 (2004)
Cai, T., Xu, G., Zhang, J.: On recovery of sparse signals via \(l_{1}\) minimization. IEEE Trans. Inf. Theory 55, 3388–3397 (2009)
Tropp, J.A.: Greedy is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50, 2231–2242 (2004)
DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. Adv. Comput. Math. 5, 173–187 (1996)
Acknowledgements
Thanks for all the referenced authors.
Availability of data and materials
Not applicable.
Funding
This work is supported by the National Nature Science Foundation of China [Grant Nos. 11701411, 11271199, 11671213, 11401247].
Author information
Authors and Affiliations
Contributions
The authors contributed equally and significantly in writing this paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Wei, X., Ye, P. Efficiency of orthogonal super greedy algorithm under the restricted isometry property. J Inequal Appl 2019, 124 (2019). https://doi.org/10.1186/s13660-019-2075-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-019-2075-x