Exact recovery of sparse multiple measurement vectors by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{2,p}$\end{d

The joint sparse recovery problem is a generalization of the single measurement vector problem widely studied in compressed sensing. It aims to recover a set of jointly sparse vectors, i.e., those that have nonzero entries concentrated at a common location. Meanwhile \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{p}$\end{document}lp-minimization subject to matrixes is widely used in a large number of algorithms designed for this problem, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{2,p}$\end{document}l2,p-minimization \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{aligned} \min_{X \in\mathbb {R}^{n\times r}} \Vert X \Vert _{2,p}\quad \text{s.t. }AX=B. \end{aligned}$$ \end{document}minX∈Rn×r∥X∥2,ps.t. AX=B. Therefore the main contribution in this paper is two theoretical results about this technique. The first one is proving that in every multiple system of linear equations there exists a constant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p^{\ast}$\end{document}p∗ such that the original unique sparse solution also can be recovered from a minimization in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{p}$\end{document}lp quasi-norm subject to matrixes whenever \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$0< p<p^{\ast}$\end{document}0<p<p∗. The other one is showing an analytic expression of such \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p^{\ast}$\end{document}p∗. Finally, we display the results of one example to confirm the validity of our conclusions, and we use some numerical experiments to show that we increase the efficiency of these algorithms designed for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{2,p}$\end{document}l2,p-minimization by using our results.


Introduction
In sparse information processing, one of the central problems is to recover a sparse solution of an underdetermined linear system, such as visual coding [1], matrix completion [2], source localization [3], and face recognition [4]. That is, letting A be an underdetermined matrix of size m × n, and b ∈ R m is a vector representing some signal, so the single measurement vector (SMV) problem is popularly modeled into the following l 0 -minimization: where x 0 indicates the number of nonzero elements of x. Furthermore, a natural extension of single measurement vector is the joint sparse recovery problem, also known as the multiple measurement vector (MMV) problem, which arises naturally in source localization [3], neuromagnetic imaging [5], and equalization of sparse-communication channels [6]. Instead of a single measurement b, we are given a set of r measurements, Ax (k) = b (k) , k = 1, . . . , r, ( 2 ) in which the vectors x (k) (k = 1, . . . , r) are joint sparse, i.e., the solution vectors share a common support and have nonzero entries concentrated at common locations. For a given vector x, we define a vector function Then the MMV problem can be modeled as the following l 2,0 -minimization problem: where B = [b (1) . . . b (r) ] ∈ R m×r and X row i is defined as the ith row of X.
In this paper, we define the support of X by support(X) = S = {i : X row i 2 = 0} and say that the solution X is k-sparse when |S| ≤ k, and we also say that X can be recovered by l 2,0 -minimization if X is the unique solution of an l 2,0 -minimization problem.
It needs to be emphasized that we cannot regard the solution of multiple measurement vector (MMV) as a combination of several solutions of single measurement vectors, i.e., the solution matrix X to l 2,0 -minimization is not always composed of the solution vectors to l 0 -minimization. If we treat the AX = B = [b 1 b 2 ] as a combination of two single measurement vectors Ax = b 1 and Ax = b 2 , it is easy to verify that each sparse solution to these two problems is x 1 = [0.5 2 0 0] T and x 2 = [0 0 1 2] T . So let X * = [x 1 x 2 ], it is easy to check that X * 2,0 = 4. In fact, it is easy to verify that is the solution to l 2,0 -minimization since X 2,0 = 3 < X * 2,0 = 4. With this simple Example 1, we should be aware that the MMV problem wants a jointly sparse solution, not a solution which is just composed of sparse vectors. Therefore, the MMV problem is more complex than the SMV one and needs its own theoretical work. Be inspired by l p -minimization, a popular approach to find the sparest solution to the MMV problem, which is to solve the following l 2,p -minimization optimization problem: where the mixed norm X p 2,p = n i=1 X row i p 2 . In [7], l 0 -minimization has been proved to be NP-hard because of the discrete and discontinuous nature of x 0 . Therefore, it is obviously NP-hard to solve l 2,0 -minimization too. Due to the fact that X 2,0 = lim p→0 X p 2,p , it seems to be more natural to consider l 2,p -minimization instead of an NP-hard optimization l 2,0 -minimization than others.

Related work
Many researchers have made a lot of contribution related to the existence, uniqueness, and other properties of l 2,p -minimization (see [8][9][10][11]). Eldar [12] gives a sufficient condition for MMV when p = 1, and Unser [13] analyses some properties of the solution to l 2,pminimization when p = 1. Fourcart and Gribonval [9] studied the MMV setting when r = 2 and p = 1; they give a sufficient and necessary condition to judge whether a k-sparse matrix X can be recovered by l 2,p -minimization. Furthermore, Lai and Liu [10] consider the MMV setting when r ≥ 2 and p ∈ [0, 1], they improve the condition in [9] and give a sufficient and necessary condition when r ≥ 2.
On the other hand, numerous algorithms have been proposed and studied for l 2,0minimization (e.g., [14,15]). Orthogonal matching pursuit (OMP) algorithms are extended to the MMV problem [16], and convex optimization formulations with mixed norm extend to the corresponding SMV solution [17]. Hyder [15] provides us a robust algorithm for joint sparse recovery, which shows a clear improvement in both noiseless and noisy environments. Furthermore, there exists a lot of excellent work (see [18][19][20][21][22]) presenting us algorithms designed for l 2,p -minimization. However, it is an important theoretical problem whether there exists a general equivalence relationship between l 2,pminimization and l 2,0 -minimization.
In the case r = 1, Peng [23] has given a definite answer to this theoretical problem. There exists a constant p(A, b) > 0 such that every solution to l p -minimization is also the solution to l 0 -minimization whenever 0 < p < p (A, b).
However, Peng only proves the conclusion when r = 1, so it is urgent to extend this conclusion to the MMV problem. Furthermore, Peng just proves the existence of such p, he does not give us a computable expression of such p. Therefore, the main purpose of this paper is not only to prove the equivalence relationship between l 2,p -minimization and l 2,0 -minimization, but also present an analysis expression of such p in Section 2 and Section 3.

Main contribution
In this paper, we focus on the equivalence relationship between l 2,p -minimization and l 2,0minimization. Furthermore, it is an application problem that an analytic expression of such p * is needed, especially in designing some algorithms for l 2,p -minimization.
In brief, this paper gives answers to two problems which urgently need to be solved: There exists a constant p * such that every k-sparse X can be recovered by l 2,0minimization and l 2,p -minimization whenever 0 < p < p * .
(II). We give such an analytic expression of such p * based on the restricted isometry property (RIP).
Our paper is organized as follows. In Section 2, we present some preliminaries which play a core role in the proof of our main theorem and prove the equivalence relationship between l 2,p -minimization and l 2,0 -minimization. In Section 3 we focus on proving another main result of this paper. There we present an analytic expression of such p * . Finally, we summarize our findings in the last section.

Notation
For convenience, for x ∈ R n , we define its support by support(x) = {i : x i = 0} and the cardinality of set S by |S|. Let Ker(A) = {x ∈ R n : Ax = 0} be the null space of matrix A. We also use the subscript notation x S to denote a vector that is equal to x on the index set S and zero everywhere else and use the subscript notation X S to denote a matrix whose rows are those of the rows of X that are in the set index S and zero everywhere else. Let X col i be the ith column in X, and let X row i be the ith row in X, i.e., X = [X col 1 , X col 2 , . . . , X col r ] = [X row 1 , X row 2 , . . . , X row m ] T for X ∈ R n×r . We use A, B = tr(A T B) and A F = i,j |a ij | 2 .

Equivalence relationship between l 2,p -minimization and l 2,0 -minimization
At the beginning of this section, we introduce a very important property of the measurement matrix A.

Definition 1 ([24]) A matrix
A is said to have the restricted isometry property of order k with restricted isometry constant δ k ∈ (0, 1) if δ k is the smallest constant such that Next, we will introduce another important concept named M-null space constant, and this concept is the key to proving the equivalence relationship between l 2,0 -minimization and l 2,p -minimization.
M-NSC provides us a sufficient and necessary condition of the solution to l 2,0 -minimization and l 2,p -minimization, and it is important for proving the equivalence relationship between these two models. Furthermore, we emphasize a few important properties of h(p, A, r, k).

Proposition 1 For a given matrix A, the M-NSC h(p, A, r, k) defined in Definition 2 is nondecreasing in
Proof The proof is divided into two steps.
We define a function θ (p, X, k) as then it is easy to get that the definition of h(p, A, r, k) is equivalent to For any p ∈ [0, 1], we notice that the function f (t) = t p t (t > 0) is a nonincreasing function. For any j ∈ {k + 1, . . . , n} and i ∈ {1, 2, . . . , k}, we have that We can rewrite inequalities (9) into Therefore, we can get that We can conclude that n j=k+1 X row j Step 2: To prove h(pq, A, r, k) ≤ h(p, A, r, k) for any p ∈ [0, 1] and q ∈ (0, 1). According to the definition of θ (p, X, k) in Step 1, we have that It needs to be pointed out that we have proved the fact in Step 1 that for any |u 1 | ≥ |u 2 | . . . |u n |.
Because h(p, A, r, k) = max |S|≤k sup X∈(N(A)) r \{0,0,...,0} θ (p, X, k), so we can get that h(p, A, r, k) The proof is completed. For convenience, we still use θ (p, X, S) which is defined in the proof of Proposition 1, and the following proof is divided into three steps.
Step 1. To prove that there exist X ∈ (N(A)) r and a set S ⊂ {1, 2, . . . , n} such that It needs to be pointed out that the choice of the set Step 2. To prove that lim p→p -0 h(p, A, r, k) = h(p 0 , A, r, k). We assume that lim p→p -0 h(p, A, r, k) = h(p 0 , A, r, k). According to Proposition 1, h(p, A, r, k) is nondecreasing in p ∈ [0, 1], therefore, we can get a sequence of {p n } with p n → p -0 such that According to the proof in Step 1, there exist X ∈ (N(A)) r and S ⊂ {1, 2, . . . , n} such that h(p 0 , A, r, k) = θ (p 0 , X , S ). It is easy to get that lim p→p -0 θ (p n , X, S ) = θ (p, X , S ) = h(p 0 , A, r, k).
However, according to the definition of θ (p, X, S), it is obvious that however, (15) and (16) contradict each other. Therefore, we have that lim p→p -0 h(p, A, r, k) = h(p 0 , A, r, k).
Step 3. To prove that lim p→p + 0 h(p, A, r, k) = h(p 0 , A, r, k) for any p 0 ∈ [0, 1). We consider a sequence of {p n } with p 0 ≤ p n < 1 and p → p + 0 . According to Step 1, there exist X n ∈ V and |S n | ≤ k such that h(p n , A, r, k) = θ (p n , X n , S n ). Since the choice of S ⊂ {1, 2, . . . , n} with |S| ≤ k is limited, there exist two subsequences {p n i } of {p n }, {X n i } of {X n } and a set S such that θ (p n i , X n i , S ) = h(p n i , A, r, k).
Furthermore, since X n ∈ V , it is easy to get a subsequence of X n i which is convergent. Without loss of generality, we assume that X n i → X .
Therefore, we can get that h(p n i , A, r, k) = θ (p n i , X n i , S ) → θ (p 0 , X , S ). According to the definition of h(p 0 , A, r, k), we can get that θ (p 0 , X , S ) ≤ h(p 0 , A, r, k) such that lim p→p + 0 h(p, A, r, k) = h(p 0 , A, r, k). Combining Step 2 and Step 3, we show that it is impossible for h(p, A, r, k) to have jump discontinuous.
The proof is completed.
The concept M-NSC is very important in this paper and it will offer tremendous help in illustrating the performance of l 2,0 -minimization and l 2,p -minimization; however, M-NSC is difficult to be calculated for a large scale matrix. We show the figure of M-NSC in Example 1 in Figure 1. Combining Propositions 1 and 2, we can get the first main theorem which shows us the equivalence relationship between l 2,0 -minimization and l 2,p -minimization.

Theorem 1
If every k-sparse matrix X can be recovered by l 2,0 -minimization, then there exists a constant p * (A, B, r) such that X also can be recovered by l 2,p -minimization whenever 0 < p < p * (A, B, r).
Proof First of all, we will prove that h(0, A, r, k) < 1 under the assumption. If h(0, A, r, k) ≥ 1 for some fixed r and k, then there exists X ∈ (N(A)) r such that X S 2,0 ≥ X S C 2,0 for a certain set S ⊂ {1, 2, . . . , n} with |S| ≤ k. Let B = AX S , then it is obvious that X S C is a sparser solution than X S , which contradicts the assumption.
By Propositions 1 and 2, since h(p, A, r, k) is continuous and nondecreasing at the point p = 0, there exist a constant p * (A, B, r) and a small enough number δ that h(0, A, r, k) < h(p, A, r, k) ≤ h(0, A, r, k) + δ < 1 for any p ∈ (0, p * (A, B, r)).
Therefore, for a given k-sparse matrix X * ∈ R n×r and any 0 < p < p * (A, B, r), we have that The last inequality is the result from the inequality h(p, A, r, k) < 1, the proof is completed.

An analysis expression of such p
In Section 2, we have proved the fact that there exists a constant p * (A, B, r) such that both l 2,p -minimization and l 2,0 -minimization have the same solution. However, it is also important to give such an analytic expression of p * (A, B, r). In Section 3, we focus on giving an analytic expression of an upper bound of h(p, A, r, k), and we can get the equivalence relationship between l 2,p -minimization and l 2,0 -minimization as long as h(p, A, r, k) < 1 is satisfied. In order to reach our goal, we postpone our main theorems and begin with some lemmas.

Lemma 1
For any X ∈ R n×r and p ∈ (0, 1], we have that Proof For any X ∈ R n×r , without loss of generality, we assume that X row i 2 = 0 for i ∈ { X 2,0 + 1, . . . , n}.
According to Hölder's inequality, we can show that that is, and we can get that It is easy to get that Therefore, f (p) is nondecreasing in p ∈ (0, 1], and we can get f (p) ≥ f (1) = √ 2 2 . The proof is completed.

Lemma 4 For any
Proof We define a function ϕ(p) on the interval (0, 1] by ϕ(p) = (1 -p 2 ) 1 p -1 2 , and it is easy to get that Now, we take the derivative of both sides of inequality (23) with respect to p, it is easy to get that The fourth inequality is due to the fact that ln(1 + x) ≤ x -x 2 2 for any x ∈ (0, 1]. The last inequality is the result from a simple inequality 4p -(2p) 2 = 3pp 2 > 0.
Proof According to the definition of inner product of matrixes, it is easy to get that and 1 since support(X 1 ) ∩ support(X 2 ) = ∅. Therefore, we can get that The proof is completed. Now, we present another main contribution in this paper.

Theorem 2
Given an underdetermined matrix A ∈ R m×n which satisfies RIP of order 2k, for any p ∈ (0, 1], we can get an upper bound of h (p, A, r, k).
Proof For any X ∈ (N(A)) r \ {(0, 0, . . . , 0)}, we define a vector x ∈ R n as x = X row 1 2 , X row 2 2 , . . . , X row n 2 T , and we consider the index set Since X ∈ (N(A)) r , it is obvious that AX = A(X S 0 + X S 1 + · · · + X S t ) = (0, 0, . . . , 0), and it is easy to get that The second inequality uses the result from Lemma 5. Next, we will give an estimate of t i=2 X S i F . On the one hand, by (35), it is obvious that On the other hand, we have that for any 2 ≤ i ≤ t. Therefore, By Lemma 2, we can get that where By Lemma 3, we can conclude that Therefore, we can get that Therefore, we have that Therefore, we can get that By Lemma 1, By the definition of M-NSC, we can conclude that Theorem 3 Given a matrix A ∈ R m×n with m ≤ n, every k-sparse X * can also be recovered by l 2,p -minimization, for any 0 < and w 2 = - The last inequality is the result from the fact M(δ 2k , p) < 1. The proof is completed. Now, we present one example to demonstrate the validation of our main contribution in this paper. It is easy to verify that the unique sparse solution to l 2,0 -minimization is   So the solution to AX = B can be expressed as the following form:

Example 2 We consider an underdetermined system
where s, t ∈ R such that Then we will verify the result in Theorem 3. It is easy to get that p * = 0.6989 since δ 4 = 0.5612, and we show the cases when p = 0.6989 and p = 0.3 in Figures 2 and 3.
It is obvious that X 2,p has the minimum point at s = t = 0, which is the original solution to l 2,0 -minimization.

Numerical experiment
Although l 2,p -minimization is difficult to be solved, there are a lot algorithms designed for this problem. In this section, we adopt two excellent algorithms presented in [18] and [21]. Algorithm 1 [21] Input: Given A ∈ R m×n and B ∈ R m×r . 1: For k = 1, 2, . . . r 2: Solve l 1 -minimization problem, x k = arg min x∈R n x 0 s.t. Ax = b col,k . S k = support(x k ). Algorithm 2 [18] Input: Given A ∈ R m×n , B ∈ R m×r and p ∈ (0, 1].
Algorithm 3 [18] Input: Given A ∈ R m×n , B ∈ R m×r and p ∈ (0, 1]. 1: QR decompose A T = Q 1 R, where Q 1 ∈ R n×m and R ∈ R m×m . 2: P = I n -Q 1 Q T 1 and X 1 = Q 1 (R -1 ) T B. 3: For k = 1, 2, . . . until convergence The main reason why we choose these algorithms is not only their efficient performance and theoretical results but also the feature that any p ∈ (0, 1) can be applied in these two algorithms. The details of these algorithms are presented in Algorithms 1, 2 and 3. However, we need to underline the choice of the parameters in Algorithms 2 and 3, not the smaller the better, a reasonable p has a key role in these algorithms. Therefore, it is urgent for us to see whether these algorithms can do better by using our result.
In order to set reference standards for these l 2,p -minimization algorithms, we will consider l 2,1 -minimization min X∈R n×r  Different to other p ∈ (0, 1), l 2,1 -minimization is a convex optimization problem which can be solved efficiently; especially, this problem can be transformed into a linear programming problem when r = 1. In this section, we will adopt Algorithm 1 designed for l 2,1 -minimization, and we will adopt a 256 × 1024 measurement A with δ 100 = 0.8646, and we can get p * (A) = 0.1507 by our result. Firstly, we look at the relations between sparsity and recovery success ratio. As shown in Figures 4 and 5, the performance of p = 0.1507 is better than the choice p = 0.2 and l 2,1minimization in both of these two algorithms. The results show us that our result helps these algorithms to increase the efficiency.
Secondly, we look at the relations between the relative error (RE) and sparsity. We define the relative error by RE = X * -X sol F X * F .
As shown in Figure 6, our result performs better than a larger choice. However, we need to emphasize the fact that our result may not be the optimal choice, only a little larger is permitted. In our experiments, the choice of p = 0.16 is still good, but its performance begins to get worse when p = 0.19. Figure 6 The performance of Algorithm 2 and Algorithm 3, Algorithm 2 is the left one and Algorithm 3 is the right one.

Conclusion
In this paper we have studied the equivalence relationship between l 2,0 -minimization and l 2,p -minimization, and we have given an analysis expression of such p * . Furthermore, it needs to be pointed out that the conclusion in Theorems 2 and 3 is valid in a single measurement vector problem, i.e., l p -minimization also can recover the original unique solution to l 0 -minimization when 0 < p < p * .
However, the analysis expression of such p * in Theorem 3 may not be the optimal result. In this paper, we have considered all the underdetermined matrices A ∈ R m×n and B ∈ R m×r from a theoretical point of view. So the result can be improved with a particular structure of the matrices A and B. The authors think the answer to this problem will be an important improvement of the application of l 2,p -minimization. In conclusion, the authors hope that in publishing this paper a brick will be thrown out and be replaced with a gem.