Analysis of the equivalence relationship between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{0}$\end{document}l0-minimization and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{p}$\end{document}lp-minimization

In signal processing theory, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{0}$\end{document}l0-minimization is an important mathematical model. Unfortunately, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{0}$\end{document}l0-minimization is actually NP-hard. The most widely studied approach to this NP-hard problem is based on solving \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{p}$\end{document}lp-minimization (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$0< p\leq1$\end{document}0<p≤1). In this paper, we present an analytic expression of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p^{\ast}(A,b)$\end{document}p∗(A,b), which is formulated by the dimension of the matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$A\in\mathbb{R}^{m\times n}$\end{document}A∈Rm×n, the eigenvalue of the matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$A^{T}A$\end{document}ATA, and the vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$b\in\mathbb{R}^{m}$\end{document}b∈Rm, such that every k-sparse vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x\in\mathbb{R}^{n}$\end{document}x∈Rn can be exactly recovered via \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{p}$\end{document}lp-minimization whenever \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$0< p< p^{\ast}(A,b)$\end{document}0<p<p∗(A,b), that is, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{p}$\end{document}lp-minimization is equivalent to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$l_{0}$\end{document}l0-minimization whenever \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$0< p< p^{\ast}(A,b)$\end{document}0<p<p∗(A,b). The superiority of our results is that the analytic expression and each its part can be easily calculated. Finally, we give two examples to confirm the validity of our conclusions.


Introduction
In sparse information theory, a central goal is to get the sparsest solutions of underdetermined linear systems including visual coding [1], matrix completion [2], source localization [3], and face recognition [4]. All these problems are popularly modeled by the following l 0 -minimization: where A ∈ R m×n is an underdetermined matrix (i.e. m < n), and x 0 is the number of nonzero elements of x, which is commonly called l 0 -norm although it is not a true vector norm. If x ∈ R n is a unique solution of l 0 -minimization, we also say that x can be recovered by l 0 -minimization; we adopt these two statements in this paper. Since A has more columns than rows, the underdetermined linear system Ax = b admits an infinite number of solutions. To find the sparsest one, much excellent theoretical work (see, e.g. [5,6], and [7]) has been devoted to the l 0 -minimization. However, Natarajan [8] proved that l 0 -minimization is NP-hard. Furthermore, it is combinationally and computationally intractable to solve l 0 -minimization directly because of its discrete and discontinuous nature. Therefore, a lot of work put forward some alternative strategies to get the sparsest solution (see, e.g. [5,[9][10][11][12][13][14], and [15]). Among these methods, the most popular one is l p -minimization with 0 < p ≤ 1 introduced by Gribonval and Nielsen [16], where x p p = n i=1 |x i | p . In the literature, x p is still called the p-norm of x though it is only a quasi-norm when 0 < p < 1 (because in this case it violates the triangle inequality). Due to the fact that x 0 = lim p→0 x p p , l 0 -minimization and l p -minimization are collectively called l p -minimization with 0 ≤ p ≤ 1 in this paper.
However, to get the sparsest solution of Ax = b via l p -minimization, we need certain conditions on A and/or b, for example, the novel restricted isometry property (RIP) of A. A matrix A is said to have restricted isometry property of order k with restricted isometry constant δ k ∈ (0, 1) if δ k is the smallest constant such that There exist a lot of sufficient conditions for the exact recovery by l 1 -minimization, such as δ 3k + 3δ 4k < 2 in [10], δ 2k < √ 2 -1 in [9], and δ 2k < 2(3 -√ 2)/7 in [11]. Cai and Zhang [17] showed that for any given t ≥ 4 3 , the condition δ tk < t-1 t guarantees recovery of every k-sparse vector by l 1 -minimization. From the definition of p-norm it seems to be more natural to consider l p -minimization with 0 < p < 1 instead of l 0 -minimization. Foucart [11] showed that the condition δ 2k < 0.4531 can guarantee exact k-sparse recovery via l pminimization for any 0 < p < 1. Chartrand [18] proved that if δ 2k+1 < 1, then we can recover a k-sparse vector by l p -minimization for some p > 0 small enough. However, it should be pointed out that the problem of calculating δ 2k for a given matrix A is still NP-hard.
Recently, Peng, Yue, and Li [7] have proved that there exists a constant p(A, b) > 0 such that every solution of l p -minimization is also a solution of l 0 -minimization whenever 0 < p < p(A, b). This result builds a bridge between l p -minimization and l 0 -minimization, and it is important that this conclusion is not limited by the structure of a matrix A. However, the paper [7] does not give an analytic expression of p(A, b). The model of choice of l pminimization is still difficult.
As already mentioned, it is NP-hard to calculate δ 2k for a given matrix A ∈ R m×n and also to calculate these p. On the other hand, the possibility of recovery of every k-sparse vector by l 0 -minimization is just a necessary condition for the existence of such δ 2k , and therefore the results based on δ 2k lead to limitations of practical application.
We have to emphasize that although l p -minimization is also difficult due to its nonconvexity and nonsmoothness, a lot of algorithms have been designed to solve l pminimization; see e.g. [11,19], and [20]. Moreover, a reasonable range of p in these algorithms is very important. In this paper, we devote ourselves to giving a complete answer to this problem.
Our paper is organized as follows. In Section 2, we present some preliminaries of the l p -null space property, which plays a core role in the proof of our main theorem. In Section 3, we focus ourselves on proving the main results of this paper: we present an analytic expression of p * (A, b) such that every k-sparse vector x ∈ R n can be exactly recovered via l p -minimization with 0 < p < p * (A, b) as long as x can be recovered via l 0 -minimization. Finally, we summarize our findings in the last section.
For convenience, for x ∈ R n , its support is defined by support(x) = {i : x i = 0}, and the cardinality of a set is denoted by | |. Let Ker(A) = {x ∈ R n : Ax = 0} be the null space of a matrix A, and denote by λ min + (A) the minimal nonzero absolute-value eigenvalue of A and by λ max (A) the maximal one. We denote by x the vector that is equal to x on the index set and zero elsewhere and by A the submatrix the columns of which are the columns of A that are in the set index . Let c be the complement of .

Preliminaries
To investigate conditions under which both l 0 -minimization and l p -minimization have the same unique solution, it is convenient for us to work with a sufficient and necessary condition of the solutions of l 0 -minimization and l p -minimization. Therefore, in this preliminary section, we focus on introducing such an condition, namely the l p -null space property.

Definition 1 ([16]) A matrix
In the literature, the null space property usually means the l 1 -null space property. We now indicate the relation between the l p -null space property and exact recovery via l pminimization with 0 ≤ p ≤ 1.
Theorem 1 ([16, 21]) Given a matrix A ∈ R m×n with m ≤ n, every k-sparse vector x ∈ R n can be recovered via l p -minimization with 0 ≤ p ≤ 1 if and only if A satisfies the l p -null space property of order k.
Theorem 1 provides a sufficient and necessary condition to judge whether a vector can be recovered by l p -minimization with 0 ≤ p ≤ 1, which is the most important advantage of the l p -null space property. However, the l p -null space property is difficult to be checked for a given matrix. To reach our goal, we recall the concept of the null space constant (NSC), which is closely related to the l p -null space property and offers tremendous help in illustrating the performance of l 0 -minimization and l p -minimization.
Similarly to the l p -null space property, NSC also can be used for characterizing the performance of l p -minimization. Combining the definition of NSC and the results in [23] and [22], we can derive the following corollaries.

Corollary 1 For any p ∈ [0, 1], h(p, A, k) < 1 is a sufficient and necessary condition for recovery of all k-sparse vectors via l p -minimization with
Proof The proof is very easy, and we leave it to the readers.

Corollary 2 Given a matrix
Proof (a) We assume that there exists a vector According to the definition of h(p, A, k), these two conclusions contradict h(0, A, k) < 1, and therefore we have that x 0 ≥ 2k + 1 for any x ∈ Ker(A) \ {0}.
(b) As has been proved in (a), we get that Due to the integer values of x 0 and k, it is easy to get that when n is even.

Remark 1
In Corollary 2, we obtained a relation of inequality between n and k under the assumption h(0, A, k) < 1. Furthermore, Foucart [23, p. 49, Chapter 2] showed another relation of inequality between m and k. If every k-sparse vector x ∈ R n can be recovered via l 0 -minimization, then we get that m ≥ 2k; furthermore, it is easy to get that k ≤ m 2 due to the integer values of k.
Remark 2 Chen and Gu [22] showed some important properties of h(p, is the smallest number of columns from A that are linearly dependent. Therefore, if h(0, A, k) < 1 for some fixed A and k, then there exists a constant p * such that h(p, A, k) < 1 for p ∈ [0, p * ), that is, every k-sparse vector can be recovered via both l 0 -minimization and l p -minimization for p ∈ (0, p * ), which is a corollary of the main theorem in [7].
, every solution to l p -minimization also solves l 0 -minimization.
Theorem 2 is the main theorem in [7]. Obviously, this theorem qualitatively proves the effectiveness of solving the original l 0 -minimization problem via l p -minimization. Moreover, the theorem becomes more practical if p (A, b) is computable. At the end of this section, we need to point out a necessary and sufficient condition based on the l p -null space property, and NSC can provide us the following lemma, which is similar to RIP.

Lemma 1 Given an underdetermined matrix A ∈ R m×n and an integer k, the inequality h(0, A, k) < 1 holds if and only if there exist two constants
such that for every 2k-sparse vector x ∈ R n .
Proof Necessity. The proof is divided into two steps.
Step 1: Proof of the existence of u.
To prove this result, we just need to prove that the set has a nonzero infimum. If we assume that inf V = 0, then, for any n ∈ N + , there exists a vector 2k-sparse vector x n with x n 2 = 1 such that Ax n 2 ≤ n -1 .
Furthermore, it is easy to get a convergent subsequence {x n i } of the bounded sequence {x n }, that is, x n i → x 0 , and it is obvious that Ax 0 = 0 because the function y(x) = Ax is continuous.
Let J(x 0 ) = {i : (x 0 ) i = 0}. There exists N i such that (x n k ) i = 0 when k ≥ N i for any i ∈ J(x 0 ). Let N = max i∈J(x 0 ) N i . For any i ∈ J(x 0 ), it is easy to get that (x n k ) i = 0 when k ≥ N . When k ≥ N , we get that x n k 0 ≥ x 0 0 and x 0 0 ≤ 2k.
Therefore, there exists a constant u > 0 such that Ax 2 ≥ u x 2 for any x ∈ R n with x 0 ≤ 2k.
Step 2: Proof of u 2 ≥ λ min + (A T A). According to the proof above, there exists a vector x ∈ R n with x 0 ≤ 2k such that A x 2 = u x 2 . Let V = support( x). It is easy to get that for all x ∈ R |V | . Therefore, the smallest eigenvalue of A T V A V is u 2 since A T V A V ∈ R |V |×|V | is a symmetric matrix, and we can choose an eigenvector z ∈ R |V | of eigenvalue u 2 .
If u 2 < λ min + (A T A), then consider the vector x ∈ R n with x i = z i when i ∈ V and zero otherwise. Therefore, it is easy to get that A T Ax = u 2 x , which contradicts the definition of λ min + (A T A).
Finally, notice that A T A is a semipositive definite matrix such that Ax 2 2 = x T A T Ax ≤ λ max (A T A) x 2 2 for all x ∈ R n . So there exists a constant w such that Ax 2 2 ≤ w 2 x 2 2 for all x 0 ≤ 2k.
Sufficiency. Let a k-sparse vector x * be the unique solution of l 0 -minimization. For any k-sparse vector x 1 , we have that Therefore, we get that x * = x 1 as long as x 1 is a solution of Ax = b, that is, every k-sparse vector can be recovered by l 0 -minimization (1), and this is equivalent to h(0, A, k) < 1 by Corollary 1.

Main contribution
In this section, we focus ourselves on the proposed problem. By introducing a new technique and utilizing preparations provided in Section 2, we will present an analytic expression of p * (A, b) such that every k-sparse vector x can be recovered via l p -minimization with 0 < p < p * (A, b) as long as it can be recovered via l 0 -minimization. To this end, we first begin with two lemmas.

Lemma 2
For any x ∈ R n and 0 < p ≤ 1, we have that Proof This result can be easily proved by Hölder's inequality.
Proof By the assumption on the matrix A, it is easy to get that Since support(x 1 ) ∩ support(x 2 ) = ∅, we have that from which we get that With the above lemmas in hand, we now can prove our main theorems.

Theorem 3 Given a matrix A ∈ R m×n with m ≤ n and
where h * (p, A, k) = Proof According to Theorem 1 and Corollary 2, it is easy to get that x 0 ≥ 2k + 1 for every x ∈ Ker(A) \ {0} since h(0, A, k) < 1. Furthermore, according to Lemma 1, we can find constants λ min for anyx ∈ R n with x 0 ≤ 2k. We know that x 0 ≥ 2k + 1, so both 1 and 0 are not empty, and there are only two cases: (i) 0 and i (i = 2, . . . , t -1) all have k elements except, possibly, t .
Since u x 2 ≤ Ax 2 ≤ w x 2 for any x 0 ≤ 2k, we have that Since According to Lemma 3, for any i ∈ {2, 3, . . . , t}, we get that Substituting inequalities (19) into (18), we have By the definition of x (1) 1 and x 1 it is easy to get that and x (2) Therefore, we get that Substituting inequalities (20) and (23) into (17), we have that For any 2 ≤ i ≤ t and any element a of x i , it is easy to get that |a| p ≤ 1 k+1 x 1 p p , so we have the inequalities: and Substituting inequalities (25) and (26) into (24), we derive that Let r = w 2 u 2 and Then we can rewrite inequality (27) as so that Therefore, we get that x 0 2 ≤ √ 2+1 2 B. According to Lemma 2, we have that Substituting the expression of B into inequality (29), we obtain that We notice that the sets 0 and i (i = 2, . . . , t -1) all have k elements and the set 1 has k + 1 elements such that tk + 2 ≤ n ≤ (t + 1)k + 1, so we get that t ≤ n-2 k . According to Lemma 1, we have that r = w 2 . Substituting the inequalities into (30), we obtain It is obvious that x 1 p ≤ x c 0 p , and therefore we get that where h * (p, A, k) is given in (15). According to the definition of h(p, A, k), we can get that h(p, A, k) ≤ h * (p, A, k).
Theorem 3 presents a result that is very similar to the result in Theorem 1. However, it is worth pointing out that the constant h * (p, A, k) plays a central role in Theorem 3. In fact, we can treat h * (p, A, k) as an estimate of h(p, A, k), where the former is calculable, and since the latter is NP-hard, h * (p, A, k) may be considered as an improvement of h(p, A, k). According to Theorem 1, if we take k as the l 0 -norm of the unique solution of l 0 -minimization, then we can get the main contribution as soon as the inequality h * (p, A, k) 1 p < 1 is satisfied.
Theorem 4 Let A ∈ R m×n be an underdetermined matrix of full rank, and denote * = | support(A T (AA T ) -1 b)|. If every k-sparse vector x can be recovered via l 0 -minimization, then x also can be recovered via l p -minimization with p ∈ (0, p * (A, b)), where 2 )( (λ-1)(n-3) and Combining Theorems 3 and 4, we have reached the major goals of this paper. The most important result in these two theorems is the analytic expression of p * (A, b), with which the specific range of p can be easily calculated.
Next, we present two examples to demonstrate the validation of Theorem 4. We consider two matrixes of different dimensions of their null spaces and get the unique solution to l p -minimization to verify whether it is the unique solution to l 0 -minimization. Therefore, the p-norm of x can be expressed as  of p * (A, b) such that l p -minimization is equivalent to l 0 -minimization. Although it is NPhard to find the global optimal solution of l p -minimization, a local minimizer can be done in polynomial time [24]. Chen [22] proved that h(p, A, k) < 1 is a necessary and sufficient condition for the global optimality of l p -minimization. Therefore, it is confident that we can find the sparse solution with l p -minimization with 0 < p < p * (A, b) as long as we start with a good initialization.
However, in this paper, we only consider the situation where l 0 -minimization has an unique solution. The uniqueness assumption is vital for us to prove the main results. However, from Lemma 1 we see that the uniqueness assumption is equivalent to a certain double-inequality condition, which looks like RIP. The evident difference between them is in that the former possesses the homogeneity rather than the latter. This implies that, unlike RIP, the uniqueness assumption is not in essential conflict with equivalence of all linear systems λAx = λx, λ ∈ R. Therefore, we think that the uniqueness assumption and, equivalently, the resulting double-inequality condition in Lemma 1 can replace the RIP in many cases.