A result on the limiting spectral distribution of random matrices with unequal variance entries

A classical result in randommatrix theory reveals that the limiting spectral distribution of a Wigner matrix whose entries have a common variance and satisfy other regular assumptions almost surely converges to the semicircular law. In the paper, we will relax the assumption of uniform covariance of each entry, when the average of the normalized sums of the variances in each row of the data matrix converges to a constant, we prove that the same limiting spectral distribution holds. A similar result on a sample covariance matrix is also established. The proofs mainly depend on the Stein equation and the generalized Stein equation of independent random variables.


Introduction
Suppose A n is an n × n Hermitian matrix and λ 1 , λ 2 , . . . , λ n denote the real eigenvalues of A n . The empirical spectral distribution function (ESD) of A n can be defined as where I A represents the indicator function on the set A. The limit distribution of F A n (x) as n → ∞, if it exists, will be called the limiting spectral distribution (LSD) of A n . Since most of the global spectral limiting properties of A n can be determined by its LSD, the LSD of large dimensional random matrices has attracted considerable interest among mathematicians, probabilists, and statisticians, one can refer to Wigner [15,16], Grenander and distribution F sc , whose density is given by 0 o t h e r w i s e .
The LSD F sc is usually called the semicircular law in the literature. Grenander [6] proved that F W n -F sc → 0 in probability. Arnold [1,2] obtained the result that F W n converges to F sc almost surely. Pastur [12] removed the identically distributed assumption, and considered that when the entries above or on the diagonal of X n are independent real or complex random variables with mean zero and variance 1, may not necessarily be identically distributed, but satisfy the following Lindeberg type assumption, for any constant η > 0: Then the ESD of W n converges almost surely to the semicircular law. Among the results above, the assumption that the entries of the Wigner matrix have a common variance is necessary. However, in a practical application, the uniform variance assumption is a strong condition. In the paper, we will remove the uniform variance assumption and establish the same semicircular law result under a milder assumption on the variances of the entries, in particular, we assume that the covariances of the entries may not be equal to a constant, but only the average of the normalized sums of in each row of the data matrix converges to a positive constant. The result reads as follows. Theorem 1.1 Let W n = 1 √ n X n be a Wigner matrix, and the entries above or on the diagonal of X n be independent real or complex random variables, but they may not be necessarily identically distributed. Assume that all the entries of X n are of mean zero, and the variance E|X ij | 2 = σ 2 ij , where σ ij satisfies 1 n n i=1 | 1 n n j=1 σ 2 ij -1| → 0 as n → ∞, and the assumption (1.1) holds. Then, almost surely, the ESD of W n converges weakly to the semicircular law. Remark 1.1 The result of Theorem 1.1 can be extended to a general one: when the average of the normalized sums in each row converges weakly to a positive constant σ 2 , then almost surely the LSD of W n is the general semicircular law with density Now, we will consider the LSD of a sample covariance matrix, which is also an important object in random matrix theory and multivariate statistics. Suppose Y n = (Y ij ) n×N is a real or complex random matrix, whose entries Y ij (i = 1, . . . , n, j = 1, . . . , N) are i.i.d. real or complex random variables with mean zero and variance 1. Write Y j = (Y 1j , . . . , Y nj ) and where * represents the conjugate transpose symbol, we usually consider the sample covariance matrix defined by S n = 1 N Y n Y * n , The limiting spectral properties of large sample covariance matrices have generated a considerable amount of interest in statistics, signal processing and other disciplines. The first result on the LSD of S n is due to Marcenko and Pastur [10], who proved that when lim n→∞ n N = y ∈ (0, ∞), the LSD of S n is M-P law F MP y (x) with density and has a point mass 1 -1/y at the origin if y > 1, where a = (1 -√ y) 2 There is also some work on the discussion of the M-P law of sample covariance matrices, such as Bai and Yin [4], Grenander and Silverstein [7], Jonsson [8], Yin [17], Silverstein [13] and Silverstein and Bai [14]. A typical result (see Theorem 3.9 of Bai and Silverstein [3]) states that when the entries of Y n are independent random variables with mean zero and variance 1, n/N → y ∈ (0, ∞), and for any η > 0, then the ESD of S n tends to the M-P law F MP y almost surely. Note that the entries of Y n having a uniform variance 1 is also a necessary condition in the proof. By the same motivation as Theorem 1.1, we will also consider removing the equal covariance condition.
Similarly, we can get the following result.

Theorem 1.2
Assume that the entries of the random matrix Y n defined above are independent variables with mean zero and variance E|Y ij ij -σ 2 | → 0, the other assumptions remain unchanged, we also get, almost surely, for the LSD of S n = 1 N Y n Y * n the general M-P law with density and it has a point mass 1 -1/y at the origin if y > 1, whereã = σ 2 (1 - The rest of the paper is organized as follows. The proofs of the main results are presented in Sect. 2. In the Appendix, some useful lemmas are listed. In the sequel, when there is no confusion, we may get rid of the subscript n in the notation of matrices for brevity. A * denotes the conjugate transpose of matrix A, and tr(A) denotes the trace of A, and C denotes positive constant, which may be different in different cases.

Proofs
The Stieltjes transform method is mainly adopted to complete the proofs. For a distribution function F(x), its Stieltjes transform can be defined as Obviously, we can write the Stieltjes transform of ESD F A n (x) as where I n is the identity matrix with order n. The continuity theorem of Stieltjes transform states that, for a sequence of functions of bounded variation {G n } with the Stieltjes transform s G n (z), G n (-∞) = 0 for all n, and a function of bounded variation G, G(-∞) = 0, with the Stieltjes transform s G (z), G n converges vaguely to G if and only if s G n (z) converges to s G (z) for z ∈ C + . In view of the fact that the sequence of the ESD of Wigner matrix is tight (see Lytova and Pastur [9]), the weak convergence of the ESD can be obtained by the convergence of their corresponding Stieltjes transform. Furthermore, if the LSD is a deterministic probability density function, then the almost surely convergence of the ESD can be achieved by the almost surely convergence of the Stieltjes transform, which is a basic idea in the following proofs.

Proof of Theorem 1.1
Define F W n to be the ESD of W n and s n (z) the Stieltjes transforms of F W n . Then by the continuity theorem of the Stieltjes transform, we complete the proof of Theorem 1.1 by showing where s(z) is the Stieltjes transform of the semicircular law F sc . The proofs of the real-valued Wigner matrix are almost the same as those of the complex-valued Wigner matrix, that is, all the results as well as the main ingredients of the proofs in the real symmetric matrices case remain valid in the Hermitian case with natural modifications. For the sake of simplicity, we will confine ourselves to a real symmetric Wigner matrix. To this end, we will writeŴ n = 1 √ nX n to be a Wigner matrix independent of W n , and the entries ofX n = (X ij ) n×n are independent N(0, 1) random variables. Define FŴ n to be the ESD ofŴ n , andŝ n (z) the Stieltjes transforms of FŴ n . By Theorem 2.9 of Bai and Silverstein [3], we know that, almost surely, the LSD ofŴ n is a semicircular law F sc (x), which meanŝ s n (z) → s(z), z ∈ C + , a.s.
Thus, (2.1) can be achieved bŷ s n (z)s n (z) → 0, z ∈ C + , a.s. (2.2) In the sequel, we will complete the proof of (2.2) by the following two steps.
(ii) For any fixed z ∈ C + , Es n (z) -Eŝ n (z) → 0. We begin with step (i). Define W k to be a major submatrix of order (n -1), obtained from W n with the kth row and column removed, and α k to be the vector from the kth column of W n by deleting the kth entry. Denote by E k (·) conditional expectation with respect to the σ -field generated by the random variables {X i,j , i, j > k}, with the convention that E n s n (z) = Es n (z) and E 0 s n (z) = s n (z). Then By Theorem A.5 of Bai and Silverstein [3], we know which implies |γ k | ≤ 2/nv. Since {γ k , k ≥ 1} forms a martingale difference sequence, then it follows by Lemma A.1 with p = 4 that E s n (z) -Es n (z) 4 which, together with the Borel-Cantelli lemma, yields s n (z) -Es n (z) → 0, a.s.
for every fixed z ∈ C + . Therefore, step (i) is completed. Now we come to step (ii). We firstly introduce some notation: By the facts that X(1) = X, X(0) =X, we can write Es n (z) -Eŝ n (z) . Write the (i, j)-entry of G by G ij and the (i, j)-entry of X(s) by X ij (s). Since the random variablesX ij are independent N(0, 1) random variables, applying the Stein equation in Lemma A.2 with Φ = G ij , we have where D ij (s) = ∂/∂X ij (s).
On the other hand, as the random variables X ij are independent, we will adopt the generalized Stein equation in Lemma A.3 to rewrite the second term in the parentheses of the r.h.s. of (2.3). To this end, we will take p = 1 and Φ = G ij in Lemma A.3. Note that κ 1 = EX ij = 0 and κ 2 = E|X ij | 2 . Then we have where ℘ n is the set of n × n real symmetric matrices.

Thus
Es n (z) -Eŝ n (z) (2.4) By (3.25) in Lytova and Pastur [9], where c l is an absolute constant for every l. Let l = 1, then |D ij G ij | ≤ c 1 /v 3 , and Since E|X ij | 2 = σ 2 ij , and σ ij satisfies 1 We have By the assumption (1.1), we select a sequence η n ↓ 0 as n → ∞, such that And the convergence rate of η n can be as slow as desired. For definiteness, we may assume that η n > 1/ log n and η n → 0. Then we have (1). And by (2.5), let l = 2, then So we have By η n → 0 as n → ∞, we have II = o (1). This, together with (2.4) and (2.6), means that Es n (z) -Eŝ n (z) → 0 for any fixed z ∈ C + .
Step (ii) is completed.
Combining steps (i) and (ii), we see that (2.2) is proved. Therefore, we have F W n w → F sc , a.s., 2), we complete the proof of Theorem 1.1.

Proof of Theorem 1.2
We will also consider the real-valued sample covariance matrix case and take a similar procedure to Theorem 1.1 to complete the proof. To this end, we also firstly defineŶ n = (Ŷ ij ) n×N to be a n × N random matrix independent of Y n , and the entriesŶ , ij s to be i.i.d. N(0, 1) random variables. WriteŜ n = 1 NŶ nŶ * n . We will use F S n and FˆS n to denote the ESD of S n andŜ n , respectively. Let m n (z) andm n (z) be the Stieltjes transforms of F S n and FˆS n , respectively.
By Theorem 3.10 of Bai and Silverstein [3], we have obtained where m(z) is the Stieltjes transform of standard M-P law F MP y . Thus, by the continuity theorem of the Stieltjes transform again, we complete the proof by showing that, for any fixed z ∈ C + , (i) m n (z) -Em n (z) → 0, a.s. andm n (z) -Em n (z) → 0, a.s. (ii) Em n (z) -Em n (z) → 0. For (i), we prove it by a similar argument to Bai and Silverstein [3]. For the sake of completeness, we will also give the proof. Let E k (·) denote the conditional expectation given by {Y k+1 , . . . , Y N }, with the convention that E N m n (z) = Em n (z) and E 0 m n (z) = m n (z). Then Here S nk = S n -Y k Y * k and Y k is the kth column of Y n with the kth element removed, and Note that { γ k , k ≥ 1} forms a sequence of bounded martingale differences. By Lemma A.1 with p = 4, we have By the Borel-Cantelli lemma again, we see that almost surely m n (z) -Em n (z) → 0. By the same argument, we getm n (z) -Em n (z) → 0, a.s., which means (i) is completed. Then we come to the proof of (ii). We firstly introduce some notation. For 0 ≤ s ≤ 1, By the same procedure in (2.3), we have Em n (z) -Em n (z) (2.7) It follows by Lemma A.2 with Φ = (V * (s)U ) ij that where D ij (s) = ∂/∂V ij (s).
By Lemma A.3 with p = 1 and Φ = (V * (s)U ) ij again, we can see by κ 1 = EY ij = 0 and κ 2 = E|Y ij | 2 that where M n,N is the set of n × N real matrices. By (2.7), we have Em n (z) -Em n (z) (2.8) The bound of |D r ij (V * U ) ji |, r = 1, 2, is critical for the proof. Since (V * U ) ij is analytic in z ∈ C + , by the Cauchy inequality for the bound of derivatives of analytic functions in Lemma A.4, to get the bound of D r ij (V * U ) ij , r = 1, 2, on any compact set of C + , it suffices to find the bound of D r ij (V * U) ij on the compact set. By elementary calculations, we can get the derivatives of V * U with respect to the entries V ij , i = 1, 2, . . . , n, j = 1, 2, . . . N , As U = (VV * -zI n×n ) -1 , this induces U ≤ 1 v , |U ii | ≤ 1 v . Define U = (V * V -zI N×N ) -1 . We also have U ≤ We also easily get 1 nN i,j E|Y ij | 2 = 1 + o(1). Using (2.9) again, we can see Em n (z) -Em n (z) ≤ Cη n + o(1).
As η n → 0, we have Em n (z) -Em n (z) → 0 for any z ∈ C + , which completes the proof of (ii). Based on steps (i) and (ii), we conclude that F S n w → F MP y , a.s.
The proof of Theorem 1.2 is complete.

Appendix: Some lemmas
We will list several important lemmas in our proofs. The first one is the Burkholder inequality for a complex martingale difference sequence, which can be found in Burkholder [5].