 Research
 Open Access
 Published:
Consistency of bootstrap approximation to the null distributions of local spatial statistics with application to house price analysis
Journal of Inequalities and Applications volume 2020, Article number: 217 (2020)
Abstract
With the increasing availability of spatially extensive georeferenced data, much attention has been paid to the use of local statistics to identify local patterns of spatial association, in which the null distributions of local statistics play an essential role in the related statistical inference. As a powerful tool to approximate the distribution of a statistic, the bootstrap method is used in this paper to derive null distributions of the commonly used local spatial statistics including local Getis and Ord’s \(G_{i}\), Moran’s \(I_{i}\) and Geary’s \(c_{i}\). Strong consistency of the bootstrap approximation to the null distributions of the statistics is proved under some mild conditions, and the Boston housing price data are analyzed to demonstrate the application of the theoretical results.
Introduction
Exploration of spatial association has long been recognized as an important issue in spatial data analysis. With the increasing availability of spatially extensive georeferenced data and due to the geological and geographical diversity on a large region, a global structure of spatial association is no longer a realistic assumption for such a data set. Therefore, much attention has been paid to the use of local statistics to identify local patterns of spatial association. The most popular local spatial statistics are perhaps Getis and Ord’s \(G_{i}\) [11, 18] and Anselin’s LISAs [1]. Since their inception, these local statistics have been applied to a variety of fields for spatial data analysis (see, for example, [9, 10, 14, 24]).
In order to test for significance of local spatial association at a reference location, it is essential to derive the null distribution of the local statistics. Normal distributions have been used to approximate the null distributions of some local spatial statistics such as local Getis and Ord’s \(G_{i}\), Moran’s \(I_{i}\) and Geary’s \(c_{i}\) (see, for example, [1, 11, 18]). However, many empirical studies have shown that this approximation is sometimes problematic [1, 4, 5, 27]. Based on the distributional theory of quadratic forms in normal variables, some improved methods have been developed under the assumption that the spatial data are drawn from a normally distributed population (see, for example, [5, 13, 20–22]). Nevertheless, this assumption might be invalid for some realworld data sets. With the computation power of modern computers, the randomized permutation method, a resampling procedure that randomly relocates the data over the locations, is frequently employed to approximate the null distributions of local spatial statistics (see, for example, [1, 12, 17]). Recently, Yan et al. [26] suggested a bootstrap method, originally proposed by Efron [8], to approximate the null distributions of the spatiotemporal versions of local Getis and Ord’s \(G_{i}\), Moran’s \(I_{i}\) and Geary’s \(c_{i}\). They showed by simulations that both the bootstrap and the randomized permutation methods can accurately approximate the null distributions of the local statistics while the bootstrap method seems more efficient than the randomized permutation method in terms of computational time. However, the theoretical validity of the bootstrap approximation remains to be investigated.
The main objective of this paper is to theoretically investigate the validity of the bootstrap approximation to the null distributions of local Getis and Ord’s \(G_{i}\), Moran’s \(I_{i}\) and Geary’s \(c_{i}\). Under some mild conditions, we proved that the bootstrap approximation is strongly consistent in terms of the Kolmogorov distance on the space of distribution functions. Moreover, the Monte Carlo implementation of the bootstrap approximation for statistical inference is given in detail by a case study of the Boston housing price in order to demonstrate application of the theoretical results.
The remainder of this paper is organized as follows: the main results are presented in the next section, and their proofs are given in Sect. 3. As an application example of the theoretical results, the Boston housing price data are analyzed in Sect. 4. The paper is then ended with a brief summary.
Main results
Let \(F(x)\) be the population distribution and s be the coordinate of a geographical location. Given n locations \(s_{i}\) (\(i=1, 2, \ldots , n\)), let \(W=(w_{ij}(d))_{n\times n}\) be the symmetric spatial linkage matrix determined by the underlying spatial structure of the n locations or geographical units, where d is a prespecified distance threshold and \(w_{ij}(d) \) (\(j=1,2,\ldots ,n\)) are positive for all \(s_{j}\)’s within distance d of the location \(s_{i}\) excluding \(s_{j}=s_{i}\), and are zero for other \(s_{j}\)’s. Generally, the binary values, zero and one, are assigned to \(w_{ij}(d) \) (\(j=1,2,\ldots ,n\)) according to the above rule. At each location \(s_{j}\), draw independently \(X_{j}\) from the population distribution \(F(x)\), forming an independent and identically distributed (i.i.d.) sample \((X_{1},X_{2},\ldots ,X_{n})\) with \(X_{j}\) located at \(s_{j} \) (\(j=1,2,\ldots ,n\)).
Given a reference location \(s_{i}\), after rescaling and/or recentering, the local Getis and Ord’s \(G_{i}\) [11], the local Moran’s \(I_{i}\) and Geary’s \(c_{i}\) [1] are, respectively, of the forms
and
where \(\bar{X}=\frac{1}{n} \sum_{i=1}^{n} X_{i}\) and \(W_{in}=\sqrt{\sum_{j=1}^{n} w_{ij}(d)}\).
Remark 1
For \(G_{i}(d)\), it is a natural assumption that \(E(X_{j})=\mu \neq 0\). Moreover, we modify the numerator in the \(G_{i}(d)\) statistic as \(X_{j}\bar{X}\) instead of \(X_{j}\) in its original form to facilitate the forthcoming proof of the asymptotic property. This modification does not change the interpretation of the statistic.
Let \(F_{n}\) denote the empirical distribution of the sample \((X_{1}, X_{2},\dots ,X_{n})\), that is,
where \(\mathbb{I}_{\{A\}}\) is the indicator function of the event A. Let \((X_{1}^{*}, X_{2}^{*},\dots ,X_{n}^{*})\) be the bootstrap sample drawn from \(F_{n}(x)\) with replacement and be located at \((s_{1},s_{2},\ldots ,s_{n})\). The bootstrap scenarios of the local Getis and Ord’s \(G_{i}\), Moran’s \(I_{i}\) and Geary’s \(c_{i}\) are, respectively,
and
where \(\bar{X}^{*}=\frac{1}{n} \sum_{i=1}^{n} X_{i}^{*}\).
Throughout this paper, we use the notations P, E and Var to indicate the probability, expectation and variance calculated under \(F(x)\) and the notations \(P^{*}\), \(E^{*}\) and Var^{∗} to represent those computed under \(F_{n}(x)\). In what follows, we first introduce the consistency definition of bootstrap approximation to the distribution of a statistic and then give the main results of this article.
Definition 1
([7], Chap. 29)
Let F and G be two distributions on a sample space \(\mathscr{X}\) and \(\rho (F,G)\) be a metric on the space of distribution functions. Let \((X_{1},X_{2},\ldots ,X_{n})\) be i.i.d. random variables with the common distribution F. For a given statistic \(T=T(X_{1}, \ldots , X_{n}; F)\), let \(H_{n}(x)= P(T(X_{1}, X_{2}, \ldots , X_{n}; F) \leq x)\) and \(H_{n}^{*}(x)= P^{*}(T(X_{1}^{*}, X_{2}^{*}, \ldots , X_{n}^{*}; F_{n}) \leq x)\) be the distribution function of T and the bootstrap distribution function of \(T^{*}=T(X_{1}^{*}, X_{2}^{*}, \ldots ,X_{n}^{*}; F_{n})\), respectively. We say that the bootstrap approximation for T is weakly consistent under ρ if \(\rho (H_{n}, H_{n}^{*})\stackrel{P}{\longrightarrow} 0\) as \(n \to \infty \), where \(\stackrel{P}{\longrightarrow }\) denotes convergence in probability; we say that the bootstrap approximation for T is strongly consistent under ρ if \(\rho (H_{n}, H_{n}^{*})\stackrel{a.s.}{\longrightarrow } 0\) as \(n \to \infty \), where \(\stackrel{a.s.}{\longrightarrow }\) denotes convergence for almost all sample sequences of \(X_{1},X_{2},\ldots \) .
Several metrics such as the Kolmogorov distance and the Mallows distance can be employed to measure the consistency of the bootstrap approximation. The Kolmogorov distance defined by
is commonly used, where \(R=(\infty , +\infty )\). In this paper, the Kolmogorov distance is mainly used to investigate the strong consistency of the bootstrap approximation for the local Getis and Ord’s \(G_{i}\), Moran’s \(I_{i}\) and Geary’s \(c_{i}\) and the main results are summarized in the following theorems.
Theorem 1
Let \(W=(w_{ij}(d))_{n\times n}\)be the binary spatial linkage matrix of the geographical locations \(s_{j} \) (\(j=1,2,\ldots ,n\)) and \(W_{in}=\sqrt{\sum_{j=1}^{n} w_{ij}(d)}\). Let \((X_{1},X_{2},\ldots ,X_{n})\)be an i.i.d. sample drawn from a continuous distribution F with nonzero mean μ and positive variance \(\sigma ^{2}\). Given a reference location \(s_{i}\), if \(\frac{1}{n} W_{in}^{2} \rightarrow 0\)as \(n \rightarrow \infty \), then the bootstrap approximation for \(G_{i}(d)\)is strongly consistent under the Kolmogorov distance. That is,
Theorem 2
Let \(W=(w_{ij}(d))_{n\times n}\)be the binary spatial linkage matrix of the geographical locations \(s_{j} \) (\(j=1,2,\ldots ,n\)) and \(W_{in}=\sqrt{\sum_{j=1}^{n} w_{ij}(d)}\). Let \((X_{1},X_{2},\ldots ,X_{n})\)be an i.i.d. sample drawn from a continuous distribution F with mean μ and positive variance \(\sigma ^{2}\). Given a reference location \(s_{i}\), if \(\frac{1}{n}W_{in}^{2} \rightarrow 0\)as \(n \rightarrow \infty \), then the bootstrap approximation for \(I_{i}(d)\)is strongly consistent under the Kolmogorov distance. That is,
Theorem 3
Let \(W=(w_{ij}(d))_{n\times n}\)be the binary spatial linkage matrix of the geographical locations \(s_{j}\) (\(j=1,2,\ldots ,n\)) and \(W_{in}=\sqrt{\sum_{j=1}^{n} w_{ij}(d)}\). Let \((X_{1},X_{2},\ldots ,X_{n})\)be an i.i.d. sample drawn from a continuous distribution F with mean μ and positive variance \(\sigma ^{2}\). Given a reference location \(s_{i}\), the bootstrap approximation for \(c_{i}(d)\)is strongly consistent under the Kolmogorov distance. That is,
Proofs of the main results
Preliminaries and lemmas
To prove the theorems, the Mallows distance (see, for example, [3, 15, 16]) will be used because of its interesting properties relating to the Kolmogorov distance. Let \(\mathscr{F}_{p}\) be the set of distribution functions F with \(\int _{\infty }^{\infty } x^{p} \,dF(x)<\infty \). For \(F,G \in \mathscr{F}_{p}\), the Mallows distance between F and G is defined as
where \(1 \leq p < \infty \) and the infimum is taken over the pairs \((X,Y)\) with the marginal distribution functions of X and Y being F and G, respectively. Throughout this paper, we also write \(d_{p}(F,G)\) and \([d_{p}(F,G) ]^{2}\) as \(d_{p}(X,Y)\) and \(d_{p}^{2}(X,Y)\), respectively, for the ease of interpretation.
Lemma 1
([23], p. 12)
Let \(X_{1},X_{2},\ldots \)be a random variable sequence and X be a random variable with a continuous distribution function. If \(X_{n}\)converges to X in distribution, which we denote \(X_{n} \rightsquigarrow X\), then
Lemma 2
([3])
Let \(G_{n} \in \mathscr{F}_{p}\)and \(G \in \mathscr{F}_{p}\). Then \(d_{p}(G_{n},G)\to 0\)as \(n \to \infty \)if and only if both of the following conditions hold:
Remark 2
Let the distribution functions of \(X_{n}\) and X be \(G_{n}\) and G, respectively. Lemma 2 means that \(G_{n}\) converges to G in the Mallows distance \(d_{p}\) if and only if \(X_{n}\rightsquigarrow X\) and \(EX_{n}^{p} \to EX^{p}\).
Lemma 3
Let \(X_{1},X_{2},\ldots \)be an i.i.d. random variable sequence with the common distribution function \(F \in \mathscr{F}_{p}\). Let \(F_{n}\)be the empirical distribution function of \((X_{1},X_{2},\ldots , X_{n})\). Then
where \(\stackrel{a.s.}{\longrightarrow }\)means that \(d_{p}(F_{n},F)\to 0\)for almost all sample sequences of \(X_{1},X_{2},\ldots \) .
Proof
By Lemma 2, it is sufficient to prove \(F_{n} \rightarrow F\) weakly and \(EX_{n}^{p} \rightarrow EX^{p}\). Let \(Y_{i}=\mathbb{I}_{\{X_{i} \leq x\}} \) (\(i=1,2,\ldots ,n\)). Since \((Y_{1},Y_{2},\ldots ,Y_{n})\) are i.i.d. random variables, we know from the strong law of large numbers that
which indicates \(F_{n} \rightarrow F\) weakly. Similarly, \(EX_{n}^{p} \rightarrow EX^{p}\) can be obtained by using the strong law of large numbers for \((X_{1}^{p},X_{2}^{p},\ldots ,X_{n}^{p})\). □
Lemma 4
([3])
Let \((X_{1},X_{2},\ldots ,X_{n})\)and \((Y_{1},Y_{2},\ldots ,Y_{n})\)be two sets of independent random variables with their distribution functions belonging to \(\mathscr{F}_{p}\). Then, for constants \(a_{i} \) (\(1 \leq i\leq n\)), we have
Remark 3
The key for proving this lemma is the use of the Minkowski’s inequality (see Lemma 8.6 in Bickel and Freedman [3] for the details), which does not need the independence condition among the two sets of the random variables. Therefore, the independence assumption on \((X_{1},X_{2},\ldots ,X_{n})\) as well as on \((Y_{1},Y_{2},\ldots ,Y_{n})\) is indeed not indispensable for guaranteeing the conclusion of the lemma.
Lemma 5
Let \(X,X_{1},X_{2},\ldots \)be a sequence of random variables with their distribution functions belonging to \(\mathscr{F}_{p}\). If \(d_{p}(X_{n}, X) \to 0 \)as \(n \to \infty \), then
Proof
The conditions imply that the distribution functions of \(X^{2}, X_{1}^{2}, X_{2}^{2}, \ldots \) belong to \(\mathscr{F}_{q}\). From Lemma 2, we have (i) \(X_{n} \rightsquigarrow X\); and (ii) \(E (X_{n}^{p} ) \to E (X^{p} )\). The continuous mapping theorem ([23], p.7) together with (i) yields \((iii) X_{n}^{2} \rightsquigarrow X^{2}\). The lemma is then proved according to (ii), (iii) and Lemma 2. □
Lemma 6
Let \(X_{1},X_{2},\ldots \)be an i.i.d. random variable sequence drawn from F with finite variance \({\sigma }^{2}\). Let \(F_{n}\)and \((X_{1}^{*}, X_{2}^{*},\ldots ,X_{n}^{*} )\)be the empirical distribution function and the bootstrap sample of \((X_{1},X_{2},\ldots ,X_{n} )\), respectively. Then, for almost all sample sequences of \(X_{1},X_{2},\ldots \) ,
Proof
The condition \({\sigma }^{2} <\infty \) implies that \(E (X_{i})\triangleq \mu \) exists. By the strong law of large numbers, we have, for almost all sample sequences of \(X_{1},X_{2},\ldots \) ,
Then the desired result can be proved by the continuous mapping theorem. □
Lemma 7
If \(X_{n}\)and \(Y_{n}\)are independent random variables for each n, then \(X_{n}\rightsquigarrow X\)and \(Y_{n}\rightsquigarrow Y\)imply that \((X_{n},Y_{n})\rightsquigarrow (X,Y)\)with X and Y being independent.
Proof
Because \(X_{n}\) and \(Y_{n}\) are independent random variables for every n, we have \(F_{(X_{n},Y_{n})}=F_{X_{n}}F_{Y_{n}}\). It follows from \(X_{n}\rightsquigarrow X\) and \(Y_{n}\rightsquigarrow Y\) that \(F_{(X_{n},Y_{n})}\rightarrow F_{X}F_{Y}\) for all continuous points of \(F_{X}F_{Y}\). Then the lemma is proved. □
Proofs of the theorems
In the proofs of the theorems, the following two cases will be separately considered because the proof ways are essentially different for the two cases.
Case 1 Suppose that \(W_{in} =\sqrt{\sum_{j=1}^{n}w_{ij}(d)}< \infty \) as \(n \to \infty \), which means that the number of observations within ddistance neighborhood of the reference location \(s_{i}\) will be fixed when n is large enough. For a local spatial statistic, this case is possible if the newly coming observations are all placed outside the ddistance neighborhood of the reference location \(s_{i}\) after n reaches some finite integer, say, \(n_{0}\).
Case 2 Assume \(W_{in} \rightarrow \infty \) as \(n \rightarrow \infty \), which implies that the number of observations within distance d of the reference location \(s_{i}\) goes to infinity as \(n \rightarrow \infty \).
Proof of Theorem 1
Note that
Since \(\bar{X}\stackrel{a.s.}{\longrightarrow }\mu \) as \(n \rightarrow \infty \), we have
Furthermore, the numerators of \(G_{i}(d)\) and \(G_{i}^{*}(d)\) can be, respectively, expressed as
and
For any \(\varepsilon >0\), by the Chebyshev inequality and the assumption that \(\frac{1}{n} W_{in}^{2}\rightarrow 0\) as \(n \rightarrow \infty \), we obtain
which implies
Similarly, we have
where \(S_{n}^{2}=\frac{1}{n}\sum_{i=1}^{n} (X_{i}\bar{X} )^{2}\). Since \(S_{n}^{2} \stackrel{a.s.}{\longrightarrow } \sigma ^{2} <\infty \) according to the strong law of large numbers, we have, for almost all sample sequences of \(X_{1},X_{2},\ldots \) ,
which implies
In Case 1, from Eqs. (8) and (10) and the Slutsky theorem, we have
where \(W_{i n_{0}}=\sqrt{\sum_{j=1}^{n_{0}} w_{ij}(d)}\). Similarly, from Eqs. (9) and (11), it can be inferred that \(\frac{1}{ W_{in}}\sum_{j=1}^{n} w_{ij}(d) (X_{j}^{*}\bar{X}^{*} )\) and \(\frac{1}{ W_{i n_{0}}}\sum_{j=1}^{n_{0}} w_{ij}(d) (X_{j}^{*}\bar{X} )\) have the same limiting distribution for almost all sample sequences of \(X_{1},X_{2},\ldots \) . Moreover, according to Lemmas 2, 3 and 4, we obtain
which implies that the distribution of \(\frac{1}{ W_{i n_{0}}}\sum_{j=1}^{n_{0}} w_{ij}(d) (X_{j}^{*}\bar{X} )\) converges to the distribution of \(Z_{0}\). Therefore, for almost all sample sequences of \(X_{1},X_{2},\ldots \) , we have
On the other hand, let \(G(d)\triangleq \frac{Z_{0}}{\mu }\) where \(\mu =E(X_{i})\neq 0\). From the Slutsky theorem, Eq. (12) and the fact that \(\frac{1}{n1}\sum_{j\neq i}X_{j}\stackrel{a.s.}{\longrightarrow }\mu \) as \(n \rightarrow \infty \), we have \(G_{i}(d)\rightsquigarrow G(d)\). Then, according to Lemma 1 and noting that the distribution function of \(G(d)\) is continuous, we have
Similarly, from Eqs. (7) and (13), we have
Therefore, the above two equations and the triangle inequality yields the conclusion of the theorem.
In Case 2, given an n, suppose that there are \(k_{n}\) observation locations within distance d of \(s_{i}\), leading to \(W_{in} = \sqrt{k_{n}}\) and \(k_{n} \to \infty \) as \(n \to \infty \). Without loss of generality, let \((X_{1},X_{2},\ldots ,X_{k_{n}} )\) locate within distance d of \(s_{i}\). Note that \(X_{j}\mu \) (\(j=1, \ldots , k_{n}\)) are i.i.d. random variables with finite variance \(\sigma ^{2}\) and \(k_{n} \rightarrow \infty \) as \(n \rightarrow \infty \). Therefore, according to the central limit theorem, we have
where Z stands for a random variable distributed as the normal distribution \(N(0,\sigma ^{2})\). It therefore follows from Eqs. (8) and (10) and the Slutsky theorem that
Similarly, because \(X_{j}^{*}\bar{X} \) (\(j=1, \ldots , k_{n}\)) are conditionally i.i.d. random variables with variance \(S_{n}^{2}=\frac{1}{n}\sum_{i=1}^{n} (X_{i}\bar{X} )^{2}\) and \(k_{n} \rightarrow \infty \) as \(n \rightarrow \infty \), then, according to the central limit theorem and noting \(S_{n}^{2} \stackrel{a.s.}{\longrightarrow } \sigma ^{2}\) as \(n \rightarrow \infty \), we have, for almost all sample sequences of \(X_{1},X_{2},\ldots \) ,
This, together with Eqs. (9) and (11) and the Slutsky theorem, yields
for almost all sample sequences of \(X_{1},X_{2},\ldots \) . Let \(G(d)\triangleq \frac{Z}{\mu }\). With a similar derivation to that in Case 1, the theorem is then proved in this case. □
Proof of Theorem 2
Notice that the numerator of \(I_{i}(d)\) can be expressed as
where
Firstly, from Eq. (10) and \(\bar{X} X_{i} \stackrel{a.s.}{\longrightarrow } \mu  X_{i}\), we have \(T_{1} \stackrel{P}{\longrightarrow } 0\) as \(n \rightarrow \infty \).
Secondly, as mentioned in the proof of Theorem 1, \(\frac{1}{ W_{in}} \sum_{j=1}^{n} w_{ij}(d) (X_{j}\mu )\) converges to \(Z_{0}\) and Z in distribution as \(n \rightarrow \infty \) in Cases 1 and 2, respectively. Therefore, from \(\mu \bar{X} \stackrel{a.s.}{\longrightarrow } 0\) and the Slutsky theorem, we have \(T_{2} \stackrel{P}{\longrightarrow } 0\) as \(n \rightarrow \infty \) in both cases.
Finally, because \((x,y)\mapsto yx\) is a continuous mapping, then, by Lemma 7 and the result that \(X_{i} \mu \) is independent from \(\frac{1}{ W_{in}}\sum_{j=1}^{n} w_{ij}(d) (X_{j}\mu )\), we have \(T_{3} \rightsquigarrow (X_{i} \mu ) Z_{0}\) and \(T_{3} \rightsquigarrow (X_{i} \mu ) Z\) as \(n \rightarrow \infty \) in Cases 1 and 2, respectively.
By the Slutsky theorem and Eq. (16), we obtain, as \(n \rightarrow \infty \),
in Case 1 and
in Case 2.
Let \(I(d)\triangleq \frac{(X_{i} \mu ) Z_{0}}{\sigma ^{2}}\) and \(I(d)\triangleq \frac{(X_{i} \mu ) Z}{\sigma ^{2}}\) in Cases 1 and 2, respectively. Since \(\frac{1}{n}\sum_{j=1}^{n}(X_{j}\bar{X})^{2} \stackrel{a.s.}{\longrightarrow } \sigma ^{2}\) as \(n \rightarrow \infty \), we know that \(I_{i}(d)\rightsquigarrow I(d)\) according to the Slutsky theorem. Therefore, Lemma 1 and the continuity of the distribution function of \(I(d)\) guarantee that
According to the triangle inequality, to prove Theorem 2, it is sufficient to prove
In a similar way to that in dealing with the quantity of the lefthand side in Eq. (16), we rewrite the numerator of \(I_{i}^{*}(d)\) as
where
First of all, we obtain \(T_{1}^{*}\stackrel{P^{*}}{\longrightarrow }0\) as \(n \rightarrow \infty \) according to Eq. (11) and \(\bar{X}^{*}X_{i}^{*} \stackrel{a.s.}{\longrightarrow } \mu  X_{i}^{*}\) as \(n \rightarrow \infty \) for almost all sample sequences of \(X_{1},X_{2},\ldots \) .
Then, for almost all sample sequences of \(X_{1},X_{2},\ldots \) , we have
in Case 1 and
in Case 2. Furthermore, it follows from the Slutsky theorem and \(\bar{X} \bar{X}^{*} \stackrel{a.s.}{\longrightarrow } 0\) as \(n \rightarrow \infty \) that \(T_{2}^{*}\stackrel{P^{*}}{\longrightarrow }0\) as \(n \rightarrow \infty \) in both cases.
Finally, it is known that
which implies \(X_{i}^{*}\bar{X}\rightsquigarrow X_{i}\mu \) as \(n \rightarrow \infty \). Then, according to Lemma 7 and the result that \(X_{i}^{*} \bar{X}\) is conditionally independent to \(\frac{1}{ W_{in}}\sum_{j=1}^{n} w_{ij}(d) (X_{j}^{*}\bar{X})\), we obtain \(T_{3}^{*} \rightsquigarrow (X_{i} \mu ) Z_{0}\) and \(T_{3}^{*} \rightsquigarrow (X_{i} \mu ) Z\) as \(n \rightarrow \infty \) in Cases 1 and 2, respectively.
According to the Slutsky theorem, it follows from Lemma 6 and Eq. (17) that \(I_{i}^{*}(d) \rightsquigarrow I(d)\) as \(n \rightarrow \infty \) in both cases. Noting the continuity of the distribution function of \(I(d)\) and using Lemma 1 and the triangle inequality, Theorem 2 is then proved. □
Proof of Theorem 3
In Case 1, since \(W_{in}=\sqrt{\sum_{j=1}^{n} w_{ij}(d)}< \infty \) as \(n \rightarrow \infty \), we can write \(W_{in}=W_{in_{0}}=\sqrt{\sum_{j=1}^{n_{0}} w_{ij}(d)}\) for some positive integer \(n_{0}\). According to the triangle inequality, the Hölder inequality and Lemma 3, we have
Then it follows from Lemmas 4 and 5 that
which implies that both \(\frac{1}{W_{in}^{2}}\sum_{j=1}^{n} w_{ij}(d) (X_{i}^{*}X_{j}^{*})^{2}\) and \(\frac{1}{W_{in}^{2}}\sum_{j=1}^{n} w_{ij}(d) (X_{i}X_{j})^{2}\) converge to \(\frac{1}{W_{in_{0}}^{2}}\sum_{j=1}^{n_{0}} w_{ij}(d) (X_{i}X_{j})^{2}\triangleq T\) in distribution as \(n \rightarrow \infty \).
Let \(c(d)=\frac{T}{\sigma ^{2}}\). From the fact that \(S_{n}^{2}=\frac{1}{n}\sum_{i=1}^{n} (X_{i}\bar{X} )^{2} \stackrel{a.s.}{\longrightarrow } \sigma ^{2}\) as \(n \rightarrow \infty \) and the Slutsky theorem, we have \(c_{i}(d)\rightsquigarrow c(d)\) as \(n \rightarrow \infty \). According to Lemma 1 and the continuity of the distribution function of \(c(d)\), we obtain
Similarly, from Lemma 6, we have
Then the theorem is proved by using the triangle inequality.
In Case 2, since \(W_{in}\rightarrow \infty \) as \(n \rightarrow \infty \), we can rewrite the numerator of \(c_{i}(d)\) as
where
With the same argument as in the proof of Theorem 1, we have
According to the strong law of large numbers, we obtain \(B \stackrel{a.s.}{\longrightarrow } 0\) as \(n \rightarrow \infty \).
It follows from the Markovian inequality that
that is,
Then the Slutsky theorem together with the result that \(\frac{1}{W_{in}}\sum_{j=1}^{n} w_{ij}(d)(X_{j} \mu )\rightsquigarrow Z\) as \(n \rightarrow \infty \) guarantees \(C \stackrel{P}{\longrightarrow } 0\) as \(n \rightarrow \infty \).
Applying the Slutsky theorem to Eq. (18), we have
Let \(c(d)\triangleq \frac{A}{\sigma ^{2}}\). From the Slutsky theorem and \(S_{n}^{2} \stackrel{a.s.}{\longrightarrow }\sigma ^{2}\), we obtain \(c_{i}(d)\rightsquigarrow c(d)\). Then, according to Lemma 1 and the assumption that the distribution function of \(c(d)\) is continuous, we obtain
By the triangle inequality for the Kolmogorov distance, it is then sufficient to prove
Similarly, the numerator of \(c_{i}^{*}(d)\) can be expressed as
where
Firstly, according to Lemmas 3, 4 and 5, we obtain
which implies that the distribution of \((X_{i}^{*} )^{2}2\mu X_{i}^{*}+\mu ^{2}+\sigma ^{2}\) converges to the distribution of A. According to the strong law of large numbers, we therefore obtain \(A^{*}\rightsquigarrow A\) as \(n \rightarrow \infty \) for almost all sample sequences of \(X_{1},X_{2},\ldots \) .
Moreover, with the same argument in the proof of Theorem 1, we write \(B^{*}\) as
Noting that \((X_{j}^{*} )^{2} (\bar{X} )^{2} S_{n}^{2} \) (\(j=1,\ldots ,n\)) are conditionally i.i.d. random variables and using the strong law of large numbers, we obtain \(B^{*}\stackrel{a.s.}{\longrightarrow }0\) as \(n \to \infty \) for almost all sample sequences of \(X_{1},X_{2},\ldots \) .
Finally, for any \(\varepsilon >0\), we obtain from the Markovian inequality that
Because \(\frac{1}{n}\sum_{j=1}^{n}X_{j}^{2} \stackrel{a.s.}{\longrightarrow } \mu ^{2}+\sigma ^{2} < \infty \), we have, for almost all sample sequences of \(X_{1},X_{2},\ldots \) ,
That is, for almost all sample sequences of \(X_{1},X_{2},\ldots \) , it is true that
which implies that \(C^{*}\stackrel{P^{*}}{\longrightarrow }0\) as \(n \rightarrow \infty \) for almost all sample sequences.
In conclusion, according to the Slutsky theorem and Lemma 6, we obtain \(c_{i}^{*}(d) \rightsquigarrow c(d)\) as \(n \rightarrow \infty \) for almost all sample sequences of \(X_{1},X_{2},\ldots \) . Equation (19) is then proved according to Lemma 1. □
Remark 4
In the proofs of the three theorems, different ways are used to prove the consistency of the bootstrap approximations for Cases 1 and 2. For Case 2, the distributions of each local statistic and its bootstrap scenario are bridged by a same normal distribution. Therefore, it can be inferred that the bootstrap approximation performs at least as well as the normal distribution in this case. For Case 1, however, the numerator of each statistic is the sum of a fixed number of random variables in the process of \(n \rightarrow \infty \). The limit distribution of each statistic cannot be a normal distribution if the population for drawing the sample does not follow a normal distribution. Therefore, the normal approximation fails to approximate the null distribution of each statistic in this case, but the bootstrap approximation still works according to the proof of each theorem, which is possibly the main reason for the empirical finding that the normal approximation is sometimes problematic as mentioned in the introduction. In practice, the neighbors of a reference location is generally very few relatively to the sample size and, as aforementioned, the bootstrap method can provide a valid approximation to the null distribution of each local statistic. In summary, the bootstrap approximation outperforms the normal approximation especially in practice.
Application to the spatial pattern detection of the Boston housing price data
In order to demonstrate the application of the bootstrap approximations, a realworld example based on the Boston housing price data is analyzed for the significance test of local spatial association. As mentioned in Remark 4, the bootstrap method can provide a valid approximation for the null distribution of each local statistic. However, for a localstatisticbased test with the bootstrap approximation, some other issues such as the MonteCarlo implementation of the bootstrap method and the multiple test problem should be considered. The purpose of this section is to provide a full process of using the bootstrap approximation in practice.
Description of the data set and determination of the spatial linkage matrix
The Boston housing price data set, which is publicly available in the R package spdep (http://eran.rproject.org/), consists of observations of the median house value (in $1000) of owneroccupied homes and 13 explanatory variables in 506 US census tracts of the Boston area in 1970. Moreover, a list of influential neighbors for each tract is also attached, where a tract is an influential neighbor of another tract if these two tracts share a common part of the boundary.
Here, we chose the median house value, which we denoted by X henceforth, as the target variable to detect its spatial variation patterns based on the observations \(x_{1}, x_{2}, \ldots , x_{n}\) of X in the \(n=506\) census tracts. The spatial linkage matrix \(W=(w_{ij})_{n\times n}\) was obtained from the list of influential neighbors of each tract. Specifically, let \(w_{ij}=1\) if tract j is the influential neighbor of tract i; \(w_{ij}=0\) if otherwise; and \(w_{ii}=0\) by convention. The number of neighbors for the 506 census tract ranges from 1 to 8 with the averaged value being 4.25 which is much smaller than the sample size \(n=506\).
First of all, we conducted the Kolmogorov–Smirnov test for the normality of the observations of the target variable X. The pvalue of the test is \(p=0.0000\), providing strong evidence of nonnormality of the observations. As mentioned in Remark 4, the normal approximation to the null distributions of the three local statistics is problematic while the bootstrap approximation works for this data set.
Monte Carlo implementation of the bootstrap distribution functions
In general, the exact bootstrap distribution of a statistic is difficult to derive, although it is theoretically known for a given sample drawn from the population. In practice, Monte Carlo simulation is commonly used to compute the bootstrap distribution of the statistic. Here, we take the Getis and Ord’s \(G_{i}\) statistic (we omit the distance threshold d in the statistic here because the spatial linkage matrix was determined without using it explicitly) as an example to show the Monte Carlo procedure. The procedure for the other two statistics is essentially the same.
Let \(x_{1}, x_{2}, \ldots , x_{n}\) be the observations of the target variable X with \(x_{i}\) located at the location \(s_{i}\). Given a reference location \(s_{i}\), the Monte Carlo procedure for approximating the bootstrap distribution of \(G_{i}\) is as follows.
Step 1. Draw with replacement a bootstrap sample \((x_{1}^{*}, x_{2}^{*}, \ldots , x_{n}^{*})\) from \((x_{1}, x_{2}, \ldots , x_{n})\). Specifically, for each of \(k=1, 2, \ldots , n\), draw a random number u from the uniform distribution \(U(0,1)\), and let \(x_{k}^{*}=x_{[nu]+1}\).
Step 2. Compute the bootstrap value \(G_{i}^{*}\) of \(G_{i}\) according to Eq. (4).
Step 3. Repeat Steps 1 and 2 for N times and obtain N bootstrap values of \(G_{i}\) which we denote \(G_{i(1)}^{*}, G_{i(2)}^{*}, \ldots , G_{i(N)}^{*}\).
Step 4. Compute the empirical distribution function of \(G_{i(1)}^{*}, G_{i(2)}^{*}, \ldots , G_{i(N)}^{*}\) and take it as an estimator of the bootstrap distribution of \(G_{i}\). That is, for each real number x, the bootstrap distribution function of \(G_{i}\) is approximated by
Spatial association detection of the Boston housing price data
Alternative hypotheses and pvalues of the tests
As pointed out by Getis and Ord [11], \(G_{i}\) measures the concentration or lack concentration of the values associated with the variable X on the reference location \(s_{i}\). Therefore, \(G_{i}\) is commonly used to identify a location which is surrounded by large values or small values of X in its neighborhood. \(I_{i}\) and \(c_{i}\) can be employed to test whether the value of X located at the reference location \(s_{i}\) is similar (local positive autocorrelation) or dissimilar (local negative autocorrelation) to those located at its neighbors. To be specific, we mainly focused in this case study on identifying such a location that is surrounded by large values located at its neighbors by \(G_{i}\), that is, a location with extremely large value of \(G_{i}\), and testing local positive autocorrelation using \(I_{i}\) and \(c_{i}\), that is, a location with extremely large value of \(I_{i}\) or with extremely small value of \(c_{i}\). These above objectives amount to the \(G_{i}\), \(I_{i}\) and \(c_{i}\)based tests for the following alternative hypotheses, respectively:
 \(\text{H}_{1G}\)::

a tract surrounded by its neighbors with high housing price;
 \(\text{H}_{1I}\)::

a tract with the housing price being positively correlated to those in its neighbors;
 \(\text{H}_{1c}\)::

a tract with the housing price being similar to those in its neighbors.
The above alternative hypotheses all lead to onesided tests. Specifically, the pvalue of the \(G_{i}\) test derived by the bootstrap distribution in Eq. (21) is
where \(G_{i}^{(0)}\) is the observed value of \(G_{i}\) at the reference location \(s_{i}\) and is computed according to Eq. (1) with the sample \((X_{1}, X_{2}, \ldots , X_{n})\) replaced by its observed value \((x_{1}, x_{2}, \ldots , x_{n})\). Similarly, the pvalues of the \(I_{i}\) test and the \(c_{i}\) test are, respectively,
and
where \(I_{i}^{(0)}\) and \(c_{i}^{(0)}\) are the observed values of \(I_{i}\) and \(c_{i}\) computed according to Eqs. (2) and (3), respectively.
Method for dealing with multiple testing problem
When a local statistic is used to identify local spatial association of georeferenced data, the test is generally performed at each location over the study region based on the same observations, which involves the multiple testing problem. Therefore, a given overall significance level, say α, should be properly adjusted in order to control the overall type I error to be less than α. Although the commonly used Bonferroni and Sidák criterions can readily be used here for adjusting the overall significance level, both methods are very conservative especially when the sample size is large [1]. Caldas and Singer [6] have used the socalled false discovery rate (FDR) criterion, developed by Benjamini and Hochberg [2], to handle the multiple testing problem associated with local spatial statistics and the results demonstrated that the FDR criterion is much more powerful than the Bonferroni and the Sidák methods. Therefore, the FDR criterion is employed here for dealing with the multiple testing problem in the analysis of the Boston housing price data with the \(G_{i}\), \(I_{i}\) and \(c_{i}\) statistics. We introduce in what follows the FDR criterion in its general case.
Suppose that a total of K tests are simultaneously conducted based on a local statistic and the resultant pvalues are \(p_{1}, p_{2},\ldots , p_{K}\), respectively. Sort the pvalues in ascending order as \(p_{(1)}\leq p_{(2)} \leq \cdots \leq p_{(K)}\), and let
where α is the given overall significance level. The adjusted significance level for each individual test is \(\alpha _{A}=\frac{k_{0}}{K} \alpha \).
Testing results with analysis
For the Boston housing price data, the sample size is \(n=506\). Given each of the three local statistics \(G_{i}\), \(I_{i}\) and \(c_{i}\), the bootstrap procedure was used to compute the pvalue at each of the 506 tracts, in which the number of the bootstrap replications is \(N=500\). The overall significance level was set to be \(\alpha =0.05\). Using the FDR criterion, we saw that the adjusted significance levels are \(\alpha _{A}^{G}=0.00198\) for \(G_{i}\), \(\alpha _{A}^{I}=0.00573\) for \(I_{i}\), and \(\alpha _{A}^{c}=0.01107\) for \(c_{i}\), respectively. The maps of the testing results are shown in Fig. 1, where the black areas represent the tracts with the original pvalues being less than the overall significance level \(\alpha =0.05\) (lefthand column) or less than the corresponding adjusted significance levels (righthand column).
The result of the \(G_{i}\) test (panels in the first row) shows that the tracts with high housing price concentration appear mainly in the middle western region. After the adjustment of the overall significance level, only a few of tracts show the pattern that they are surrounded by their respective neighbors with high housing price.
The result of the \(I_{i}\) test (panels in the second row) shows a similar pattern to that of the \(G_{i}\) test especially under the overall significance level of \(\alpha =0.05\). That is, the tracts with similar housing price to those of their respective neighbors also locate on the middle western region except for some tracts on the middle eastern part. After the significance level is adjusted to \(\alpha _{A}^{I}=0.0057\), a belt region where positive spatial autocorrelation is significant is clearly shown. By the combination of the results from \(G_{i}\) and \(I_{i}\) tests, we know that these common tracts colored in black and their respective neighbors all share high housing price, indicating “hot” spots of housing price in the Boston area.
The result of the \(c_{i}\) test (panels in the last row) demonstrates a totally opposite spatial pattern to that of the \(I_{i}\) test for the significant tracts, although both tests focus on detecting such tracts that share a similar housing price with their respective neighbors. Given the foregoing analysis showing that the \(I_{i}\) test uncovers the tracts with high housing price sharing with their respective neighbors, it can be inferred that the \(c_{i}\) test clarifies such tracts that share low house price with their respective neighbors. According to the structures of the \(I_{i}\) and \(c_{i}\) statistics, the opposite spatial patterns identified by the \(I_{i}\) and the \(c_{i}\) tests may imply that a large difference generally exists in the high housing price shared by a reference tract and its neighbors, while the low housing prices shared by a reference tract and its neighbors are relatively homogeneous. Moreover, it can be observed from the figure that the tracts sharing low housing price with their respective neighbors are more separately spatially distributed than those sharing high housing price with their respective neighbors. That is to say, the “cool” spots in the housing price are separately spatially distributed and the “hot” spots crowd in space.
Final remarks
There has been a growing interest in using local statistics to explore local patterns of spatial association in georeferenced data, in which the null distributions of the local statistics play a key role in the related statistical inference. Considering that the bootstrap method can well account for nonnormality of data and can easily be implemented with modern computers, we propose in this paper a bootstrap method to approximate the null distributions of the commonly used local spatial statistics of Getis and Ord’s \(G_{i}\), Moran’s \(I_{i}\) and Geary’s \(c_{i}\). More importantly, strong consistency of the bootstrap approximation is established, which provides not only a theoretical basis for using the bootstrap method to approximate the null distributions of these three statistics, but also some evidence that normal approximation sometimes fails to approximate the null distributions of these local statistics. Furthermore, the practical implementation procedure of the local spatial statistics based bootstrap tests is fully given by a case study of the Boston housing price data.
Methodologically, the bootstrap procedure can readily be used to approximate the null distributions of other local spatial statistics such as Ord and Getis’s LOSH statistic [19, 25]. However, establishing a common theoretical framework for the validity of the bootstrap approximation seems not easy. Therefore, consistency of the bootstrap approximation for other local spatial statistics or, furthermore, convergence rate of the current bootstrap approximation deserves to be investigated in the future research.
References
 1.
Anselin, L.: Local indicators of spatial association—LISA. Geogr. Anal. 27(2), 93–115 (1995)
 2.
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57(1), 289–300 (1995)
 3.
Bickel, P.J., Freedman, D.A.: Some asymptotic theory for the bootstrap. Ann. Stat. 9(6), 1196–1217 (1981)
 4.
Bivand, R., Müller, W.G., Reder, M.: Power calculations for global and local moran’s I. Comput. Stat. Data Anal. 53(8), 2859–2872 (2009)
 5.
Boots, B., Tiefelsdorf, M.: Global and local spatial autocorrelation in bounded regular tessellations. J. Geogr. Syst. 2(4), 319–348 (2000)
 6.
Caldas, M.C., Singer, B.H.: Controlling the false discovery rate: a new application to account for multiple and dependent tests in local statistics of spatial association. Geogr. Anal. 38(2), 180–208 (2006)
 7.
DasGupta, A.: Asymptotic Theory of Statistics and Probability. Springer, New York (2008)
 8.
Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7(1), 1–26 (1979)
 9.
Getis, A.: Spatial filtering in a regression framework: examples using data on urban crime, regional inequality, and government expenditures. In: Anselin, L., Rey, S.J. (eds.) Perspectives on Spatial Data Analysis, pp. 191–202. Springer, Berlin (2010)
 10.
Getis, A., Griffith, D.A.: Comparative spatial filtering in regression analysis. Geogr. Anal. 34(2), 130–140 (2002)
 11.
Getis, A., Ord, J.K.: The analysis of spatial association by use of distance statistics. Geogr. Anal. 24(3), 189–206 (1992)
 12.
Hardisty, F., Klippel, A.: Analysing spatiotemporal autocorrelation with LISTAViz. Int. J. Geogr. Inf. Sci. 24(10), 1515–1526 (2010)
 13.
Leung, Y., Mei, C.L., Zhang, W.X.: Statistical test for local patterns of spatial association. Environ. Plan. A 35(4), 725–744 (2003)
 14.
Liu, X.Q., Sun, T.S., Li, G.P.: Spatial analysis of industry clusters based on local spatial statistics: a case study of Beijing manufacturing industry clusters. Sci. Geogr. Sin. 32(5), 530–535 (2012)
 15.
Major, P.: On the invariance principle for sums of independent identically distributed random variables. J. Multivar. Anal. 8(4), 487–517 (1978)
 16.
Mallows, C.L.: A note on asymptotic joint normality. Ann. Math. Stat. 43(2), 508–515 (1972)
 17.
Mclaughlin, C.C., Boscoe, F.P.: Effects of randomization methods on statistical inference in disease cluster detection. Health Place 13(1), 152–163 (2007)
 18.
Ord, J.K., Getis, A.: Local spatial autocorrelation statistics: distributional issues and an application. Geogr. Anal. 27(4), 286–306 (1995)
 19.
Ord, J.K., Getis, A.: Local spatial heteroscedasticity (LOSH). Ann. Reg. Sci. 48(2), 529–539 (2012)
 20.
Tiefelsdorf, M.: Some practical applications of Moran’s I’s exact conditional distribution. Pap. Reg. Sci. 77(2), 101–129 (1998)
 21.
Tiefelsdorf, M.: The saddlepoint approximation of Moran’s I’s and local Moran’s \(I_{i}\)’s reference distributions and their numerical evaluation. Geogr. Anal. 34(3), 187–206 (2002)
 22.
Tiefelsdorf, M., Boots, B.: The exact distribution of Moran’s I. Environ. Plan. A 27(6), 985–999 (1995)
 23.
van der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, New York (2000)
 24.
Xie, Z., Yan, J.: Detecting traffic accident clusters with network kernel density estimation and local spatial statistics: an integrated approach. J. Transp. Geogr. 31(5), 64–71 (2013)
 25.
Xu, M., Mei, C.L., Yan, N.: A note on the null distribution of the local spatial heteroscedasticity (LOSH) statistic. Ann. Reg. Sci. 52(3), 697–710 (2014)
 26.
Yan, N., Mei, C.L., Wang, N.: A unified bootstrap test for local patterns of spatiotemporal association. Environ. Plan. A 47(1), 227–242 (2015)
 27.
Zhang, T.: Limiting distribution of the G statistics. Stat. Probab. Lett. 78(12), 1656–1661 (2008)
Acknowledgements
The authors would like to thank the reviewer for his/her valuable comments and suggestions, which led to significant improvement on the manuscript.
Availability of data and materials
The realworld data set is available in the R package “spdep” linked to http://eran.rproject.org/.
Funding
This work was supported by the National Nature Science Foundation of China (Nos. 11871056 and 11271296).
Author information
Affiliations
Contributions
CLM contributed the idea, formulated the methodology and wrote part of the original draft; SFX completed the theoretical proofs and wrote part of the original draft. FC performed the computation of the realword example. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mei, CL., Xu, SF. & Chen, F. Consistency of bootstrap approximation to the null distributions of local spatial statistics with application to house price analysis. J Inequal Appl 2020, 217 (2020). https://doi.org/10.1186/s1366002002482x
Received:
Accepted:
Published:
MSC
 62G09
 62G20
Keywords
 Local spatial statistic
 Bootstrap
 Strong consistency
 Kolmogorov distance
 Mallows distance