Strong representation results of the Kaplan-Meier estimator for censored negatively associated data

Wu, Qunying; Chen, Pingyan

doi:10.1186/1029-242X-2013-340

Research
Open access
Published: 25 July 2013

Strong representation results of the Kaplan-Meier estimator for censored negatively associated data

Qunying Wu¹ &
Pingyan Chen²

Journal of Inequalities and Applications volume 2013, Article number: 340 (2013) Cite this article

1828 Accesses
6 Citations
Metrics details

Abstract

In this paper, we discuss the strong convergence rates and strong representation of the Kaplan-Meier estimator and the hazard estimator based on censored data when the survival and the censoring times form negatively associated (NA) sequences. Under certain regularity conditions, strong convergence rates are established for the Kaplan-Meier estimator and the hazard estimator, and the Kaplan-Meier estimator and the hazard estimator can be expressed as the mean of random variables, with the remainder of order $n^{- 1 / 2} {ln}^{1 / 2} n$ a.s.

MSC:60F15, 60F05.

1 Introduction and main results

Let ${T_{i}; i \geq 1}$ be a sequence of true survival times. Random variables (r.v.s) are not assumed to be mutually independent; it is assumed, however, that they have a common unknown continuous marginal distribution function (d.f.) $F (x) = P (T_{i} \leq x)$ such that $F (0) = 0$ . Let the r.v.s $T_{i}$ be censored on the right by the censoring r.v.s $Y_{i}$ , so that one observes only $(Z_{i}, δ_{i})$ , where

Z_{i} = min (T_{i}, Y_{i}) : = T_{i} \land Y_{i} and δ_{i} = I (T_{i} \leq Y_{i}), i = 1, \dots, n .

Here and in the sequel, $I (A)$ is the indicator random variable of the event A. In this random censorship model, the censoring times $Y_{i}$ , $i = 1, \dots, n$ , are assumed to have the common distribution function $G (y) = P (Y_{i} \leq y)$ such that $G (0) = 0$ ; they are also assumed to be independent of the r.v.s $T_{i}$ ’s. The problem at hand is that of drawing nonparametric inference about F based on the censored observations $(Z_{i}, δ_{i})$ , $i = 1, \dots, n$ . For this purpose, define two stochastic processes on $[0, \infty)$ as follows:

N_{n} (t) = \sum_{k = 1}^{n} I (Z_{k} \leq t, δ_{k} = 1) = \sum_{k = 1}^{n} I (T_{k} \leq t \land Y_{k}),

the number of uncensored observations less than or equal to t, and

Y_{n} (t) = \sum_{k = 1}^{n} I (Z_{k} \geq t),

the number of censored or uncensored observations greater than or equal to t. The following nonparametric estimation ${\hat{F}}_{n}$ of F due to Kaplan and Meier [1] is widely used to estimate F on the basis of the data $(Z_{i}, δ_{i})$ :

{\hat{F}}_{n} (x) = 1 - \prod_{s \leq x} (1 - \frac{d N_{n} (s)}{Y_{n} (s)}),

where $d N_{n} (s) = N_{n} (s) - N_{n} (s -)$ .

Let L be the distribution of the $Z_{i}$ ’s, $\bar{L} : = 1 - L$ . Since the sequences ${T_{n}; n \geq 1}$ and ${Y_{n}; n \geq 1}$ are independent, it follows that $L = 1 - \bar{F} \bar{G} = 1 - (1 - F) (1 - G)$ . The empirical d.f. $L_{n} (t)$ of L is defined by

L_{n} (t) : = \frac{1}{n} \sum_{k = 1}^{n} I (Z_{k} < t) = 1 - \frac{Y_{n} (t)}{n} : = \frac{{\bar{Y}}_{n} (t)}{n},

where ${\bar{Y}}_{n} (t) = \sum_{k = 1}^{n} I (Z_{k} < t)$ .

Define (possibly infinite) times $τ_{F}$ , $τ_{G}$ and $τ_{L}$ by

τ_{F} = inf {y; F (y) = 1}, τ_{G} = inf {y; G (y) = 1}, τ_{L} = inf {y; L (y) = 1} .

Then $τ_{L} = τ_{F} \land τ_{G}$ . By setting

F_{*} (t) = P (Z_{1} \leq t, δ_{1} = 1) = P (T_{1} \leq t \land Y_{1}),

and the empirical d.f. of $F_{*}$ is defined by

F_{* n} (t) : = \frac{1}{n} \sum_{k = 1}^{n} I (Z_{k} \leq t, δ_{k} = 1) = \frac{N_{n} (t)}{n} .

We have then

F_{*} (t) = \int_{0}^{\infty} F (t \land z) d G (z) = \int_{0}^{t} \bar{G} (z) d F (z),

and

d F_{*} (t) = \bar{G} (t) d F (t) .

Another question of interest in survival analysis is the estimation of the hazard function h defined as follows when it is further assumed that F has a density f:

h (x) : = \frac{d}{d x} (- log \bar{F} (x)) = f (x) / \bar{F} (x) for F (x) < 1,

with $\bar{F} = 1 - F$ . The quantity

H (x) = - log \bar{F} (x) = \int_{0}^{x} \frac{d F (s)}{\bar{F} (s)} = \int_{0}^{x} \frac{d F_{*} (s)}{\bar{L} (s)}

(1.1)

is called the cumulative hazard function. The empirical cumulative hazard function ${\hat{H}}_{n} (x)$ is given by

{\hat{H}}_{n} (x) : = \int_{0}^{x} \frac{d N_{n} (s)}{Y_{n} (s)} = \int_{0}^{x} \frac{d F_{* n} (s)}{{\bar{L}}_{n} (s)},

(1.2)

where ${\bar{L}}_{n} (s) = 1 - L_{n} (s)$ .

Since $N_{n} (t)$ is a step function, and $d N_{n} (Z_{(k)}) = δ_{(k)}$ , $k = 1, 2, \dots, n$ , it can be easily seen that

{\hat{H}}_{n} (x) = \sum_{k = 1}^{n} \frac{I (Z_{(k)} \leq x, δ_{(k)} = 1)}{n - k + 1},

(1.3)

and

{\hat{F}}_{n} (x) = 1 - \prod_{k = 1}^{n} {(1 - \frac{δ_{(k)}}{n - k + 1})}^{I (Z_{(k)} \leq x)} = 1 - \prod_{k = 1}^{n} {(\frac{n - k}{n - k + 1})}^{I (δ_{(k)} = 1, Z_{(k)} \leq x)},

(1.4)

where $Z_{(1)} \leq Z_{(2)} \leq \dots \leq Z_{(n)}$ denote the order statistics of $Z_{1}, Z_{2}, \dots, Z_{n}$ , and $δ_{(i)}$ is the concomitant of $Z_{(i)}$ .

There is extensive literature on the Kaplan-Meier and the hazard estimator ${\hat{F}}_{n} (x)$ and ${\hat{H}}_{n} (x)$ for censored independent observations. We refer to papers by Breslow and Crowley [2], Foldes and Rejto [3] and Gu and Lai [4]. Martingale methods for analyzing properties of ${\hat{F}}_{n} (x)$ are described in the monograph by Gill [5]. However, the censored dependent data appear in a number of applications. For example, repeated measurements in survival analysis follow this pattern, see Kang and Koehler [6] or Wei et al. [7]. In the context of censored time series analysis, Shumway et al. [8] considered (hourly or daily) measurements of the concentration of a given substance subject to some detection limits, thus being potentially censored from the right. Ying and Wei [9], Lecoutre and Ould-Saïd [10], Cai [11] and Liang and Uña-Álvarez [12] studied the convergence of ${\hat{F}}_{n} (x)$ for the stationary α-mixing data.

The main purpose of this paper is to study the strong convergence rates and strong representation of the Kaplan-Meier estimator and the hazard estimator based on censored data when the survival and the censoring times form the NA (see the following definition) sequences. Under certain regularity conditions, we find strong convergence rates of the Kaplan-Meier and hazard estimator, and the expression of the Kaplan-Meier estimator and the hazard estimator as the mean of random variables, with the remainder of order $n^{- 1 / 2} {ln}^{1 / 2} n$ a.s.

Definition Random variables $X_{1}, X_{2}, \dots, X_{n}$ , $n \geq 2$ are said to be negatively associated (NA) if for every pair of disjoint subsets $A_{1}$ and $A_{2}$ of ${1, 2, \dots, n}$ ,

cov (f_{1} (X_{i}; i \in A_{1}), f_{2} (X_{j}; j \in A_{2})) \leq 0,

where $f_{1}$ and $f_{2}$ are increasing for every variable (or decreasing for every variable) so that this covariance exists. A sequence of random variables ${X_{i}; i \geq 1}$ is said to be NA if every finite subfamily is NA.

Obviously, if ${X_{i}; i \geq 1}$ is a sequence of NA random variables, and ${f_{i}; i \geq 1}$ is a sequence of nondecreasing (or non-increasing) functions, then ${f_{i} (X_{i}); i \geq 1}$ is also a sequence of NA random variables.

This definition was introduced by Joag-Dev and Proschan [13]. A statistical test depends greatly on sampling. The random sampling without replacement from a finite population is NA, but is not independent. NA sampling has wide applications such as in multivariate statistical analysis and reliability theory. Because of the wide applications of NA sampling, the limit behaviors of NA random variables have received more and more attention recently. One can refer to Joag-Dev and Proschan [13] for fundamental properties, Matula [14] for the three series theorem, and Wu and Jiang [15, 16] for the strong convergence.

We give two lemmas, which are helpful in proving our theorems.

Lemma 1.1 (Yang [17], Lemma 1)

Let ${X_{i}; i \geq 1}$ be a sequence of negatively associated random variables with zero means and $| X_{i} | \leq b_{i}$ , a.s. ( $i = 1, 2, \dots$ ). Let $t > 0$ be such that $t {max}_{1 \leq i \leq n} b_{i} \leq 1$ . Then, for all $ε > 0$ ,

P (| \sum_{i = 1}^{n} X_{i} | \geq ε) \leq 2 exp (- t ε + t^{2} \sum_{i = 1}^{n} E X_{i}^{2}) .

Lemma 1.2 Let ${X_{i}; i \geq 1}$ be a sequence of NA r.v.s with continuous d.f. F, and let $F_{n} (x) : = \frac{1}{n} \sum_{i = 1}^{n} I (X_{i} < x)$ be the empirical d.f. based on the segments $X_{1}, \dots, X_{n}$ . Then

sup_{x \in R} | F_{n} (x) - F (x) | = O (n^{- 1 / 2} {ln}^{1 / 2} n) a.s.

Proof Similar to the proof of Lemma 4 in Yang [17], we can prove Lemma 1.2. □

Theorem 1.3 Let ${T_{n}; n \geq 1}$ and ${Y_{n}; n \geq 1}$ be two sequences of NA random variables. Suppose that the sequences ${T_{n}; n \geq 1}$ and ${Y_{n}; n \geq 1}$ are independent. Then, for any $0 < τ < τ_{L}$ ,

sup_{0 \leq t \leq τ} | {\hat{H}}_{n} (t) - H (t) | = O (a_{n}) a.s.

(1.5)

and

sup_{0 \leq t \leq τ} | {\hat{F}}_{n} (t) - F (t) | = O (a_{n}) a.s.,

(1.6)

here and in the sequel, $a_{n} = n^{- 1 / 2} {(ln n)}^{1 / 2}$ .

For positive reals z and t, and δ taking value 0 or 1, let

ξ (z, δ, t) = g (z \land t) - I (z \leq t, δ = 1) / \bar{L} (z),

(1.7)

where $g (x) = \int_{0}^{x} \frac{d F_{*} (s)}{{\bar{L}}^{2} (s)}$ .

Theorem 1.4 Assume that the conditions of Theorem 1.3 hold. Then

{\hat{H}}_{n} (t) - H (t) = - \frac{1}{n} \sum_{i = 1}^{n} ξ (Z_{i}, δ_{i}, t) + r_{1 n} (t)

(1.8)

and

{\hat{F}}_{n} (t) - F (t) = - \frac{\bar{F} (t)}{n} \sum_{i = 1}^{n} ξ (Z_{i}, δ_{i}, t) + r_{2 n} (t),

(1.9)

where ${sup}_{0 \leq t \leq τ} | r_{i n} (t) | = O (a_{n})$ a.s. $i = 1, 2$ , $0 < τ < τ_{L}$ .

2 Proofs

Proof of Theorem 1.3 It is easy to see from Property P₇ of Joag-Dev and Proschan [13] that ${Z_{n}; n \geq 1}$ and ${(Z_{n}, δ_{n}); n \geq 1}$ are also two sequences of NA r.v.s. Therefore

sup_{t \geq 0} | L_{n} (t) - L (t) | = O (a_{n}) a.s.

(2.1)

and

sup_{t \geq 0} | F_{* n} (t) - F_{*} (t) | = O (a_{n}) a.s.

(2.2)

follow from Lemma 1.2 and the fact that both $L_{n}$ and $F_{* n}$ are empirical distribution functions of L and $F_{*}$ .

Now, by (1.1) and (1.2), let us write

\begin{aligned} {\hat{H}}_{n} (t) - H (t) = & \int_{0}^{t} \frac{d F_{* n} (s)}{{\bar{L}}_{n} (s)} - \int_{0}^{t} \frac{d F_{*} (s)}{\bar{L} (s)} \\ = & \int_{0}^{t} (\frac{1}{{\bar{L}}_{n} (s)} - \frac{1}{\bar{L} (s)}) d F_{*} (s) + \int_{0}^{t} \frac{d (F_{* n} (s) - F_{*} (s))}{{\bar{L}}_{n} (s)} \\ = & \int_{0}^{t} \frac{\bar{L} (s) - {\bar{L}}_{n} (s)}{{\bar{L}}_{n} (s) \bar{L} (s)} d F_{*} (s) + \frac{F_{* n} (t) - F_{*} (t)}{{\bar{L}}_{n} (t)} \\ - \int_{0}^{t} (F_{* n} (s) - F_{*} (s)) d {({\bar{L}}_{n} (s))}^{- 1} . \end{aligned}

(2.3)

Therefore, by the combination of equations (2.1) and (2.2), and ${\bar{L}}_{n} (τ) \to \bar{L} (τ) > 0$ , for $0 < τ < τ_{L}$ , we obtain

\begin{array}{rcl} sup_{0 \leq t \leq τ} | {\hat{H}}_{n} (t) - H (t) | & \leq & \frac{{sup}_{t \geq 0} | {\bar{L}}_{n} (t) - \bar{L} (t) |}{{\bar{L}}_{n} (τ) \bar{L} (τ)} (F_{*} (τ) - F_{*} (0)) \\ + \frac{{sup}_{t \geq 0} | F_{* n} (t) - F_{*} (t) |}{{\bar{L}}_{n} (τ)} \\ + sup_{t \geq 0} | F_{* n} (t) - F_{*} (t) | (\frac{1}{{\bar{L}}_{n} (τ)} - \frac{1}{{\bar{L}}_{n} (0)}) \\ \leq & \frac{{sup}_{t \geq 0} | {\bar{L}}_{n} (t) - \bar{L} (t) |}{{\bar{L}}_{n} (τ) \bar{L} (τ)} + \frac{2 {sup}_{t \geq 0} | F_{* n} (t) - F_{*} (t) |}{{\bar{L}}_{n} (τ)} \\ = & O (a_{n}) . \end{array}

Thus, (1.5) holds.

Now we prove (1.6). By (1.3) and (1.4),

\begin{array}{rcl} - {\hat{H}}_{n} (t) - ln (1 - {\hat{F}}_{n} (t)) & = & - \sum_{i = 1}^{n} \frac{I (δ_{(i)} = 1, Z_{(i)} \leq t)}{n - i + 1} - \sum_{i = 1}^{n} I (δ_{(i)} = 1, Z_{(i)} \leq t) ln \frac{n - i}{n - i + 1} \\ = & \sum_{i = 1}^{n} I (δ_{(i)} = 1, Z_{(i)} \leq t) (ln \frac{n - i + 1}{n - i} - \frac{1}{n - i + 1}) . \end{array}

Therefore, by combining the inequality $0 < ln \frac{x + 1}{x} - \frac{1}{x + 1} < \frac{1}{x (x + 1)}$ , $x > 0$ , and (2.1), for $0 < τ < τ_{L}$ , $0 \leq t \leq τ$ , we get that

\begin{array}{rcl} 0 & < & - {\hat{H}}_{n} (t) - ln (1 - {\hat{F}}_{n} (t)) \leq \sum_{i = 1}^{n} I (δ_{(i)} = 1, Z_{(i)} \leq t) \frac{1}{(n - i) (n - i + 1)} \\ \leq & \sum_{i; Z_{(i)} \leq t} \frac{1}{(n - i) (n - i + 1)} = \sum_{i = 1}^{n - Y_{n} (t)} (\frac{1}{n - i} - \frac{1}{n - i + 1}) \\ = & \frac{1}{Y_{n} (t)} - \frac{1}{n} \leq \frac{1}{n} \frac{1}{\frac{Y_{n} (t)}{n}} = \frac{1}{n} \frac{1}{{\bar{L}}_{n} (t)} \\ = & O (\frac{1}{n}) . \end{array}

(2.4)

By (1.1),(1.6) and (2.4), using the Taylor expansion, $e^{x} = 1 + x + o (x)$ , we obtain

\begin{array}{rcl} {\hat{F}}_{n} (t) - F (t) & = & 1 - F (t) - (1 - {\hat{F}}_{n} (t)) = (e^{- H (t)} - e^{- {\hat{H}}_{n} (t)}) + (e^{- {\hat{H}}_{n} (t)} - e^{ln (1 - {\hat{F}}_{n} (t))}) \\ = & e^{- H (t)} (1 - e^{- {\hat{H}}_{n} (t) + H (t)}) + e^{ln (1 - {\hat{F}}_{n} (t))} (e^{- {\hat{H}}_{n} (t) - ln (1 - {\hat{F}}_{n} (t))} - 1) \\ = & e^{- H (t)} ({\hat{H}}_{n} (t) - H (t) + o ({\hat{H}}_{n} (t) - H (t))) \\ + (1 - {\hat{F}}_{n} (t)) (- {\hat{H}}_{n} (t) - ln (1 - {\hat{F}}_{n} (t)) + o (- {\hat{H}}_{n} (t) - ln (1 - {\hat{F}}_{n} (t)))) \\ = & e^{- H (t)} ({\hat{H}}_{n} (t) - H (t)) + o (a_{n}) + O (\frac{1}{n}) \\ = & \bar{F} (t) ({\hat{H}}_{n} (t) - H (t)) + o (a_{n}) . \end{array}

(2.5)

Thence, the combination (1.5), (1.6) holds. This completes the proof of Theorem 1.3. □

Proof of Theorem 1.4 By (2.1),

\begin{array}{rcl} \frac{1}{{\bar{L}}_{n} (s)} - \frac{1}{\bar{L} (s)} & = & \frac{\bar{L} (s) - {\bar{L}}_{n} (s)}{{\bar{L}}^{2} (s)} - \frac{2}{\bar{L} (s)} + \frac{{\bar{L}}_{n} (s)}{{\bar{L}}^{2} (s)} + \frac{1}{{\bar{L}}_{n} (s)} \\ = & \frac{\bar{L} (s) - {\bar{L}}_{n} (s)}{{\bar{L}}^{2} (s)} + \frac{{({\bar{L}}_{n} (s) - \bar{L} (s))}^{2}}{{\bar{L}}^{2} (s) {\bar{L}}_{n} (s)} = \frac{1}{\bar{L} (s)} - \frac{{\bar{L}}_{n} (s)}{{\bar{L}}^{2} (s)} + O (a_{n}^{2}) . \end{array}

Thus, by the combination of (2.3),

\begin{array}{rcl} {\hat{H}}_{n} (t) - H (t) & = & \int_{0}^{t} (\frac{1}{{\bar{L}}_{n} (s)} - \frac{1}{\bar{L} (s)}) d F_{*} (s) + \int_{0}^{t} \frac{d (F_{* n} (s) - F_{*} (s))}{\bar{L} (s)} \\ + \int_{0}^{t} (\frac{1}{{\bar{L}}_{n} (s)} - \frac{1}{\bar{L} (s)}) d (F_{* n} (s) - F_{*} (s)) \\ = & (\int_{0}^{t} \frac{d F_{* n} (s)}{\bar{L} (s)} - \int_{0}^{t} \frac{{\bar{L}}_{n} (s)}{{\bar{L}}^{2} (s)} d F_{*} (s)) \\ + \int_{0}^{t} (\frac{1}{{\bar{L}}_{n} (s)} - \frac{1}{\bar{L} (s)}) d (F_{* n} (s) - F_{*} (s)) + O (a_{n}^{2}) \\ : = & I_{1} (t) + I_{2} (t) + O (a_{n}^{2}) . \end{array}

(2.6)

Noting that $F_{* n} (s) = \frac{N_{n} (s)}{n}$ and $N_{n} (s)$ is a step function, we get

\begin{array}{rcl} I_{1} (t) & = & \frac{1}{n} \sum_{i; Z_{(i)} \leq t} \frac{N_{n} (Z_{i}) - N_{n} (Z_{i}^{-})}{\bar{L} (Z_{(i)})} - \frac{1}{n} \int_{0}^{t} \frac{\sum_{i = 1}^{n} I (Z_{i} \geq s)}{{\bar{L}}^{2} (s)} d F_{*} (s) \\ = & \frac{1}{n} \sum_{i; Z_{(i)} \leq t} \frac{δ_{(i)}}{\bar{L} (Z_{(i)})} - \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{t \land Z_{i}} \frac{d F_{*} (s)}{{\bar{L}}^{2} (s)} \\ = & \frac{1}{n} \sum_{i = 1}^{n} \frac{I (Z_{(i)} \leq t, δ_{(i)} = 1)}{\bar{L} (Z_{(i)})} - \frac{1}{n} \sum_{i = 1}^{n} g (t \land Z_{i}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} \frac{I (Z_{i} \leq t, δ_{i} = 1)}{\bar{L} (Z_{i})} - \frac{1}{n} \sum_{i = 1}^{n} g (t \land Z_{i}) \\ = & - \frac{1}{n} \sum_{i = 1}^{n} ξ (Z_{i}, δ_{i}, t) . \end{array}

(2.7)

Therefore, to prove (1.8), it suffices to prove that ${sup}_{0 \leq t \leq τ} | I_{2} (t) | = O (a_{n})$ for $τ < τ_{H}$ . Let us divide the interval $[0, τ]$ into subintervals $[x_{i}, x_{i + 1}]$ , $i = 1, \dots, k_{n}$ , where $k_{n} = O (a_{n}^{- 1})$ , and $0 = x_{1} < x_{2} < \dots < x_{k_{n} + 1} = τ$ are such that $H (x_{i + 1}) - H (x_{i}) = O (a_{n})$ . For $0 \leq t \leq τ$ , it is easy to check that

\begin{array}{rcl} I_{2} (t) & = & \int_{0}^{t} (\frac{1}{{\bar{L}}_{n} (s)} - \frac{1}{\bar{L} (s)}) d (F_{* n} (s) - F_{*} (s)) \\ \leq & 2 max_{1 \leq i \leq k_{n}} sup_{y \in [x_{i}, x_{i + 1}]} | {\bar{L}}_{n}^{- 1} (y) - {\bar{L}}_{n}^{- 1} (x_{i}) - {\bar{L}}^{- 1} (y) + {\bar{L}}^{- 1} (x_{i}) | \\ + sup_{0 \leq x \leq τ} | {\bar{L}}_{n}^{- 1} (x) - {\bar{L}}^{- 1} (x) | max_{1 \leq i \leq k_{n}} | F_{* n}^{- 1} (x_{i + 1}) - F_{* n}^{- 1} (x_{i}) - F_{*} (x_{i + 1}) + F_{*} (x_{i}) | \\ \leq & c max_{1 \leq i \leq k_{n}} sup_{y \in [x_{i}, x_{i + 1}]} | {\bar{L}}_{n} (y) - {\bar{L}}_{n} (x_{i}) - \bar{L} (y) + \bar{L} (x_{i}) | \\ + c max_{1 \leq i \leq k_{n}} | F_{* n} (x_{i + 1}) - F_{* n} (x_{i}) - F_{*} (x_{i + 1}) + F_{*} (x_{i}) | + O (a_{n}^{2}) \\ : = & I_{21} + I_{22} + O (a_{n}^{2}) . \end{array}

(2.8)

To estimate $I_{21}$ , we further subdivide each $[x_{i}, x_{i + 1}]$ into subintervals $[x_{i j}, x_{i (j + 1)}]$ , $j = 1, \dots, b_{n}$ , where $b_{n} = O (k_{n}^{1 / 2}) = O (a_{n}^{- 1 / 2})$ such that $| \bar{L} (x_{i (j + 1)}) - \bar{L} (x_{i j}) | = O (a_{n}^{3 / 2})$ uniformly in i, j. Now, by (2.1) and $| {\bar{L}}_{n} (y) - {\bar{L}}_{n} (x_{i j}) | \leq 1 / n \leq O (a_{n}^{3 / 2})$ , for $y \in [x_{i j}, x_{i (j + 1)}]$ , it follows that

\begin{array}{rcl} I_{21} & = & max_{1 \leq i \leq k_{n}} sup_{y \in [x_{i}, x_{i + 1}]} | {\bar{L}}_{n} (y) - {\bar{L}}_{n} (x_{i}) - \bar{L} (y) + \bar{L} (x_{i}) | \\ \leq & max_{1 \leq i \leq k_{n}} max_{1 \leq j \leq b_{n}} sup_{y \in [x_{i j}, x_{i (j + 1)}]} | {\bar{L}}_{n} (x_{i j}) - {\bar{L}}_{n} (x_{i}) - \bar{L} (x_{i j}) + \bar{L} (x_{i}) | \\ + max_{1 \leq i \leq k_{n}} max_{1 \leq j \leq b_{n}} sup_{y \in [x_{i j}, x_{i (j + 1)}]} (| {\bar{L}}_{n} (y) - {\bar{L}}_{n} (x_{i j}) | + | - \bar{L} (y) + \bar{L} (x_{i j}) |) \\ \leq & max_{1 \leq i \leq k_{n}} max_{1 \leq j \leq b_{n}} | {\bar{L}}_{n} (x_{i j}) - {\bar{L}}_{n} (x_{i}) - \bar{L} (x_{i j}) + \bar{L} (x_{i}) | + O (a_{n}^{3 / 2}) . \end{array}

(2.9)

For $1 \leq i \leq k_{n}$ , $1 \leq j \leq b_{n}$ , $1 \leq k \leq n$ , let $η_{k} = I (Z_{k} \geq x_{i}) - EI (Z_{k} \geq x_{i})$ , $ζ_{k} = I (Z_{k} \geq x_{i j}) - EI (Z_{k} \geq x_{i j})$ . Then ${\bar{L}}_{n} (x_{i j}) - {\bar{L}}_{n} (x_{i}) - \bar{L} (x_{i j}) + \bar{L} (x_{i}) = \frac{1}{n} \sum_{k = 1}^{n} (η_{k} + ζ_{k})$ , ${η_{k}}$ and ${ζ_{k}}$ are NA sequences with $| η_{k} | \leq 1$ , $| ζ_{k} | \leq 1$ , $E η_{k} = E ζ_{k} = 0$ , $E η_{k}^{2} \leq 1$ , $E ζ_{k}^{2} \leq 1$ .

Taking $t = a_{n}$ in Lemma 1.1, yields the following probability bound:

\begin{aligned} P (max_{1 \leq i \leq k_{n}} max_{1 \leq j \leq b_{n}} | {\bar{L}}_{n} (x_{i j}) - {\bar{L}}_{n} (x_{i}) - \bar{L} (x_{i j}) + \bar{L} (x_{i}) | \geq 8 a_{n}) \\ \leq \sum_{i = 1}^{k_{n}} \sum_{j = 1}^{b_{n}} P (| \sum_{k = 1}^{n} (η_{k} + ζ_{k}) | \geq 8 n a_{n}) \\ \leq \sum_{i = 1}^{k_{n}} \sum_{j = 1}^{b_{n}} P (| \sum_{k = 1}^{n} η_{k} | \geq 4 n a_{n}) + \sum_{i = 1}^{k_{n}} \sum_{j = 1}^{b_{n}} P (| \sum_{k = 1}^{n} ζ_{k} | \geq 4 n a_{n}) \\ \leq \sum_{i = 1}^{k_{n}} \sum_{j = 1}^{b_{n}} 4 exp (- 4 n a_{n}^{2} + n a_{n}^{2}) \\ = 4 k_{n} b_{n} e^{- 3 ln n} \leq \frac{1}{n^{2}} . \end{aligned}

Using the bound and the Borel-Cantelli lemma, we deduce that $I_{21} = O (a_{n})$ a.s. The estimation of $I_{22} = O (a_{n})$ is similar noting that $| F_{*} (x) - F_{*} (y) | \leq | \bar{L} (x) - \bar{L} (y) |$ for all x and y. Therefore, by (2.6)-(2.9), (1.8) holds. (1.9) follows from (2.5) and (1.8). □

Authors’ information

Qunying Wu, Professor, Doctor, working in the field of probability and statistics.

References

Kaplan EM, Meier P: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53: 457–481. 10.1080/01621459.1958.10501452
Article MathSciNet Google Scholar
Breslow N, Crowley J: A large sample study of the life table and product limit estimates under random censorship. Ann. Stat. 1974, 2: 437–453. 10.1214/aos/1176342705
Article MathSciNet Google Scholar
Földes A, Rejtö L: A LIL type result for the product limit estimator. Z. Wahrscheinlichkeitstheor. Verw. Geb. 1981, 56: 75–84. 10.1007/BF00531975
Article Google Scholar
Gu MG, Lai TL: Functional laws of the iterated logarithm for the product-limit estimator of a distribution function under random censorship or truncation. Ann. Probab. 1990, 18: 160–189. 10.1214/aop/1176990943
Article MathSciNet Google Scholar
Gill R Mathematical Centre Tracts 124. In Censoring and Stochastic Integrals. Math. Centrum, Amsterdam; 1980.
Google Scholar
Kang SS, Koehler KJ: Modification of the Greenwood formula for correlated failure times. Biometrics 1997, 53: 885–899. 10.2307/2533550
Article Google Scholar
Wei LJ, Lin DY, Weissfeld L: Regression analysis of multivariate incomplete failure times data by modelling marginal distributions. J. Am. Stat. Assoc. 1989, 84: 1064–1073.
Article MathSciNet Google Scholar
Shumway RH, Azari AS, Johnson P: Estimating mean concentrations under transformation for environmental data with detection limits. Technometrics 1988, 31: 347–356.
Article Google Scholar
Ying Z, Wei LJ: The Kaplan-Meier estimate for dependent failure time observations. J. Multivar. Anal. 1994, 50: 17–29. 10.1006/jmva.1994.1031
Article MathSciNet Google Scholar
Lecoutre JP, Ould-Sad E: Convergence of the conditional Kaplan-Meier estimate under strong mixing. J. Stat. Plan. Inference 1995, 44: 359–369. 10.1016/0378-3758(94)00084-9
Article Google Scholar
Cai ZW: Estimating a distribution function for censored time series data. J. Multivar. Anal. 2001, 78: 299–318. 10.1006/jmva.2000.1953
Article Google Scholar
Liang HY, Uña-Álvarez J: A Berry-Esseen type bound in kernel density estimation for strong mixing censored samples. J. Multivar. Anal. 2009, 100: 1219–1231. 10.1016/j.jmva.2008.11.001
Article Google Scholar
Joag-Dev K, Proschan F: Negative association of random variables with applications. Ann. Stat. 1983, 11(1):286–295. 10.1214/aos/1176346079
Article MathSciNet Google Scholar
Matula PA: A note on the almost sure convergence of sums of negatively dependent random variables. Stat. Probab. Lett. 1992, 15: 209–213. 10.1016/0167-7152(92)90191-7
Article MathSciNet Google Scholar
Wu QY, Jiang YY: A law of the iterated logarithm of partial sums for NA random variables. J. Korean Stat. Soc. 2010, 39(2):199–206. 10.1016/j.jkss.2009.06.001
Article MathSciNet Google Scholar
Wu QY, Jiang YY: Chover’s law of the iterated logarithm for NA sequences. J. Syst. Sci. Complex. 2010, 23(2):293–302. 10.1007/s11424-010-7258-y
Article MathSciNet Google Scholar
Yang SC: Consistency of nearest neighbor estimator of density function for negative associated samples. Acta Math. Appl. Sin. 2003, 26(3):385–394.
MathSciNet Google Scholar

Download references

Acknowledgements

Supported by the National Natural Science Foundation of China (11061012), project supported by Program to Sponsor Teams for Innovation in the Construction of Talent Highlands in Guangxi Institutions of Higher Learning ([2011] 47), and the Support Program of the Guangxi China Science Foundation (2012GXNSFAA053010, 2013GXNSFDA019001).

Author information

Authors and Affiliations

College of Science, Guilin University of Technology, Guilin, 541004, P.R. China
Qunying Wu
Department of Mathematics, Ji’nan University, Guangzhou, 510630, P.R. China
Pingyan Chen

Authors

Qunying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Pingyan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qunying Wu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

QW conceived of the study and drafted, complete the manuscript. PC participated in the discussion of the manuscript. QW and PC read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Wu, Q., Chen, P. Strong representation results of the Kaplan-Meier estimator for censored negatively associated data. J Inequal Appl 2013, 340 (2013). https://doi.org/10.1186/1029-242X-2013-340

Download citation

Received: 04 February 2013
Accepted: 10 July 2013
Published: 25 July 2013
DOI: https://doi.org/10.1186/1029-242X-2013-340

Strong representation results of the Kaplan-Meier estimator for censored negatively associated data

Abstract

1 Introduction and main results

2 Proofs

Authors’ information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords