Open Access

Maximum likelihood estimators in linear regression models with Ornstein-Uhlenbeck process

Journal of Inequalities and Applications20142014:301

https://doi.org/10.1186/1029-242X-2014-301

Received: 5 February 2014

Accepted: 9 July 2014

Published: 19 August 2014

Abstract

The paper studies the linear regression model

y t = x t T β + ε t , t = 1 , 2 , , n ,

where

d ε t = λ ( μ ε t ) d t + σ d B t ,

with parameters λ , σ R + , μ R and { B t , t 0 } the standard Brownian motion. Firstly, the maximum likelihood (ML) estimators of β, λ and σ 2 are given. Secondly, under general conditions, the asymptotic properties of the ML estimators are investigated. And then, limiting distributions for likelihood ratio test statistics of the hypothesis are also given. Lastly, the validity of the method are illuminated by two real examples.

MSC:62J05, 62M10, 60J60.

Keywords

linear regression model maximum likelihood estimator Ornstein-Uhlenbeck process asymptotic property likelihood ratio test

1 Introduction

Consider the following linear regression model
y t = x t T β + ε t , t = 1 , 2 , , n ,
(1.1)
where y t ’s are scalar response variables, x t ’s are explanatory variables, β is an m-dimensional unknown parameter, and { ε t } is an Ornstein-Uhlenbeck process, which satisfies the linear stochastic differential equation (SDE)
d ε t = λ ( μ ε t ) d t + σ d B t
(1.2)

with parameters λ , σ R + , μ R and { B t , t 0 } the standard Brownian motion.

It is well known that a linear regression model is the most important and popular model in the statistical literature, which attracts many people to investigate the model. For an ordinary linear regression model (when the errors are independent and identically distributed (i.i.d.) random variables), Wang and Zhou [1], Anatolyev [2], Bai and Guo [3], Chen [4], Gil et al. [5], Hampel et al. [6], Cui [7], Durbin [8] and Li and Yang [9] used various estimation methods to obtain estimators of the unknown parameters in (1.1) and discussed some large or small sample properties of these estimators. Recently, linear regression with serially correlated errors has attracted increasing attention from statisticians and economists. One case of considerable interest is that the errors are autoregressive processes; Hu [10], Wu [11], and Fox and Taqqu [12] established its asymptotic normality with the usual p n -normalization in the case of long memory stationary Gaussian observations errors. Giraitis and Surgailis [13] extended this result to non-Gaussian linear sequences. Koul and Surgailis [14] established the asymptotic normality of the Whittle estimator in linear regression models with non-Gaussian long memory moving average errors. Shiohama and Taniguchi [15] estimated the regression parameters in a linear regression model with autoregressive process. Fan [16] investigated moderate deviations for M-estimators in linear models with ϕ-mixing errors.

The Ornstein-Uhlenbeck process was originally introduced by Ornstein and Uhlenbeck [17] as a model for particle motion in a fluid. In physical sciences, the Ornstein-Uhlenbeck process is a prototype of a noisy relaxation process, whose probability density function f ( x , t ) can be described by the Fokker-Planck equation (see Janczura et al. [18], Debbasch et al. [19], Gillespie [20], Ditlevsen and Lansky [21], Garbaczewski and Olkiewicz [22], Plastino and Plastino [23]):
f ( x , t ) t = x ( λ ( x μ ) f ( x , t ) ) + σ 2 2 2 f ( x , t ) x 2 .

This process is now widely used in many areas of application. The main characteristic of the Ornstein-Uhlenbeck process is the tendency to return towards the long-term equilibrium μ. This property, known as mean-reversion, is found in many real life processes, e.g., in commodity and energy price processes (see Fasen [24], Yu [25], Geman [26]). There are a number of papers concerned with the Ornstein-Uhlenbeck process, for example, Janczura et al. [18], Zhang et al. [27], Rieder [28], Iacus [29], Bishwal [30], Shimizu [31], Zhang and Zhang [32], Chronopoulou and Viens [33], Lin and Wang [34] and Xiao et al. [35]. It is well known that the solution of model (1.2) is an autoregressive process. For a constant or functional or random coefficient autoregressive model, many people (for example, Magdalinos [36], Andrews and Guggenberger [37], Fan and Yao [38], Berk [39], Goldenshluger and Zeevi [40], Liebscher [41], Baran et al. [42], Distaso [43] and Harvill and Ray [44]) used various estimation methods to obtain estimators and discussed some asymptotic properties of these estimators, or investigated hypotheses testing.

By (1.1) and (1.2), we can obtain that the more general process satisfies the SDE
d y t = λ ( L ( t , λ , μ , β ) y t ) d t + σ d B t ,
(1.3)

where L ( t , λ , μ , β ) is a time-dependent mean reversion level with three parameters. Thus, model (1.3) is a general Ornstein-Uhlenbeck process. Its special cases have gained much attention and have been applied to many fields such as economics, physics, geography, geology, biology and agriculture. Dehling et al. [45] considered the model with maximum likelihood estimate, and proved strong consistency and asymptotic normality. Lin and Wang [34] established the existence of a successful coupling for a class of stochastic differential equations given by (1.3). Bishwal [30] investigated the uniform rate of weak convergence of the minimum contrast estimator in the Ornstein-Uhlenbeck process (1.3).

The solution of model (1.2) is given by
ε t = e λ t ε 0 + μ ( 1 e λ t ) + σ 0 t e λ ( s t ) d B t ,
(1.4)

where 0 t e λ ( s t ) d B t N ( 0 , 1 exp 2 λ t 2 λ ) .

The process observed in discrete time is more relevant in statistics and economics. Therefore, by (1.4), the Ornstein-Uhlenbeck time series for t = 1 , 2 , , n is given by
ε t = e λ d ε t 1 + μ ( 1 e λ d ) + σ 1 e 2 λ d 2 λ η t ,
(1.5)

where η t N ( 0 , 1 ) i.i.d. random errors and with equidistant time lag d, fixed in advance. Models (1.1) and (1.5) include many special cases such as a linear regression model with constant coefficient autoregressive processes (when μ = 0 ; see Hu [10], Wu [11], Maller [46], Pere [47] and Fuller [48]), Ornstein-Uhlenbeck time series or processes (when β = 0 ; see Rieder [28], Iacus [29], Bishwal [30], Shimizu [31] and Zhang and Zhang [32]), constant coefficient autoregressive processes (when μ = 0 , β = 0 ; see Chambers [49], Hamilton [50], Brockwell and Davis [51] and Abadir and Lucas [52], etc.).

The paper discusses models (1.1) and (1.5). The organization of the paper is as follows. In Section 2 some estimators of β, θ and σ 2 are given by the quasi-maximum likelihood method. Under general conditions, the existence and consistency of the quasi-maximum likelihood estimators as well as asymptotic normality are investigated in Section 3. The hypothesis testing is given in Section 4. Some preliminary lemmas are presented in Section 5. The main proofs of theorems are presented in Section 6, with two real examples in Section 7.

2 Estimation method

Without of loss generality, we assume that μ = 0 , ε 0 = 0 in the sequel. Write the ‘true’ model as
y t = x t T β 0 + e t , t = 1 , 2 , , n
(2.1)
and
e t = exp ( λ 0 d ) e t 1 + σ 0 1 exp ( 2 λ 0 d ) 2 λ 0 η t ,
(2.2)

where η t N ( 0 , 1 ) i.i.d.

By (2.2), we have
e t = σ 0 1 exp ( 2 λ 0 d ) 2 λ 0 j = 1 t exp { λ 0 d ( t j ) } η j .
(2.3)
Thus e t is measurable with respect to the σ-field H generated by η 1 , η 2 , , η t , and
E e t = 0 , Var ( e t ) = σ 0 2 1 exp ( 2 λ 0 d ) 2 λ 0 exp { λ 0 d t ( t 1 ) } .
(2.4)
Using similar arguments as those of Rieder [28] or Maller [46], we get the log-likelihood of y 2 , y 3 , , y n conditional on y 1 ,
Ψ n ( β , λ , σ 2 ) = log L n = 1 2 ( n 1 ) log ( π σ 2 λ ) 1 2 ( n 1 ) log ( 1 exp ( 2 λ d ) ) λ σ 2 ( 1 exp ( 2 λ d ) ) t = 2 n ( ε t exp ( λ d ) ε t 1 ) 2 .
(2.5)
We maximize (2.5) to obtain QML estimators denoted by σ ˆ n 2 , β ˆ n , λ ˆ n (when they exist). Then the first derivatives of Ψ n may be written as
Ψ n σ 2 = n 1 2 σ 2 + λ σ 4 ( 1 exp ( 2 λ d ) ) t = 2 n ( ε t exp ( λ d ) ε t 1 ) 2 ,
(2.6)
Ψ n λ = n 1 2 λ ( n 1 ) d exp ( 2 λ d ) 1 exp ( 2 λ d ) 2 d λ exp ( λ d ) σ 2 ( 1 exp ( 2 λ d ) ) t = 2 n ( ε t exp ( λ d ) ε t 1 ) ε t 1 Ψ n λ = 1 ( 1 + 2 d λ ) exp ( 2 λ d ) σ 2 ( 1 exp ( 2 λ d ) ) 2 t = 2 n ( ε t exp ( λ d ) ε t 1 ) 2
(2.7)
and
Ψ n β = 2 λ σ 2 ( 1 exp ( 2 λ d ) ) t = 2 n ( ε t exp ( λ d ) ε t 1 ) ( x t exp ( λ d ) x t 1 ) .
(2.8)
Thus σ ˆ n 2 , β ˆ n , λ ˆ n satisfy the following estimation equations:
σ ˆ n 2 = 2 λ ˆ n ( n 1 ) ( 1 exp ( 2 λ ˆ n d ) ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) 2 ,
(2.9)
σ ˆ n 2 ( 1 ( 1 + 2 d λ ˆ n ) exp ( 2 λ ˆ n d ) ) 2 λ ˆ n 2 d λ ˆ n exp ( λ ˆ n d ) n 1 t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) ε ˆ t 1 1 ( 1 + 2 d λ ˆ n ) exp ( 2 λ ˆ n d ) ( 1 exp ( 2 λ ˆ n d ) ) ( n 1 ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) 2 = 0
(2.10)
and
t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) ( x t exp ( λ ˆ n d ) x t 1 ) = 0 ,
(2.11)
where
ε ˆ t = y t x t T β ˆ n .
(2.12)

To obtain our results, the following conditions are sufficient (see Maller [46]).

(A1) X n = t = 2 n x t x t T is positive definite for sufficiently large n and
lim n max 1 t n x t T X n 1 x t = 0 .
(2.13)
(A2)
lim sup n | λ ˜ | max ( X n 1 2 Z n X n T 2 ) < 1 ,
(2.14)

where Z n = 1 2 t = 2 n ( x t x t 1 T + x t 1 x t T ) , | λ ˜ | max ( ) denotes the maximum in absolute value of the eigenvalues of a symmetric matrix.

For ease of exposition, we shall introduce the following notations which will be used later in the paper.

Let ( m + 1 ) -vector θ = ( β , λ ) . Define
S n ( θ ) = σ 2 Ψ n θ = σ 2 ( Ψ n β , Ψ n λ ) , F n ( θ ) = σ 2 2 Ψ n θ θ T .
(2.15)
By (2.7) and (2.8), we get the components of F n ( θ )
σ 2 2 Ψ n β β T = 2 λ 1 exp ( 2 λ d ) t = 2 n ( x t exp ( λ d ) x t 1 ) ( x t exp ( λ d ) x t 1 ) T σ 2 2 Ψ n β β T = 2 λ 1 exp ( 2 λ d ) X n ( λ ) ,
(2.16)
σ 2 2 Ψ n β λ = 2 d λ exp ( λ d ) 1 exp ( 2 λ d ) t = 2 n ( ε t 1 x t + ε t x t 1 2 exp ( λ d ) x t 1 ε t 1 ) σ 2 2 Ψ n β λ = 1 ( 1 + 2 d λ ) exp ( 2 λ d ) ( 1 exp ( 2 λ d ) ) 2 σ 2 2 Ψ n β λ = t = 2 n ( ε t exp ( λ d ) ε t 1 ) ( x t exp ( λ d ) x t 1 )
(2.17)
and
σ 2 2 Ψ n λ 2 = σ 2 ( n 1 ) 2 λ 2 2 σ 2 ( n 1 ) d 2 exp ( 2 λ d ) ( 1 exp ( 2 λ d ) ) 2 + 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) t = 2 n ε t 1 2 + 2 d ( 1 d λ ( 1 + d λ ) exp ( 2 λ d ) ) exp ( λ d ) ( 1 exp ( 2 λ d ) ) 2 t = 2 n ( ε t exp ( λ d ) ε t 1 ) ε t 1 + 2 d exp ( λ d ) [ 1 ( 1 + 2 d λ ) exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 2 t = 2 n ( ε t exp ( λ d ) ε t 1 ) ε t 1 + 4 d exp ( 2 λ d ) [ d λ 1 + ( 1 + d λ ) exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 3 t = 2 n ( ε t exp ( λ d ) ε t 1 ) 2 = σ 2 ( n 1 ) 2 λ 2 2 σ 2 ( n 1 ) d 2 exp ( 2 λ d ) ( 1 exp ( 2 λ d ) ) 2 + 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) t = 2 n ε t 1 2 + 2 d exp ( λ d ) [ ( 2 d λ ) d λ exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 2 t = 2 n ( ε t exp ( λ d ) ε t 1 ) ε t 1 + 4 d exp ( 2 λ d ) [ d λ 1 + ( 1 + d λ ) exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 3 t = 2 n ( ε t exp ( λ d ) ε t 1 ) 2 .
(2.18)
Hence we have
F n ( θ ) = ( 2 λ 1 exp ( 2 λ d ) X n ( λ ) σ 2 2 Ψ n β λ σ 2 2 Ψ n λ 2 ) ,
(2.19)
where the indicates that the elements are filled in by symmetry. By (2.18), we have
E { σ 2 2 Ψ n λ 2 | θ = θ 0 } = ( n 1 ) σ 0 2 { 1 2 λ 0 2 + 2 d exp ( 2 λ 0 d ) [ 1 + ( 1 + d λ 0 ) exp ( 2 λ 0 d ) ] λ 0 ( 1 exp ( 2 λ 0 d ) ) 2 } + 2 d 2 λ 0 exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) t = 2 n E e t 1 2 = ( n 1 ) σ 0 2 [ 1 ( 1 + 2 d λ 0 ) exp ( 2 λ 0 d ) ] 2 2 λ 0 2 ( 1 exp ( 2 λ 0 d ) ) 2 + 2 d 2 λ 0 exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) t = 2 n E e t 1 2 = Δ n ( θ 0 , σ 0 ) = O ( n ) .
(2.20)
Thus,
D n = E ( F n ( θ 0 ) ) = ( 2 λ 0 1 exp ( 2 λ 0 d ) X n ( λ 0 ) 0 0 Δ n ( θ 0 , σ 0 ) . )
(2.21)

3 Large sample properties of the estimators

Theorem 3.1 Suppose that conditions (A1)-(A2) hold. Then there is a sequence A n 0 such that, for each A > 0 , as n , the probability
P { there are estimators  θ ˆ n , σ ˆ n 2  with  S n ( θ ˆ n ) = 0 , and  ( θ ˆ n , σ ˆ n 2 ) N n ( A ) } 1 .
(3.1)
Furthermore,
( θ ˆ n , σ ˆ n 2 ) p ( θ 0 , σ 0 2 ) , n ,
(3.2)
where, for each n = 1 , 2 ,  , A > 0 and A n ( 0 , σ 0 2 ) , define neighborhoods
N n ( A ) = { θ R m + 1 : ( θ θ 0 ) T D n ( θ θ 0 ) A 2 }
(3.3)
and
N n ( A ) = N n ( A ) { σ 2 [ σ 0 2 A n , σ 0 2 + A n ] } .
(3.4)
Theorem 3.2 Suppose that conditions (A1)-(A2) hold. Then
1 σ n ˆ F n T 2 ( θ ˆ n ) ( θ ˆ n θ 0 ) D N ( 0 , I m + 1 ) , n .
(3.5)

In the following, we will investigate some special cases in models (1.1) and (1.5). From Theorem 3.1 and Theorem 3.2, we obtain the following results. Here we omit their proofs.

Corollary 3.1 If β = 0 , then
Δ n ( θ 0 , σ 0 ) σ ˆ n ( λ ˆ n λ 0 ) D N ( 0 , 1 ) , n .
(3.6)
Corollary 3.2 If β = 0 , then
n ( λ ˆ n λ 0 ) D N ( 0 , σ 0 2 ) , n .
(3.7)

4 Hypothesis testing

In order to fit a data set { y t , t = 1 , 2 , , n } , we may use model (1.3) or an Ornstein-Uhlenbeck process with a constant mean level model
d y t = λ ( μ y t ) d t + σ d B t .
(4.1)

If β 0 , then we use model (1.3), namely models (1.1) and (1.2). If β = 0 , then we use model (1.4). How to know β = 0 or β 0 ? In the section, we shall consider the question about hypothesis testing and obtain limiting distributions for likelihood ratio (LR) test statistics (see Fan and Jiang [53]).

Under the null hypothesis
H 0 : β 0 = 0 , λ 0 > 0 , σ 0 > 0 ,
(4.2)
let β ˆ 0 n , λ ˆ 0 n , σ ˆ 0 n 2 be the corresponding ML estimators of β, λ, σ 2 . Also let
L ˆ n = 2 Ψ n ( β ˆ n , λ ˆ n , σ ˆ n 2 )
(4.3)
and
L ˆ 0 n = 2 Ψ n ( β ˆ 0 n , λ ˆ 0 n , σ ˆ 0 n 2 ) .
(4.4)
By (2.9) and (2.5), we have that
L ˆ n = ( n 1 ) log ( π σ ˆ n 2 λ ˆ n ) + ( n 1 ) log ( 1 exp ( 2 λ ˆ n d ) ) + 2 λ ˆ n σ ˆ n 2 ( 1 exp ( 2 λ ˆ n d ) ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) 2 = ( n 1 ) log ( π σ ˆ n 2 λ ˆ n ) + ( n 1 ) log ( 1 exp ( 2 λ ˆ n d ) ) + ( n 1 ) = ( n 1 ) log ( π + 1 ) + ( n 1 ) log ( σ ˆ n 2 ) + ( n 1 ) ( log ( 1 exp ( 2 λ ˆ n d ) ) log ( λ ˆ n ) ) .
(4.5)
And similarly,
L ˆ 0 n = ( n 1 ) log ( π + 1 ) + ( n 1 ) log ( σ ˆ 0 n 2 ) + ( n 1 ) ( log ( 1 exp ( 2 λ ˆ 0 n d ) ) log ( λ ˆ 0 n ) ) .
(4.6)
By (4.5) and (4.6), we have
d ˜ ( n ) = L ˆ 0 n L ˆ n = ( n 1 ) log ( σ ˆ 0 n 2 σ ˆ n 2 ) + ( n 1 ) ( log ( 1 exp ( 2 λ ˆ 0 n d ) 1 exp ( 2 λ ˆ n d ) ) log ( λ ˆ 0 n λ ˆ n ) ) = ( n 1 ) ( σ ˆ 0 n 2 σ ˆ n 2 1 ) + ( n 1 ) ( 1 exp ( 2 λ ˆ 0 n d ) 1 exp ( 2 λ ˆ n d ) λ ˆ 0 n λ ˆ n ) + o p ( 1 ) = ( n 1 ) σ ˆ 0 n 2 σ ˆ n 2 σ 0 2 + ( n 1 ) ( 1 exp ( 2 λ ˆ 0 n d ) 1 exp ( 2 λ ˆ n d ) λ ˆ 0 n λ ˆ n ) + o p ( 1 ) = ( n 1 ) σ ˆ 0 n 2 σ ˆ n 2 σ 0 2 + o p ( 1 ) .
(4.7)

Large values of d ˜ ( n ) suggest rejection of the null hypothesis.

Theorem 4.1 Suppose that conditions (A1)-(A2) hold. If H 0 holds, then
d ˜ ( n ) D χ 2 ( m ) , n .
(4.8)

5 Some lemmas

Throughout this paper, let C denote a generic positive constant which could take different value at each occurrence. To prove our main results, we first introduce the following lemmas.

Lemma 5.1 If condition (A1) holds, then for any λ R + the matrix X n ( λ ) is positive definite for large enough n, and
lim n max 1 t n x t T X n 1 ( λ ) x t = 0 .
Proof Let λ ˜ 1 and λ ˜ m be the smallest and largest roots of | Z n λ ˜ X n | = 0 . Then from Ex. 22.1 of Rao [54],
λ ˜ 1 u T Z n u u T X n u λ ˜ m
for unit vectors u. Thus by (2.18) there are some δ ( 0 , 1 ) and n 0 ( δ ) such that n N 0 implies
| u T Z n u | ( 1 δ ) u T X n u .
(5.1)
By (2.16) and (5.1), we have
u T X n ( λ ) u = t = 2 n ( u T ( x t exp ( λ d ) x t 1 ) ) 2 t = 2 n ( u T x t ) 2 + min λ exp ( 2 λ d ) t = 2 n ( u T x t 1 ) 2 max λ exp ( λ d ) u T Z n u u T X n u + min λ exp ( 2 λ d ) u T X n u u T Z n u ( 1 + min λ exp ( 2 λ d ) ( 1 δ ) ) u T X n u = ( min λ exp ( 2 λ d ) + δ ) u T X n u = C ( λ , δ ) u T X n u .
(5.2)
By Rao [[54], p.60] and (2.17), we have
( u T x t ) 2 u T X n u 0 .
(5.3)
From (5.3) and C ( λ , δ ) > 0 ,
x t T X n 1 ( λ ) x t = sup u ( ( u T x t ) 2 u T X n ( λ ) u ) sup u ( ( u T x t ) 2 C ( λ , δ ) u T X n u ) 0 .
(5.4)

 □

Lemma 5.2 The matrix D n is positive definite for large enough n, E ( S n ( θ 0 ) ) = 0 and Var ( S n ( θ 0 ) ) = σ 0 2 D n .

Proof Note that X n ( λ 0 ) is positive definite and Δ n ( θ 0 , σ 0 ) > 0 . It is easy to show that the matrix D n is positive definite for large enough n. By (2.8), we have
σ 0 2 E ( Ψ n β | θ = θ 0 ) = 2 λ 0 1 exp ( 2 λ 0 d ) t = 2 n E ( e t exp ( λ 0 d ) e t 1 ) ( x t exp ( λ 0 d ) x t 1 ) = 2 λ 0 1 exp ( 2 λ 0 d ) σ 0 1 exp ( 2 d λ 0 ) 2 λ 0 t = 2 n ( x t exp ( λ 0 d ) x t 1 ) E η t = 0 .
(5.5)
Note that e t 1 and η t are independent, so we have E ( η t e t 1 ) = 0 . Thus, by (2.7) and E η t = 0 , we have
E ( Ψ n λ | θ = θ 0 ) = n 1 2 λ 0 ( n 1 ) d exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) 0 1 ( 1 + 2 d λ 0 ) exp ( 2 λ 0 d ) σ 0 2 ( 1 exp ( 2 λ 0 d ) ) 2 σ 0 2 1 exp ( 2 d λ 0 ) 2 λ 0 t = 2 n E η t 2 = n 1 2 λ 0 ( n 1 ) d exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) 1 ( 1 + 2 d λ 0 ) exp ( 2 λ 0 d ) 2 λ 0 ( 1 exp ( 2 λ 0 d ) ) ( n 1 ) = 0 .
(5.6)
Hence, from (5.5) and (5.6),
E ( S n ( θ 0 ) ) = σ 0 2 E ( Ψ n β | θ = θ 0 , Ψ n λ | θ = θ 0 ) = 0 .
(5.7)
By (2.8) and (2.20), we have
Var ( σ 0 2 Ψ n β | θ = θ 0 ) = Var { 2 λ 0 1 exp ( 2 λ 0 d ) t = 2 n ( e t exp ( λ 0 d ) e t 1 ) ( x t exp ( λ 0 d ) x t 1 ) } = 2 σ 0 2 λ 0 1 exp ( 2 λ 0 d ) Var { t = 2 n ( x t exp ( λ 0 d ) x t 1 ) η t } = 2 σ 0 2 λ 0 1 exp ( 2 λ 0 d ) X n ( λ 0 ) .
(5.8)
Note that { η t e t 1 , H t } is a martingale difference sequence with
Var ( η t e t 1 ) = E η t 2 E e t 1 2 = E e t 1 2 ,
so
Var ( σ 0 2 Ψ n λ | θ = θ 0 ) = E { σ 0 d exp ( λ 0 d ) 2 λ 0 1 exp ( λ 0 d ) t = 2 n η t e t 1 } 2 + E { σ 0 2 [ 1 ( 1 + 2 d λ 0 ) exp ( 2 d λ 0 ) ] 2 λ 0 ( 1 exp ( 2 λ 0 d ) ) t = 2 n ( η t 2 1 ) } 2 + 2 σ 0 3 d exp ( λ 0 d ) [ 1 ( 1 + 2 d λ 0 ) exp ( 2 d λ 0 ) ] λ 0 ( 1 exp ( 2 λ 0 d ) ) 3 2 E { t = 2 n η t e t 1 t = 2 n ( η t 2 1 ) } = 2 λ 0 σ 0 2 d 2 exp ( 2 λ 0 d ) 1 exp ( λ 0 d ) t = 2 n E e t 1 2 + { σ 0 2 [ 1 ( 1 + 2 d λ 0 ) exp ( 2 d λ 0 ) ] 2 λ 0 ( 1 exp ( 2 λ 0 d ) ) } 2 ( n 1 ) ( E η t 4 1 ) + 2 σ 0 3 d exp ( λ 0 d ) [ 1 ( 1 + 2 d λ 0 ) exp ( 2 d λ 0 ) ] λ 0 ( 1 exp ( 2 λ 0 d ) ) 3 2 ( t = 2 n E ( ( η t 2 1 ) η t e t 1 ) + t k E ( η t e t 1 ( η k 2 1 ) ) ) = 2 λ 0 σ 0 2 d 2 exp ( 2 λ 0 d ) 1 exp ( λ 0 d ) t = 2 n E e t 1 2 + 2 ( n 1 ) { σ 0 2 [ 1 ( 1 + 2 d λ 0 ) exp ( 2 d λ 0 ) ] 2 λ 0 ( 1 exp ( 2 λ 0 d ) ) } 2 = σ 0 2 Δ n ( θ 0 , σ 0 ) .
(5.9)
By (2.7), (2.8), and noting that e t 1 and η t are independent, we have
Cov ( σ 0 2 Ψ n β , σ 0 2 Ψ n λ ) | θ = θ 0 = σ 0 3 1 ( 1 + 2 d λ ) exp ( 2 λ d ) 2 λ 0 ( 1 exp ( 2 λ d ) ) 3 2 E ( t = 2 n η t 2 t = 2 n η t ( x t exp ( λ d ) x t 1 ) ) = σ 0 3 1 ( 1 + 2 d λ ) exp ( 2 λ d ) 2 λ 0 ( 1 exp ( 2 λ d ) ) 3 2 E η t 3 t = 2 n ( x t exp ( λ d ) x t 1 ) = 0 .
(5.10)

From (5.8)-(5.10), it follows that Var ( S n ( θ 0 ) ) = σ 0 2 D n . The proof is completed. □

Lemma 5.3 (Maller [55])

Let W n be a symmetric random matrix with eigenvalues λ ˜ j ( n ) , 1 j d . Then
W n p I λ ˜ j ( n ) p 1 , n .
Lemma 5.4 For each A > 0 ,
sup θ N n ( A ) D n 1 2 F n ( θ ) D n T 2 Φ n p 0 , n
(5.11)
and also
Φ n D Φ ,
(5.12)
lim c 0 lim sup A lim sup n P { inf θ N n ( A ) λ min ( D n 1 2 F n ( θ ) D n T 2 ) c } = 0 ,
(5.13)
where
Φ n = ( λ ( 1 exp ( 2 d λ 0 ) ) λ 0 ( 1 exp ( 2 d λ ) ) I m 0 0 σ 2 2 Ψ n λ 2 | θ = θ 0 Δ n ( θ 0 , σ 0 ) ) , Φ = I m + 1 .
(5.14)
Proof Let X n ( λ 0 ) = X n 1 2 ( λ 0 ) X n T 2 ( λ 0 ) be a square root decomposition of X n ( λ 0 ) . Then
D n = ( 2 λ 0 1 exp ( 2 d λ 0 ) X n 1 2 ( λ 0 ) 0 0 Δ n ( θ 0 , σ 0 ) ) ( 2 λ 0 1 exp ( 2 d λ 0 ) X n T 2 ( λ 0 ) 0 0 Δ n ( θ 0 , σ 0 ) ) = D n 1 2 D n T 2 .
(5.15)
Let θ N n ( A ) . Then
( θ θ 0 ) T D n ( θ θ 0 ) = 2 λ 0 1 exp ( 2 d λ 0 ) ( β β 0 ) T X n ( λ 0 ) ( β β 0 ) + ( λ λ 0 ) 2 Δ n ( θ 0 , σ 0 ) A 2 .
(5.16)
From (2.20), (2.21) and (5.14),
D n 1 2 F n ( θ ) D n T 2 Φ n = ( W 11 W 12 W 22 ) ,
(5.17)
where
W 11 = λ ( 1 exp ( 2 d λ 0 ) ) λ 0 ( 1 exp ( 2 d λ ) ) { X n 1 2 ( λ 0 ) X n ( λ ) X n T 2 ( λ 0 ) I m } ,
(5.18)
W 12 = 1 exp ( 2 d λ 0 ) 2 λ 0 X n 1 2 ( λ 0 ) ( σ 2 2 Ψ n β λ ) Δ n ( θ 0 , σ 0 )
(5.19)
and
W 22 = σ 2 2 Ψ n λ 2 σ 2 2 Ψ n λ 2 | θ = θ 0 Δ n ( θ 0 , σ 0 ) .
(5.20)
Let
N n β ( A ) = { β : 2 λ 0 1 exp ( 2 d λ 0 ) | ( β β 0 ) T X n 1 2 ( λ 0 ) | 2 A 2 }
(5.21)
and
N n λ ( A ) = { θ : | λ λ 0 | A Δ n ( θ 0 , σ 0 ) } .
(5.22)
As the first step, we will show that, for each A > 0 ,
sup θ N n θ ( A ) W 11 0 , n .
(5.23)
In fact, note that
W 11 = λ ( 1 exp ( 2 d λ 0 ) ) λ 0 ( 1 exp ( 2 d λ ) ) X n 1 2 ( λ 0 ) ( X n ( λ ) X n ( λ 0 ) ) X n T 2 ( λ 0 ) = λ ( 1 exp ( 2 d λ 0 ) ) λ 0 ( 1 exp ( 2 d λ ) ) X n 1 2 ( λ 0 ) ( T 1 + T 2 T 3 ) X n T 2 ( λ 0 ) ,
(5.24)
where
T 1 = t = 2 n ( exp ( d λ 0 ) exp ( d λ ) ) x t 1 ( x t exp ( d λ 0 ) x t 1 ) T , T 2 = t = 2 n ( exp ( d λ 0 ) exp ( d λ ) ) ( x t exp ( d λ 0 ) x t 1 ) x t T
and
T 3 = t = 2 n ( exp ( d λ ) exp ( d λ 0 ) ) 2 x t 1 x t 1 T .
Let u , v R d , | u | = | v | = 1 , and let u n T = u T X n 1 2 ( λ 0 ) , v n T = X n T 2 ( λ 0 ) v . By the Cauchy-Schwarz inequality, Lemma 5.1 and noting N n λ ( A ) , we have
| u n T T 1 v n | = | ( exp ( d λ 0 ) exp ( d λ ) ) t = 2 n u n T x t 1 ( x t exp ( d λ 0 ) x t 1 ) T v n | max | exp ( d λ 0 ) exp ( d λ ) | ( t = 2 n u n T x t x t T u n ) 1 2 ( t = 2 n v n T ( x t exp ( d λ 0 ) x t 1 ) ( x t exp ( d λ 0 ) x t 1 ) T v n ) 1 2 d | λ 0 λ | n max 1 t n ( x t T X n 1 ( λ 0 ) x t ) 1 C n Δ n ( θ 0 , σ 0 ) o ( 1 ) 0 .
(5.25)
Similar to the proof of T 1 , we easily obtain
| u n T T 2 v n | 0 .
(5.26)
By the Cauchy-Schwarz inequality, Lemma 5.1 and noting N n λ ( A ) , we have
| u n T T 3 v n | = | u n T t = 2 n ( exp ( d λ 0 ) exp ( d λ ) ) 2 x t 1 x t 1 T v n | max | exp ( d λ 0 ) exp ( d λ ) | 2 ( t = 2 n u n T x t x t T u n t = 2 n v n T x t x t T v n ) 1 2 n | λ 0 λ | 2 max 1 t n ( x t T X n 1 ( λ 0 ) x t ) n A 2 Δ n ( θ 0 , σ 0 ) o ( 1 ) 0 .
(5.27)

Hence, (5.23) follows from (5.24)-(5.27).

For the second step, we will show that
W 12 p 0 .
(5.28)
Note that
ε t = y t x t T β = x t T ( β 0 β ) + e t
(5.29)
and
ε t exp ( d λ 0 ) ε t 1 = ( x t exp ( d λ 0 ) x t 1 ) T ( β 0 β ) + σ 0 1 exp ( 2 d λ 0 ) 2 λ 0 η t .
(5.30)
Write
J = 1 exp ( 2 d λ 0 ) 2 λ 0 X n 1 2 ( λ 0 ) ( σ 2 2 Ψ n β λ ) = 1 exp ( 2 d λ 0 ) 2 λ 0 X n 1 2 ( λ 0 ) 2 d λ exp ( λ d ) 1 exp ( 2 λ d ) t = 2 n ( ε t 1 x t + ε t x t 1 2 exp ( λ d ) x t 1 ε t 1 ) 1 exp ( 2 d λ 0 ) 2 λ 0 X n 1 2 ( λ 0 ) 1 ( 1 + 2 d λ ) exp ( 2 λ d ) ( 1 exp ( 2 λ d ) ) 2 t = 2 n ( ε t exp ( λ d ) ε t 1 ) ( x t exp ( λ d ) x t 1 ) = 1 exp ( 2 d λ 0 ) 2 λ 0 2 d λ exp ( λ d ) 1 exp ( 2 λ d ) X n 1 2 ( λ 0 ) ( T 1 + T 2 + 2 T 3 + 2 T 4 + 2 T 5 ) 1 exp ( 2 d λ 0 ) 2 λ 0 1 ( 1 + 2 d λ ) exp ( 2 λ d ) ( 1 exp ( 2 λ d ) ) 2 X n 1 2 ( λ 0 ) T 6 ,
(5.31)
where
T 1 = t = 2 n x t 1 T ( β 0 β ) ( x t exp ( λ 0 d ) x t 1 ) , T 2 = t = 2 n ( x t exp ( λ 0 d ) x t 1 ) T ( β 0 β ) x t 1 , T 3 = t = 2 n ( exp ( λ 0 d ) exp ( λ d ) ) x t 1 T ( β 0 β ) x t 1 , T 4 = σ 1 exp ( 2 λ d ) 2 λ t = 2 n η t x t 1 , T 5 = t = 2 n e t 1 x t 1 , T 6 = σ 1 exp ( 2 λ d ) 2 λ t = 2 n η t ( x t exp ( λ d ) x t 1 ) .
For β N n β ( A ) and each A > 0 , we have
| ( β 0 β ) T x t | 2 = ( β 0 β ) T X n 1 2 ( λ 0 ) X n 1 2 ( λ 0 ) x t x t T X n T 2 ( λ 0 ) X n T 2 ( λ 0 ) ( β 0 β ) max 1 t n ( x t T X n 1 ( λ 0 ) x t ) ( β 0 β ) T X n ( λ 0 ) ( β 0 β ) A 2 max 1 t n ( x t T X n 1 ( λ 0 ) x t ) .
(5.32)
By (5.32) and Lemma 5.1, we have
sup β N n β ( A ) max 1 t n | ( β 0 β ) T x t | 0 , n , A > 0 .
(5.33)
Using the Cauchy-Schwarz inequality and (5.33), we obtain
u n T T 1 = t = 2 n u n T x t 1 T ( β 0 β ) ( x t exp ( λ 0 d ) x t 1 ) { t = 2 n ( x t 1 T ( β 0 β ) ) 2 } 1 2 { t = 2 n u n T ( x t exp ( λ 0 d ) x t 1 ) ( x t exp ( λ 0 d ) x t 1 ) T u n } 1 2 n max 1 t n | ( β 0 β ) T x t | = o ( n ) .
(5.34)
Using a similar argument as T 1 , we obtain that
u n T T 2 = o p ( n ) .
(5.35)
By the Cauchy-Schwarz inequality and (5.33), (5.25), we get
u n T T 3 = t = 2 n ( exp ( λ 0 d ) exp ( λ d ) ) x t 1 T ( β 0 β ) u n T x t 1 { t = 2 n ( exp ( λ 0 d ) exp ( λ d ) ) 2 ( x t 1 T ( β 0 β ) ) 2 t = 2 n ( u n T x t 1 ) 2 } 1 2 C | λ 0 λ | { t = 2 n ( x t 1 T ( β 0 β ) ) 2 t = 2 n ( u n T x t 1 ) 2 } 1 2 C A Δ n ( θ 0 , σ 0 ) n o ( 1 ) o ( n ) = o ( n ) .
(5.36)
By (5.25), we have
Var ( u n T T 4 ) = σ 2 1 exp ( 2 λ d ) 2 λ t = 2 n ( u n T x t 1 ) 2 = o ( n ) .
(5.37)
Thus, by the Chebychev inequality and (5.37),
u n T T 4 = o p ( n ) .
(5.38)
By Lemma 5.1 and (2.3), we have
Var ( u n T T 5 ) = Var ( t = 2 n u n T x t e t 1 ) = σ 0 2 1 exp ( 2 λ 0 d ) 2 λ 0 Var { j = 1 n 1 ( t = j + 1 n u n T x t exp { λ 0 d ( t 1 j ) } ) η j } = σ 0 2 1 exp ( 2 λ 0 d ) 2 λ 0 j = 1 n 1 ( t = j + 1 n u n T x t exp { λ 0 d ( t 1 j ) } ) 2 σ 0 2 1 exp ( 2 λ 0 d ) 2 λ 0 max 2 t n | u n T x t | j = 1 n 1 ( t = j + 1 n exp { λ 0 d ( t 1 j ) } ) 2 C max 2 t n | u n T x t | n = o ( n ) .
(5.39)
Thus, by the Chebychev inequality and (5.39),
u n T T 5 = o p ( n ) .
(5.40)
Using a similar argument as T 4 , we obtain
u n T T 6 = o p ( n ) .
(5.41)

Thus (5.28) follows immediately from (5.31), (5.34)-(5.36), (5.38), (5.40) and (5.41).

For the third step, we will show that
W 22 p 0 .
(5.42)
Write that
J = σ 2 2 Ψ n λ 2 σ 2 2 Ψ n λ 2 | θ = θ 0 = σ 2 ( n 1 ) 2 λ 2 2 σ 2 ( n 1 ) d 2 exp ( 2 λ d ) ( 1 exp ( 2 λ d ) ) 2 + 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) t = 2 n ε t 1 2 + 2 d exp ( λ d ) [ ( 2 d λ ) d λ exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 2 t = 2 n ( ε t exp ( λ d ) ε t 1 ) ε t 1 + 4 d exp ( 2 λ d ) [ d λ 1 + ( 1 + d λ ) exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 3 t = 2 n ( ε t exp ( λ d ) ε t 1 ) 2 σ 0 2 ( n 1 ) 2 λ 0 2 + 2 σ 0 2 ( n 1 ) d 2 exp ( 2 λ 0 d ) ( 1 exp ( 2 λ 0 d ) ) 2 2 d 2 λ 0 exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) t = 2 n e t 1 2 2 d exp ( λ 0 d ) [ ( 2 d λ 0 ) d λ 0 exp ( 2 λ 0 d ) ] ( 1 exp ( 2 λ 0 d ) ) 2 t = 2 n ( e t exp ( λ 0 d ) e t 1 ) e t 1 4 d exp ( 2 λ 0 d ) [ d λ 0 1 + ( 1 + d λ 0 ) exp ( 2 λ 0 d ) ] ( 1 exp ( 2 λ 0 d ) ) 3 t = 2 n ( e t exp ( λ 0 d ) e t 1 ) 2 .
(5.43)
By (3.3) and (3.4), we obtain that
T 1 = σ 2 ( n 1 ) 2 λ 2 σ 0 2 ( n 1 ) 2 λ 0 2 = n 1 2 λ 2 λ 0 2 ( σ 2 ( λ 0 2 λ 2 ) + λ 2 ( σ 2 σ 0 2 ) ) = o ( n )
(5.44)
and
T 2 = 2 σ 0 2 ( n 1 ) d 2 exp ( 2 λ 0 d ) ( 1 exp ( 2 λ 0 d ) ) 2 2 σ 2 ( n 1 ) d 2 exp ( 2 λ d ) ( 1 exp ( 2 λ d ) ) 2 = 2 d 2 ( n 1 ) ( 1 exp ( 2 λ 0 d ) ) 2 ( 1 exp ( 2 λ d ) ) 2 { σ 0 ( exp ( λ 0 d ) exp ( λ d ) ) + exp ( λ d ) ( σ 0 σ ) + exp ( λ d λ 0 d ) [ σ ( exp ( λ 0 d ) exp ( λ d ) ) + exp ( λ d ) ( σ σ 0 ) ] } ( σ 0 exp ( λ 0 d ) ( 1 exp ( 2 λ d ) ) + σ exp ( λ d ) ( 1 exp ( 2 λ 0 d ) ) ) = o ( n ) .
(5.45)
By (5.29), we have
T 3 = 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) t = 2 n ε t 1 2 2 d 2 λ 0 exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) t = 2 n e t 1 2 = 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) t = 2 n { ( x t T ( β 0 β ) ) 2 + 2 x t T ( β 0 β ) e t + e t 2 } 2 d 2 λ 0 exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) t = 2 n e t 1 2 = 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) t = 2 n ( x t T ( β 0 β ) ) 2 + 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) t = 2 n 2 x t T ( β 0 β ) e t + { 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) 2 d 2 λ 0 exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) } t = 2 n e t 1 2 = 2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) T 31 + 4 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) T 32 + T 33 .
(5.46)
By (5.32), it is easy to show that
T 31 = o ( n ) .
(5.47)
By Lemma 5.1, (2.3) and (5.32), we have
Var ( T 32 ) = Var ( t = 2 n x t T ( β 0 β ) e t ) = Var { j = 1 n 1 ( t = j + 1 n x t T ( β 0 β ) exp { λ 0 d ( t 1 j ) } ) η j } = j = 1 n 1 ( t = j + 1 n x t T ( β 0 β ) exp { λ 0 d ( t 1 j ) } ) 2 max 2 t n | x t T ( β 0 β ) | j = 1 n 1 ( t = j + 1 n exp { λ 0 d ( t 1 j ) } ) 2 C max 2 t n | x t T ( β 0 β ) | n = o ( n ) .
(5.48)
Thus by the Chebychev inequality and (5.48),
T 32 = o p ( n ) .
(5.49)
Write
2 d 2 λ exp ( 2 λ d ) 1 exp ( 2 λ d ) 2 d 2 λ 0 exp ( 2 λ 0 d ) 1 exp ( 2 λ 0 d ) = 2 d 2 ( 1 exp ( 2 λ d ) ) ( 1 exp ( 2 λ 0 d ) ) U ,
(5.50)
where
U = λ exp ( 2 λ d ) ( 1 exp ( 2 λ 0 d ) ) λ 0 exp ( 2 λ 0 d ) ( 1 exp ( 2 λ d ) ) .
Note that
U = λ exp ( 2 λ d ) ( exp ( 2 λ d ) exp ( 2 λ 0 d ) ) + ( λ ( exp ( 2 λ d ) exp ( 2 λ 0 d ) ) + ( λ λ 0 ) exp ( 2 λ 0 d ) ) ( 1 exp ( 2 λ d ) ) = o ( 1 ) ,
(5.51)
so we have
T 33 = o ( n ) .
(5.52)
Thus, by (5.46), (5.47), (5.49) and (5.52), we have
T 3 = o ( n ) .
(5.53)
By (5.29), we have
T 4 = 2 d exp ( λ d ) [ ( 2 d λ ) d λ exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 2 t = 2 n ( ε t exp ( λ d ) ε t 1 ) ε t 1 2 d exp ( λ 0 d ) [ ( 2 d λ 0 ) d λ 0 exp ( 2 λ 0 d ) ] ( 1 exp ( 2 λ 0 d ) ) 2 t = 2 n ( e t exp ( λ 0 d ) e t 1 ) e t 1 = 2 d exp ( λ d ) [ ( 2 d λ ) d λ exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 2 σ 1 exp ( 2 λ d ) 2 λ t = 2 n x t 1 T ( β 0 β ) η t + { 2 d exp ( λ d ) [ ( 2 d λ ) d λ exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 2 σ 1 exp ( 2 λ d ) 2 λ 2 d exp ( λ 0 d ) [ ( 2 d λ 0 ) d λ 0 exp ( 2 λ 0 d ) ] ( 1 exp ( 2 λ 0 d ) ) 2 σ 1 exp ( 2 λ d ) 2 λ } t = 2 n η t e t 1 = T 41 + T 42 .
(5.54)
It is easy to show that
T 41 = o ( n ) .
(5.55)
Note that { η t e t 1 , H t } is a martingale difference sequence, so we have
Var ( t = 2 n η t e t 1 ) = t = 2 n E e t 1 2 = Δ n ( θ 0 , σ 0 ) .
Hence,
T 42 = o ( n ) .
(5.56)
By (5.54)-(5.56), we have
T 4 = o ( n ) .
(5.57)
It is easily proved that
T 5 = 4 d exp ( 2 λ d ) [ d λ 1 + ( 1 + d λ ) exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 3 t = 2 n ( ε t exp ( λ d ) ε t 1 ) 2 4 d exp ( 2 λ 0 d ) [ d λ 0 1 + ( 1 + d λ 0 ) exp ( 2 λ 0 d ) ] ( 1 exp ( 2 λ 0 d ) ) 3 t = 2 n ( e t exp ( λ 0 d ) e t 1 ) 2 = { 4 d exp ( 2 λ d ) [ d λ 1 + ( 1 + d λ ) exp ( 2 λ d ) ] ( 1 exp ( 2 λ d ) ) 3 σ 1 exp ( 2 λ d ) 2 λ 4 d exp ( 2 λ 0 d ) [ d λ 0 1 + ( 1 + d λ 0 ) exp ( 2 λ 0 d ) ] ( 1 exp ( 2 λ 0 d ) ) 3 σ 0 1 exp ( 2 λ 0 d ) 2 λ 0 } t = 2 n η t 2 = o ( n ) .
(5.58)

Hence, (5.42) follows immediately from (5.43)-(5.45), (5.53), (5.57) and (5.58). This completes the proof of (5.11) from (5.17), (5.23), (5.28) and (5.42).

It is well known that λ ( 1 exp ( 2 d λ 0 ) ) λ 0 ( 1 exp ( 2 d λ ) ) 1 as n . To prove (5.12), we need to show that
σ 2 2 Ψ n λ 2 | θ = θ 0 Δ n ( θ 0 , σ 0 ) p 1 , n .

This follows immediately from (2.20) and the Markov inequality.

Finally, we will prove (5.13). By (5.11) and (5.12), we have
D n 1 2 F ( θ ) D n T 2 p I m , n
(5.59)
uniformly in θ N n ( A ) for each A > 0 . Thus, by Lemma 5.3,
λ min ( D n 1 2 F ( θ ) D n T 2 ) p 1 , n .
(5.60)

This implies (5.13). □

Lemma 5.5 (Hall and Heyde [56])

Let { S n i , F n i , 1 i k n , n 1 } be a zero-mean, square-integrable martingale array with differences X n i , and let η 2 be an a.s. finite random variable. Suppose that i E { X n i 2 I ( | X n i | > ε ) | F n , i 1 } p 0 for all ε 0 , and i E { X n i 2 | F n , i 1 } p η 2 . Then
S n k n = i X n i D Z ,

where the r.v. Z has the characteristic function E { exp ( 1 2 η 2 t 2 ) } .

6 Proof of theorems

Proof of Theorem 3.1 Take A > 0 , let
M n ( A ) = { θ R m + 1 : ( θ θ 0 ) T D n ( θ θ 0 ) = A 2 }
(6.1)
be the boundary of N n ( A ) , and let θ M n ( A ) . Using (2.19) and the Taylor expansion, for each σ 2 > 0 , we have
Ψ n ( θ , σ 2 ) = Ψ n ( θ 0 , σ 2 ) + ( θ θ 0 ) T Ψ n ( θ 0 , σ 2 ) θ + 1 2 ( θ θ 0 ) T 2 Ψ n ( θ 0 , σ 2 ) θ θ T ( θ θ 0 ) = 1 σ 2 Ψ n ( θ 0 , σ 2 ) + ( θ θ 0 ) T S n ( θ 0 ) 1 2 σ 2 ( θ θ 0 ) T F n ( θ ˜ ) ( θ θ 0 ) ,
(6.2)

where θ ˜ = a θ + ( 1 a ) θ 0 for some 0 a 1 .

Let Q n ( θ ) = 1 2 ( θ θ 0 ) T F n ( θ ˜ ) ( θ θ 0 ) and v n ( θ ) = 1 A D n T 2 ( θ θ 0 ) . Take c > 0 and θ M n ( A ) , and by (6.2), we obtain that
P { Ψ n ( θ , σ 2 ) Ψ n ( θ 0 , σ 2 )  for some  θ M n ( A ) } P { ( θ θ 0 ) T S n ( θ 0 ) Q n ( θ ) , Q n ( θ ) > c A 2  for some  θ M n ( A ) } + P { Q n ( θ ) c A 2  for some  θ M n ( A ) } P { v n T ( θ ) D n 1 2 S n ( θ 0 ) > c A  for some  θ M n ( A ) } + P { v n T ( θ ) D n 1 2 F n ( θ ˜ ) D n T 2 v n ( θ ) c  for some  θ M n ( A ) } P { | D n 1 2 S n ( θ 0 ) | > c A } + P { inf θ N n ( A ) λ min ( D n 1 2 F n ( θ ˜ ) D n T 2 ) c } .
(6.3)
By Lemma 5.2 and the Chebychev inequality, we obtain
P { | D n 1 2 S n ( θ 0 ) | > c A } Var ( D n 1 2 S n ( θ 0 ) ) c 2 A 2 = σ 0 2 c 2 A 2 .
(6.4)
Let A , then c 0 , and using (5.13), we have
P { inf φ N n ( A ) λ min ( D n 1 2 F n ( θ ˜ ) D n T 2 ) c } 0 .
(6.5)
By (6.3)-(6.5), we have
lim A lim inf n P { Ψ n ( θ , σ 2 ) < Ψ n ( θ 0 , σ 2 )  for all  θ M n ( A ) } = 1 .
(6.6)
By Lemma 5.3, λ min ( X n ( θ 0 ) ) as n . Hence λ min ( D n ) . Moreover, from (5.13), we have
inf θ N n ( A ) λ min ( F n ( θ ) ) p .
This implies that Ψ n ( θ , σ 2 ) is concave on N n ( A ) . Noting this fact and (6.6), we get
lim A lim inf n P { sup θ M n ( A ) Ψ n ( θ , σ 2 ) < Ψ n ( θ 0 , σ 2 ) , Ψ n ( θ , σ 2 )  is concave on  N n ( A ) } = 1 .
(6.7)
On the event in the brackets, the continuous function Ψ n ( θ , σ 2 ) has a unique maximum in θ over the compact neighborhood N n ( A ) . Hence
lim A lim inf n P { S n ( θ ˆ n ( A ) ) = 0  for a unique  θ ˆ n ( A ) N n ( A ) } = 1 .
Moreover, there is a sequence A n such that θ ˆ n = θ ˆ ( A n ) satisfies
lim inf n P { S n ( θ ˆ n ) = 0  and  θ ˆ n  maximizes  Ψ n ( θ , σ 2 )  uniquely in  N n ( A ) } = 1 .
This θ ˆ n = ( β ˆ n , λ ˆ n ) is a QML estimator for θ 0 . It is clearly consistent, and
lim A lim inf n P { θ ˆ n N n ( A ) } = 1 .

Since θ ˆ n = ( β ˆ n , λ ˆ n ) are ML estimators for θ 0 , σ ˆ n 2 is an ML estimator for σ 0 2 from (2.9).

To complete the proof, we will show that σ ˆ n 2 σ 0 2 as n . If θ ˆ n N n ( A ) , then β ˆ n N n β ( A ) and λ ˆ n N n λ ( A ) .

By (2.12) and (2.1), we have
ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 = ( x t exp ( λ ˆ n d ) x t 1 ) T ( β 0 β ˆ n ) + ( e t exp ( λ ˆ n d ) e t 1 ) .
(6.8)
By (2.9), (2.11) and (6.8), we have
( n 1 ) σ ˆ n 2 = 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) 2 = 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) { ( x t exp ( λ ˆ n d ) x t 1 ) T ( β 0 β ˆ n ) + ( e t exp ( λ ˆ n d ) e t 1 ) } = 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) ( x t exp ( λ ˆ n d ) x t 1 ) T ( β 0 β ˆ n ) + 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) ( e t exp ( λ ˆ n d ) e t 1 ) = 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) ( e t exp ( λ ˆ n d ) e t 1 ) .
(6.9)
From (6.8), it follows that
t = 2 n { ( x t exp ( λ ˆ n d ) x t 1 ) T ( β 0 β ˆ n ) } 2 = t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) 2 2 t = 2 n ( ε ˆ t exp ( λ ˆ n d ) ε ˆ t 1 ) ( e t exp ( λ ˆ n d ) e t 1 ) + t = 2 n ( e t exp ( λ ˆ n d ) e t 1 ) 2 .
(6.10)
From (2.2), we get
t = 2 n ( e t exp ( λ ˆ n d ) e t 1 ) 2 = t = 2 n ( exp ( λ 0 d ) e t 1 + σ 0 1 exp ( 2 λ 0 d ) 2 λ 0 η t exp ( λ ˆ n d ) e t 1 ) 2 = σ 0 2 1 exp ( 2 λ 0 d ) 2 λ 0 t = 2 n η t 2 + t = 2 n ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) 2 e t 1 2 + 2 σ 0 1 exp ( 2 λ 0 d ) 2 λ 0 t = 2 n ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) η t e t 1 .
(6.11)
By (6.9)-(6.11), we have
( n 1 ) σ ˆ n 2 = 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( e t exp ( λ ˆ n d ) e t 1 ) 2 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( ( x t exp ( λ ˆ n d ) x t 1 ) T ( β 0 β ˆ n ) ) 2 = 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) σ 0 2 1 exp ( 2 λ 0 d ) 2 λ 0 t = 2 n η t 2 + 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) 2 e t 1 2 + 2 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) σ 0 1 exp ( 2 λ 0 d ) 2 λ 0 t = 2 n ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) η t e t 1 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) t = 2 n ( ( x t exp ( λ ˆ n d ) x t 1 ) T ( β 0 β ˆ n ) ) 2 = T 1 + T 2 + 2 T 3 T 4 .
(6.12)
By the law of large numbers and λ ˆ n p λ , we have
1 n 1 T 1 = 2 λ ˆ n 1 exp ( 2 λ ˆ n d ) 1 exp ( 2 λ 0 d ) 2 λ 0 σ 0 2 1 n 1 t = 2 n η t 2 p σ 0 2 2 λ n 1 exp ( 2 λ n d ) 1 exp ( 2 λ 0 d ) 2 λ 0 = σ 0 2 ( n ) .
(6.13)
By the Markov inequality, and noting that E T 2 C A 2 , we obtain
1 n 1 T 2 p 0 ( n ) .
(6.14)
Since { ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) η t e t 1 , H t 1 } is a martingale difference sequence with
Var { ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) η t e t 1 } = ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) 2 E e t 1 2 ,
so we have
Var ( T 3 ) = t = 2 n E ( ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) η t e t 1 ) 2 = t = 2 n ( exp ( λ 0 d ) exp ( λ ˆ n d ) ) 2 E e t 1 2 C ( λ 0 λ ˆ n ) 2 t = 2 n E e t 1 2 C A 2 .
(6.15)
By the Chebychev inequality, we have
1 n 1 T 3 p 0 ( n ) .
(6.16)
By (5.33), we have
T 4 = t = 2 n ( ( x t T ( β 0 β ˆ n ) exp ( λ ˆ n d ) x t 1 ) T ( β 0 β ˆ n ) ) 2 2 t = 2 n ( x t T ( β 0 β ˆ n ) ) 2 + t = 2 n ( exp ( λ ˆ n d ) x t 1 T ( β 0 β ˆ n ) ) 2 = o ( n ) .