Empirical likelihood confidence regions of the parameters in a partially single-index varying-coefficient model

Abstract In this paper, we investigate a partially single-index varying-coefficient model, and suggest two empirical log-likelihood ratio statistics for the unknown parameters in the model. The first statistic is asymptotically distributed as a weighted sum of independent chi-square variables under some mild conditions. It is proved that another statistic, with adjustment factor, is asymptotically standard chi-square under some suitable conditions. These useful statistics could be used to construct the confidence regions of the parameters. A simulation study indicates that, with the increase of sample size, the coverage probability of the confidence region constructed by us gradually approaches the theoretical value.


Introduction
Consider a partially single-index varying-coe cient model of the form Y = g τ (β τ U)X + θ τ Z + ε, (1.1) where (U, X, Z) ∈ R p × R d × R q is a vector of covariates, Y is the response variable, β is a p × vector of unknown parameters, θ is a q × vector of regression coe cient, g (·) is a d × vector of unknown functions and ε is a random error with E(ε | U, X, Z) = and Var(ε | U, X, Z) = σ . Assume that ε and (U, X, Z) are independent. In order to make sure the identi ability, it is often assumed that β = , where · denotes the Euclidean metric. Feng and Xue [1] considered the problem of model detection and estimation for single-index varyingcoe cient model, they identi ed the true model structure and obtained a new semiparametric model, which is partially linear single-index varying-coe cient model. As we all know, the optimal parametric estimation rate is n − / and the optimal nonparametric estimation rate is n − / , if we treat a parametric component as a nonparametric component, the problems of data over tting and e ciency loss will occur.
Model (1.1), may include crossproduct terms of some components of X and Z, is easily interpreted in real applications because it has the features of the partial linear model, the single-index model and the varyingcoe cient model, which make the model more general. Model (1.1) takes many other regression models as special cases, such as linear model, partial linear model, varying-coe cient model, single-index model, partial linear single-index model, single-index varying-coe cient model, etc. The linear component θ τ Z provides a simple summary of covariates e ects which are of the main scienti c interest. The index β τ U enables us to simplify the treatment of the multiple auxiliary variables, and the functions g (·)s enrich model exibility. It is well known that in order to construct the con dence region of (β , θ ) by using the normal approximation method, it is necessary to construct the embedded estimation of the asymptotic variance of the corresponding estimator, which includes the estimation of parametric and non-parametric components. The empirical likelihood method avoids this shortcoming and its structure does not include estimation of parameter (β , θ ). In this paper, we can construct an empirical likelihood ratio function for (β , θ ) by assuming g (·) and its derivativeġ (·) to be known functions.
As far as we know, there is not much literature on this model by using empirical likelihood method, although it has been applied to varieties of models. In this paper, we consider the problem of a method of constructing con dence regions for (β , θ ), since the empirical likelihood method, which is introduced by Owen [2,3], has many advantages. For example, it does not require the construction of a pivotal quantity, and it does not impose a priori constraint on the shape of the region. Owen [4] proved the empirical log-likelihood ratio is asymptotically a standard chi-square variable when he applied the empirical likelihood to linear regression model, so that it can be applied to constructing the con dence region of the regression parameter. There are studies related to empirical likelihood, such as Wang and Rao [5], Wang, Linton and Härdle [6], Xue and Zhu [7][8][9], Zhu and Xue [10], You and Zhou [11], Qin and Zhang [12], Stute, Xue and Zhang [13], Xue [14,15],Wang et al. [16], Huang and Zhang [17], Wang and Xue [18], Lian [19], Xiao [20], Zhou, Zhao and Wang [21], Fang, Liu and Lu [22], Arteaga-Molina and Rodriguez-Poo [23], among others.
The rest of this article is organized as follows. In Section 2, we give an estimated empirical likelihood ratio, and investigate the asymptotic properties of the proposed estimators. In Section 3, we give an adjusted empirical log-likelihood and derive its asymptotic distribution. Section 4 reports a simulation study. Proofs of theorems and lemmas are postponed in Appendix A and Appendix B, respectively.

. Methodology
where ε i s are i.i.d. random errors with mean and nite variance σ . Assume that ε i s are independent of Our primary interest is to construct the con dence region of (β , θ ). In order to construct an empirical likelihood ratio function for (β , θ ), we introduce an auxiliary random vector whereġ (·) denotes the derivative of the function vector g (·), and w(·) is a bounded weight function with a bounded support Tw. In order to control the boundary e ect in the estimations of g (·) andġ (·), it is necessary to introduce this function here. To convenience, we take w(·) the indicator function of the set Tw. Hence, the problem of testing whether (β, θ) is the true parameter is equivalent to testing whether E[η i (β, θ)] = if (β, θ) is the true parameter. By Owen [2], this can be done by using the empirical likelihood. That is, we can de ne the pro le empirical likelihood ratio function as follows It can be shown that − log Ln(β , θ ) is asymptotically chi-squared with p + q degrees of freedom. However, Ln(β, θ) cannot directly be used to make statistical inference of (β , θ ) because Ln(β, θ) contains the unknowns g(·) andġ(·). A common approach is to replace g(·) andġ(·) in Ln(β, θ) by their estimators and de ne an estimated empirical likelihood function. When (β, θ) is known, model (1.1) can be treated as a varying-coe cient partially linear regression model. Then, we can use the methodology of pro le least square to estimate g (·) andġ (·). We estimate the vector functions g (·) andġ (·) via the local linear regression technique. The local linear estimators for g (t) anḋ g (t) are de ned asg(t; β , θ ) =ã andg(t; β , θ ) =b at xed point (β , θ ), whereã andb minimize the sum of weighted squares is a kernel function, and h = hn is a bandwidth sequence that decreases to as n increases to ∞. Simple calculation yields Yn) τ and Z = (Z , · · · , Zn) τ . Letg(t; β, h) andg(t; β, h) denote the estimators of g(t) andġ(t) with the bandwidths h and h = h n respectively. Therefore, letη i (β, θ) denote η i (β, θ) with g(β τ U i ) andġ(β τ U i ) replaced byg(β τ U i ; β) anḋ g(β τ U i ; β) respectively for i = , · · · , n. Then an estimated empirical log-likelihood ratio function is de ned asl where λ = λ(β, θ) is determined by (2.8) Firstly, we write G β = (g τ (β τ U )X , · · · , g τ (β τ Un)Xn) τ . Hence, we can derive a estimator of G β by (2.5), which isG here d denotes the d-dimensional zero vector, let ξ i,β and W i,β denote ξ ,β and W ,β with t replaced by β τ U i for i = , · · · , n. Hence we get an approximate linear model as follows here In denotes the nth identity matrix.
Secondly, we can use the least square theory to obtain we can get the estimators of g(·) andġ(·) at t = β τ u by substituting (2.12) into (2.5) as follows Thirdly, noticed that (2.13) and (2.12) are based on a known β . Under the condition β = , we have a estimatorβ of β by minimize the following equation that (2.14) Obviously, solving (2.14) is equivalent to solving the following equation under the condition β = that Finally, we get the nal estimatorsθ,ĝ(·) andĝ(·) with β replaced byβ inθ,g(·) andg(·), respectively.

. Asymptotic properties
To obtain the asymptotic distribution of − logL(β , θ ), we give a set of conditions rst. These conditions are not very hard to satisfy, similar restrictions were also made by Härdle, Hall and Ichimura [24], Xia and Li [25], Wang and Xue [18], Xue and Pang [26].
(C1) The density function f (t) of β τ U is bounded away from zero for t ∈ Tw and β near β , and satis es the Lipschitz condition of order on Tw, where Tw is the support of w(t).
(C2) The functions g j (t), ≤ j ≤ q, have continuous second derivatives on Tw, where g j (t) are the jth components of g (t).
is a symmetric probability density function with a bounded support and satis es the Lipschitz condition of order and t K(t)dt ≠ .
The following theorem shows that − logL(β , θ ) is asymptotically distributed as a weighted sum of independent χ variables.
(2. 16) In order to apply Theorem(2.1) to construct a con dence region for (β , θ ), we have to estimate the unknown weights w i s consistently. By the plug-in method, A(β , θ ) and B(β , θ ) can be estimated consistently bŷ where K (·) is a kernel function, and bn is a bandwidth with < bn → . This means thatŵ j , the eigenvalues ofV(β,θ) =B − (β,θ)Â(β,θ), consistently estimate w j for j = , · · · , p+ q. Letĉ −α be the −α quantile of the conditional distribution of the weighted sumŝ =ŵ χ , +· · ·+ŵp+q χ ,p+q given the data. Then we can de ne an approximate − α con dence region for (β , θ ) as In practice, we can calculate the conditional distribution of the weighted sumŝ, given the sample by using Monte Carlo simulations, by repeatedly generating independent samples χ , , · · · , χ ,p+q from the χ distribution.

Adjusted empirical likelihood
When we use Theorem (2.1) to construct con dence regions of (β, θ), the weights w i s need to be estimated, the accuracy of con dence region is decreased. Let ρ(β , θ ) = (p + q)/tr{V (β , θ )}, where V (β , θ ) is de ned in Theorem (2.1). By Rao and Scott [27], the distribution of ρ(β , θ ) p+q i= w i χ ,i can be approximated by χ p+q , which is a standard chi-square distribution with p + q degrees of freedom. Therefore, an improved Rao-Scott adjusted empirical log-likelihood ratio can be de ned aŝ However, the accuracy of this approximation depends on the w i s. Xue and Wang [28] proposed another adjusted empirical log-likelihood. By using an approximate result in the above, the adjustment technique is developed by Wang and Rao [5]. Note thatρ(β, θ) can be written aŝ where A − represents a generalized inverse of matrix A. By examining the asymptotic expansion of − logL(β, θ), similar to Xue and Wang [28], we can de ne an adjustment factor The adjusted empirical log-likelihood ratio is de ned bŷ 2) The following theorem shows that the adjusted empirical log-likelihood ratio is asymptotically distributed as standard chi-square. Invoking Theorem (3.1),l(β, θ) can be used to construct an approximate con dence region for (β , θ ). Thereby, we can obtain the con dence region of (β, θ) To apply Theorem (3.1) to construct a con dence region for (β , θ ), we only need to estimate the adjustment factorr(β, θ) by replacing (β, θ) by (β,θ). The value of − logL(β, θ) is not depend on the estimation of (β, θ). In practice, we can calculate the numerical value of − logL(β, θ) by using the package in R(see 'emplik', http://cran.r-project.org/web/packages/emplik/).

Simulation study
Consider the regression model ] (t). The bandwidthsĥ =ĥ n − / (log n) − / andĥ =ĥ opt respectively, whereĥ opt was an optimal bandwidth by using the generalized cross validation(GCV). It's not hard to see that the bandwidthĥ satis es condition (C4). Table 1 The coverage probabilities of the con dence regions on (β, θ) when the nominal level is 0.95 n Resampling times Coverage probabilities . . . . .
From Table 1, we can see that the probability of coverage increases with n until it approaches the theoretical value . .

. Appendix B: Proofs of lemmas
Proof of Lemma 5.1. This lemma is a direct extension of known results in nonparametric function estimation, we can nd its proof in Wang and Xue [18], they used the result of Theorem 2 in Einmahl and Mason [29], we omit the detail here.
To make formulations more concise, we give some notations here. Denote G = {g : Tw × B → R d }, g G = sup t∈Tw ,β∈B g(t; β) . From Lemma (5.1), we have g − g G = o P ( ) and g −ġ G = o P ( ). Hence we can assume that g lies in G δ with δ = δn → and δ > , where
Proof of Lemma 5.2. From (2.9) and (2.12), we have Let's consider the three terms on the right-side hand of the equation given in the above. First, we show that . (5.6) Note that each entry of the above matrices has a standard kernel estimation form, hence uniformly for t ∈ Tw and β ∈ Bn, here ⊗ denotes the Kronecker product.
Second, we will show that Similar to (5.4) and (5.7), we have uniformly for t ∈ Tw and β ∈ Bn. Then, we have So, we have the result as followed Checking the condition (C6), it's obvious that lim n→∞ √ n(h + {log n/nh} / ) = .