Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access October 11, 2022

Robust estimation for varying coefficient partially functional linear regression models based on exponential squared loss function

  • Jun Sun EMAIL logo and Wanrong Liu
From the journal Open Mathematics

Abstract

In this article, we present a new robust estimation procedure based on the exponential squared loss function for varying coefficient partially functional linear regression models, where the slope function and nonparametric coefficients are approximated by functional principal component basis functions and B splines, respectively. Under some mild conditions, the convergence rates of the resulted estimators are obtained. Simulation studies indicate that our proposed method can achieve robustness against outliers or heavy-tail error distributions and perform no worse than the popular least-squares estimation method for the normal error case. Finally, a real data example is used to illustrate the application of the proposed method.

MSC 2010: 62G05; 62G35

1 Introduction

Varying coefficient models have become an important research topic in modern statistics and have been frequently applied for addressing problems in various scientific areas, such as economics, biomedical science, finance, medicine, and engineering. One superior characteristic that makes the varying coefficient models attractive is that they have the ability to explore the dynamic pattern by allowing coefficients to vary over the so-called index variable. Moreover, varying coefficient models retain the virtues of both parametric and nonparametric models; therefore, the possible modeling biases can be reduced and the curse of dimensionality can be avoided. Recently, much effort has been devoted to studying the methodological, theoretical, and applied sides around varying coefficient models, and we refer, for instance, to [1,2,3, 4,5,6].

Nowadays, technological innovations allow us to create and store large-scale data sets. In many cases, these data sets can be viewed as functions of time or spatial points or some other continua. Such type of data is termed functional data. In recent years, functional data analysis has attracted much attention among statisticians and practitioners. As the fundamental tool and mathematical framework for functional data analysis, the functional linear model (FLM), which characterizes the relationship between a scalar response and a functional predictor with a linear operator, is seeing a rise in popularity in practice. For a systematic coverage of statistical methods and inferences of FLM, see [7,8, 9,10,11, 12,13] and others. It is well known that the FLM model is useful. But in many practical situations, the simple relationship between the response variable and functional covariate may not be sufficient. To improve the power of prediction and interpretation of FLM, several extensions of FLM have been proposed for the analysis of mixed data with complex structures. Recently, Peng et al. [14] incorporated some real-valued covariates in FLM and proposed a mixed data model, namely, a varying coefficient partially functional linear regression model (VCPFLRM). The VCPFLRM takes the form

(1) Y = Z T β ( U ) + T α ( t ) X ( t ) d t + ε ,

where Y is a real-valued random variable defined on a probability space ( Ω , , P ) , Z = ( Z 1 , , Z p ) T and U are observed explanatory variables and so-called index variables defined on the same probability space, respectively. β ( U ) = ( β 1 ( U ) , , β p ( U ) ) T consists of p unknown varying-coefficient functions, { X ( t ) : t T } is a zero mean, the second-order stochastic process valued in H = L 2 ( T ) , the set of all square-integrable functions on T with the inner product x , y = T x ( t ) y ( t ) d t , x , y H and norm x = x , x 1 2 , α ( t ) H is unknown slope function, the error term ε is assumed to be independent of ( Z , U , X ) with E ( ε Z , U , X ) = 0 . Without loss of generality, we assume that U takes values from the unit interval [ 0 , 1 ] and suppose throughout that T = [ 0 , 1 ] . The VCPFLRM is flexible and includes some existing statistical models. For example, when α ( t ) = 0 , it reduces to the varying coefficient model; when β ( ) are all set as constant, it reduces to the partially FLM [15]; when p 1 and Z 1 = 1 , it reduces to the semi-FLM [16].

To estimate the slope function α ( ) and the coefficient functions β j ( ) , j = 1 , , p in VCPFLRM, Peng et al. [14] proposed the least squares estimation procedure based on B-spline basis function approximations. Feng and Xue [17] proposed a profile least squares estimation by using functional principal component (FPC) analysis and local linear smoothing technique. However, all the above existing articles are built on least square method and the assumptions that the error is normally distributed, which are sensitive to outliers and heavy-tailed error distributions. Hence, robustness against outliers is a very important issue in the VCPFLRM. Recently, Wang et al. [18] investigated a new estimation method for linear regression models by minimizing exponential squared loss (ESL) ρ ( ) , i.e., ρ ( t ) = 1 exp ( t 2 h ) , where the tuning parameter h ( 0 , ) controls the degree of robustness and efficiency for the estimators. For large h , 1 exp ( t 2 h ) t 2 h , and therefore the proposed estimators are similar to the least squares estimators in the extreme case. For a small h , the outliers in the observations t will receive a small impact on the estimators. Therefore, this method is more robust to outliers than the least squares method. Due to its nice property, this method has been quickly extended to the statistical models for the sake of robust inference, the recent literature include but are not restricted to Jiang et al. [19] for the varying coefficient partially nonlinear model, Song et al. [20] for the single index varying coefficient model, Yu et al. [21] for the semi-functional linear regression model, Lv et al. [22] for the generalized varying coefficient partially linear model. Even though exponential-squared-based estimation has been well developed, the theory and methodology for VCPFLRM are none. It is therefore our impetus for applying this method to VCPFLRM.

The main objective of this article is to propose a new robust estimation procedure based on the ESL function for the VCPFLRM. To estimate α ( ) in VCPFLRM, two commonly used approaches have been proposed, including the FPC basis method and the basis function expansions (such as B-spline functions approximation) method. Since FPC has some desirable properties in approximating a smooth function, for example, it is a data-driven technique where the data can be represented as linear combinations of functions estimated from the data. Besides, by requiring the order of generalized Fourier coefficients based on the FPC basis, we can control the smoothness of the slope function. Due to these advantages mentioned above, we first use B splines and FPC basis to approximate the varying coefficient functions and slope function in model (1), respectively. With the help of these approximations, the estimation problem based on the ESL function for VCPFLRM becomes a problem of estimating the coefficients in the linear combinations. Under some mild regularity conditions, we show that the estimators of coefficient functions and slope functions can achieve the optimal convergence rate. Moreover, our numerical studies and a real data application in the sequel demonstrate that our proposed method is at least comparable to some existing works. It is worth pointing out that there exist some essential differences between our proposed method and that of Wang et al. [18] and Yu et al. [21]. First, our studied VCPFLRM is flexible enough to cover the classical linear regression model and the semi-functional linear regression model as special cases. Accordingly, a different approach is required to deal with varying coefficient functions β ( ) , this difference leads to different theoretical properties for proposed estimators. Second, compared with the works of [18] and [21], though the estimate of β ( ) and α ( ) can be easily obtained by transforming the original VCPFLRM to a standard linear regression model, we need to choose extra tuning parameters by the data-driven procedure for spline estimates of β ( ) , such as the number of interior knots, which controls the dimensions of the spline spaces used to approximate the true varying coefficient functions.

The rest of the article is organized as follows. In Section 2, we describe the proposed estimation procedure for VCPFLRM. In Section 3, we establish the theoretical properties of the resulted estimators. In Section 4, we develop an estimation algorithm and discuss the selection of tuning parameters. Section 5 illustrates the numerical performance of the proposed method through simulation experiments. A real data analysis is presented in Section 6, and some concluding remarks are followed in Section 7. All proofs of theorems are provided in the Appendix.

2 Estimation procedure

Let { Y i , Z i , U i , X i } i = 1 n be an independent and identically distributed sample that is generated from model (1). Motivated by [18], we estimate the slope function α ( t ) and nonparametric coefficients β ( U ) in VCPFLRMs by maximizing the following ESL function

(2) i = 1 n exp ( Y i Z i T β ( U i ) 0 1 α ( t ) X i ( t ) d t ) 2 h .

In what follows, we first approximate the nonparametric components β ( U ) by B splines. Let B ( u ) = ( B 1 ( u ) , , B K ( u ) ) T be the B-spline basis functions, where K = k n + q + 1 is the number of basis functions, q is the degree of the spline, and k n is the number of internal knots that control the smoothness of nonparametric functions. Thus, for any function β j ( u ) , j = 1 , , p , we have

(3) β j ( u ) k = 1 K η j k B k ( u ) .

Note that it is possible to specify different K for each component, but we assume that they are the same for simplicity. Next, we construct the covariance function and the empirical covariance function for the random process X ( ) as follows:

(4) K ( s , t ) = Cov ( X ( s ) , X ( t ) ) , K ˆ ( s , t ) = 1 n i = 1 n X i ( s ) X i ( t ) .

Obviously, the covariance function K ( s , t ) defines a linear operator that maps a function f to K f given by ( K f ) ( s ) = K ( t , s ) f ( t ) d t . We assume that the linear operator with kernel K ( s , t ) is positive definite. Let λ 1 > λ 2 > > 0 and λ ˆ 1 λ ˆ 2 λ ˆ n + 1 = = 0 be the ordered eigenvalue sequences of the linear operators with kernels K ( s , t ) and K ˆ ( s , t ) . { ϕ j } and { ϕ ˆ j } are an orthonormal sequence of eigenfunctions, and it is clear that the vectors ϕ 1 , ϕ 2 , form an orthonormal basis in H . Then, consider the spectral decomposition of the covariance function K ( s , t ) and K ˆ ( s , t ) as follows:

(5) K ( s , t ) = j = 1 λ j ϕ j ( s ) ϕ j ( t ) , K ˆ ( s , t ) = j = 1 λ ˆ j ϕ ˆ j ( s ) ϕ ˆ j ( t ) .

According to the Karhunen-Loèvere presentation, two L 2 -valued functions X ( t ) and α ( t ) have the following expansions:

(6) X ( t ) = i = 1 ξ i ϕ i ( t ) , α ( t ) = j = 1 γ j ϕ j ( t ) ,

where ξ i = 0 1 X ( t ) ϕ i ( t ) d t , γ j = 0 1 α ( t ) ϕ j ( t ) d t . It follows that ξ i s are uncorrelated random variables with mean 0 and variance E ( ξ i 2 ) = λ i . For more details, see [23].

Let Π i = ( Z i 1 B 1 ( U i ) , , Z i 1 B K ( U i ) , , Z i p B K ( U i ) ) T , η = ( η 1 T , , η p T ) T with η j = ( η j 1 , , η j K ) T . Based on equations (3)–(6), model (1) can be rewritten as follows:

(7) Y i Π i T η + j = 1 m γ j ϕ j , X i + ε i , i = 1 , , n ,

where m n is the truncation level that trades off approximation error against variability and typically diverges with n . Replace ϕ j by ϕ ˆ j for j = 1 , , m , model (7) can be rewritten as follows:

(8) Y i Π i T η + V i T γ + ε i , i = 1 , , n ,

where V i = { X i , ϕ ˆ j } j = 1 , , m , γ = ( γ 1 , , γ m ) T . Thus, the objective function (2) becomes

(9) Q n ( η , γ ) = i = 1 n exp ( Y i Π i T η V i T γ ) 2 h .

Denote the maximizer of (9) by η ˆ and γ ˆ . Consequently, the estimated component functions are given by β ˆ j ( u ) = B T ( u ) η ˆ j , α ˆ ( t ) = j = 1 m γ ˆ j ϕ ˆ j ( t ) .

3 Theoretical results

In this section, we aim to establish the theoretical properties of the resulting estimators. First, we need to make some definitions and assumptions. We assume that the true functions in model (1) are α 0 ( ) and β 0 j ( ) , j = 1 , , p . We use C to denote a generic positive constant that can vary from line to line. a n b n means that a n b n is bounded away from zero and infinity as n . The notation is the L 2 norm for a function or the Euclidean norm for a vector. Let S = X ( t ) , α 0 ( t ) , ψ h ( ς ) = exp ( ς 2 h ) , F ( z , u , s , h ) = E ( ψ h ( ε ) Z = z , U = u , S = s ) , and G ( z , u , s , h ) = E ( ψ h ( ε ) 2 Z = z , U = u , S = s ) . The following regularity conditions will be required, which are common and some can be found in [14], [24].

  1. E X ( ) 4 < C < , E [ ξ i 4 ] C λ i 2 , for i 1 , for ι > 2 , E ( Z j ι ) < for j = 1 , , p .

  2. The eigenvalues λ i and score coefficients γ j satisfy the following conditions, respectively.

    1. There exist some constant C and a > 1 such that

      C 1 i a λ i C i a , λ i λ i + 1 C i a 1 , i 1 .

    2. There exist some constant C and b > a 2 + 1 such that

      γ j C j b , j 1 .

  3. The tuning parameter m satisfies m n 1 ( a + 2 b ) .

  4. The density function of U , f U ( u ) is bounded away from 0 and infinity on [ a , b ] . Furthermore, we assume that f U ( u ) is continuously differentiable on ( a , b ) .

  5. β j ( t ) r , j = 1 , , p for some r > 2 , where r is the collection of all functions on [ 0 , 1 ] whose d th order derivative satisfies the Hölder condition of the order ς with r d + ς and 0 < ς 1 .

  6. F ( z , u , s , h ) and G ( z , u , s , h ) are continuous with respect to ( z , u , s ) and F ( z , u , s , h ) < 0 for any h > 0 .

  7. E ( ψ h ( ε ) z , u , s ) = 0 , E ( ψ h 2 ( ε ) z , u , s ) , E ( ψ h 3 ( ε ) z , u , s ) , E ( ψ h ( ε ) z , u , s ) are continuous with respect to ( z , u , s ) .

Theorem 1

Suppose that the regularity conditions (C1)–(C7) hold, and the number of interior knots k n n 1 2 r + 1 , then we have

α ˆ ( ) α 0 ( ) 2 = O p ( n 2 r 2 r + 1 + n 2 b 1 a + 2 b ) , β ˆ j ( ) β 0 j ( ) 2 = O p ( n 2 r 2 r + 1 + n 2 b 1 a + 2 b ) , j = 1 , p .

Remark 1

Theorem 1 derives the convergence rates of the estimators α ˆ ( ) and β ˆ j ( ) . When we take m k n n 1 2 r + 1 and 1 2 + a < r a + 2 b 1 2 , then β ˆ j ( ) β 0 j ( ) 2 = O p n 2 r 2 r + 1 and α ˆ ( ) α 0 ( ) 2 = O p n 2 r a 2 r + 1 , we see that β ˆ j ( ) obtains the optimal global convergence rate, but α ˆ ( ) obtains O p ( n 2 r a 2 r + 1 ) , the introduction of the varying coefficient component affects the convergence rate of the slope function α ( ) . When we take m k n n 1 a + 2 b and r > a + 2 b 1 2 , we show β ˆ j ( ) β 0 j ( ) 2 = O p n a + 2 b 1 a + 2 b and α ˆ ( ) α 0 ( ) 2 = O p n 2 b 1 a + 2 b , so α ˆ ( ) attains the same convergence rate as those of the estimators of [10] and [15], which are optimal in the minimax sense. However, β ˆ j ( ) do not obtain the optimal global convergence rate, so the convergence rates of varying coefficient components are also affected by α ( ) . When we take k n n 1 2 r + 1 m n 1 a + 2 b , and r = 2 b 1 2 ( a + 1 ) , both the estimators α ˆ ( ) and β ˆ j ( ) obtain the optimal global convergence rates.

4 Computational issue

4.1 Algorithm implementation

In this subsection, we apply the Newton-Raphson iteration algorithm to solve the estimating equation (9) since the ESL is smooth with continuous second-order derivative. To be more specific, let Ψ i = ( Π i T , V i T ) T , θ = ( η T , γ T ) T , and Q n ( θ ) = i = 1 n exp { ( Y i Ψ i T θ ) 2 h } , then we outline the iterative algorithm as follows:

  1. Start with an initial estimator θ ˆ ( 0 ) .

  2. Update the estimator θ ˆ ( k ) of θ by using the iterative procedure

    θ ˆ ( k + 1 ) = θ ˆ ( k ) 2 Q n ( θ ) θ θ T θ = θ ˆ ( k ) 1 × Q n ( θ ) θ θ = θ ˆ ( k ) .

  3. Iterate Step 2 until convergence and denote the final estimator of θ as θ ˆ .

4.2 Selection of tuning parameter h

In this subsection, we will address how to select the tuning parameter h for the proposed estimators in practice. For simplicity, we further assume that ε is independent of ( Z , U , X ) . Following the advice of [19], we can estimate F ( h ) and G ( h ) in implementation by

G ˆ ( h ) = 1 n i = 1 n 4 h 2 { ε ˆ i exp ( ε ˆ i 2 h ) } 2 , F ˆ ( h ) = 1 n i = 1 n 4 h 2 exp ( ε ˆ i 2 h ) ( ε ˆ i 2 h 2 ) ,

where ε ˆ i = Y i Π i T η ˆ V i T γ ˆ , η ˆ , and γ ˆ are estimated based on the pilot estimates. In practice, one can use some existing methods such as mean regression and median regression methods to obtain these pilot estimates. Therefore, the optimal h can be chosen as

h ˆ opt = arg min h { G ˆ ( h ) F ˆ 2 ( h ) σ ˆ 2 } ,

where σ ˆ is estimated based on the pilot estimator. Then, we can use the grid search method recommended in [25] to find h , we find that the possible grid points for h can be selected from 0.5 σ ˆ × 1.0 2 j , j = 1 , 2 , , k for some fixed k , such as k = 50 or k = 100 . In our numerical analysis, we set k = 100 .

4.3 Extra parameter selections

To achieve good numerical performance, one also needs to choose the number of interior knots k n and the tuning parameter m , appropriately. There are many methods such as the generalized cross-validation selector [26] and the BIC selector [27] can be used. In this work, we use cubic splines in all our numerical examples. Then use a BIC-type criterion to select k n and m , that is,

BIC ( k n , m ) = log i = 1 n 1 exp ( Y i Π i T η ˆ V i T γ ˆ ) 2 h ˆ opt + log n n ( p ( k n + 4 ) + m ) ,

where η ˆ and γ ˆ are the proposed estimators by minimizing (9) with the number of interior knots k n and the first m principal components. In this work, we select the tuning parameter and the number of interior knots by minimizing the above BIC criterion on the ranges 1 m 8 and [ n 1 9 ] k n [ 8 n 1 9 ] , respectively, where [ ϖ ] denotes the largest integer not greater than ϖ .

5 Simulation experiments

In this section, we conduct simulations to compare the finite sample performance of our proposed estimation method (ESL) with the other two methods: the quantile regression method [24] with quantile τ = 0.5 (QR) and the least square (LS) method [14]. We generated data from the model as follows:

Y = Z 1 β 1 ( U ) + Z 2 β 2 ( U ) + Z 3 β 3 ( U ) + T α ( t ) X ( t ) d t + ε ,

where the coefficient functions are β 1 ( u ) = 2 sin ( 2 π u ) , β 2 ( u ) = 8 u ( 1 u ) , and β 3 ( u ) = 5.5 + 10 exp ( 2 u 1 ) , the covariate Z = ( Z 1 , Z 2 , Z 3 ) T , with each Z i , i = 1 , 2 , 3 , being standard normal distribution and the correlation between Z i and Z i being 0 . 5 i j , U U ( 0 , 1 ) . For the functional linear component, we take the same form as [21], that is, α ( t ) = 2 sin ( π t 2 ) + 3 2 sin ( 3 π t 2 ) and X ( t ) = j = 1 50 ξ j v j ( t ) , where the ξ j are independently distributed as the normal with mean 0 and variance λ j = 10 ( ( j 0.5 ) π ) 2 , v j ( t ) = 2 sin ( ( j 0.5 ) π t ) . In order to examine the robustness and efficiency of our proposed method, the following three different error distributions are considered:

Case 1. ε follows the standard normal distribution, abbreviated as N ( 0 , 1 ) .

Case 2. ε follows the standard t distribution with three degrees of freedom, abbreviated as t ( 3 ) , which is used to produce heavy-tailed distribution.

Case 3. ε follows the contaminated normal distribution with k % data from N ( 0 , ) and remaining 1 k % data from N ( 0 , 1 ) , abbreviated as 0.9 N ( 0 , 1 ) + 0.1 N ( 0 , 10 ) . Obviously, the k % data can be regarded as outliers, and we set k = 10 and = 10 in this case.

To evaluate the performance of our estimation for slope function α ( ) and coefficient functions β j ( ) , we define the following square root of the average square errors (RASE):

RASE 0 = 1 n 1 , grid i = 1 n 1 , grid ( α ˆ ( t i ) α ( t i ) ) 2 , RASE j = 1 n 2 , grid i = 1 n 2 , grid ( β ˆ j ( u i ) β j ( u i ) ) 2 , j = 1 , 2 , 3 ,

where { t i , i = 1 , , n 1 , grid } and { u i , i = 1 , , n 2 , grid } are the grid points at which the functions α ˆ ( ) and β ˆ j ( ) are evaluated, respectively. In our simulation experiments, we simply set n 1 , grid = 201 and n 2 , grid equals to the sample size n in each scenario. A total of 200 replications with n = 100 , 200 , 600 are performed. The mean, median, and standard error (SE) of the RASE values for three error distribution cases are reported in Tables 1, 2, 3. Several observations can be found in Tables 13. First, when the error follows a standard normal distribution, the LS estimate slightly outperforms the QR and ESL estimates, and ESL performs slightly better than QR. Second, ESL is significantly superior to LS in the other two cases, and this is expected because the performance of LS is sensitive to heavy-tailed distributions or outliers, ESL seems to be slightly better than QR in most cases. Third, for the given error distribution, all the considered estimators become better and better as the sample size n increase gradually. Figure 1 demonstrates the performances of average curve estimates of β j ( ) and α ( ) when t 0.9 N ( 0 , 1 ) + 0.1 N ( 0 , 10 ) and n = 100 . The plots for other cases are similar and are omitted. All these results indicate that our proposed method works well under the considered settings. To further demonstrate the performance of three different methods, we also consider the Cauchy error distribution. The results are given in Figure 2. We find LS has poor performance since the LS estimates will break down and suffer from consistency when the error is Cauchy.

Table 1

Simulation results for comparison under N ( 0 , 1 ) error distribution when the sample size n = 100 , 200 , 600

n Method RASE 0 RASE 1 RASE 2 RASE 3
100 LS Mean 0.4443 0.2916 0.2635 0.2743
Median 0.4100 0.2823 0.2596 0.2628
SE 0.1733 0.0844 0.0902 0.0948
QR Mean 0.4477 0.3480 0.3194 0.3263
Median 0.4205 0.3261 0.3155 0.3227
SE 0.1619 0.1036 0.1126 0.1154
ESL Mean 0.4354 0.3274 0.2952 0.3012
Median 0.4105 0.3146 0.2868 0.2878
SE 0.1472 0.1028 0.1124 0.1061
200 LS Mean 0.2952 0.2137 0.1679 0.1665
Median 0.2810 0.2081 0.1590 0.1614
SE 0.0977 0.0491 0.0600 0.0582
QR Mean 0.3285 0.2473 0.2094 0.2004
Median 0.3022 0.2427 0.1992 0.1901
SE 0.1133 0.0628 0.0752 0.0673
ESL Mean 0.3146 0.2328 0.1865 0.1845
Median 0.2972 0.2294 0.1773 0.1777
SE 0.1033 0.0539 0.0711 0.0620
600 LS Mean 0.1728 0.1529 0.0928 0.0921
Median 0.1639 0.1505 0.0917 0.0887
SE 0.0631 0.0266 0.0305 0.0299
QR Mean 0.1805 0.1785 0.1127 0.1126
Median 0.1682 0.1748 0.1101 0.1113
SE 0.0668 0.0284 0.0360 0.0371
ESL Mean 0.1762 0.1696 0.0993 0.1003
Median 0.1670 0.1664 0.0984 0.0989
SE 0.0624 0.0196 0.0321 0.0320
Table 2

Simulation results for comparison under t ( 3 ) error distribution when the sample size n = 100 , 200 , 600

n Method RASE 0 RASE 1 RASE 2 RASE 3
100 LS Mean 0.5620 0.4460 0.4235 0.4098
Median 0.4624 0.4126 0.3908 0.3735
SE 0.3453 0.2233 0.2226 0.1807
QR Mean 0.4670 0.4122 0.3796 0.3815
Median 0.4487 0.3772 0.3554 0.3543
SE 0.1551 0.1505 0.1506 0.1388
ESL Mean 0.4696 0.3964 0.3752 0.3777
Median 0.4403 0.3666 0.3473 0.3443
SE 0.1628 0.1452 0.1627 0.1637
200 LS Mean 0.3350 0.2967 0.2652 0.2621
Median 0.3080 0.2810 0.2608 0.2452
SE 0.1280 0.0896 0.0951 0.1034
QR Mean 0.3101 0.2688 0.2351 0.2395
Median 0.2884 0.2634 0.2256 0.2250
SE 0.1006 0.0654 0.0798 0.0913
ESL Mean 0.3151 0.2615 0.2267 0.2238
Median 0.2882 0.2522 0.2249 0.2114
SE 0.1044 0.0721 0.0767 0.0852
600 LS Mean 0.1944 0.2007 0.1579 0.1506
Median 0.1783 0.1929 0.1523 0.1430
SE 0.0887 0.0628 0.0572 0.0534
QR Mean 0.1736 0.1852 0.1287 0.1269
Median 0.1582 0.1807 0.1259 0.1256
SE 0.0631 0.0280 0.0420 0.0451
ESL Mean 0.1729 0.1747 0.1173 0.1150
Median 0.1616 0.1714 0.1106 0.1130
SE 0.0618 0.0247 0.0387 0.0381
Table 3

Simulation results for comparison under 0.9 N ( 0 , 1 ) + 0.1 N ( 0 , 10 ) error distribution when the sample size n = 100 , 200 , 600

n Method RASE 0 RASE 1 RASE 2 RASE 3
100 LS Mean 0.8708 0.8677 0.7990 0.8095
Median 0.6436 0.7485 0.6906 0.7061
SE 0.7456 0.5025 0.4587 0.4369
QR Mean 0.4723 0.4329 0.4056 0.4057
Median 0.4469 0.3674 0.3851 0.3685
SE 0.1562 0.2029 0.1706 0.1859
ESL Mean 0.4591 0.3460 0.3182 0.3211
Median 0.4318 0.3023 0.2916 0.2973
SE 0.1732 0.1951 0.1416 0.1487
200 LS Mean 0.4754 0.5300 0.5423 0.5536
Median 0.4003 0.5009 0.5111 0.5247
SE 0.2881 0.1910 0.2462 0.2259
QR Mean 0.3145 0.2692 0.2295 0.2424
Median 0.2941 0.2645 0.2217 0.2360
SE 0.1022 0.0647 0.0800 0.0795
ESL Mean 0.3073 0.2268 0.1875 0.1960
Median 0.2836 0.2180 0.1799 0.1903
SE 0.1086 0.0520 0.0637 0.0695
600 LS Mean 0.2555 0.3250 0.2904 0.2970
Median 0.2344 0.3193 0.2843 0.2930
SE 0.1128 0.0859 0.1081 0.1156
QR Mean 0.1744 0.1898 0.1267 0.1278
Median 0.1677 0.1858 0.1238 0.1244
SE 0.0584 0.0286 0.0406 0.0415
ESL Mean 0.1700 0.1638 0.1026 0.1053
Median 0.1596 0.1608 0.1008 0.1025
SE 0.0609 0.0271 0.0326 0.0351
Figure 1 
               The estimated curves of functions 
                     
                        
                        
                           
                              
                                 β
                              
                              
                                 j
                              
                           
                           
                              (
                              
                                 u
                              
                              )
                           
                        
                        {\beta }_{j}\left(u)
                     
                  , 
                     
                        
                        
                           j
                           =
                           1
                           ,
                           2
                           ,
                           3
                        
                        j=1,2,3
                     
                   and 
                     
                        
                        
                           α
                           
                              (
                              
                                 t
                              
                              )
                           
                        
                        \alpha \left(t)
                     
                   when 
                     
                        
                        
                           ε
                           
                           ∼
                           
                           0.9
                           N
                           
                              (
                              
                                 0
                                 ,
                                 1
                              
                              )
                           
                           +
                           0.1
                           N
                           
                              (
                              
                                 0
                                 ,
                                 10
                              
                              )
                           
                        
                        \varepsilon \hspace{0.33em} \sim \hspace{0.33em}0.9N\left(0,1)+0.1N\left(0,10)
                     
                   and 
                     
                        
                        
                           n
                           =
                           100
                        
                        n=100
                     
                  .
Figure 1

The estimated curves of functions β j ( u ) , j = 1 , 2 , 3 and α ( t ) when ε 0.9 N ( 0 , 1 ) + 0.1 N ( 0 , 10 ) and n = 100 .

Figure 2 
               The estimated curves of functions 
                     
                        
                        
                           
                              
                                 β
                              
                              
                                 j
                              
                           
                           
                              (
                              
                                 u
                              
                              )
                           
                        
                        {\beta }_{j}\left(u)
                     
                  , 
                     
                        
                        
                           j
                           =
                           1
                           ,
                           2
                           ,
                           3
                        
                        j=1,2,3
                     
                   and 
                     
                        
                        
                           α
                           
                              (
                              
                                 t
                              
                              )
                           
                        
                        \alpha \left(t)
                     
                   when 
                     
                        
                        
                           ε
                           
                           ∼
                        
                        \varepsilon \hspace{0.33em} \sim 
                     
                   Cauchy and 
                     
                        
                        
                           n
                           =
                           100
                        
                        n=100
                     
                  .
Figure 2

The estimated curves of functions β j ( u ) , j = 1 , 2 , 3 and α ( t ) when ε Cauchy and n = 100 .

To further compare the efficiency of the LS, QR, and ESL estimation of the nonparametric functions g ˆ = ( α ˆ ( ) , β ˆ j ( ) ) , we consider the ratio of average squared error (RT) defined as

ASE = 1 n 1 , grid i = 1 n 1 , grid ( α ˆ ( t i ) α ( t i ) ) 2 + 1 n 2 , grid j = 1 3 i = 1 n 2 , grid ( β ˆ j ( u i ) β j ( u i ) ) 2 , RT ( g ˆ ) = ASE ( g ˆ LS ) ASE ( g ˆ ) ,

where g ˆ LS denotes the LS estimate of g , and g ˆ could be taken as the QR or ESL estimate. The corresponding results are summarized in Table 4.

Table 4

Mean values of the RT values

n Method N ( 0 , 1 ) t ( 3 ) 0.9 N ( 0 , 1 ) + 0.1 N ( 0 , 10 )
100 LS 1 1 1
QR 0.8919 1.1438 2.0084
ESL 0.9515 1.1743 2.4272
200 LS 1 1 1
QR 0.8646 1.1153 2.0176
ESL 0.9245 1.1504 2.3424
600 LS 1 1 1
QR 0.8821 1.1553 1.9079
ESL 0.9435 1.2241 2.1859

From Table 4, we can see that the RTs of the ESL method are slightly smaller than 1 compared with the RTs of the QR method when the error is normally distributed, for the other two error distributions, the RTs of the ESL method are greater than 1, which indicates that our proposed estimation method can significantly improve the estimation efficiency.

6 A real data analysis

For illustration purposes, we apply the proposed methodology to analyze the spectral data. The data set contains 215 chopped meat samples, each data sample contains fat, protein, moisture contents, and spectral curve. The three contents are measured in percent and determined by analytic chemistry. The spectral curve consists of 100 wavelengths of absorbance spectrum records. Our main interest is to explore the relationship between fat content and other factors. In the subsequent analysis, we treat the fat content as the response variable Y , the protein content as Z , the moisture content as U , and the spectral curve as X ( t ) . Beforehand, we standardize the response variable Y and two other contents Z and U . As done in [17], we use the following VCPFLRM to fit the data:

(10) Y = β 0 ( U ) + Z β 1 ( U ) + 850 1,050 α ( t ) X ( t ) d t + ε .

For a fair evaluation, the randomly selected 129 samples are used as training data to build the model, and the remaining 86 observations are left as the validation data for prediction purposes. The prediction performance is measured by the mean squared error proposed by [28], which is defined as MSE = 1 86 i G ( Y i Y ˆ i ) 2 , where G is the validation data set. To verify the proposed robust estimation procedure, we re-analyzed this data set by including some outliers in the training response variable; in other words, the response Y of training data with ι high leverage outliers being replaced by Y + κ (Case 1: κ = 5 , ι = 10 , Case 2: κ = 10 , ι = 10 and Case 3: κ = 5 , ι = 20 ). Then, we apply the three different estimation procedures (LS, QR with quantile τ = 0.5 , and ESL) to analyze the data set by VCPFLRM stated as (10). The mean values of MSE of the proposed ESL procedure with those of QR and LS methods based on 200-time repetitions are shown to be 0.022, 0.023, and 0.706 for Case 1, 0.019, 0.032, and 2.378 for Case 2, 0.035, 0.042, and 1.323 for Case 3, respectively. Consequently, our proposed ESL method is better than the others in terms of performance of prediction when data set contains outliers. For comparison, we also use the functional linear regression model to fit the data. We only consider case 2, that is, the response Y of training data has 10 high leverage outliers, with 200-time repetitions, the mean values of MSE based on ESL, QR, and LS are shown to be 0.067, 0.102, and 3.187. This result reveals that our proposed model and estimation method can improve the accuracy of the prediction when data sets are subjected to high outliers in the response.

7 Concluding remarks

In this article, we provide a robust and efficient estimation approach for VCPFLRMs based on the ESL function. The varying coefficients are approximated by B-splines, and the slope function is estimated by the FPC basis. The theoretical properties of proposed estimators are established under some conditions. Furthermore, we develop a computation algorithm to solve the optimization problems and then introduce a data-driven procedure to select the tuning parameters. The outstanding feature of the newly proposed method is that it can achieve good performance for a wide class of errors by selecting an appropriate tuning parameter h based on observed real data. The good properties of this method have also been demonstrated through both numeric examples and real data analysis. As discussed in [28], the FPC-based methods select data-driven basis functions relying on the decomposition of covariance function of the functional predictor X ( t ) . To compensate for this defect, we could use B-spline basis functions to approximate the slope function for its efficiency in function approximation and numerical computation. In addition, the method proposed here may be extended to other functional regression models, which could be considered in future research.

Acknowledgements

The authors are grateful to the editor, the associate editor, and the referees, whose comments and suggestions have greatly helped improve this article.

  1. Conflict of interest: The authors state no conflict of interest.

Appendix

In the proofs, we use C to denote a generic constant that might assume different values at different places.

Proof of Theorem 1

Suppose B T ( u ) η 0 j is the best approximating spline function for β j ( u ) . Under the conditions (C4) and (C5) together with a constant C , we have

sup u [ 0 , 1 ] β j ( u ) B T ( u ) η 0 j C k n r .

Let δ = n r ( 2 r + 1 ) + n ( 2 b 1 ) [ 2 ( a + 2 b ) ] , s 1 = δ 1 ( η η 0 ) , s 2 = δ 1 ( γ γ 0 ) with η 0 = ( η 01 T , , η 0 p T ) T and s = ( s 1 T , s 2 T ) T , and θ 0 = ( η 0 T , γ 0 T ) T . We want to show that for any given ϱ > 0 , there exists a sufficiently large constant C such that

(A1) P { sup s = C Q n ( θ 0 + δ s ) < Q n ( θ 0 ) } 1 ϱ .

This implies that there is a local minimum in the ball { θ 0 + δ s : s C } with probability tending to 1. Using the Taylor expansion, we have

Q n ( θ 0 + δ s ) Q n ( θ 0 ) δ i = 1 n ψ h { ε i + Z i T R 1 i + R 2 i } Ω i + 1 2 δ 2 i = 1 n ψ h { ε i + Z i T R 1 i + R 2 i } Ω i 2 1 6 δ 3 i = 1 n ψ h { ζ i } Ω i 3 I 1 + I 2 + I 3 ,

where ζ i is between ε i + Z i T R 1 i + R 2 i and ε i + Z i T R 1 i + R 2 i δ Ω i , Ω i = Π i s 1 + V i s 2 , R 1 i = j = 1 p β 0 j ( U i ) Π i T η 0 , R 2 i = X i ( t ) , α 0 ( t ) V i T γ 0 . From Conditions (C4) and (C5) and Corollary 6.21 in [29], we can derive that R 1 i = O p ( k n r ) .

Next, we consider R 2 i . By ϕ j ϕ ˆ j 2 = O p ( n 1 j 2 ) (see, e.g., [15,21]), one has

R 2 i 2 = 0 1 X i ( t ) α 0 ( t ) d t V i T γ 0 2 2 j = 1 m X i , ϕ ˆ j ϕ j γ 0 j 2 + 2 j = m + 1 X i , ϕ j γ 0 j 2 2 A 1 + 2 A 2 .

For A 1 , by Conditions (C1) and (C2) and the Hölder inequality, it is obtained as follows:

A 1 = j = 1 m X i , ϕ j ϕ ˆ j γ 0 j 2 c m j = 1 m ϕ j ϕ ˆ j 2 γ 0 j 2 c m j = 1 m O p ( n 1 j 2 2 b ) = O p n a + 4 b 4 a + 2 b = o p ( n ( 2 b 1 ) ( a + 2 b ) ) .

As for A 2 , due to

E j = m + 1 X i , ϕ j γ 0 j = 0 , Var j = m + 1 X i , ϕ j γ 0 j = j = m + 1 λ j γ 0 j 2 c j = m + 1 j ( a + 2 b ) = O ( n a + 2 b 1 a + 2 b ) ,

one has A 2 = O p n a + 2 b 1 a + 2 b . Taking these together, we have R 2 i 2 = O p n 2 b 1 a + 2 b .

Moreover, by Taylor expansion, we have

i = 1 n ψ h { ε i + Z i T R 1 i + R 2 i } Ω i = i = 1 n { ψ h ( ε i ) + ψ h ( ε i ) [ Z i T R 1 i + R 2 i ] + ψ h ( ε i ) [ Z i T R 1 i + R 2 i ] 2 } Ω i ,

where ε i is between ε i and Z i T R 1 i + R 2 i . Invoking conditions (C1) and (C7) and after some direct calculations, we obtain I 1 = O p ( n δ 2 s ) . For I 2 , we can prove I 2 = E { F ( Z , U , S , h ) } O p ( n δ 2 s 2 ) . Therefore, choosing a sufficiently large C , I 2 dominates I 1 uniformly s = C . Similarly, we can prove that I 3 = O p ( n δ 3 s 3 ) . Since k n n 1 ( 2 r + 1 ) , m n 1 ( a + 2 b ) , it follows that δ s 0 with s = C , which leads to I 3 = o p ( I 2 ) . Therefore, I 3 is also dominated by I 2 in s = C . Hence, by choosing a sufficiently large C together with the condition (C6), (A.1) holds, and there exist local maximizers η ˆ and γ ˆ such that η ˆ η 0 = O p ( δ ) , γ ˆ γ 0 = O p ( δ ) .

Let Ξ = 0 1 B ( u ) B T ( u ) d u and R 1 j ( u ) = β 0 j ( u ) B T ( u ) η 0 j , we have

β ˆ j ( ) β 0 j ( ) 2 = 0 1 ( β ˆ j ( u ) β 0 j ( u ) ) 2 d u = 0 1 { B j T ( u ) η ˆ j B j T ( u ) η 0 j + R 1 j ( u ) } 2 d u 2 0 1 { B j T ( u ) η ˆ j B j T ( u ) η 0 j } 2 d u + 2 0 1 R 1 j ( u ) 2 d u = 2 ( η ˆ j η 0 j ) T Ξ ( η ˆ j η 0 j ) + 2 0 1 R 1 j ( u ) 2 d u .

Then, invoking Ξ = O ( 1 ) , a simple calculation yields ( η ˆ j η 0 j ) T Ξ ( η ˆ j η 0 j ) = O p ( δ 2 ) . In addition, it is easy to show 0 1 R 1 j ( u ) 2 d u = O p n 2 r 2 r + 1 . Thus, we have β ˆ j ( ) β 0 j ( ) 2 = O p ( δ 2 ) .

Observe that

α ˆ ( t ) α 0 ( t ) 2 = j = 1 m γ ˆ j ϕ ˆ j j = 1 γ 0 j ϕ j 2 2 j = 1 m γ ˆ j ϕ ˆ j j = 1 m γ 0 j ϕ j 2 + 2 j = m + 1 γ 0 j ϕ j 2 4 j = 1 m ( γ ˆ j γ 0 j ) ϕ ˆ j 2 + 4 j = 1 m γ 0 j ( ϕ ˆ j ϕ j ) 2 + 2 j = m + 1 γ 0 j 2 4 J 1 + 4 J 2 + 2 J 3 .

By Condition (C2) and the orthogonality of { ϕ ˆ j } and ϕ j ϕ ˆ j 2 = O p ( n 1 j 2 ) , one has

(A2) J 1 = j = 1 m ( γ ˆ j γ 0 j ) ϕ ˆ j 2 j = 1 m γ ˆ j γ 0 j 2 = γ ˆ γ 0 2 = O p ( δ 2 ) .

(A3) J 2 = j = 1 m γ 0 j ( ϕ ˆ j ϕ j ) 2 m j = 1 m ϕ ˆ j ϕ j 2 γ 0 j 2 m n O p j = 1 m j 2 γ 0 j 2 = O p n 1 m j = 1 m j 2 2 b = O p ( n 1 m ) = o p n 2 b 1 a + 2 b ,

and

(A4) J 3 = j = m + 1 γ 0 j 2 C j = m + 1 j 2 b = O ( n 2 b 1 a + 2 b ) .

Then, combining equations (A.2)–(A.4), we have α ˆ ( t ) α ( t ) 2 = O p ( δ 2 ) = O p n 2 r 2 r + 1 + n 2 b 1 a + 2 b . Thus, we complete the proof of Theorem 1.□

References

[1] J. Fan and W. Zhang, Statistical methods with varying coefficient models, Stat. Interface 1 (2008), 179–195, https://dx.doi.org/10.4310/SII.2008.v1.n1.a15. Search in Google Scholar

[2] L. Feng, C. Zou, Z. Wang, X. Wei, and B. Chen, Robust spline-based variable selection in varying coefficient model, Metrika 78 (2015), 85–118, https://doi.org/10.1007/s00184-014-0491-y. Search in Google Scholar

[3] C. Guo, H. Yang, and J. Lv, Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression, Statist. 58 (2017), 1009–1033, https://doi.org/10.1007/s00362-015-0736-5. Search in Google Scholar

[4] T. Hastie and R. Tibshirani, Varying-coefficient models, J. R. Stat. Soc. Ser. B. Methodol. 55 (1993), no. 4, 757–796, https://doi.org/10.1111/j.2517-6161.1993.tb01939.x. Search in Google Scholar

[5] J. Huang, C. Wu, and L. Zhou, Varying-coefficient models and basis function approximations for the analysis of repeated measurements, Biometrika 89 (2002), no. 1, 111–128, https://doi.org/10.1093/biomet/89.1.111. Search in Google Scholar

[6] H. Wang and Y. Xia, Shrinkage estimation of the varying coefficient model, J. Amer. Statist. Assoc. 104 (2009), no. 486, 747–757, https://doi.org/10.1198/jasa.2009.0138. Search in Google Scholar

[7] T. Cai and P. Hall, Prediction in functional linear regression, Ann. Statist. 34 (2006), no. 5, 2159–2179, https://doi.org/10.1214/009053606000000830. Search in Google Scholar

[8] T. Cai and M. Yuan, Minimax and adaptive prediction for functional linear regression, J. Amer. Statist. Assoc. 107 (2012), no. 499, 1201–1216, https://doi.org/10.1080/01621459.2012.716337. Search in Google Scholar

[9] H. Cardot, F. Ferraty, and P. Sarda, Spline estimators for the functional linear model, Statist. Sinica 13 (2003), no. 3, 571–591, http://www.jstor.org/stable/24307112. Search in Google Scholar

[10] P. Hall and J. Horowitz, Methodology and convergence rates for functional linear regression, Ann. Statist. 35 (2007), no. 1, 70–91, https://doi.org/10.1214/009053606000000957. Search in Google Scholar

[11] K. Kato, Estimation in functional linear quantile regression, Ann. Statist. 40 (2012), no. 6, 3108–3136, https://doi.org/10.1214/12-AOS1066. Search in Google Scholar

[12] J. Lei, Adaptive global testing for functional linear models, J. Amer. Statist. Assoc. 109 (2014), no. 506, 624–634, https://doi.org/10.1080/01621459.2013.856794. Search in Google Scholar

[13] F. Yao, H. Müller, and J. Wang, Functional linear regression analysis for longitudinal data, Ann. Statist. 33 (2005), no. 6, 2873–2903, https://doi.org/10.1214/009053605000000660. Search in Google Scholar

[14] Q. Peng, J. Zhou, and N. Tang, Varying coefficient partially functional linear regression models, Statist. Papers 57 (2016), 827–841, https://doi.org/10.1007/s00362-015-0681-3. Search in Google Scholar

[15] H. Shin, Partial functional linear regression, J. Statist. Plann. Inference 139 (2009), no. 10, 3405–3418, https://doi.org/10.1016/j.jspi.2009.03.001. Search in Google Scholar

[16] J. Zhou and M. Chen, Spline estimators for semi-functional linear model, Statist. Probab. Lett. 82 (2012), no. 3, 505–513, https://doi.org/10.1016/j.spl.2011.11.027. Search in Google Scholar

[17] S. Feng and L. Xue, Partially functional linear varying coefficient model, Statistics 50 (2016), no. 4, 717–732, https://doi.org/10.1080/02331888.2016.1138954. Search in Google Scholar

[18] X. Wang, Y. Jiang, M. Huang, and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc. 108 (2013), no. 502, 632–643, https://doi.org/10.1080/01621459.2013.766613. Search in Google Scholar PubMed PubMed Central

[19] Y. Jiang, Q. Ji, and B. Xie, Robust estimation for the varying coefficient partially nonlinear models, J. Comput. Appl. Math. 326 (2017), 31–43, https://doi.org/10.1016/j.cam.2017.04.028. Search in Google Scholar

[20] Y. Song, L. Jian, and L. Lin, Robust exponential squared loss-based variable selection for high-dimensional single-index varying-coefficient model, J. Comput. Appl. Math. 308 (2016), 330–345, https://doi.org/10.1016/j.cam.2016.05.030. Search in Google Scholar

[21] P. Yu, Z. Zhu, and Z. Zhang, Robust exponential squared loss-based estimation in semi-functional linear regression models, Comput. Stat. 34 (2019), 503–525, https://doi.org/10.1007/s00180-018-0810-2. Search in Google Scholar

[22] J. Lv, H. Yang, and C. Guo, Robust smooth-threshold estimating equations for generalized varying-coefficient partially linear models based on exponential score function, J. Comput. Appl. Math. 280 (2015), 125–140, DOI: https://doi.org/10.1016/j.cam.2014.11.003. 10.1016/j.cam.2014.11.003Search in Google Scholar

[23] L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, Springer, New York, 2012. 10.1007/978-1-4614-3655-3Search in Google Scholar

[24] P. Yu, J. Du, and Z. Zhang, Varying-coefficient partially functional linear quantile regression models, J. Korean Stat. Soc. 46 (2017), no. 3, 462–475, https://doi.org/10.1016/j.jkss.2017.02.001. Search in Google Scholar

[25] W. Yao, B. Lindsay, and R. Li, Local modal regression, J. Nonparametr. Stat. 24 (2012), no. 3, 647–663, https://doi.org/10.1080/10485252.2012.678848. Search in Google Scholar PubMed PubMed Central

[26] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (2001), no. 456, 1348–1360, https://doi.org/10.1198/016214501753382273. Search in Google Scholar

[27] H. Wang, R. Li, and C. Tsai, Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika 94 (2007), no. 3, 553–568, https://doi.org/10.1093/biomet/asm053. Search in Google Scholar PubMed PubMed Central

[28] L. Huang, H. Wang, H. Cui, and S. Wang, Sieve M-estimator for a semi-functional linear model, Sci. China Math. 58 (2015), no. 11, 2421–2434, https://doi.org/10.1007/s11425-015-5040-2. Search in Google Scholar

[29] L. Schumaker, Spline Functions: Basic Theory, Wiley, New York, 1981. Search in Google Scholar

Received: 2021-04-28
Revised: 2022-04-23
Accepted: 2022-09-11
Published Online: 2022-10-11

© 2022 Jun Sun and Wanrong Liu, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.