Accessible Published by De Gruyter February 21, 2014

Locally Efficient Estimation of Marginal Treatment Effects When Outcomes Are Correlated: Is the Prize Worth the Chase?

Alisa Stephens, Eric Tchetgen Tchetgen and Victor De Gruttola


Semiparametric methods have been developed to increase efficiency of inferences in randomized trials by incorporating baseline covariates. Locally efficient estimators of marginal treatment effects, which achieve minimum variance under an assumed model, are available for settings in which outcomes are independent. The value of the pursuit of locally efficient estimators in other settings, such as when outcomes are multivariate, is often debated. We derive and evaluate semiparametric locally efficient estimators of marginal mean treatment effects when outcomes are correlated; such outcomes occur in randomized studies with clustered or repeated-measures responses. The resulting estimating equations modify existing generalized estimating equations (GEE) by identifying the efficient score under a mean model for marginal effects when data contain baseline covariates. Locally efficient estimators are implemented for longitudinal data with continuous outcomes and clustered data with binary outcomes. Methods are illustrated through application to AIDS Clinical Trial Group Study 398, a longitudinal randomized clinical trial that compared the effects of various protease inhibitors in HIV-positive subjects who had experienced antiretroviral therapy failure. In addition, extensive simulation studies characterize settings in which locally efficient estimators result in efficiency gains over suboptimal estimators and assess their feasibility in practice.

1 Introduction

Semiparametric estimators are appealing because of their robustness to distributional assumptions and model misspecification. In the analysis of randomized trials, semiparametric theory has been used to develop estimators of treatment effects that improve efficiency of inferences by incorporating baseline covariates, where “baseline” describes data measured prior to randomization. In this paper, we present a semiparametric locally efficient estimator to improve efficiency of inferences in randomized experiments with correlated outcomes when baseline covariates are available. We begin with a review of current estimators for multivariate outcomes and then introduce our semiparametric locally efficient estimator.

Correlated outcomes are often observed in medical research studies, such as those that randomize clusters of subjects or that randomize individual subjects but collect repeated measures of the response. We denote the outcome for the ith independent randomized unit, i=1,,m, in such studies by the ni-dimensional response vector Yi=(Yi1,Yi2,,Yini)T, which may represent longitudinal measurements from a single subject or a set of responses from subjects within a cluster defined by a family, hospital, or class. Considering the substantial costs incurred by such studies, it is of interest to maximize efficiency in the estimation of treatment effects using all available data.

In general, studies collect data on i.i.d. observations Oi=(Yi,Ai,Xi), where Ai denotes a scalar treatment assignment to 1 of K possible treatments, and Xi is a matrix of baseline covariates. Throughout we allow ni to be fixed or random and assume ignorability when ni is random. Longitudinal data also include a time variable ti=(ti1,ti2,,tini)T denoting time points at which outcomes are measured. As in the case of unit size ni, we allow ti to be either fixed or random but ignorable. When repeated measures are taken on the same subject, baseline covariates are measured at tij=0; thus Xij=Xi for all j=1,2,,ni, resulting in a single level of baseline covariate information. Clustered data, however, may include pre-treatment covariates at the level of the group or the individual, creating two layers of auxiliary data. In the longitudinal context, we refer to the vector Yi as the subject, or independent unit and Yij as observation- or measurement-level data. For clustered data, we refer to Yi as cluster-level and Yij as individual-level observations.

Semiparametric estimation often involves specifying a restricted mean model. When estimating marginal treatment effects, a model for the expected outcomes given treatment assignment is usually assumed. Consequently, only data on the treatment and outcome are used in estimation. For example, in longitudinal studies, the marginal effect of treatment over time may be measured by assuming the restricted mean model


where f1(tij) is a function of ti. The main effect βA, which measures imbalance in E(Yij|Ai,tij) at baseline, is expected to be zero when randomization successfully balances covariate profiles across treatment arms. The post-baseline effect of treatment is measured by βA,t. Parameters βt and βA,t may be vector-valued, as the function describing the effect of time on expected outcomes may be of some polynomial form. Similarly, for clustered data, the semiparametric model


may be assumed, with treatment effects determined by inference on β1.

Estimating equations are determined by geometric arguments that distinguish parameters of interest, such as the treatment-outcome association (β) in the context of randomized studies, from other components needed to fully specify the data-generating distribution, which are represented by η. For parameters of interest, we aim to derive regular asymptotically linear (RAL) estimators, where an asymptotically linear estimator βˆ is one for which there exists a function ψ(Oi) such that


Regularity conditions ensure that variance bounds are well-defined and exclude superefficient estimators that have undesirable properties under local alternatives [1]. The function ψ(Oi) is called the influence function of βˆ and determines its limiting distribution. As eq. (3) suggests, any RAL estimator may be obtained by solving an influence function equation. To derive the class of estimating functions under an assumed model M, one first defines the nuisance scores log(Lη)/η for the data-generating distribution Lη; one then determines the subspace defined by the closed linear span of all scores of smooth parametric submodels Lη in model M. This nuisance tangent space is denoted by Λnuis [2]. The orthogonal complement of Λnuis, Λnuis defines the set {ψ(h):h}, indexed by h, which contains the set of influence functions of all RAL estimators [2, 3]. For correlated outcomes, the geometric arguments of semiparametric theory may be viewed as a generalization of the quasilikelihood approach of Liang and Zeger [4] in deriving generalized estimating equations (GEE). We denote as M1 the set of distributions of Wi=(Yi,Ai) with known treatment process satisfying eq. (1). Under model M1, Λnuis1=ψ(Wi;h,β)=h(Ai,ti){Yig(Ai,ti;β)} defines the estimating equations


for estimating the p-dimensional vector β. The index or weight h(Ai,ti) is a p×ni function of a random treatment variable Ai and time ti, and g(Ai,ti;β)={g(Ai,t0;β),g(Ai,t1;β),,g(Ai,tni;β)}T. We use bold g(Ai,ti;β) to denote the vector-valued mean function and g(Ai,tij;β) to represent its scalar components.

A locally efficient estimator of a semiparametric model is defined as an estimator that achieves the semiparametric efficiency bound (minimum asymptotic variance among all RAL estimators) at a given submodel for the data-generating law, but remains consistent outside the data-generating submodel [2], provided that the marginal model is correct. More explicitly, semiparametric models parametrize specific components of a data-generating process and leave others unspecified. Estimation may require working models of unspecified components; a locally efficient estimator achieves the semiparametric efficiency bound when such working models are correctly specified, but also remains consistent when only the parametric component is correctly specified. A semiparametric locally efficient estimator is determined by finding the optimal estimating function, referred to as the efficient score, which for GEE requires finding the optimal h(). When no baseline covariates are observed, Chamberlain [5] showed that the efficient score of β is obtained by setting h(Ai,ti)=DiTVi1, where Vi is the ni×ni variance–covariance matrix of Yi, and Di=g(Ai,ti;β)β. The estimator remains consistent, however, when a working covariance other than the true covariance is substituted into the estimating equations, thereby demonstrating that consistency is achievable outside of the data-generating law.

For model M2, defined by observations Oi, marginal model (1), and known treatment process, the set of influence functions was derived by Robins et al. [6] and arises by augmenting the influence functions of β under model M1. Augmented estimators are constructed by subtracting the orthogonal projection of the standard estimating function onto the span of the scores of the treatment mechanism from the standard estimating function [6, 7]. For correlated outcomes, Λnuis2={ψ(Oi,h,γ,β):h,γ}, and augmented GEE are


where for K-level treatment Ai, P(Ai=a)=πa. Fixing h(Ai,ti), the most efficient estimating function sets γk(Xi)=γkopt(Xi)=h(k,t)E(Y|Ai=k,Xi,t)g(k,t;β) [6, 810]. The augmentation therefore involves estimation of the conditional mean outcome regression model E(Yi|Xi,Ai), which may be related the marginal parameter of interest in the binary treatment setting by β=EE(Yi|Xi,Ai=1)E(Yi|Xi,Ai=0) under an identity link and β=logit[EE(Yi|Xi,Ai=1)]logit[E{E(Yi|Xi,Ai=0)}] under the logit link. When baseline covariates are predictive of the outcome augmentation reduces variability in estimated treatment effects, irrespective of the outcome distribution. For the longitudinal marginal model (1), if outcomes Yij are restricted to post-baseline measurements, the baseline measurement Yi0 may be utilized as a baseline covariate and included in Xi. The βA,t term is then no longer required to assess a post-baseline effect of treatment and may be removed from the model, leaving βA to capture the marginal treatment effect. The interaction term βA,t may still be required for correct model specification even when the baseline outcome is included as a covariate if the treatment effect varies in time.

Semiparametric locally efficient estimators of parameters in restricted mean models of marginal treatment effects have been implemented for univariate data in the presence of baseline covariates by Robins [8], Bang and Robins [11], van der Laan and Rubin [12], Tsiatis et al. [13], Zhang et al. [10], and Moore and van der Laan [14, 15]. In these developments, the choice of h() has no impact on the resulting asymptotic variance and is therefore not considered for deriving efficient estimators. For a univariate outcome, the model gs(Ai;β)=E[Yi|Ai] defined by a unique parameter for each treatment level is saturated, and the choice of h() is inconsequential. When Yi is multivariate, gs(Ai;β)=E[Yij|Ai] is not saturated because a single parameter β is shared across components of the vector E[Yi|Ai]. As a result, Λnuis2 provides a larger set of estimating functions than the orthogonal complement of the nuisance tangent space of the corresponding restricted mean model for a univariate outcome. Each element in Λnuis2 is indexed by h(); the choice of h() impacts efficiency, and the optimal h() must be found to achieve minimum variance. A similar discussion of h(), model saturation, and estimating functions was included in Neugebauer and van der Laan [16].

The efficient score in model M2 does not generally have the same optimal index h(Ai,ti) as the efficient score in model M1. When incorporating auxiliary covariates in the estimation of marginal treatment effects via augmented GEE, the choice h(Ai,ti)=DiTVi1, while resulting in a consistent estimator, is no longer optimal in model M2. The efficient score is determined by optimizing over all p×ni index functions h(Ai,ti) [6, 7, 9]. Robins [7] established general theory for deriving the efficient score of treatment effects in marginal structural models (MSMs) of time-dependent exposures, including the case of multivariate outcomes. Application of the Robins [7] theory to establish locally efficient estimators in specific settings, such as for randomized trials with correlated outcomes, requires further derivation. Additionally, the locally efficient estimators of Robins [7] were neither implemented nor evaluated for practical use. Models (1) and (2) may be viewed as examples of MSMs for a point exposure; the Robins [7] theory therefore equally applies. Although the efficient score may be obtained theoretically, it is often computationally intensive to calculate. Consequently, inefficient estimators are typically used. The suboptimal estimator based on augmenting GEE with h(Ai,ti)=DiTVi1 was shown to improve efficiency by Zhang et al. [10] within the context of the linear mixed model and Stephens et al. [17] for general continuous and binary outcomes. In subsequent text, we refer to unaugmented GEE (4) under model M1 with the index function h(Ai,ti)=DiTVi1 as standard GEE, and the suboptimal estimator obtained by augmenting standard GEE is referred to as simple augmented GEE. Here we show how to further improve on simple augmented GEE by deriving the corresponding semiparametric locally efficient estimator for model M2. We then evaluate the feasibility of achieving such improvement in practice.

The following section presents the efficient score and derives a locally efficient estimator of marginal treatment effects in randomized trials with correlated outcomes when auxiliary data are available as in model M2. We also discuss an implementation procedure detailing how to appropriately estimate each component of the efficient score. In Sections 3 and 4, we compare the derived semiparametric locally efficient estimator to standard and simple augmented GEE through a simulation study and application to the AIDS Clinical Trial Group Study 398, a randomized longitudinal HIV intervention trial.

2 Methods

2.1 The efficient score

We consider the setting of longitudinal data and note that results follow analogously for clustered data by omitting ti. Before presenting the main result, some additional notation is required. Conditioning on ti, the matrix h(Ai,ti) takes K possible values, which may be denoted by Kp×ni constant matrices h0(ti),h1(ti),,hK1(ti). For binary treatment, we have h1=h1(ti) and h0=h0(ti), which denote the index functions under treatment (A=1) and control (A=0), respectively. Let Δai(X)=E(Yi|Ai=a,Xi,ti)g(a,ti;β), the ni-dimensional vector of the difference in the conditional and marginal mean outcomes given time. Using this construction, let h=[h0,h1,,hK1], the complete index matrix of dimension p×Kni. Using a result from Newey and McFadden [18], we show in the supplementary material that the optimal index hopt(A,t) and resulting efficient score may be determined by solving a generalized information equality. Here, we present our main result:

Proposition 1. The efficient score for modelM2is


C=C1C2, where




As shown above, C is of dimension Kni×Kni and may be decomposed into the difference C=C1C2, where C1 is a block diagonal matrix with diagonal components πaV(Y|A=a,t). The block diagonal of C2 contains the matrices πa(1πa)EXΔa(X)ΔaT(X), and off-diagonal block components are determined by πaπaEXΔa(X)ΔaT(X).

When treatment is binary, C simplifies to


where π0=1π1. Inverting C analytically and letting ζa,a=EXΔa(X)ΔaT(X),


Expressing the optimal index as in eq. (7) demonstrates that hopt incorporates information on the treatment assignment and auxiliary covariates X through ζa,a, while the standard index hstd=DT(A)V(A)1 does not. The matrix ζa,a is by definition the covariance of E(Yi|Xi,a,ti) and E(Yi|Xi,a,ti), the expected outcomes given baseline covariates and treatment assignment to a and a, respectively. The optimal index hopt therefore boosts efficiency by incorporating information on the covariance in expected outcomes when weighting the residuals Yig(Ai;β) in the marginal model estimating equations. To implement locally efficient GEE for model M2, estimates of V(Yi|Ai,ti), E[Yi|Xi,Ai,ti], and ζa,a for all unique pairs of treatment levels {a,a}, including a=a, are needed. The next section details an estimation procedure for each component of hopt when Yi is continuous and g() is the identity link, or Yi is binary and g() is the inverse logit link.

2.2 Estimation of hopt

The semiparametric locally efficient estimator requires estimates of three additional parameters:

  1. 1.


  2. 2.


  3. 3.


These quantities may be linked by the law of total variance, V(Yi|Ai,ti)=E[V(Yi|Xi,Ai,ti)|Ai,ti]+V(E[Yi|Xi,Ai,ti]|Ai,ti). For the ith independent unit, the ni-dimensional vector E[Yi|Xi,Ai,ti] determines the ni×ni matrix V(E[Yi|Xi,Ai,ti]|Ai,ti) and ultimately impacts the form of the marginal variance matrix V(Yi|Ai,ti). Observing the relationship among each of these parameters provides guidance for estimation. For example, the working marginal covariance selected must be compatible with the working model chosen for E[Yi|Xi,Ai,ti]. More generally, the models for each component of hopt must be specified so that the model selected for one component does not preclude the choice of model chosen for another. One approach that ensures compatibility is to start by estimating E(Yij|Xij,Ai,tij) through an appropriate regression technique to provide an estimate Eˆ(Yi|Xi,Ai=a,ti). The conditional mean outcome may be modeled by


where Xij represents the collection of covariates for the jth measurement in the ith unit. The next step is to estimate the conditional expectation by noting how the model of E(Yij|Xij,Ai=a,ti) impacts the form of the matrix ζa,a. The final step is estimation of V(Yi|Ai,ti) by summing the estimates of E[V(Yi|Xi,Ai,ti)|Ai,ti] and V(E[Yi|Xi,Ai,ti]|Ai,ti).

2.2.1 General estimation of ζa,a and V(Y|A).

Since ζa,a is a covariance matrix, it may generally be estimated in a similar fashion to estimating the correlation parameters in standard GEE. Let ζa,a=R1/2SR1/2, where R is a ni×ni diagonal matrix with the jth diagonal component Rj,j=Cov(E[Yij|Xij,Ai=a,tij],E[Yij|Xij,Ai=a,tij]|Ai,tij)=νja,a, the covariance of the predicted outcomes of element j under treatments a and a, and S is a ni×ni correlation matrix with Sj,j=1 and Sj,j=f(τa,a) for a function f() of correlation parameter τa,a that denotes the correlation in the predicted outcomes of element j under treatment a and element j under treatment a. The parameters τa,a, which may be a vector, and νa,a=(ν1a,a,ν2a,a,,νnia,a) characterize the covariance in conditional mean outcomes under treatments a and a. Letting Δˆaij=Eˆ(Yij|Xij,Ai=a,tij)g(a,tij;βˆinit), where βˆinit is an initial estimate of β obtained, for example, by maximum likelihood inference for Generalized Linear Models, νja,a may be estimated by


where pξ is the dimension of the outcome regression parameter ξ. The correlation parameter τa,a is then estimated by the moment equations


For a=a, we obtain an estimate of ζa,a=V(E[Yi|Xi,Ai=a,ti]|Ai,ti).

As an alternative approach, one may also derive an expression of ζj,ja,a, the j,j element of ζa,a, which depends on ξ=(ξ0,ξA,ξtT,ξA,tT,ξXT,ξX,tT,ξA,XT)T and the covariance in baseline covariates. An empirical estimate of Cov(Xi) may then be substituted into this expression.

After estimating ζa,a, the conditional variance of Yi, V(Yi|Xi,Ai,ti), may be estimated using the correlation parameters from GEE based on the conditional mean model (8). Under homoscedasticity V(Yi|Xi,Ai,ti)=λ for all i. To ensure compatibility of all parameters, the marginal variance V(Yi|Ai,ti) is then estimated by Vˆ(Yi|Ai,ti)=ζˆa,a+λˆ, where ζˆa,a and λˆ are estimates of ζa,a and λ, respectively.

2.2.2 Estimation of ζa,a for clustered data or longitudinal data with ξX,t=0

For clustered data and longitudinal data with ξX,t=0 in eq. (8), calculating ζa,a is straightforward. When data are clustered, ξt=ξA,t=ξX,t=0, leaving E[Yij|Xij,Ai]=g(ξ0+ξAAi+ξXTXij+ξA,XTAiXij). In this setting, ζj,ja,a is calculated as ζj,ja,a=CovX{g(ξ0+ξAa+ξXTXij+ξA,XTaXij),g(ξ0+ξAa+ξXTXij+ξA,XTaXij)}. If auxiliary covariates Xij,Xij are equally correlated among subjects within a cluster ζj,ja,a=ρa,a for all j,j. This holds for all link functions g(). For longitudinal data when ξX,t=0 (i.e. the effects of baseline covariates on the conditional mean outcome do not vary over time) ζj,ja,a=CovX{g(ξ0+ξAa+ξtTf(tij)+ξA,tTaf(tij)+ξXTXi+ξA,XTaXi),g(ξ0+ξAa+ξtTtij+ξA,tTaf(tij)+ξXTXi+ξA,XTaXi}. If g() is the identity link, this reduces to ζj,ja,a=Cov(ξXTXi+ξA,XTaXi,ξXTXi+ξA,XTaXi)=ρa,a for all j,j. For clustered data, ρa,a is a constant that depends on Cov(Xij,Xij), ξXT, and ξA,X, whereas for longitudinal data, Cov(Xij,Xij) is replaced by Var(Xi) since Xij=Xi for all j.

2.2.3 Estimation of V(Yi|Ai=a) under a compatible standard form

In some special cases where summing E[V(Yi|Xi,Ai,ti)|Ai,ti] and V(E[Yi|Xi,Ai,ti]|Ai,ti) results in a marginal covariance matrix V(Yi|Ai,ti) with a standard form, e.g., exchangeable, V(Yi|Ai,ti) may be estimated directly while maintaining compatibility with E[Yi|Xi,Ai,ti] and ζa,a. As stated above, if individual-level covariates Xij are equally correlated among subjects within the ith cluster, the model E[Yij|Xij,Ai=a] imposes compound symmetry on ζa,a, where diagonal components depend on Var(Xij) and off-diagonal components are determined by Cov(Xij,Xij). If the conditional variance V(Yi|Xi,Ai) is also exchangeable, V(Yi|A,ti) is the sum of two exchangeable matrices and therefore also has an exchangeable structure. The optimal index hopt may then be calculated by estimating V(Yi|Ai,ti) directly as in standard or simple augmented GEE and using the above procedure to estimate ζa,a.

A consistent estimator of the asymptotic variance of βˆopt, the solution to the augmented estimating equations (5) evaluated under eq. (6), may be calculated using the sandwich variance formula of Huber [19].

3 Simulation study

Semiparametric locally efficient GEE for model M2 were compared to standard and simple augmented GEE through a simulation study. Simulations were completed for clustered data with continuous and binary outcomes and longitudinal data with continuous outcomes. Results are based on 1,000 Monte Carlo datasets.

3.1 Continuous outcomes

3.1.1 Clustered data

Data for m=500 clusters were generated, with ni = 2, 4, 6, 8, 10, 12 with equal probability for the first set of simulations and ni = 10, 20, 30, 40, 50 in the second set. Auxiliary covariates Xij1, Xij2, and Xij3 were each generated from a multivariate normal distribution with Var(Xij1)=2, Var(Xij2)=6, and Var(Xij3)=5. Correlation was induced among individual-level covariates within the same cluster by setting Cov(Xij1,Xij1)=ςX1, Cov(Xij2,Xij2)=ςX2, and Cov(Xij3,Xij3)=1. Covariance terms ςX1 and ςX2 were varied from 0.5 to 2 and 1.5 to 6, respectively, to evaluate the effect of auxiliary covariate correlation on the performance of locally efficient augmented GEE. At ςX1=0.5 and ςX2=1.5, covariates were weakly correlated among individuals in the same cluster, while at ςX1=5 and ςX2,4=6, covariates were perfectly correlated, thereby becoming cluster-level. The exact values considered for ςX1 and ςX2 were (0.5, 1, 1.5, 2) and (1.5, 3, 4.5, 6), for simulation sets 1–4 at each set of cluster sizes. Within the jth individual in the ith cluster, auxiliary covariates were independent. The treatment variable Ai was drawn from the Bernoulli distribution with p = 1/2. Clustered responses were generated from the following model, with individual-level error terms εijN(0,40) and cluster-level effects biN(0,σb2):Yij|Ai,Xij,bi=1.0+1.1Xij12+0.9Xij2+0.5Ai+bi+εij. The proportion of variability in Yij explained by auxiliary covariates Xij was held fixed at roughly 25%. Simulations were completed with σb2=0 and σb2=6, representing the case in which covariates account for all between-cluster heterogeneity (V(Y|X,A) independent) and the alternative of some intracluster correlation caused by an unmeasured variable (V(Y|X,A) exchangeable), respectively.

For each dataset, the marginal effect of treatment was estimated by fitting model (2) through standard, simple augmented, and locally efficient GEE for M2. The impact of misspecification on the locally efficient estimator and its efficiency relative to simple augmented and standard GEE was evaluated by fitting various models to estimate E(Y|X,A). The correct model for E(Y|X,A), denoted by “C” in tables and figures, was E(Yij|Xij,Ai)=ξ0+ξ1Xij12+ξ2Xij2+ξ3Ai, and two additional models: Model 1, “M1” = E(Yij|Xij,Ai)=ξ0+ξ1Xij1+ξ2Xij2+ξ3Ai, and Model 2, “M2” = E(Yij|Xij,Ai)=ξ0+ξ1Xij12+ξ2Xij2+ξ3Xij3+ξ4Ai. “Model 1” evaluated the impact of misspecifying the functional form of Xij1, while “Model 2” examined the effect of adding noise to the outcome regression model. All working covariance matrices were fit under exchangeable structure.

Efficiency comparisons relative to standard GEE are summarized in Figure 1(a) and 1(b), while the Monte Carlo Relative Efficiency (MCRE) of the locally efficient estimator for M2 to simple augmented GEE is given in Table 1. Small cluster figures are included in the supplementary material. Across all levels of correlation, augmented estimators resulted in increased efficiency compared to the unaugmented estimator (MCRE 1.25–11.6). For low correlation among Xij simple augmented and locally efficient augmented estimators performed similarly. Simple augmented GEE and locally efficient GEE for M2 also resulted in similar efficiency when the conditional mean model did not include the data-generating quadratic term Xij12, or the true conditional variance was exchangeable (MCRE locally efficient to simple augmented GEE 0.99–1.01). When correlation was increased among Xij within a cluster, the assumed conditional mean model included all important covariates in the correct functional form, and baseline covariates accounted for all within-subject correlation, locally efficient GEE for M2 gained in efficiency over the simple augmented GEE (MCRE locally efficient to simple augmented GEE 1.04–1.22). Increased covariance among auxiliary covariates also resulted in greater efficiency gains for any augmented GEE relative to the standard estimator. Trends were more pronounced for large average cluster size (average ni = 30 vs average ni = 7).

Figure 1 MCRE of locally efficient and simple augmented GEE relative to standard (unaugmented) GEE. Continuous clustered outcomes. Estimators corresponding to each curve are denoted by “estimator-outcome regression” using the abbreviations: Loc Eff – locally efficient, Simp – simple augmented, Std – standard, C – correct, M1–Model 1, M2–Model 2. All estimators use exchangeable working covariance for V(Y|A)$$V(Y|A)$$ and V{E(Y|X,A)}$$V\{E(Y|X,A)\} $$. The order of curves in the legend follows the order of curves in the figure, with sets of superimposed curves denoted by “()”, “[]”, or “{}”

Figure 1

MCRE of locally efficient and simple augmented GEE relative to standard (unaugmented) GEE. Continuous clustered outcomes. Estimators corresponding to each curve are denoted by “estimator-outcome regression” using the abbreviations: Loc Eff – locally efficient, Simp – simple augmented, Std – standard, C – correct, M1–Model 1, M2–Model 2. All estimators use exchangeable working covariance for V(Y|A) and V{E(Y|X,A)}. The order of curves in the legend follows the order of curves in the figure, with sets of superimposed curves denoted by “()”, “[]”, or “{}”

Table 1

MCRE of locally efficient augmented GEE to suboptimal augmented GEE: continuous clustered outcomes

Cluster size = 2, 4, 6, 8, 10, 12
Correlation among Xij
Cluster size = 10, 20, 30, 40, 50
Correlation amongXij

3.1.2 Longitudinal responses

For each Monte Carlo dataset, m = 500 longitudinal response vectors Yi were generated from the model Yij=1.5+1.1Xi12+0.9Xi2+1.0tij+1.0Ai+εij, where εijN(0,σε2), and Cov(εij,εij) had an AR1 structure with correlation parameter α=0.1,0.3, or 0.5 for different sets of simulations. Covariates Xi1 and Xi2 were normally distributed with mean 0 and variance σX12 and σX22, respectively. Variance parameters σε2, σX12, and σX22 were varied so that baseline covariates accounted for 10–60% of the variability in Y|A in increments of 10%. Subjects were randomly assigned to treatment (Ai = 1) with probability 1/2. For each subject ti=(ti1=1,ti2=2,,tini=ni), where ni varied from 1 to 8, as might be the case in a longitudinal study with staggered entry.

Standard GEE, simple augmented GEE, and locally efficient GEE for M2 were applied to each Monte Carlo dataset to estimate marginal treatment effects. All GEE were fit based on the marginal mean model E(Yij|Ai)=β0+β1Ai+β2tij with inferences on the treatment effect completed through β1. Standard and simple augmented GEE were applied to each Monte Carlo dataset with AR1, exchangeable, and true working covariance structures, with the true structure under the marginal model being a summation of AR1 and exchangeable matrices as described in Section 2. Locally efficient GEE for M2 were fit under the true covariance structure and a misspecified marginal AR1 working covariance. Baseline covariates were incorporated fitting several outcome regression models. We use “C” to denote the correct model E(Yij|Xij,Ai,tij)=ξ0+ξ1Ai+ξ2tij+ξ3Xi12+ξ4Xi2, which corresponds to the true data-generating mechanism; “M1” indicates the model E(Yij|Xi,Ai,tij)=ξ0+ξ1Ai+ξ2tij+ξ3Xi1+ξ4Xi2, omitting the exponent on Xi1; and “M2” is the model that includes a noisy covariate Xi3, such that E(Yij|Xi,Ai,tij)=ξ0+ξ1Ai+ξ2tij+ξ3Xi12+ξ4Xi2+ξ5Xi3.

Efficiency comparisons are summarized in Figure 2 and Table 2. Additional figures are given in the supplementary material. For well-specified variance components and conditional mean models, the locally efficient GEE for M2 was more efficient than the simple augmented GEE, with the difference in efficiency increasing with the percent variability explained by Xi (MCRE of locally efficient to simple augmented GEE 1.0–1.27). Similarly, all augmented estimators were more efficient than standard GEE, with efficiency gains from augmenting increasing with correlation in Y and X (MCRE of augmented GEE to standard GEE 1.36–5.28). For poorly specified conditional mean models, locally efficient GEE for M2 and simple augmented GEE were nearly equally efficient (MCRE of locally efficient to simple augmented GEE 0.97–1.0), but when the marginal variance was also misspecified locally efficient GEE were less efficient than simple augmented GEE (MCRE 0.88–0.99). This demonstrates that the locally efficient GEE for M2 is a bit more sensitive to working marginal covariance misspecification than simple augmented GEE. Among the simple augmented estimators, the estimator with the incorrect marginal AR1 working covariance resulted in the β1 estimate with the lowest variability. This illustrates an important distinction between locally efficient and suboptimal estimating functions. Considering estimators using a suboptimal index, misspecified models for parameters in the index may result in more efficient inferences than correctly specified models. For the locally efficient estimator, semiparametric asymptotic efficiency is achieved only in the absence of model misspecification of all parameters in hopt().

Figure 2 MCRE of locally efficient and simple augmented GEE relative to standard (unaugmented) GEE. Continuous longitudinal outcomes. Estimators corresponding to each curve are denoted by “estimator (marginal working covariance) outcome regression” using the abbreviations: Loc Eff – locally efficient, Simp – simple augmented, Std – standard, AR1–autoregressive(1) V(Y|A)$$V(Y|A)$$, true-exchangeable/AR1 for V{E(Y|X,A)}$$V\{E(Y|X,A)\} $$ and V(Y|X,A)$$V(Y|X,A)$$; C – correct, M1–Model 1;α$$\alpha $$ = 0.3. The order of curves in the legend follows the order of curves in the figure, with the set of superimposed curves denoted by “[]”

Figure 2

MCRE of locally efficient and simple augmented GEE relative to standard (unaugmented) GEE. Continuous longitudinal outcomes. Estimators corresponding to each curve are denoted by “estimator (marginal working covariance) outcome regression” using the abbreviations: Loc Eff – locally efficient, Simp – simple augmented, Std – standard, AR1–autoregressive(1) V(Y|A), true-exchangeable/AR1 for V{E(Y|X,A)} and V(Y|X,A); C – correct, M1–Model 1;α = 0.3. The order of curves in the legend follows the order of curves in the figure, with the set of superimposed curves denoted by “[]”

Table 2

MCRE of locally efficient augmented GEE to suboptimal augmented GEE: continuous longitudinal outcomes

Correlation between Y and X

3.2 Clustered binary data

As for continuous outcomes, data for m = 500 clusters of variable size were generated with ni = 2, 4, 6, 8, 10, 12 for small cluster settings and ni = 10, 20, 30, 40, 50 for the large cluster scenario. The binary treatment variable Ai was simulated from the Bernoulli(1/2) distribution. Individual-level covariates Xij1, Xij2, and Xij3 were each generated from a multivariate normal distribution with μXijk=0, σXij12=σXij32 = 2, σXij22=5, and Cov(Xijk,Xijk)=ςXk, inducing marginal correlation among individuals within the same cluster. Covariance parameters ςXk were varied to evaluate the impact of covariance in auxiliary covariates on the performance of augmented estimators, with ςX1=ςX3=0.5,1.0,1.5,2.0 and ςX3=1.25,2.5,3.75,5.0 for different sets of simulations. For low levels of ςXk, covariates were weakly correlated, while for ςXk=σXijk2, covariates were cluster-level. Binary outcomes were simulated from the model logit[E(Yij|Xij,Ai,bi)]=0.7Xij12+0.4Xij20.5Ai+bi, where bi was drawn from the bridge distribution for the logit link [20] with scale parameter θ. Simulations were completed with two values of the bridge distribution scale parameter, θ=1 and θ=0.8, representing settings in which all sources of between-cluster heterogeneity are measured through auxiliary covariates, or when unmeasured sources of between-cluster heterogeneity are present. A total of 16 sets of simulations were done, varying cluster size, correlation in X, and θ.

Standard, simple augmented, and locally efficient GEE for M2 were applied to each dataset and compared for efficiency. For each estimator, the restricted mean model of interest was Model 2 with g() the inverse logit link and β1 measuring the marginal effect of treatment. Among augmented estimators, four outcome regression models were considered: (1) “C” – Correct, E(Yij|Xij,Ai)=g(ξ0+ξ1Xij12+ξ2Xij2+ξ3Ai); (2) “M1” – Model 1, E(Yij|Xij,Ai)=g(ξ0+ξ1Xij1+ξ2Xij2+ξ3Ai); (3) “M2” – Model 2, E(Yij|Xij,Ai)=g(ξ0+ξ1Xij12+ξ2Xij2+ξ3Xij3+ξ4Ai); and (4) “M1 OLS” – Model 1 OLS, E(Yij|Xij,Ai)=ξ0+ξ1Xij1+ξ2Xij2+ξ3Xij3+ξ4Ai. With the exception of Model 4, which was fit through ordinary least squares (OLS), all outcome regression models were fit by logistic regression. All estimators were fit with exchangeable working covariances.

Large cluster results are shown in Figure 3(a) and 3(b) and Table 3, while small cluster results are included in the supplementary material. Conclusions are similar to those obtained for continuous outcomes. Efficiency improvement with augmented estimators relative to standard GEE increased with correlation in auxiliary covariates (MCRE 1.10–10.54), as did the additional efficiency gains for the locally efficient GEE for M2 over simple augmented GEE (MCRE 1.0–1.23). Simple and locally efficient augmented estimators were equally efficient for θ=0.8 or when conditional mean models left out important transformations, but differences in efficiency favoring the optimal estimator were observed for θ=1 and well-specified covariate-adjusted models.

Figure 3 MCRE of locally efficient and simple augmented GEE relative to standard (unaugmented) GEE. Binary clustered outcomes. Estimators corresponding to each curve are denoted by “estimator-outcome regression” using the abbreviations: Loc Eff – locally efficient, Simp – simple augmented, Std – standard, C – correct, M1–Model 1, M1 OLS – Model 1 OLS. All estimators use exchangeable working covariance for V(Y|A)$$V(Y|A)$$ and V{E(Y|X,A)}$$V\{E(Y|X,A)\} $$. The order of curves in the legend follows the order of curves in the figure, with sets of superimposed curves denoted by “()”and “[]”

Figure 3

MCRE of locally efficient and simple augmented GEE relative to standard (unaugmented) GEE. Binary clustered outcomes. Estimators corresponding to each curve are denoted by “estimator-outcome regression” using the abbreviations: Loc Eff – locally efficient, Simp – simple augmented, Std – standard, C – correct, M1–Model 1, M1 OLS – Model 1 OLS. All estimators use exchangeable working covariance for V(Y|A) and V{E(Y|X,A)}. The order of curves in the legend follows the order of curves in the figure, with sets of superimposed curves denoted by “()”and “[]”

Table 3

MCRE of locally efficient augmented GEE to suboptimal augmented GEE: binary clustered outcomes

Correlation between Y and X

3.3 Simulation study summary

Results from the various simulation settings provide insight into the performance of the locally efficient GEE for model M2 and its practical value. The locally efficient estimator theoretically achieves minimum asymptotic variance when all components of hopt() and the augmentation are correctly specified. The results show that achieving the efficiency bound is not robust to model misspecification of working covariances and conditional means; the locally efficient GEE for M2 was only more efficient than simple augmented GEE when all mean models included important covariates in the correct polynomial form, and the correct structure was specified for working covariances. Even under well-specified models, the locally efficient GEE only improved over the simple augmented GEE when the data-generating mechanism was such that the underlying conditional variance, V(Y|X,A) had a sparse structure, such as AR1 or independence. The difficulty of correctly specifying models for nuisance parameters, particularly covariances, as well as measuring all sources of correlation so that V(Y|X,A) is sparse present challenges for successfully implementing locally efficient estimators in real-world analysis. This challenge is further illustrated in the following section with application to AIDS Clinical Trial Group Study 398.

4 Application

The semiparametric locally efficient estimator of marginal treatment effects for correlated outcomes was applied to data from AIDS Clinical Trial Group Study 398 (ACTG 398) [21]. ACTG 398 was a multicenter, double-blind trial, in which 481 HIV-infected patients were randomized to one of four arms, (A) saquinavir, (B) indinavir, (C) nelfinavir, or (D) placebo based on their past protease inhibitor (PI) treatment. Patients were only randomized to drugs to which they had no prior exposure. Randomized treatments were added to a common antiviral regimen for all subjects. Subjects’ CD4 counts were measured at weeks 0 (baseline), 4, 8, and every 8 weeks thereafter until 48 weeks or dropout. Here, we apply the GEE estimators to compare the nelfinavir and placebo arms among patients who were eligible for both according to the stratified randomization scheme. Additional baseline covariates were age, sex, past PI use, past non-nucleoside reverse transcriptase inhibitor exposure, weight, Karnofsky score, intravenous drug use, and race/ethnicity. Weeks 4–32 of follow-up were included for analysis, with CD4 measurements at week 4 and beyond included as outcomes and week 0 CD4 included as a baseline covariate. Data were ~90% complete through week 32. In evaluating the effect of treatment on CD4, the marginal model was E(Yij|Ai)=β0+β1Ai+β2tij, where tij indicates the week of the jth measurement on the ith individual, and Ai was an indicator for the placebo arm. This model was chosen by minimizing the prediction error from 10-fold cross validation of several candidate parametric models that included categorical time, quadratic time, or an interaction of time and treatment. Since only follow-up measurements were modeled as outcomes, and no interaction was detected between treatment and time, the effect of treatment was captured by β1.

Standard, simple augmented, and locally efficient GEE for M2 were applied to estimate β1. Several candidate outcome regression models for augmented GEE were identified through model selection procedures. Cross validation was used to select the final model, E(Yij|Ai,Xi,ti)=ξ0+ξ1Ai+ξ2tij+ξ3CD40i+ξ4Sexi, where CD4ξ0 is baseline CD4. The Quasi-likelihood under the independence model criterion (QIC) goodness-of-fit statistic [22] was compared among GEE fit to unaugmented marginal and conditional models to guide the choice of working covariance structures. To enforce compatibility of the marginal variance, conditional variance, and outcome regression in fitting locally efficient augmented GEE, the additive estimate of the marginal covariance was used. The working conditional variance was chosen by selecting the covariance structure resulting in the lowest QIC when fitting GEE on the conditional mean model. Simple augmented GEE were computed under all possible working marginal covariance structures, including the additive estimator motivated by the locally efficient GEE.

Results are shown in Table 4. Regarding covariance selection, unstructured working covariance resulted in the lowest QIC for the conditional model (supplementary material), suggesting the semiparametric locally efficient estimator should be fit assuming an unstructured form of V(Yi|Xi,Ai). Several other covariance structures were also implemented for the locally efficient estimator to explore variance misspecification. Among simple augmented estimators, the additive marginal covariance obtained by summing the unstructured V(Y|X,A,t) and V(E[Y|X,A,t]|A,t) induced by the chosen conditional mean model resulted in lower variability than estimators using standard marginal covariance structures. Estimated treatment effects exhibited variability across estimators but fell within a range of one standard deviation within the class of estimator considered (standard, simple augmented, or semiparametric locally efficient). Comparing standard GEE with different working covariance models, the estimated difference in average CD4 for the placebo arm versus nelfinavir ranged from 9.9 to 20.17. The direction of the effect was reversed for estimators that incorporated baseline covariates, with average CD4 on the placebo arm 0.07–8.11 units lower than the nelfinavir arm. Treatment did not have a significant impact on CD4 at the 0.05 level for any of the estimators considered.

Table 4

Application of standard, simple augmented, and locally efficient augmented GEE to AIDS Clinical Trial Group Study 398

Standard (Ind)9.97120.7720.942
Standard (Exch)14.18220.5930.958
Standard (AR1)16.97720.2220.993
Standard (Un)20.17320.1561.000
Standard (Exch/Un)14.61520.3470.981
Simple Aug. (Ind)−8.1109.2034.797
Simple Aug. (Exch)−6.3858.9045.124
Simple Aug. (AR1)−3.0599.2444.754
Simple Aug. (Un)−0.0799.4114.587
Simple Aug. (Exch/Un)−5.9728.5715.530
Simple Aug. (Exch/AR1)−5.0488.9205.106
Loc. Eff. (Ind)−8.1109.2034.797
Loc. Eff. (Exch)−6.8218.9535.068
Loc. Eff. (Exch/AR1)−5.7159.0734.936
Loc. Eff. (Exch/Un)−6.2778.6015.492
Adjusted (Un)−6.6498.6215.467

Estimators that incorporated baseline covariates greatly increased efficiency, with SE(βˆ1)20 for standard GEE and SE(βˆ1)9 among augmented estimators (relative efficiency augmented to standard GEE 5.0). Simple augmented and locally efficient GEE for M2 resulted in similar efficiency – a result that may be explained by several factors: (1) Subjects had the same number of follow-up visits. For GEE, the index impacts efficiency most when the number of observations per unit is variable, (2) the unstructured conditional variance is not sparse, and (3) the components of hopt may be misspecified. As a benchmark for efficiency, we also fit unaugmented GEE assuming the conditional mean model E(Yij|Ai,Xi,ti)=β0+β1Ai+β2tij+β3CD40i+β4Sexi with an unstructured working covariance. This estimator represents the most efficient estimator of β1 that may be obtained using Xi, which requires assuming that the more restrictive conditional mean model is correct. From this estimator, we can determine that for this particular case, there is little additional efficiency to be gained by locally efficient GEE if simple augmented GEE are fit under the best working covariance (Table 4).

5 Discussion

We derived and implemented a closed-form expression of the efficient score and a semiparametric locally efficient estimator in model M2 for correlated outcomes. This estimator requires correct specification of the marginal mean model, but is consistent and asymptotically normal under misspecification of the variance or conditional mean working models. To avoid misspecification of the marginal mean model, we recommend several modeling approaches. In cluster randomized studies, misspecification of the marginal mean model may be avoided by including a unique parameter for each treatment level. Under exchangeable correlation structures, which are typically assumed, the marginal mean model is then correct. In longitudinal studies with a small number of visits, fixed effects for each visit time and an interaction of time and treatment may be used to avoid misspecification. When continuous time measurement prevents such a strategy, nonparametric methods may be used to suggest appropriate functional forms. Through simulation, we demonstrated that the semiparametric locally efficient estimator is more efficient than corresponding suboptimal estimators in certain settings, particularly when randomized units vary in size, baseline covariates account for a large portion of the within-unit correlation, and baseline covariates are at least moderately predictive of the outcome. In longitudinal studies, variable size may occur when studies have staggered entry or as subjects are lost to follow-up. The estimator derived is only semiparametric locally efficient in the first case, as the locally efficient estimator for incomplete data incorporates information on the missingness process.

There are several challenges to achieve semiparametric local efficiency, some of which stem from the parametric nature of the model of the marginal treatment effect parameter. Assuming the mean model is correct, accounting for correlation through measured covariates and correctly specifying the form of correlation can be difficult in practice. This challenge may be addressed through the use of scientific knowledge and covariance structure diagnostic tools, but is still likely to make local efficiency unachievable in most practical settings, rendering the simple augmented GEE the more useful option. When the marginal mean model is not correct, the semiparametric locally efficient estimator does not yield a consistent treatment parameter, regardless of the index used. The need for correct specification is yet another challenge to the presented estimator, but this challenge is common across all GEE estimators. Although theoretically possible, the prize of implementing semiparametric local efficiency for restricted mean models of marginal treatment effects with baseline covariates in the context of correlated outcomes is typically not worth the chase. The correlated outcome setting gives rise to the possibility of multiple estimators; the semiparametric locally efficient estimator does not offer much practical gain compared to the augmented estimator using the index function from standard GEE after taking into account the possibility of model misspecification of nuisance parameters.

There are several alternatives to the semiparametric locally efficient and suboptimal augmented estimators we consider. A nonparametric approach to modeling marginal treatment effects in longitudinal designs was shown in Neugebauer and van der Laan [16]. Compared to our semiparametric locally efficient estimator the nonparametric strategy has the advantage of providing an interpretable causal parameter when the marginal mean model is possibly misspecified, but it does not utilize information on the correlation in outcomes and therefore would not be locally efficient in the setting of a time-fixed treatment with a correctly specified marginal model for the mean. Another semiparametric approach considers estimation of the conditional mean model followed by marginalization (the G-formula), but this relies on correct specification of the conditional mean model, whereas augmented estimators do not. Despite the shortcomings of the optimal augmented estimator, large efficiency gains were shown for longitudinal analysis when the baseline level of the outcome was incorporated in estimation as an auxiliary covariate. Baseline levels of outcomes can be highly predictive of follow-up levels, suggesting that in the analysis of data from longitudinal studies, failing to incorporate baseline covariates can be highly inefficient. These results suggest the value of incorporating baseline covariates in both interim and final analyses of data from randomized clinical trials.

6 Supplementary material

The reader is referred to the on-line supplementary materials for the derivation of the efficient score and additional simulation and data analysis results.


This research was supported by NIH grants R37 51164, T32AI007358, R01AI104459-01A1, and 1R21ES019712-01.


1. NeweyWK. Semiparametric efficiency bounds. J Appl Econometrics1990;5:99135.Search in Google Scholar

2. BickelPJ, KlassenCA, RitovY, WellnerJA. Efficient and adaptive estimation for semiparametric models. Baltimore, MD: The Johns Hopkins University Press, 1993.Search in Google Scholar

3. van der VaartAW. Asymptotic statistics. Cambridge, UK: Cambridge University Press, 1998.Search in Google Scholar

4. LiangKY, ZegerSL. Longitudinal data analysis for discrete and continuous outcomes. Biometrics1986;42:12130.Search in Google Scholar

5. ChamberlainG. Asymptotic efficiency in semi-parametric models with censoring. J Econometrics1986;32:189218.Search in Google Scholar

6. RobinsJM, RotnitzkyA, ZhaoLP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc1994;89:84666.Search in Google Scholar

7. RobinsJM. Marginal structural models versus structural nested models as tools for causal inference. In: BerryD,HalloranME, editors. Statistical models in epidemiology: the environment and clinical trials, vol. 116. New York: Springer-Verlag, 1999:95134.Search in Google Scholar

8. RobinsJ. Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American Statistical Association Section on Bayesian Statistical Science 1999, 2000:610.Search in Google Scholar

9. van der LaanMJ, RobinsJM. Unified methods for censored longitudinal data and causality. New York: Springer-Verlag, 2003.Search in Google Scholar

10. ZhangM, TsiatisAA, DavidianM. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics2008;64:70715.Search in Google Scholar

11. BangH, RobinsJM. Doubly robust estimation in missing data and causal inference models. Biometrics2005;61:96272.Search in Google Scholar

12. van der LaanMJ, RubinD. Targeted maximum likelihood learning. Int J Biostat2006;2:140.Search in Google Scholar

13. TsiatisAA, DavidianM, ZhangM, LuX. Covariate adjustment for two-sample treatment comparisons for randomized clinical trials: a principled yet flexible approach. Stat Med2008;27:465877.Search in Google Scholar

14. MooreKL, van der LaanMJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med2009;28:3964.Search in Google Scholar

15. MooreKL, van der LaanMJ. Application of time-to-event methods in the assessment of safety in clinical trials. In: PeaceKE, editor. Design, summarization, analysis & interpretation of clinical trials with time-to-event endpoints. New York: Chapman & Hall, 2009.Search in Google Scholar

16. NeugebauerR, van der LaanMJ. Nonparametric causal effects based on marginal structural models. J Stat Plann Inference2007;137:41934.Search in Google Scholar

17. StephensAJ, Tchetgen TchetgenEJ, De GruttolaV. Augmented gee for improving efficiency of inferences in cluster randomized trials by leveraging cluster and individual-level covariates. Stat Med2011;31:91530.Search in Google Scholar

18. NeweyWK, McFaddenD. Large sample estimation and hypothesis testing. Handbook of Econometrics 1994;4:2111245. Available at: in Google Scholar

19. HuberPJ. Robust estimation of a location parameter. Ann Math Stat1964;35:73101.Search in Google Scholar

20. WangA, LouisTA. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika2003;90:76575.Search in Google Scholar

21. HammerSM, VaidaF, BennettK, HolohanMK, SheinerL, EronJ, et al. Dual vs. single protease inhibitor therapy following antiretroviral treatment failure. J Am Med Assoc2002;288:16980.Search in Google Scholar

22. PanW. Akaike’s information criterion in generalized estimating equations. Biometrics2001;57:12025.Search in Google Scholar

Published Online: 2014-2-21
Published in Print: 2014-5-1

© 2014 by Walter de Gruyter Berlin / Boston