Jump to ContentJump to Main Navigation
Show Summary Details
More options …

The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

2 Issues per year

IMPACT FACTOR 2017: 0.840
5-year IMPACT FACTOR: 1.000

CiteScore 2017: 0.97

SCImago Journal Rank (SJR) 2017: 1.150
Source Normalized Impact per Paper (SNIP) 2017: 1.022

Mathematical Citation Quotient (MCQ) 2016: 0.09

See all formats and pricing
More options …

Double Bias: Estimation of Causal Effects from Length-Biased Samples in the Presence of Confounding

Ashkan Ertefaie / Masoud Asgharian / David A. Stephens
Published Online: 2015-03-21 | DOI: https://doi.org/10.1515/ijb-2014-0037


Length bias in survival data occurs in observational studies when, for example, subjects with shorter lifetimes are less likely to be present in the recorded data. In this paper, we consider estimating the causal exposure (treatment) effect on survival time from observational data when, in addition to the lack of randomization and consequent potential for confounding, the data constitute a length-biased sample; we hence term this a double-bias problem. We develop estimating equations that can be used to estimate the causal effect indexing the structural Cox proportional hazard and accelerated failure time models for point exposures in double-bias settings. The approaches rely on propensity score-based adjustments, and we demonstrate that estimation of the propensity score must be adjusted to acknowledge the length-biased sampling. Large sample properties of the estimators are established and their small sample behavior is studied using simulations. We apply the proposed methods to a set of, partly synthesized, length-biased survival data collected as part of the Canadian Study of Health and Aging (CSHA) to compare survival of subjects with dementia among institutionalized patients versus those recruited from the community and depict their adjusted survival curves.

Keywords: Causal Inference; Length-biased Sampling; Propensity Score; Cox Proportional Hazard model; Accelerated Failure Time model.

1 Introduction

In many observational studies, logistic or other constraints may render recruitment of disease-free patients for follow-up studies infeasible. In such cases, subjects who already experienced the initiation of the disease prior to recruitment (i.e. prevalent cases) are sampled. It is well known that subjects so recruited do not form a representative sample from the target population because subjects with longer survival time have greater chance to be recruited into the study. When the disease has stationary incidence, the induced bias in sampling is called length bias [1, 2]. This bias in sampling can lead to bias in the estimation of an exposure effect of interest.

Length-biased sampling can affect the sampling distribution of the covariates, such that covariates associated with the longer survivors have a higher chance of being selected. Recently, Bergeron et al. [3], Shen et al. [4], Qin and Shen [5] and Ning et al. [6] studied analysis of covariates under biased sampling. Studies on length-biased sampling can be traced as far back as Wicksell [7], Fisher [8], Neyman [9], Cox and Lewis [1], Zelen and Feinlein [2] and Patil and Rao [10]. An updated review of the subject can be found in Asgharian et al. [11].

A second source of potential bias in estimation of treatment or exposure effects encountered in observational studies is confounding. In the simple case of binary exposure, when exposure is influenced by other predictors, individuals in each exposure group may have different characteristics, which yielding imbalanced covariate distributions across the different groups. If the predictors also influence outcome (say, survival time) this may also lead to bias in the estimated exposure effect. Under an assumption of no unmeasured confounding, a consistent exposure effect estimator can be obtained by two well-known methods: Inverse probability of treatment weighting (IPTW) and propensity score regression (PSR). Weighted proportional hazard (PH) models for right censored data were introduced by Binder [12] and Lin [13] in the survey sampling literature. Pugh, Robins, Lipsitz and Harrington [14] also presented a weighted PH estimation equation to adjust for missing covariates [1518].

In a recent article, Ertefaie et al. [19] developed a method for estimating the propensity score in the presence of length-biased sampling. In this paper, we address estimation of total causal effects in the presence of both length-biased sampling and confounding, which we term the double-bias problem, in the analysis of survival data. Specifically, we develop augmented estimating equations based on PH and accelerated failure time (AFT) models that can be used to estimate the exposure effect. In both cases, the augmentation spaces are formed using the censoring mechanism to improve the efficiency.

The rest of this paper is organized as follows. In Section 2, we introduce concepts and notation used in the manuscript. Section 3 presents our proposed estimating equation for estimating the propensity score when data are subject to length-biased sampling. In Sections 4 and 5, we present our estimating equations to deal with length-biased sampling and confounding under PH and AFT modeling assumptions, respectively. Also, the large sample properties of the estimators derived from the proposed estimation procedure are presented. We examine the performance of the proposed approach via simulation, and, in Section 7, apply our method to analyze a set of length-biased right-censored survival data collected as part of the Canadian Study of Health and Aging (CSHA) investigating the effect on survival of institutionalization; see Wolfson et al. [20].

2 Length-biased sampling

2.1 Notation

Our notation is similar to that of Ertefaie et al. [19]. Our data comprise n i.i.d samples of (X,Y,D,A,C,R) where D and X are the binary treatment variable and the vector of covariates, respectively. A is the time from the onset of the disease to the recruitment time and R covers the time from the recruitment time to the event (residual life time). Accordingly, the observed lifetime is defined as T=A+R. In the presence of right censoring, C is the censoring time measured from the recruitment to the loss to follow up. The observed survival time is Y=A+min(R,C). The variables with superscript pop represent the population variables; variables without pop denote the observed truncated variables. Figure 1 illustrates the different random quantities introduced in this section. The symbols and × denote a censored lifetime and an observed failure, respectively.

The data structure.
Figure 1:

The data structure.

Let F and f be the distribution and density of Tpop, respectively. If the onset times are generated by a stationary Poisson process (the so-called stationarity assumption), then FLB(t)=0tsdF(s)0sdF(s)=1μ0tsdF(s)andfLB(t)=tf(t)μ,(1)if FLB has a corresponding absolutely continuous density fLB, where μ is the mean survival time under F. Equation (1) is derived under a uniform truncation assumption.

For t>0, we define the process {N(t)} by N(t)=1(Y<t,δ=1) where δ is the censoring indicator (δ=1 indicating failure). We use small letters to refer to the possible values of the corresponding capital letter random variable. Throughout the manuscript, we make the following standard assumptions:

  • A1. The variable (Tpop,Dpop,Xpop) is independent of the calendar time of the onset of the disease.

  • A2. The disease has stationary incidence, i.e. the disease incidence occurs at a constant rate.

  • A3. The censoring time C is independent of (A,R,D,X).

2.2 Counterfactual outcomes

We define the causal effect of interest using the counterfactual framework introduced by Rubin [21]. The counterfactual values (A(d),R(d),Y(d)) are representing the backward, forward recurrence times, and observed survival time, respectively, if D=d. Similarly, Tpop(d) represents the counterfactual response. The observed response, Tpop, is defined as DTpop(1)+(1D)Tpop(0).

We make the following standard causal assumptions to link the counterfactual outcome and the observed data [22, 23]:

  • 1.

    Consistency: Y(D)=Y.

  • 2.

    Ignorability: Y(d)D|X.

  • 3.

    Positivity: pD|X(d|x)>0 where pD|X(d|x) is the conditional probability of receiving treatment d given X=x.

Assumption 1 means that the counterfactual outcome of a treatment corresponds to the actual outcome if assigned to that treatment. Assumption 2 means that within levels of X, treatment D is randomized. Assumption 3 insures that there is enough overlap between treated and untreated groups for each possible value x. In what follows, we assume that these identifiability assumptions hold.

3 Propensity score estimation under length-biased sampling

Rosenbaum and Rubin [24] adjust for differences between exposed and unexposed groups using a scalar function of the measured covariates, the propensity score, which removes the bias induced by differences between these two groups of units. The propensity score, π(x), for binary exposure D is defined by π(x)=P(D=1|x), where x is a p-dimensional vector of covariates.

In general, the propensity score π(x) is unknown and needs to be estimated; it has also been shown that even if the propensity score is known, one may gain efficiency in estimating the average treatment effect (ATE) by estimating π(.) using the data available [25]. However, estimating the propensity score using a length-biased sample does not lead to a balancing score or create the desired pseudo-population in which the exposure is independent of covariates; indeed, it may induce even more bias than leaving the confounders unadjusted [19].

Assuming a logit model for the propensity score in the target population, we have π(x,α)=p(Dpop=1|Xpop=x)=exp(xα)1+exp(xα),(2)where α is a p×1 vector of parameters. Cheng and Wang [26] develop a method that consistently estimates the parameters of the propensity score from prevalent survival data. Their method requires correct specification of the conditional hazard model given the treatment and covariates. Ertefaie et al. [19] show that under assumptions A1–A3 this requirement can be removed, and propose the following estimating equation U(α)=i=1nδixiT(diπ(xi,α))wˆ(yi)=0,(3)where wˆ(y)=0ySˆC(s)ds and SˆC is the Kaplan–Meier estimator of the survivor function of the residual censoring variable C. Note that the censored individuals contribute to this estimating equation through wˆ(y). Ertefaie et al. [19] show that 1nU(α)=1ni=1nδixiT(diπ(xi,α))w(yi)+L(yi,ci,ai,di,xi)+op(n1/2),where L(yi,ci,ai,di,xi) is the augmentation element [18]. In this manuscript, we use eq. (3) to estimate the parameters of the propensity score. The term L(yi,ci,ai,di,xi) augments the failure time of the censored subjects using the observed failure times. We present the form of this augmentation term in Appendix B.

4 Cox PH models

The hazard ratio (HR) is defined as the ratio of hazards in the exposed and unexposed groups. Qin and Shen [5] introduce a set of estimating equations to assess the effect of covariates on the survival time in the presence of length-biased sampling. Our proposed estimating equation is an adaptation of the estimating equation introduced by Qin and Shen [5] (under the PH model) which adjusts for the confounding as well as length-biased sampling. We derive an estimating equation which estimates the marginal treatment effect without the need of estimating the effect for other covariates on the survival time.

Under A1–A3 and identifiability assumptions, the density of a counterfactual failure time observed in the study under exposure d can be expressed as f(y(d),δ=1)=f(y,δ=1|D=d,x)fLB(x)dx=fd(y|x)w(y)f(x)μddx=fd(y)w(y)μd,where μd=tfd(t)dt and fd(y) are the counterfactual densities of the survival time if all the individuals would have received the exposure d. The second equality follows as p(Y(t,t+dt),δ=1|d,x)=f(t|d,x)w(t)dtμd(x),fLB(x)=μd(x)f(x)μd,where w(t)=0tSC(s)ds and μd(x)=E[Y(d)|X=x]=E[Y|D=d,X=x] [35].

Assuming the PH model for the counterfactual survival time, we have λd(t)=λ0(t)eβ0d, and parameter eβ0 can be interpreted as a causal HR for the total effect of the treatment D. We propose the following estimating equation for β0, U(β)=i=1n0s1(Di=di)p(Di=di|Xi)Dij=1nDjπ(Xj)eβDjδj1(Yju)/w(Yj)j=1n1(Dj=dj)p(Dj=dj|Xj)eβDjδj1(Yju)/w(Yj)dNi(u),(4)In Appendix D, we show that eq. (4) corresponds to a score function of a pseudo-partial likelihood which can be presented as Iβ=1(D=d)p(D=d|X)Djn[djeβdjSdj(u)/μdj]jn[eβdjSdj(u)/μdj]dN(u)where Sd(y)=S(y|d,x)dF(x) is the counterfactual survival function of the survival time if all the individuals would have received the exposure d. The dependence of the estimating eq. (4) on the parametrization for the propensity score is shown by defining U(β,α)=i=1n0s1(Di=di)p(Di=di|Xi,α)Dij=1nDjπ(Xj,α)eβDjδj1(Yju)/w(Yj)j=1n1(Dj=dj)p(Dj=dj|Xj,α)eβDjδj1(Yju)/w(Yj)dNi(u),(5)In the proof of Theorem 1 in the Appendix, we show that U(β,α) can also be written as UM(β,α)=i=1n0sDij=1nDjπ(Xj,α)eβDjδj1(Yju)/w(Yj)j=1n1(Dj=dj)p(Dj=dj|Xj,α)eβDjδj1(Yju)/w(Yj)dMi(u),(6)where dMi(u)=1(Di=di)p(D=di|Xi,α)dNi(u)eβdiw(u)/w(Yi)δiI(Yi>u)dΛ0(u).The stochastic process M(u) can be estimated by replacing the w(.) and Λ0(.) by their estimates, wˆ(.) and Λˆ0(.), respectively. In the proof of Theorem 1 given in the Appendix B, we show that this stochastic process has mean zero.

The following theorem addresses the asymptotic properties of the estimator β obtained by the estimating eq. (6) when w(y) and p(D=d|X) are replaced by their estimated values wˆ(y) and p(D=d|X,α), respectively. The parameters of the propensity score can be estimated using the estimating equation given in eq. (3). Define MC(s)=1(YA<s,δ=0)0s1(min(YA,C)>u)dΛC(u),where ΛC(.) is the cumulative hazard function of the censoring variable. The stochastic process MC(s) can be estimated by replacing the ΛC(.) by its estimate, ΛˆC(.). The stochastic process MC(s) has mean zero, E[MC(s)]=E[1(C<YA<s)]0sE[1(YA>u).1(C>u)]dΛC(u)=0sSC(u)λC(u)SR(u)du0sSC(u)SR(u)dΛC(u)=0,where SR(u) is the survival function of the residual life time.

Theorem 1 Let βˆCox be the exposure effect estimator obtained as the root of Uˆ(β,α)=i=1n0s1(Di=di)p(Di=di|Xi,αˆ)Dij=1nDjπ(Xj,αˆ)eβDjδj1(Yju)/wˆ(Yj)j=1n1(Dj=dj)p(Dj=dj|Xj,αˆ)eβDjδj1(Yju)/wˆ(Yj)dNi(u).(7)Then under regularity conditions C.1C.4, C.5.a and C.5.c listed in Appendix A, n(βˆCoxβ)dN(0,ζ(β,α)),where ζ(β,α) is defined in Appendix B. Also, the estimating function Uˆ(β,α) converges in probability to U=0sDES1(β,α,u)ES0(β,α,u)dM(u)+0sH1(β,u)SC(u)SR(u)dMC(u),(8)where H1(β,u) and Sd(β,α,u) for d=0,1 are defined in Appendix B.

Proof See Appendix B.

In the absence of length-biased sampling, augmented partial likelihood estimators have been proposed in Robins et al. [27] and van der Laan and Robins [28]. The function U in eq. (8) generalizes this idea to length-biased sampling settings. Note the second part of the summation in U is the augmentation element.

Remark: Parameter β measures the marginal association between the exposure and the hazard, which is not necessarily equal to the conditional association due to non-collapsibility.

5 Accelerated failure time models

Inspired by the AFT models introduced by Cox and Oakes [29], we consider a general form of AFT models, where we do not assume a known error distribution. Assuming the AFT model for the counterfactual survival time, we have log(Tpop(d))=β0d+ε,and the parameter β0 can be interpreted as a total treatment effect. Under causal identifiability assumptions and by the balancing property of the propensity score, the above model can be written in terms of the observed data as follows log(Tpop)=β0D+γ0π(X)+ε.(9)We refer to this model as the AFT propensity score regression (AFTPSR) model [30]. Higher order and interaction terms can also be included in the model if needed. While AFT models may suffer from lack of robustness with respect to the log transformation, they are often more interpretable [31].

5.1 AFT-weighted estimating equations

Another approach for correcting the bias induced by non-random assignment was suggested by Horvitz and Thompson [32] and Hájek and Dupač [33] who introduced estimators which weight the observed outcomes. The IPTW estimator adjusts for confounding by assigning a weight to each individual proportional to their chance of receiving the exposure they actually received [34, 35].

We generalize the IPTW estimator to account for length-biased sampling. In our setting, the weights are the reciprocal of the probability of being in the exposure group to which each individual is observed to belong. The estimating equation corresponding to IPTW is given by UIPTWAFT=Pnδw(Y)Dlog(Y)π(1D)log(Y)(1π)=0.(10)where Pn is the empirical average. This is a version of the complete case influence function introduced by Tsiatis [36] modified to take into account the censoring weight w(Y).

Augmented IPTW (AIPTW), which is a more efficient version of IPTW, was introduced by Scharfstein et al. [37] and Lipsitz et al. [38]. Let μd(x,θ)=E[log(T)|D=d,X=x] for d=0,1. The corresponding estimating equation is given by UAIPTWAFT=Pnδw(Y)m(X,D,Y)β0+0sκ(t)dMC(t)SC(t)SR(t)=0,(11)where κˆ(t)=Pnδ1(Y>t)[m(X,D,Y)]tYSC(v)dvw2(Y).and m(X,D,Y)=D[log(Y)μ1(X,θ)]π(X,α)(1D)[log(Y)μ0(X,θ)](1π(X,α))+μ1(X,θ)μ0(X,θ).The causal effect estimator corresponding to an influence function in UAIPTWAFT is called a double robust (DR) estimator in the sense that the estimator is consistent if either the propensity score model or the conditional response mean model is correctly specified [28, 36, 39, 40]. The influence function (11) is a member of the class of AIPTW influence functions and it has been shown that it is more efficient than eq. (10) [18]. In the proof of Theorem 2, we show how UAIPTWAFT has been derived.

5.2 Asymptotic properties of the WEE estimator

Theorem 2 presents the asymptotic properties of the DR treatment effect estimator obtained by eq. (11) in the presence of length-biased sampling using the AFT models when both the treatment assignment and w(.) are replaced by their estimated values.

Theorem 2 Let βˆDRAFT be a DR estimator corresponding to UAIPTWAFT. Then under regularity conditions C.1C.4 and C.5.b, n(βˆDRAFTβ)dN(0,η(θ,α)), where η(θ,α) is defined in the Appendix B.

Proof See Appendix B.

6 Simulation studies

We examine the performance of the proposed estimating equations for the Cox and the AFT models. In both cases, we simulate 1,000 datasets consisting of 200, 400 and 800 observations to study the performance of the proposed estimating equations for estimating the unmediated causal effect. Here, the censoring variable C is generated from a uniform distribution in the interval (0,τ) where the parameter τ is set such that it results in a desired censoring proportion. To create length-biased samples, we generate a variable A from a uniform distribution (0,ρ) and ignore those whose generated unbiased failure time is less than A.

6.1 Cox model

We generated the population failure times from the hazard model h(t|d,x)=0.2exp{dx1x2+dx1}, where DBernoulliexpit{1+3x11x2} with X1 uniformly distributed on (0,1), X2Bernoulli(0.5), where expit(t)=et/(1+et). The true marginal treatment effect computed by Monte Carlo is θ0=1.385. We consider three different unadjusted scenarios: Unadjustedlc is an estimator for which neither the length-biased nor the confounding is adjusted, Unadjustedc is obtained by adjusting for the length-biased sampling but leaving the confounding unadjusted, and Unadjustedl is carried out by adjusting for the confounding while the length-biased sampling is left unadjusted. The estimating equations for these unadjusted cases are listed in Appendix E.

Table 1 summarizes the marginal estimated treatment effects and their standard errors. Our simulation results confirm that the proposed estimating eq. (7) under Cox model assumption adjusts for both confounding and the length-biased sampling and results in smaller MSE across different sample sizes.

Table 1:

Cox proportional hazard model simulation study.

6.2 AFT model estimation

We consider a nonlinear failure time model and include the exposure effect modifier by adding the interaction term between the treatment and a confounder (x2) as follows, log(Tpop)=2.5d+x21+2x2+x1+exp{x1/2}3dx2+ε,where ε is uniformly distributed on (–1,1), X1 is uniformly distributed on (0,1), X2Bernoulli(0.5), and DBernoulliexpit{2x13x2}. The estimated treatment effects and their standard errors are listed in Table 2. Similar to the previous section, we consider three different unadjusted scenarios. We have used a correct conditional mean model in the DR estimating equation. The DR estimator dominates the two other estimators in terms of the standard deviation and the MSE. Increasing the censoring proportion increases the bias in the PSR, IPTW and DR estimators while maintaining the unbiasedness. All the unadjusted estimators are biased and in our parameter setting it seems that the failure to account for the length-biased sampling leads to a more biased estimator compared to the Unadjustedc. The estimating equations for the unadjusted cases are listed in Appendix E.

Table 2:

Accelerated failure time simulation study.

7 Real data analysis: the Canadian study of health and aging

The CSHA, initiated in 1989, is a nationwide study on aging in Canada. One of the objectives of CSHA was to study dementia. The CSHA included three phases in 1991, 1996 and 2001. In the first phase, 10,263 individuals aged 65 or over were sampled at random across Canada, from both rural and urban areas, from communities and institutions for the elderly. Among the participants, 1,132 people were diagnosed with dementia. The ages of dementia onset were assessed from each individual’s medical history. We analyze the data collected during the first phase of the study which began in 1991 by sampling prevalent cases and examining the types of dementia: probable Alzheimer’s disease, possible Alzheimer’s disease and vascular dementia. The age of death or censoring were recorded for each subject from the time of screening, while the age at onset was ascertained retrospectively using CAMDEX from caregivers (Wolfson et al. [20]). Gender, level of education and the types of dementia are available as baseline covariates. The timescale for survival is set in years.

Table 3:

CSHA data analysis (semiparametric AFT): estimation of the institutionalization effect on the survival time.

7.1 Exposure of interest: institutionalization

One of the collected covariates is the dichotomous institutionalization (exposure) indicator, which takes the value one if the subject is institutionalized at the time of sampling, and zero otherwise. We are interested in comparing survival of institutionalized subjects with dementia and subjects recruited from the community.

Since there are some covariates which confound the effect of the exposure on the survival time, the crude difference estimator will be biased. We estimate the effect of this covariate while having confounding and length-biased sampling as two sources of estimation bias using Cox PH, and semiparametric AFT models. Our data include 818 subjects (after excluding patients with missing information), of which 180 subjects were right censored [20]. The validity of the stationarity assumption has been shown to be reasonable by Addona and Wolfson [41] and Asgharian et al. [42].

In order to estimate the causal effect of institutionalization, we need to ignore those individuals that their date of institutionalization is after their onset of the disease. However this information was not recorded in the dataset. We address this limitation using a multiple imputation approach to generate synthetic data on which the estimating equations can be used. Using an informed model, we generate a binary variable, Z, conditional on the age at onset, X1, and the gender, X2, that attempts to reveal whether institutionalization occurred prior to onset. Specifically, we used the model logit(Z=1|x1,x2)= 2+21(x1>85)+0.651(74<x1<84)+0.15x2 to generate Z, and then ignored patients with Z=0, i.e. those patients that whose date of institutionalization is after their onset time. We parametrized the above model such that older patients and females, x2=1, have more chance to be institutionalized before the onset of dementia. The value of the parameters are extracted from Carrière and Pelletier [43]. These authors estimate the relationship between sociodemographic characteristics and institutionalization of citizens of Canada. One of the limitations of our logistic model for Z is that we do not have all the covariates that are used in Carriere and Pelletier [43] such as income and marital status. We use the above model to fill the missing variable repeatedly and create a collection of 20 imputed data sets [44].

Table 4:

CSHA data analysis (Cox model): estimation of the institutionalization effect on the survival time.

7.2 Semiparametric AFT models

We have estimated the institutionalization effect on survival time using the semiparametric estimating equation proposed in Section 5. Table 3 presents the estimated institution effect using the semiparametric estimating equations proposed in Section 5 under the AFT model. PSR is the estimator based on eq. (9), AWE is the weighted estimator based on eq. (10) and DR is the estimator based on eq. (11) described in Theorem 3. We consider three different unadjusted scenarios: Unadjustedlc, Unadjustedc and Unadjustedl. The results reveal that the institutionalization have a significant positive effect on the survival time when estimated using AWE and PSR while it has a positive effect at the 10% level using DR estimator. The unadjusted estimator shows a small negative effect. In other words, without adjusting for either the length-biased sampling or the treatment adjustment, we might incorrectly conclude that institutionalized subjects tend to have a shorter survival time.

7.3 Cox PH model

Although the residual analysis shows that AFT is a suitable model for this data set (Bergeron et al. [3], we have also estimated the marginal institutionalization effect using the weighted estimating equation proposed for Cox models (Table 4). The proposed estimating equation for Cox models can be fitted using standard software, and equivalent to the following command in R for the observed subset of data, δi=1 for i=1,...,m, coxph(Surv(y,del)d+offset(log(hatwy)),weights=wpi,subset=(del==1))where y is the observed survival time, wpi is ϖˆc=d/πˆ+(1d)/(1πˆ) and hatwy is wˆ(y) is the Kaplan–Meier estimate for the distribution of the censoring variable. In this parameterization, the coefficients estimated indicate the increase/decrease in the hazard while in the AFT model coefficients indicate an decrease/increase in the survival time, and hence the opposite sign of the coefficients in the AFT and PH model have the same interpretation. To determine whether a fitted Cox model adequately describe the data, we looked at the scaled Schoenfeld residuals plot, Figure 2, for the Cox model. There appears to be a trend in the scaled Schoenfeld residuals for the institution indicator variable which indicates violation of the assumption of PH.

CSHA data: the scaled Schoenfeld residuals obtained by Cox model against transformed time for the estimated institutionalization effect.
Figure 2:

CSHA data: the scaled Schoenfeld residuals obtained by Cox model against transformed time for the estimated institutionalization effect.

CSHA data. Survival curves for adjusted and three different unadjusted scenarios. Unadjustedlc$$^{lc}$$: neither the length biased nor the confounding are adjusted for. Unadjusted c$$^c$$: The length biased is adjusted whereas confounding left unadjusted. Unadjustedl$$^l$$: The confounding is adjusted whereas the length biased left unadjusted.
Figure 3:

CSHA data. Survival curves for adjusted and three different unadjusted scenarios. Unadjustedlc: neither the length biased nor the confounding are adjusted for. Unadjusted c: The length biased is adjusted whereas confounding left unadjusted. Unadjustedl: The confounding is adjusted whereas the length biased left unadjusted.

CSHA data. Adjusted (thick lines) and unadjusted (thin lines) survival curves for treated and untreated individuals.
Figure 4:

CSHA data. Adjusted (thick lines) and unadjusted (thin lines) survival curves for treated and untreated individuals.

7.4 Survival curves

We compute adjusted and unadjusted survival curves to compare survival with dementia in the course of time between the exposure groups (Figure 3). Several methods have been proposed to adjust for the length-biased sampling such as the nonparametric maximum likelihood estimator [4547], the truncation product-limit estimator [48] and the maximum pseudo-partial likelihood estimator [49]. Here, we use the method introduced by Huang and Qin [50] which incorporates the information from the marginal distribution of the truncation time from disease onset to recruitment time. The bias induced by confounding can be adjusted by creating a pseudo-population using the inverse probability of being in the group that the individuals actually belong to [5153]. The adjusted survival curves show that the institutionalized patients tend to live longer while the survival curves cross when unadjusted (Figure 3). Moreover, leaving the length-biased sampling unadjusted may lead to overestimate the survival times which is shown in Figure 4. This figure clearly depicts that the survival curve of the institutionalized individuals is always higher than those recruited from the community.

8 Concluding remarks

We have presented two different approaches to estimate the exposure effect from right-censored length-biased samples. The estimating equations adjust for two different types of bias at the same time. Our simulation and real data analysis results highlight the importance of adjusting for the two sources of bias; failure to adjust for either the length-biased sampling or the confounding may lead to misleading results.

We have focused on the stationary case. It would, however, be of interest to extend the method to the general left truncation where the left truncation distribution is unknown. This latter approach is robust against departure from stationarity, though it is less efficient when the stationarity assumption holds [45, 46, 54].


Here, we present the assumptions and proofs of the main and other auxiliary results.

Appendix A

The regularity conditions required for the Cox and the weighted AFT models:

  • C.1 μd(.) for d=0,1 is a twice continuously differentiable function.

  • C.2 π(.) is bounded away from zero and one (γ<π(.)<1γ where γ>0).

  • C.3 sup[t:p(R>t)>0]sup[t:p(C>t)>0]=sandp(δ=1)>0.

  • C.4 0s[(tsSC(v)dv)2/(SC2(t)Sv(t))]dSC(t)<..

  • C.5

    • (a)

      0sτ12(t)/(SC2(t)SR(t))dSC(t)< and 0sτ2(t)/(SC2(t)SR(t))dSC(t)<

    • (b)

      0sκd2(t)/(SC2(t)SR(t))dSC(t)< and 0sκ22(t)/(SC2(t)SR(t))dSC(t)<

    • (c)


where τ1(t)=EDδeβD1(Y>t)tYSC(v)dvp(D=1|X)w2(Y), τ(t)=EδeβD1(Y>t)tYSC(v)dvp(D=d|X)w2(Y), κd(t)=E[1(D=d)δ1(Y>t)[log(Y)μd(π,θ)]tYSC(v)dvp(D=d|X)w2(Y)], κ2(t)=Eδ1(Y>t)[μ1(X,θ)μ0(X,θ)β]tYSC(v)dvp(D=d|X)w2(Y), v(u)=EδX(Dπ(X,α))1(Y>u)uYSC(v)dvw2(Y).Condition C.1 is a smoothness assumption of the mean function. C.1–C.2 are termed positivity assumptions, meaning that there is a positive chance that a subject falls in either the treatment or the control groups and being not censored, respectively. C.3 is an identifiability condition [54] and C.4–C.5 are required to obtain an estimator with a finite variance.

Appendix B: Proofs of Theorems 1 and 2

Proof of Theorem 1

First, we show that the stochastic process M(s) has a mean of zero: Mi(s)=1(Di=d)p(Di=d|Xi,α)Ni(s)0seβDiw(y)/w(Yi)δiI(Yi>y)dΛ0(y).Using eqs (8), (10) and the no unmeasured confounder assumption E[Ni(s)]=E1(Di=d)p(Di=d|Xi,α)1(Yi<s)δ=E1(Yi(d)<s)δ=0sfd(y)w(y)μddyand E0s1(Di=d)p(Di=d|Xi,α)eβDiw(y)/w(Yi)δiI(Yi>y)dΛ0(y)=E0seβdw(y)/w(Yi)δiI(Yi(d)>y)dΛ0(y)=0seβdw(y)Sd(y)μddΛ0(y) Thus, E[Mi(s)]=0.

In Section 5, we have shown that the estimating equation UM(β,α) given by eq. (13) is unbiased. We need to show that the two representations (12) and (13) are equal. By the definition of the stochastic process Mi(s), we have UM(β,α)=i=1n0sDiS1(β,α,t)S0(β,α,t)dMi(u)=U(β,α)0si=1nDiπ(Xi,α)w(u)eβDiδi1(Yiu)/w(Yi)nS1(β,α,t)S0(β,α,t)S0(β,α,t)dΛ0(u)=U(β,α)where S1(β,α,u)=n1j=1nDjπ(Xj,α)w(u)eβDjδj1(Yju)/w(Yj), S0(β,α,u)=n1j=1n1(Dj=dj)p(Dj=dj|Xj,α)w(u)eβDjδj1(Yju)/w(Yj).We have the following estimating equation when w(y) is replaced by its estimate, wˆ(y), UˆM(β,α)=i=1n0sDiSˆ1(β,α,u)Sˆ0(β,α,u)dNi(u).where Sˆ1(β,α,u)=n1j=1nDjπ(Xj,α)wˆ(u)eβDjδj1(Yju)/wˆ(Yj), Sˆ0(β,α,u)=n1j=1n1(Dj=dj)p(Dj=dj|Xj,α)wˆ(u)eβDjδj1(Yju)/wˆ(Yj).The estimating equation UˆM(β,α) can be written as i=1n0sDiES1(β,α,u)ES0(β,α,u)dMi(u)+i=1n0sS1(β,α,u)S0(β,α,u)Sˆ1(β,α,u)Sˆ0(β,α,u)dNi(u)+op(1).Using the strong consistency of wˆ(y) to w(y) [55], we have 1wˆ(Yj)=1w(Yj)1+w(Yj)wˆ(Yj)w(Yj)+op(1),(12)and following the martingale integral representation n(wˆ(Y)w(Y)) introduced by Shen et al. [4] and Qin and Shen [5], we can show that 0sS1(β,α,u)S0(β,α,u)Sˆ1(β,α,u)Sˆ0(β,α,u)dNi(u)=0sH1(β,u)SC(u)SR(u)dMCi(u)+op(1),where H1(β,u)=EYE[τ(Y,u)1(Y>y)]E[S0(β,α,y)]|Y=y with τ(Y,u)=Dδw(y)eβD1(Y>u)uYSC(v)dvπ(X,α)w2(Y),where Y is independent of and identically distributed to Y. Now we can derive the asymptotic variance of our proposed estimator when α is replaced by αˆ in the propensity score model. Note ψ˜(α)=1ni=1nψ˜i(α)=1ni=1nδiXiDiπ(Xi,α)wˆ(Yi)=1ni=1nδiXiDiπ(Xi,α)w(Yi)1+w(Yi)wˆ(Yi)w(Yi)+op(1)=1ni=1nδiXiDiπ(Xi,α)w(Yi)+0svˆ(t)dMC(t)SC(t)SR(t),(13)where vˆ(t)=1ni=1nδiI(Yi>t)Xi[Diπ(Xi,α))]tYiSC(v)dvw2(Yi).Hence, using the Taylor expansion and Theorem 1 in Pugh et al. [14], ζ(β,α)=EUβ2[E{U2}E{Uψ˜i(α)}E{ψ˜i(α)ψ˜i(α)}1E{ψ˜i(α)U}],where U=0sDES1(β,α,u)ES0(β,α,u)dM(u)+0sH1(β,u)SC(u)SR(u)dMC(u).

Note: The asymptotic variance ζ(β,α) may be estimated consistently by replacing the expectations in the expressions for U, τ1 and τ with expectations with respect to the empirical measure.

Proof of Theorem 2

Following eq. (12) and using the martingale integral representation n(wˆ(Y)w(Y)), we have δwˆ(Y)m(X,D,Y)β0=δw(Y)m(X,D,Y)β0+0sκ(t)dMC(t)SC(t)SR(t)+op(1)where κ(t)=Eδ1(Y>t)[m(X,D,Y)]tYSC(v)dvw2(Y)and m(X,D,Y)=D[log(Y)μ1(X,θ)]π(X,α)(1D)[log(Y)μ0(X,θ)](1π(X,α))+μ1(X,θ)μ0(X,θ).Hence, the generic elements of the class of influence functions G(AFT) G(AFT)=φ(Y,D,X):δw(Y)m(X,D,Y)β0+0sκ(t)dMC(t)SC(t)SR(t),can be written as {Vˆ0(θ)Vˆ1(θ)+Vˆ2(θ)} where V0(θ,α)=(1D)δ[log(Y)μ0(X,θ)](1π(X,α))w(Y)+0sκ0(t)dMC(t)SC(t)SR(t) V1(θ,α)=Dδ[log(Y)μ1(X,θ)]π(X,α)w(Y)+0sκ1(t)dMC(t)SC(t)SR(t) V2(θ,α)=δw(Y)[μ1(X,θ)μ0(X,θ)β]+0sκ2(t)dMC(t)SC(t)SR(t),with κd(t)=EI(D=d)δI(Y>t)[log(Y)μd(X,θ)]tYSC(v)dvp(D=d|X,α)w2(Y),ford=0,1 κ2(t)=EδI(Y>t)[μ1(X,θ)μ0(X,θ)β]tYSC(v)dvw2(Y).In order to show that G(AFT) results in an unbiased estimator, we need to show that E[V0(θ,α)]=E[V1(θ,α)]=E[V2(θ,α)]=0. For the first expectation, we have E[V0(θ,α)]=E(1D)δ[log(Y)μ0(X,θ)](1π(X,α))w(Y)+0sκ0(t)dMC(t)SC(t)SR(t)=E(1D)δ[log(Y)μ0(X,θ)](1π(X,α))w(Y)+E0sκ0(t)dMC(t)SC(t)SR(t)(14)The first expectation on the RHS of eq. (14) is =E[00f(Y=y,A=a,δ=1,D=1|X=x)×D[log(y)μ1(x,θ)]π(x,α)w(Y)dady]=E[01π(x,α)f(Y=y|D=1,X=x)μ1(x,θ)w(y)μ1(x,θ)μ(x,θ)π(x,α)]×[log(y)μ1(x,θ)]w(y)dy]=E[1μ(x,θ)0f(y|x,D=1)[log(Y)μ1(x,θ)]dy]=0,where μd(x,θ)=yf(y|D=d,x)dy and μ(x,θ)=0p(Tpopa|x,θ). The second equality follows from p(Y(t,t+dt),δ=1|d,x)=f(t|d,x)w(t)dtμd(x,θ),and pLB(D=1|X=x)=μ1(x,θ)p(Dpop=1|Xpop=x)μ(x,θ),where pLB(D=1|X=x) is the propensity score estimated from the length-biased sample and π(x,α)=p(Dpop=1|Xpop=x) is the true propensity score. The second expectation of the RHS of eq. (14) is also equal to zero since E[MC(s)]=0. Similarly, we can show that E[V1(θ,α)]=E[V2(θ,α)]=0.

It can be shown that V0(θ,α) and V1(θ,α) are uncorrelated. Hence the asymptotic variance of the estimator is given by η(θ,α)=E[V12(θ,α)+V02(θ,α)+V22(θ,α)V02(θ,α)V22(θ)+V12(θ,α)V22(θ,α)].where EV12(θ,α)=EDδ[log(Y)μ1(X,θ)]π(X,α)w(Y)+0sκ1(t)dMC(t)SC(t)SR(t)2 EV02(θ,α)=E(1D)δ[log(Y)μ0(X,θ)](1π(X,α))w(Y)+0sκ0(t)dMC(t)SC(t)SR(t)2 EV22(θ,α)=Eδw(Y)[μ1(X,θ)μ0(X,θ)β]+0sκ2(t)dMC(t)SC(t)SR(t)2,and EV1(θ,α)V2(θ,α)=EDδ[log(Y)μ0(X,θ)](π(X,α))w(Y)+0sκ1(t)dMC(t)SC(t)SR(t)×δw(Y)[μ1(X,θ)μ0(X,θ)β]+0sκ2(t)dMC(t)SC(t)SR(t)EV0(θ,α)V2(θ,α)=E(1D)δ[log(Y)μ0(X,θ)](1π(X,α))w(Y)+0sκ0(t)dMC(t)SC(t)SR(t)×δw(Y)[μ1(X,θ)μ0(X,θ)β]+0sκ2(t)dMC(t)SC(t)SR(t).

Table 5:

Accelerated failure time simulation study when the propensity score is misspecified. PSR is the estimator based on eq. (9).

Appendix C: Misspecified propensity score or mean model

In this appendix, we study the performance of the DR AFT estimator (11) when either the propensity score or the mean model is misspecified. We use the same simulation model as in Section 6.2 with only changing the treatment assignment model to DBernoulliexpit{23x13x2}. Our misspecified propensity score ignores the confounder X1. Table 5 shows results based on 500 data sets of sizes 200 and 800 with 0, 20 and 30 percent censoring. The superscript m1 and m2 represent the propensity score and mean model misspecifications, respectively. The misspecified propensity score ignores the variable x1 and the misspecified mean model ignores the interaction term dx2 (see Section 6.2). The results confirm that our estimator is doubly robust.

Appendix D: Derivation of the score function Iβ

The score function Iβ derived from the following pseudo-partial likelihood after adjusting the risk sets for the confounding and the length-biased sampling LP(β)=iexp(βdi)j=1nexp(βdj)1(yjyiaj)1(D=d)/p(Di=di|xi)δi/p(Di=di|xi);where I(yjyiaj)/p(Dj=dj|xj) represents the adjusted risk set for both length-biased sampling and confounding. Followed by Shen et al. [4] and Qin and Shen [5], we estimate the denominator by j=1nexp(βdj)δj1(D=d)1(yjyi)p(Di=di|xi)w(yj)where the focus is on the uncensored subjects and the risk set is inversely weighted by w(yj). Note, under assumptions A1–A3, we have E1(D=d)p(D=d|X).δ1(Yy)w(Y)=Sd(y)μd,which justifies the form of the score function Iβ.

Appendix E: Cox and AFT estimating equations when either of the confounding or the length-biased sampling is ignored

  • 1.

    Estimating equation for Cox model when length biased is left unadjusted: i=1n0s1(Di=di)p(Di=di|Xi,α)Dij=1nDjπ(Xj,α)eβDjδj1(Yju)j=1n1(Dj=dj)p(Dj=dj|Xj,α)eβDjδj1(Yju)dNi(u)=0.

  • 2.

    Estimating equation for Cox model when the confounding is left unadjusted: i=1n0sDij=1nDjeβDjδj1(Yju)/w(Yj)j=1neβDjδj1(Yju)/w(Yj)dNi(u)=0.

  • 3.

    Estimating equation for AFT model when length biased is left unadjusted: i=1nδiDilog(Yi)πi(1Di)log(Yi)(1πi)β=0.

  • 4.

    Estimating equation for AFT model when the confounding is left unadjusted: i=1nδiDiw(Yi)log(Yi)βDi=0.


  • 1.

    Cox DR, Lewis P. The statistical analysis of series of events. Monographs on applied probability and statistics. London: Chapman and Hall, 1966. Google Scholar

  • 2.

    Zelen M, Feinlein M. On the theory of screening for chronic diseases. Biometrika 1969;56:601–14. CrossrefGoogle Scholar

  • 3.

    Bergeron PJ, Asgharian M, Wolfson DB. Covariate bias induced by length-biased sampling of failure times. J Am Stat Assoc 2008;103:737–42. CrossrefGoogle Scholar

  • 4.

    Shen Y, Ning J, Qin J. Analyzing length-biased data with semiparametric transformation and accelerated failure time models. J Am Stat Assoc 2009;104:1192–202. CrossrefGoogle Scholar

  • 5.

    Qin J, Shen Y. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics 2010;66:382–92. CrossrefGoogle Scholar

  • 6.

    Ning J, Qin J, Shen Y. Non-parametric tests for right-censored data with biased sampling. J R Stat Soc Ser B (Stat Methodol) 2010;72:609–30. CrossrefGoogle Scholar

  • 7.

    Wicksell SD. The corpuscle problem: a mathematical study of a biometric problem. Biometrika 1925;17:84–99. Google Scholar

  • 8.

    Fisher RA. The effect of methods of ascertainment upon the estimation of frequencies. Ann Hum Genet 1934;6:13–25. Google Scholar

  • 9.

    Neyman J. Statistics–servant of all science. Science 1955;122:401–6. CrossrefGoogle Scholar

  • 10.

    Patil GP, Rao CR. Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 1978;34:179–89. CrossrefGoogle Scholar

  • 11.

    Asgharian M, Wolfson C, Wolfson DB. Analysis of biased survival data: the Canadian study of health and aging and beyond. Stat Action Can Outlook 2014;193–208. CrossrefGoogle Scholar

  • 12.

    Binder D. Fitting Cox’s proportional hazards models from survey data. Biometrika 1992;79:139–47. CrossrefGoogle Scholar

  • 13.

    Lin D. On fitting Cox’s proportional hazards models to survey data. Biometrika 2000;87:37–47. CrossrefGoogle Scholar

  • 14.

    Pugh M, Robins J, Lipsitz S, Harrington D (1993): Inference in the Cox proportional hazards model with missing covariate data. Technical report, Harvard School of Public Health, Dept. of Biostatistics. Google Scholar

  • 15.

    Chen H, Little R. Proportional hazards regression with missing covariates. J Am Stat Assoc 1999;94:896–908. CrossrefGoogle Scholar

  • 16.

    Luo X, Tsai W, Xu Q. Pseudo-partial likelihood estimators for the cox regression model with missing covariates. Biometrika 2009;96:617–33.CrossrefGoogle Scholar

  • 17.

    Qi L, Wang C, Prentice R. Weighted estimators for proportional hazards regression with missing covariates. J Am Stat Assoc 2005;100:1250–63.CrossrefGoogle Scholar

  • 18.

    Rotnitzky A, Robins JM. Inverse probability weighting in survival analysis. In: Armitage P, Coulton, T, editors. Encyclopedia of biostatistics, 2nd ed. New York: Wiley, 2005. Google Scholar

  • 19.

    Ertefaie A, Asgharian M, Stephens D. Propensity score estimation in the presence of length-biased sampling: a non-parametric adjustment approach. Stat 2014;3:83–94. CrossrefGoogle Scholar

  • 20.

    Wolfson C, Wolfson DB, Asgharian M, M’Lan CE, Østbye T, Rockwood K, et al. A reevaluation of the duration of survival after the onset of dementia. N Engl J Med 2001;344:1111–16. CrossrefGoogle Scholar

  • 21.

    Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat 1978;6:34–58. CrossrefGoogle Scholar

  • 22.

    Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat Theory Methods 1994;23:2379–412. CrossrefGoogle Scholar

  • 23.

    Robins JM. Causal inference from complex longitudinal data. In: Berkane, M, etdior. Latent variable modeling and applications to causality. New York: Springer, 1997:69–117. Google Scholar

  • 24.

    Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. CrossrefGoogle Scholar

  • 25.

    Hirano K, Imbens G, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 2003;71:1161–89. CrossrefGoogle Scholar

  • 26.

    Cheng Y, Wang M. Estimating propensity scores and causal survival functions using prevalent survival data. Biometrics 2012;68:707–16. CrossrefGoogle Scholar

  • 27.

    Robins JM, Rotnitzky A, van der Laan M. On profile likelihood: comment. J Am Stat Assoc 2000;95:477–82. CrossrefGoogle Scholar

  • 28.

    van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer Science & Business Media, 2003. Google Scholar

  • 29.

    Cox DR, Oakes D. Analysis of survival data. Chapman & Hall/CRC, 1984. Google Scholar

  • 30.

    Robins JM, Mark SD, Newey WK. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 1992;48:479–95. URL http://www.jstor.org/stable/2532304Crossref

  • 31.

    Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd ed. New York: Wiley, 2002. Google Scholar

  • 32.

    Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 1952;47:663–85. CrossrefGoogle Scholar

  • 33.

    Hájek J, Dupač V. Sampling from a finite population. New York: Marcel Dekker, 1981. Google Scholar

  • 34.

    Hernán M, Brumback B, Robins J. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000;11:561–70. CrossrefGoogle Scholar

  • 35.

    Robins JM. Association, causation, and marginal structural models. Synthese 1999;121:151–79. CrossrefGoogle Scholar

  • 36.

    Tsiatis AA. Semiparametric theory and missing data. New York: Springer Verlag, 2006. Google Scholar

  • 37.

    Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999;94:1096–120. CrossrefGoogle Scholar

  • 38.

    Lipsitz SR, Ibrahim JG, Zhao LP. A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. J Am Stat Assoc 1999;94:1147–60. CrossrefGoogle Scholar

  • 39.

    Neugebauer R, van der Laan M. Why prefer double robust estimators in causal inference? J Stat Plan Inference 2005;129:405–26. CrossrefGoogle Scholar

  • 40.

    Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proc Am Stat Assoc Sec Bayesian Stat Sci 1999;6–10. Google Scholar

  • 41.

    Addona V, Wolfson DB. A formal test for the stationarity of the incidence rate using data from a prevalent cohort study with follow-up. Lifetime Data Anal 2006;12:267–84. CrossrefGoogle Scholar

  • 42.

    Asgharian M, Wolfson DB, Zhang X. Checking stationarity of the incidence rate using prevalent cohort survival data. Stat Med 2006;25:1751–1767. CrossrefGoogle Scholar

  • 43.

    Carrière Y, Pelletier L. Factors underlying the institutionalization of elderly persons in Canada. J Gerontol Ser B Psychol Sci Soc Sci 1995;50:S164. CrossrefGoogle Scholar

  • 44.

    Little R, Rubin DB. Statistical analysis with missing data. Vol. 539. New York: Wiley, 1987. Google Scholar

  • 45.

    Asgharian M, M’Lan CE, Wolfson DB. Length-biased sampling with right censoring. J Am Stat Assoc 2002;97:201–9. CrossrefGoogle Scholar

  • 46.

    Asgharian M, Wolfson DB. Asymptotic behavior of the unconditional NPMLE of the length-biased survivor function from right censored prevalent cohort data. Ann Stat 2005;33:2109–31. URL http://www.jstor.org/stable/3448636Crossref

  • 47.

    Vardi Y. Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 1989;76:751–61. CrossrefGoogle Scholar

  • 48.

    Wang MC, Jewell NP, Tsai WY. Asymptotic properties of the product limit estimate under random truncation. Ann Stat 1986;14:1597–605. CrossrefGoogle Scholar

  • 49.

    Luo X, Tsai W. Nonparametric estimation for right-censored length-biased data: a pseudo-partial likelihood approach. Biometrika 2009;96:873–86. CrossrefGoogle Scholar

  • 50.

    Huang CY, Qin J. Nonparametric estimation for length-biased and right-censored data. Biometrika 2011;98:177. CrossrefGoogle Scholar

  • 51.

    Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Programs Biomed 2004;75:45–9. CrossrefGoogle Scholar

  • 52.

    Nieto FJ, Coresh J. Adjusting survival curves for confounders: a review and a new method. Am J Epidemiol 1996;143:1059.CrossrefGoogle Scholar

  • 53.

    Xie J, Liu C. Adjusted Kaplan–Meier estimator and log-rank test with inverse probability of treatment weighting for survival data. Stat Med 2005;24:3089–110. CrossrefGoogle Scholar

  • 54.

    Wang MC. Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 1991;86:130–43. CrossrefGoogle Scholar

  • 55.

    Pepe, MS, Fleming, TR. Weighted Kaplan-Meier statistics: Large sample and optimality considerations. J R Stat Soc Ser B (Methodol) 1991; 53(2):341–52. Google Scholar


    About the article

    Published Online: 2015-03-21

    Published in Print: 2015-05-01

    Funding: The data reported in this article were collected as part of the Canadian Study of Health and Aging. The core study was funded by the Seniors’ Independence Research Program, through the National Health Research and Development Program (NHRDP) of Health Canada Project 6606-3954-MC(S). Additional funding was provided by Pfizer Canada Incorporated through the Medical Research Council/Pharmaceutical Manufacturers Association of Canada Health Activity Program, NHRDP Project 6603-1417-302(R), Bayer Incorporated, and the British Columbia Health Research Foundation Projects 38 (93-2) and 34 (96-1). The study was coordinated through the University of Ottawa and the Division of Aging and Seniors, Health Canada. The authors would like to thank Professor Christina Wolfson for providing the data. This work was supported in part by NIDA and NSF SES-1260782 grants P50 DA010075. The second and third authors acknowledge the support of Discovery Grants from the Natural Sciences and Engineering Research Council (NSERC) of Canada. Part of the work on this article was completed while the second author was on sabbatical leave at Universite de Bordeaux. He would like to thank hospitality of the Equipe de Biostatistique and in particular, Daniel Commenges and Helene Jacqmin-Gadda.

    Citation Information: The International Journal of Biostatistics, Volume 11, Issue 1, Pages 69–89, ISSN (Online) 1557-4679, ISSN (Print) 2194-573X, DOI: https://doi.org/10.1515/ijb-2014-0037.

    Export Citation

    © 2015 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

    Comments (0)

    Please log in or register to comment.
    Log in