Show Summary Details
More options …

# The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

2 Issues per year

IMPACT FACTOR 2017: 0.840
5-year IMPACT FACTOR: 1.000

CiteScore 2017: 0.97

SCImago Journal Rank (SJR) 2017: 1.150
Source Normalized Impact per Paper (SNIP) 2017: 1.022

Mathematical Citation Quotient (MCQ) 2016: 0.09

Online
ISSN
1557-4679
See all formats and pricing
More options …
Volume 11, Issue 1

# Double Bias: Estimation of Causal Effects from Length-Biased Samples in the Presence of Confounding

Ashkan Ertefaie
/ Masoud Asgharian
/ David A. Stephens
Published Online: 2015-03-21 | DOI: https://doi.org/10.1515/ijb-2014-0037

## Abstract

Length bias in survival data occurs in observational studies when, for example, subjects with shorter lifetimes are less likely to be present in the recorded data. In this paper, we consider estimating the causal exposure (treatment) effect on survival time from observational data when, in addition to the lack of randomization and consequent potential for confounding, the data constitute a length-biased sample; we hence term this a double-bias problem. We develop estimating equations that can be used to estimate the causal effect indexing the structural Cox proportional hazard and accelerated failure time models for point exposures in double-bias settings. The approaches rely on propensity score-based adjustments, and we demonstrate that estimation of the propensity score must be adjusted to acknowledge the length-biased sampling. Large sample properties of the estimators are established and their small sample behavior is studied using simulations. We apply the proposed methods to a set of, partly synthesized, length-biased survival data collected as part of the Canadian Study of Health and Aging (CSHA) to compare survival of subjects with dementia among institutionalized patients versus those recruited from the community and depict their adjusted survival curves.

## 1 Introduction

In many observational studies, logistic or other constraints may render recruitment of disease-free patients for follow-up studies infeasible. In such cases, subjects who already experienced the initiation of the disease prior to recruitment (i.e. prevalent cases) are sampled. It is well known that subjects so recruited do not form a representative sample from the target population because subjects with longer survival time have greater chance to be recruited into the study. When the disease has stationary incidence, the induced bias in sampling is called length bias [1, 2]. This bias in sampling can lead to bias in the estimation of an exposure effect of interest.

Length-biased sampling can affect the sampling distribution of the covariates, such that covariates associated with the longer survivors have a higher chance of being selected. Recently, Bergeron et al. [3], Shen et al. [4], Qin and Shen [5] and Ning et al. [6] studied analysis of covariates under biased sampling. Studies on length-biased sampling can be traced as far back as Wicksell [7], Fisher [8], Neyman [9], Cox and Lewis [1], Zelen and Feinlein [2] and Patil and Rao [10]. An updated review of the subject can be found in Asgharian et al. [11].

A second source of potential bias in estimation of treatment or exposure effects encountered in observational studies is confounding. In the simple case of binary exposure, when exposure is influenced by other predictors, individuals in each exposure group may have different characteristics, which yielding imbalanced covariate distributions across the different groups. If the predictors also influence outcome (say, survival time) this may also lead to bias in the estimated exposure effect. Under an assumption of no unmeasured confounding, a consistent exposure effect estimator can be obtained by two well-known methods: Inverse probability of treatment weighting (IPTW) and propensity score regression (PSR). Weighted proportional hazard (PH) models for right censored data were introduced by Binder [12] and Lin [13] in the survey sampling literature. Pugh, Robins, Lipsitz and Harrington [14] also presented a weighted PH estimation equation to adjust for missing covariates [1518].

In a recent article, Ertefaie et al. [19] developed a method for estimating the propensity score in the presence of length-biased sampling. In this paper, we address estimation of total causal effects in the presence of both length-biased sampling and confounding, which we term the double-bias problem, in the analysis of survival data. Specifically, we develop augmented estimating equations based on PH and accelerated failure time (AFT) models that can be used to estimate the exposure effect. In both cases, the augmentation spaces are formed using the censoring mechanism to improve the efficiency.

The rest of this paper is organized as follows. In Section 2, we introduce concepts and notation used in the manuscript. Section 3 presents our proposed estimating equation for estimating the propensity score when data are subject to length-biased sampling. In Sections 4 and 5, we present our estimating equations to deal with length-biased sampling and confounding under PH and AFT modeling assumptions, respectively. Also, the large sample properties of the estimators derived from the proposed estimation procedure are presented. We examine the performance of the proposed approach via simulation, and, in Section 7, apply our method to analyze a set of length-biased right-censored survival data collected as part of the Canadian Study of Health and Aging (CSHA) investigating the effect on survival of institutionalization; see Wolfson et al. [20].

## 2.1 Notation

Our notation is similar to that of Ertefaie et al. [19]. Our data comprise n i.i.d samples of $\left(\mathbf{X},Y,D,A,C,R\right)$ where D and $\mathbf{X}$ are the binary treatment variable and the vector of covariates, respectively. A is the time from the onset of the disease to the recruitment time and R covers the time from the recruitment time to the event (residual life time). Accordingly, the observed lifetime is defined as $T=A+R$. In the presence of right censoring, C is the censoring time measured from the recruitment to the loss to follow up. The observed survival time is $Y=A+min\left(R,C\right)$. The variables with superscript pop represent the population variables; variables without pop denote the observed truncated variables. Figure 1 illustrates the different random quantities introduced in this section. The symbols $\circ$ and $×$ denote a censored lifetime and an observed failure, respectively.

Figure 1:

The data structure.

Let F and f be the distribution and density of ${T}^{\mathrm{p}\mathrm{o}\mathrm{p}}$, respectively. If the onset times are generated by a stationary Poisson process (the so-called stationarity assumption), then ${F}_{LB}\left(t\right)=\frac{{\int }_{0}^{t}s\phantom{\rule{thinmathspace}{0ex}}dF\left(s\right)}{{\int }_{0}^{\mathrm{\infty }}s\phantom{\rule{thinmathspace}{0ex}}dF\left(s\right)}=\frac{1}{\mathrm{\mu }}{\int }_{0}^{t}s\phantom{\rule{thinmathspace}{0ex}}dF\left(s\right)\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{f}_{LB}\left(t\right)=\frac{tf\left(t\right)}{\mathrm{\mu }},$(1)if ${F}_{LB}$ has a corresponding absolutely continuous density ${f}_{LB}$, where $\mathrm{\mu }$ is the mean survival time under F. Equation (1) is derived under a uniform truncation assumption.

For $t>0$, we define the process $\left\{N\left(t\right)\right\}$ by $N\left(t\right)=\mathbb{1}\left(Y where $\mathrm{\delta }$ is the censoring indicator ($\mathrm{\delta }=1$ indicating failure). We use small letters to refer to the possible values of the corresponding capital letter random variable. Throughout the manuscript, we make the following standard assumptions:

• A1. The variable $\left({T}^{\mathrm{p}\mathrm{o}\mathrm{p}},{D}^{\mathrm{p}\mathrm{o}\mathrm{p}},{\mathbf{X}}^{\mathrm{p}\mathrm{o}\mathrm{p}}\right)$ is independent of the calendar time of the onset of the disease.

• A2. The disease has stationary incidence, i.e. the disease incidence occurs at a constant rate.

• A3. The censoring time C is independent of $\left(A,R,D,\mathbf{X}\right)$.

## 2.2 Counterfactual outcomes

We define the causal effect of interest using the counterfactual framework introduced by Rubin [21]. The counterfactual values $\left(A\left(d\right),R\left(d\right),Y\left(d\right)\right)$ are representing the backward, forward recurrence times, and observed survival time, respectively, if $D=d$. Similarly, ${T}^{\mathrm{p}\mathrm{o}\mathrm{p}}\left(d\right)$ represents the counterfactual response. The observed response, ${T}^{\mathrm{p}\mathrm{o}\mathrm{p}}$, is defined as $D{T}^{\mathrm{p}\mathrm{o}\mathrm{p}}\left(1\right)+\left(1-D\right){T}^{\mathrm{p}\mathrm{o}\mathrm{p}}\left(0\right)$.

We make the following standard causal assumptions to link the counterfactual outcome and the observed data [22, 23]:

• 1.

Consistency: $Y\left(D\right)=Y$.

• 2.

Ignorability: $Y\left(d\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{\perp }\phantom{\rule{thinmathspace}{0ex}}D|\mathbf{X}$.

• 3.

Positivity: ${p}_{D|\mathbf{X}}\left(d|\mathbf{x}\right)>0$ where ${p}_{D|\mathbf{X}}\left(d|\mathbf{x}\right)$ is the conditional probability of receiving treatment d given $\mathbf{X}=\mathbf{x}$.

Assumption 1 means that the counterfactual outcome of a treatment corresponds to the actual outcome if assigned to that treatment. Assumption 2 means that within levels of $\mathbf{X}$, treatment D is randomized. Assumption 3 insures that there is enough overlap between treated and untreated groups for each possible value $\mathbf{x}$. In what follows, we assume that these identifiability assumptions hold.

## 3 Propensity score estimation under length-biased sampling

Rosenbaum and Rubin [24] adjust for differences between exposed and unexposed groups using a scalar function of the measured covariates, the propensity score, which removes the bias induced by differences between these two groups of units. The propensity score, $\mathrm{\pi }\left(\mathbf{x}\right)$, for binary exposure D is defined by $\mathrm{\pi }\left(\mathbf{x}\right)=P\left(D=1|\mathbf{x}\right)$, where $\mathbf{x}$ is a p-dimensional vector of covariates.

In general, the propensity score $\mathrm{\pi }\left(\mathbf{x}\right)$ is unknown and needs to be estimated; it has also been shown that even if the propensity score is known, one may gain efficiency in estimating the average treatment effect (ATE) by estimating $\mathrm{\pi }\left(.\right)$ using the data available [25]. However, estimating the propensity score using a length-biased sample does not lead to a balancing score or create the desired pseudo-population in which the exposure is independent of covariates; indeed, it may induce even more bias than leaving the confounders unadjusted [19].

Assuming a logit model for the propensity score in the target population, we have $\mathrm{\pi }\left(\mathbf{x},\mathrm{\alpha }\right)=p\left({D}^{\mathrm{p}\mathrm{o}\mathrm{p}}=1|{\mathbf{X}}^{\mathrm{p}\mathrm{o}\mathrm{p}}=\mathbf{x}\right)=\frac{exp\left(\mathbf{x}\mathrm{\alpha }\right)}{1+exp\left(\mathbf{x}\mathrm{\alpha }\right)},$(2)where $\mathrm{\alpha }$ is a $p×1$ vector of parameters. Cheng and Wang [26] develop a method that consistently estimates the parameters of the propensity score from prevalent survival data. Their method requires correct specification of the conditional hazard model given the treatment and covariates. Ertefaie et al. [19] show that under assumptions A1–A3 this requirement can be removed, and propose the following estimating equation $U\left(\mathrm{\alpha }\right)=\sum _{i=1}^{n}{\mathrm{\delta }}_{i}{\mathbf{x}}_{i}^{T}\frac{\left({d}_{i}-\mathrm{\pi }\left({\mathbf{x}}_{i},\mathrm{\alpha }\right)\right)}{\stackrel{ˆ}{w}\left({y}_{i}\right)}=0,$(3)where $\stackrel{ˆ}{w}\left(y\right)={\int }_{0}^{y}{\stackrel{ˆ}{S}}_{C}\left(s\right)ds$ and ${\stackrel{ˆ}{S}}_{C}$ is the Kaplan–Meier estimator of the survivor function of the residual censoring variable C. Note that the censored individuals contribute to this estimating equation through $\stackrel{ˆ}{w}\left(y\right)$. Ertefaie et al. [19] show that $\frac{1}{n}U\left(\mathrm{\alpha }\right)=\frac{1}{n}\sum _{i=1}^{n}\left[{\mathrm{\delta }}_{i}{\mathbf{x}}_{i}^{\mathrm{T}}\frac{\left({d}_{i}-\mathrm{\pi }\left({\mathbf{x}}_{i},\mathrm{\alpha }\right)\right)}{w\left({y}_{i}\right)}+\mathcal{L}\left({y}_{i},{c}_{i},{a}_{i},{d}_{i},{\mathbf{x}}_{i}\right)\right]+{o}_{p}\left({n}^{-1/2}\right),$where $\mathcal{L}\left({y}_{i},{c}_{i},{a}_{i},{d}_{i},{\mathbf{x}}_{i}\right)$ is the augmentation element [18]. In this manuscript, we use eq. (3) to estimate the parameters of the propensity score. The term $\mathcal{L}\left({y}_{i},{c}_{i},{a}_{i},{d}_{i},{\mathbf{x}}_{i}\right)$ augments the failure time of the censored subjects using the observed failure times. We present the form of this augmentation term in Appendix B.

## 4 Cox PH models

The hazard ratio (HR) is defined as the ratio of hazards in the exposed and unexposed groups. Qin and Shen [5] introduce a set of estimating equations to assess the effect of covariates on the survival time in the presence of length-biased sampling. Our proposed estimating equation is an adaptation of the estimating equation introduced by Qin and Shen [5] (under the PH model) which adjusts for the confounding as well as length-biased sampling. We derive an estimating equation which estimates the marginal treatment effect without the need of estimating the effect for other covariates on the survival time.

Under A1–A3 and identifiability assumptions, the density of a counterfactual failure time observed in the study under exposure d can be expressed as $f\left(y\left(d\right),\mathrm{\delta }=1\right)=\int f\left(y,\mathrm{\delta }=1|D=d,\mathbf{x}\right){f}_{LB}\left(\mathbf{x}\right)d\mathbf{x}=\int \frac{{f}_{d}\left(y|\mathbf{x}\right)w\left(y\right)f\left(\mathbf{x}\right)}{{\mathrm{\mu }}_{d}}\phantom{\rule{thinmathspace}{0ex}}d\mathbf{x}=\frac{{f}_{d}\left(y\right)w\left(y\right)}{{\mathrm{\mu }}_{d}},$where ${\mathrm{\mu }}_{d}=\int t{f}_{d}\left(t\right)dt$ and ${f}_{d}\left(y\right)$ are the counterfactual densities of the survival time if all the individuals would have received the exposure d. The second equality follows as $p\left(Y\in \left(t,t+dt\right),\mathrm{\delta }=1|d,\mathbf{x}\right)=\frac{f\left(t|d,\mathbf{x}\right)w\left(t\right)dt}{{\mathrm{\mu }}_{d}\left(\mathbf{x}\right)},\phantom{\rule{1em}{0ex}}{f}_{LB}\left(\mathbf{x}\right)=\frac{{\mathrm{\mu }}_{d}\left(\mathbf{x}\right)f\left(\mathbf{x}\right)}{{\mathrm{\mu }}_{d}},$where $w\left(t\right)={\int }_{0}^{t}{S}_{C}\left(s\right)\phantom{\rule{thinmathspace}{0ex}}ds$ and ${\mathrm{\mu }}_{d}\left(\mathbf{x}\right)=\mathbb{E}\left[Y\left(d\right)|\mathbf{X}=\mathbf{x}\right]=\mathbb{E}\left[Y|D=d,\mathbf{X}=\mathbf{x}\right]$ [35].

Assuming the PH model for the counterfactual survival time, we have ${\mathrm{\lambda }}_{d}\left(t\right)={\mathrm{\lambda }}_{0}\left(t\right){e}^{{\mathrm{\beta }}_{0}d}$, and parameter ${e}^{{\mathrm{\beta }}_{0}}$ can be interpreted as a causal HR for the total effect of the treatment D. We propose the following estimating equation for ${\mathrm{\beta }}_{0}$, $U\left(\mathrm{\beta }\right)=\sum _{i=1}^{n}{\int }_{0}^{s}\frac{\mathbb{1}\left({D}_{i}={d}_{i}\right)}{p\left({D}_{i}={d}_{i}|{\mathbf{X}}_{i}\right)}\left[{D}_{i}-\frac{\sum _{j=1}^{n}\frac{{D}_{j}}{\mathrm{\pi }\left({\mathbf{X}}_{j}\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}{\sum _{j=1}^{n}\frac{\mathbb{1}\left({D}_{j}={d}_{j}\right)}{p\left({D}_{j}={d}_{j}|{\mathbf{X}}_{j}\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}\right]d{N}_{i}\left(u\right),$(4)In Appendix D, we show that eq. (4) corresponds to a score function of a pseudo-partial likelihood which can be presented as ${\mathcal{I}}_{\mathrm{\beta }}=\int \frac{\mathbb{1}\left(D=d\right)}{p\left(D=d|\mathbf{X}\right)}\left\{D-\frac{\sum _{j}^{n}\left[{d}_{j}{e}^{\mathrm{\beta }{d}_{j}}{S}_{{d}_{j}}\left(u\right)/{\mathrm{\mu }}_{{d}_{j}}\right]}{\sum _{j}^{n}\left[{e}^{\mathrm{\beta }{d}_{j}}{S}_{{d}_{j}}\left(u\right)/{\mathrm{\mu }}_{{d}_{j}}\right]}\right\}dN\left(u\right)$where ${S}_{d}\left(y\right)=\int S\left(y|d,\mathbf{x}\right)dF\left(\mathbf{x}\right)$ is the counterfactual survival function of the survival time if all the individuals would have received the exposure d. The dependence of the estimating eq. (4) on the parametrization for the propensity score is shown by defining $U\left(\mathrm{\beta },\mathrm{\alpha }\right)=\sum _{i=1}^{n}{\int }_{0}^{s}\frac{\mathbb{1}\left({D}_{i}={d}_{i}\right)}{p\left({D}_{i}={d}_{i}|{\mathbf{X}}_{i},\mathrm{\alpha }\right)}\left[{D}_{i}-\frac{\sum _{j=1}^{n}\frac{{D}_{j}}{\mathrm{\pi }\left({\mathbf{X}}_{j},\mathrm{\alpha }\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}{\sum _{j=1}^{n}\frac{\mathbb{1}\left({D}_{j}={d}_{j}\right)}{p\left({D}_{j}={d}_{j}|{\mathbf{X}}_{j},\mathrm{\alpha }\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}\right]d{N}_{i}\left(u\right),$(5)In the proof of Theorem 1 in the Appendix, we show that $U\left(\mathrm{\beta },\mathrm{\alpha }\right)$ can also be written as ${U}^{M}\left(\mathrm{\beta },\mathrm{\alpha }\right)=\sum _{i=1}^{n}{\int }_{0}^{s}\left[{D}_{i}-\frac{\sum _{j=1}^{n}\frac{{D}_{j}}{\mathrm{\pi }\left({\mathbf{X}}_{j},\mathrm{\alpha }\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}{\sum _{j=1}^{n}\frac{\mathbb{1}\left({D}_{j}={d}_{j}\right)}{p\left({D}_{j}={d}_{j}|{\mathbf{X}}_{j},\mathrm{\alpha }\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}\right]d{M}_{i}\left(u\right),$(6)where $d{M}_{i}\left(u\right)=\frac{\mathbb{1}\left({D}_{i}={d}_{i}\right)}{p\left(D={d}_{i}|{\mathbf{X}}_{i},\mathrm{\alpha }\right)}\left[d{N}_{i}\left(u\right)-{e}^{\mathrm{\beta }{d}_{i}}w\left(u\right)/w\left({Y}_{i}\right){\mathrm{\delta }}_{i}I\left({Y}_{i}>u\right)d{\mathrm{\Lambda }}_{0}\left(u\right)\right].$The stochastic process $M\left(u\right)$ can be estimated by replacing the $w\left(.\right)$ and ${\mathrm{\Lambda }}_{0}\left(.\right)$ by their estimates, $\stackrel{ˆ}{w}\left(.\right)$ and ${\stackrel{ˆ}{\mathrm{\Lambda }}}_{0}\left(.\right)$, respectively. In the proof of Theorem 1 given in the Appendix B, we show that this stochastic process has mean zero.

The following theorem addresses the asymptotic properties of the estimator $\mathrm{\beta }$ obtained by the estimating eq. (6) when $w\left(y\right)$ and $p\left(D=d|\mathbf{X}\right)$ are replaced by their estimated values $\stackrel{ˆ}{w}\left(y\right)$ and $p\left(D=d|\mathbf{X},\mathrm{\alpha }\right)$, respectively. The parameters of the propensity score can be estimated using the estimating equation given in eq. (3). Define ${M}_{C}\left(s\right)=\mathbb{1}\left(Y-Au\right)\phantom{\rule{thinmathspace}{0ex}}d{\mathrm{\Lambda }}_{C}\left(u\right),$where ${\mathrm{\Lambda }}_{C}\left(.\right)$ is the cumulative hazard function of the censoring variable. The stochastic process ${M}_{C}\left(s\right)$ can be estimated by replacing the ${\mathrm{\Lambda }}_{C}\left(.\right)$ by its estimate, ${\stackrel{ˆ}{\mathrm{\Lambda }}}_{C}\left(.\right)$. The stochastic process ${M}_{C}\left(s\right)$ has mean zero, $\begin{array}{rl}\mathbb{E}\left[{M}_{C}\left(s\right)\right]& =\mathbb{E}\left[\mathbb{1}\left(Cu\right).1\left(C>u\right)\right]\phantom{\rule{thinmathspace}{0ex}}d{\mathrm{\Lambda }}_{C}\left(u\right)\\ & ={\int }_{0}^{s}{S}_{C}\left(u\right){\mathrm{\lambda }}_{C}\left(u\right){S}_{R}\left(u\right)du-{\int }_{0}^{s}{S}_{C}\left(u\right){S}_{R}\left(u\right)d{\mathrm{\Lambda }}_{C}\left(u\right)=0,\end{array}$where ${S}_{R}\left(u\right)$ is the survival function of the residual life time.

Theorem 1 Let ${\stackrel{ˆ}{\mathrm{\beta }}}^{\mathrm{C}\mathrm{o}\mathrm{x}}$ be the exposure effect estimator obtained as the root of $\stackrel{ˆ}{U}\left(\mathrm{\beta },\mathrm{\alpha }\right)=\sum _{i=1}^{n}{\int }_{0}^{s}\frac{\mathbb{1}\left({D}_{i}={d}_{i}\right)}{p\left({D}_{i}={d}_{i}|{\mathbf{X}}_{i},\stackrel{ˆ}{\mathrm{\alpha }}\right)}\left[{D}_{i}-\frac{\sum _{j=1}^{n}\frac{{D}_{j}}{\mathrm{\pi }\left({\mathbf{X}}_{j},\stackrel{ˆ}{\mathrm{\alpha }}\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/\stackrel{ˆ}{w}\left({Y}_{j}\right)}{\sum _{j=1}^{n}\frac{\mathbb{1}\left({D}_{j}={d}_{j}\right)}{p\left({D}_{j}={d}_{j}|{\mathbf{X}}_{j},\stackrel{ˆ}{\mathrm{\alpha }}\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/\stackrel{ˆ}{w}\left({Y}_{j}\right)}\right]d{N}_{i}\left(u\right).$(7)Then under regularity conditions $C.1-C.4$, $C.5.a$ and $C.5.c$ listed in Appendix A, $\sqrt{n}\left({\stackrel{ˆ}{\mathrm{\beta }}}^{\mathrm{C}\mathrm{o}\mathrm{x}}-\mathrm{\beta }\right)\stackrel{d}{\to }\mathcal{N}\left(0,\mathrm{\zeta }\left(\mathrm{\beta },\mathrm{\alpha }\right)\right),$where $\mathrm{\zeta }\left(\mathrm{\beta },\mathrm{\alpha }\right)$ is defined in Appendix B. Also, the estimating function $\stackrel{ˆ}{U}\left(\mathrm{\beta },\mathrm{\alpha }\right)$ converges in probability to ${U}^{\ast }={\int }_{0}^{s}\left[D-\frac{\mathbb{E}\left\{{S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)\right\}}{\mathbb{E}\left\{{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)\right\}}\right]dM\left(u\right)+{\int }_{0}^{s}\frac{{H}_{1}\left(\mathrm{\beta },u\right)}{{S}_{C}\left(u\right){S}_{R}\left(u\right)}d{M}_{C}\left(u\right),$(8)where ${H}_{1}\left(\mathrm{\beta },u\right)$ and ${S}^{d}\left(\mathrm{\beta },\mathrm{\alpha },u\right)$ for $d=0,1$ are defined in Appendix B.

Proof See Appendix B.

In the absence of length-biased sampling, augmented partial likelihood estimators have been proposed in Robins et al. [27] and van der Laan and Robins [28]. The function ${U}^{\ast }$ in eq. (8) generalizes this idea to length-biased sampling settings. Note the second part of the summation in ${U}^{\ast }$ is the augmentation element.

Remark: Parameter $\mathrm{\beta }$ measures the marginal association between the exposure and the hazard, which is not necessarily equal to the conditional association due to non-collapsibility.

## 5 Accelerated failure time models

Inspired by the AFT models introduced by Cox and Oakes [29], we consider a general form of AFT models, where we do not assume a known error distribution. Assuming the AFT model for the counterfactual survival time, we have $log\left({T}^{\mathrm{p}\mathrm{o}\mathrm{p}}\left(d\right)\right)={\mathrm{\beta }}_{0}d+\mathrm{\epsilon },$and the parameter ${\mathrm{\beta }}_{0}$ can be interpreted as a total treatment effect. Under causal identifiability assumptions and by the balancing property of the propensity score, the above model can be written in terms of the observed data as follows $log\left({T}^{\mathrm{p}\mathrm{o}\mathrm{p}}\right)={\mathrm{\beta }}_{0}D+{\mathrm{\gamma }}_{0}\mathrm{\pi }\left(\mathbf{X}\right)+\mathrm{\epsilon }.$(9)We refer to this model as the AFT propensity score regression (AFTPSR) model [30]. Higher order and interaction terms can also be included in the model if needed. While AFT models may suffer from lack of robustness with respect to the log transformation, they are often more interpretable [31].

## 5.1 AFT-weighted estimating equations

Another approach for correcting the bias induced by non-random assignment was suggested by Horvitz and Thompson [32] and Hájek and Dupač [33] who introduced estimators which weight the observed outcomes. The IPTW estimator adjusts for confounding by assigning a weight to each individual proportional to their chance of receiving the exposure they actually received [34, 35].

We generalize the IPTW estimator to account for length-biased sampling. In our setting, the weights are the reciprocal of the probability of being in the exposure group to which each individual is observed to belong. The estimating equation corresponding to IPTW is given by ${U}_{\mathrm{I}\mathrm{P}\mathrm{T}\mathrm{W}}^{\mathrm{A}\mathrm{F}\mathrm{T}}={\mathbb{P}}_{n}\frac{\mathrm{\delta }}{w\left(Y\right)}\left[\frac{Dlog\left(Y\right)}{\mathrm{\pi }}-\frac{\left(1-D\right)log\left(Y\right)}{\left(1-\mathrm{\pi }\right)}\right]=0.$(10)where ${\mathbb{P}}_{n}$ is the empirical average. This is a version of the complete case influence function introduced by Tsiatis [36] modified to take into account the censoring weight $w\left(Y\right)$.

Augmented IPTW (AIPTW), which is a more efficient version of IPTW, was introduced by Scharfstein et al. [37] and Lipsitz et al. [38]. Let ${\mathrm{\mu }}_{d}\left(\mathbf{x},\mathrm{\theta }\right)=\mathbb{E}\left[log\left(T\right)|D=d,\mathbf{X}=\mathbf{x}\right]$ for $d=0,1$. The corresponding estimating equation is given by ${U}_{\mathrm{A}\mathrm{I}\mathrm{P}\mathrm{T}\mathrm{W}}^{\mathrm{A}\mathrm{F}\mathrm{T}}={\mathbb{P}}_{n}\frac{\mathrm{\delta }}{w\left(Y\right)}\left[m\left(\mathbf{X},D,Y\right)-{\mathrm{\beta }}_{0}\right]+\underset{0}{\overset{s}{\int }}\frac{\mathrm{\kappa }\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}=0,$(11)where $\stackrel{ˆ}{\mathrm{\kappa }}\left(t\right)={\mathbb{P}}_{n}\left[\frac{\mathrm{\delta }\mathbb{1}\left(Y>t\right)\left[m\left(\mathbf{X},D,Y\right)\right]{\int }_{t}^{Y}{S}_{C}\left(v\right)\phantom{\rule{thinmathspace}{0ex}}dv}{{w}^{2}\left(Y\right)}\right].$and $m\left(\mathbf{X},D,Y\right)=\frac{D\left[log\left(Y\right)-{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)}-\frac{\left(1-D\right)\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(1-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)}+{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right).$The causal effect estimator corresponding to an influence function in ${U}_{\mathrm{A}\mathrm{I}\mathrm{P}\mathrm{T}\mathrm{W}}^{\mathrm{A}\mathrm{F}\mathrm{T}}$ is called a double robust (DR) estimator in the sense that the estimator is consistent if either the propensity score model or the conditional response mean model is correctly specified [28, 36, 39, 40]. The influence function (11) is a member of the class of AIPTW influence functions and it has been shown that it is more efficient than eq. (10) [18]. In the proof of Theorem 2, we show how ${U}_{\mathrm{A}\mathrm{I}\mathrm{P}\mathrm{T}\mathrm{W}}^{\mathrm{A}\mathrm{F}\mathrm{T}}$ has been derived.

## 5.2 Asymptotic properties of the WEE estimator

Theorem 2 presents the asymptotic properties of the DR treatment effect estimator obtained by eq. (11) in the presence of length-biased sampling using the AFT models when both the treatment assignment and $w\left(.\right)$ are replaced by their estimated values.

Theorem 2 Let ${\stackrel{ˆ}{\mathrm{\beta }}}_{DR}^{AFT}$ be a DR estimator corresponding to ${U}_{AIPTW}^{AFT}$. Then under regularity conditions $C.1-C.4$ and $C.5.b$, $\sqrt{n}\left({\stackrel{ˆ}{\mathrm{\beta }}}_{DR}^{AFT}-\mathrm{\beta }\right)\stackrel{d}{\to }\mathcal{N}\left(0,\mathrm{\eta }\left(\mathrm{\theta },\mathrm{\alpha }\right)\right),$ where $\mathrm{\eta }\left(\mathrm{\theta },\mathrm{\alpha }\right)$ is defined in the Appendix B.

Proof See Appendix B.

## 6 Simulation studies

We examine the performance of the proposed estimating equations for the Cox and the AFT models. In both cases, we simulate 1,000 datasets consisting of 200, 400 and 800 observations to study the performance of the proposed estimating equations for estimating the unmediated causal effect. Here, the censoring variable C is generated from a uniform distribution in the interval $\left(0,\mathrm{\tau }\right)$ where the parameter $\mathrm{\tau }$ is set such that it results in a desired censoring proportion. To create length-biased samples, we generate a variable A from a uniform distribution $\left(0,\mathrm{\rho }\right)$ and ignore those whose generated unbiased failure time is less than A.

## 6.1 Cox model

We generated the population failure times from the hazard model $h\left(t|d,\mathbf{x}\right)=0.2exp\left\{d-{x}_{1}-{x}_{2}+d{x}_{1}\right\}$, where $D\phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}\mathrm{B}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{l}\mathrm{i}\left(\mathrm{e}\mathrm{x}\mathrm{p}\mathrm{i}\mathrm{t}\left\{-1+3{x}_{1}-1{x}_{2}\right\}\right)$ with ${X}_{1}$ uniformly distributed on (0,1), ${X}_{2}\phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}\mathrm{B}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{l}\mathrm{i}\left(0.5\right)$, where $\mathrm{e}\mathrm{x}\mathrm{p}\mathrm{i}\mathrm{t}\left(t\right)={e}^{t}/\left(1+{e}^{t}\right)$. The true marginal treatment effect computed by Monte Carlo is ${\mathrm{\theta }}_{0}=1.385$. We consider three different unadjusted scenarios: Unadjusted${}^{lc}$ is an estimator for which neither the length-biased nor the confounding is adjusted, Unadjusted${}^{c}$ is obtained by adjusting for the length-biased sampling but leaving the confounding unadjusted, and Unadjusted${}^{l}$ is carried out by adjusting for the confounding while the length-biased sampling is left unadjusted. The estimating equations for these unadjusted cases are listed in Appendix E.

Table 1 summarizes the marginal estimated treatment effects and their standard errors. Our simulation results confirm that the proposed estimating eq. (7) under Cox model assumption adjusts for both confounding and the length-biased sampling and results in smaller MSE across different sample sizes.

Table 1:

Cox proportional hazard model simulation study.

## 6.2 AFT model estimation

We consider a nonlinear failure time model and include the exposure effect modifier by adding the interaction term between the treatment and a confounder (${x}_{2}$) as follows, $log\left({T}^{\mathrm{p}\mathrm{o}\mathrm{p}}\right)=2.5d+\frac{{x}_{2}}{1+2{x}_{2}+{x}_{1}}+exp\left\{-{x}_{1}/2\right\}-3d{x}_{2}+\mathrm{\epsilon },$where $\mathrm{\epsilon }$ is uniformly distributed on (–1,1), ${X}_{1}$ is uniformly distributed on (0,1), ${X}_{2}\sim \mathrm{B}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{l}\mathrm{i}\left(0.5\right)$, and $D\sim \mathrm{B}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{l}\mathrm{i}\left(\mathrm{e}\mathrm{x}\mathrm{p}\mathrm{i}\mathrm{t}\left\{2-{x}_{1}-3{x}_{2}\right\}\right)$. The estimated treatment effects and their standard errors are listed in Table 2. Similar to the previous section, we consider three different unadjusted scenarios. We have used a correct conditional mean model in the DR estimating equation. The DR estimator dominates the two other estimators in terms of the standard deviation and the MSE. Increasing the censoring proportion increases the bias in the PSR, IPTW and DR estimators while maintaining the unbiasedness. All the unadjusted estimators are biased and in our parameter setting it seems that the failure to account for the length-biased sampling leads to a more biased estimator compared to the Unadjusted${}^{c}$. The estimating equations for the unadjusted cases are listed in Appendix E.

Table 2:

Accelerated failure time simulation study.

## 7 Real data analysis: the Canadian study of health and aging

The CSHA, initiated in 1989, is a nationwide study on aging in Canada. One of the objectives of CSHA was to study dementia. The CSHA included three phases in 1991, 1996 and 2001. In the first phase, 10,263 individuals aged 65 or over were sampled at random across Canada, from both rural and urban areas, from communities and institutions for the elderly. Among the participants, 1,132 people were diagnosed with dementia. The ages of dementia onset were assessed from each individual’s medical history. We analyze the data collected during the first phase of the study which began in 1991 by sampling prevalent cases and examining the types of dementia: probable Alzheimer’s disease, possible Alzheimer’s disease and vascular dementia. The age of death or censoring were recorded for each subject from the time of screening, while the age at onset was ascertained retrospectively using CAMDEX from caregivers (Wolfson et al. [20]). Gender, level of education and the types of dementia are available as baseline covariates. The timescale for survival is set in years.

Table 3:

CSHA data analysis (semiparametric AFT): estimation of the institutionalization effect on the survival time.

## 7.1 Exposure of interest: institutionalization

One of the collected covariates is the dichotomous institutionalization (exposure) indicator, which takes the value one if the subject is institutionalized at the time of sampling, and zero otherwise. We are interested in comparing survival of institutionalized subjects with dementia and subjects recruited from the community.

Since there are some covariates which confound the effect of the exposure on the survival time, the crude difference estimator will be biased. We estimate the effect of this covariate while having confounding and length-biased sampling as two sources of estimation bias using Cox PH, and semiparametric AFT models. Our data include 818 subjects (after excluding patients with missing information), of which 180 subjects were right censored [20]. The validity of the stationarity assumption has been shown to be reasonable by Addona and Wolfson [41] and Asgharian et al. [42].

In order to estimate the causal effect of institutionalization, we need to ignore those individuals that their date of institutionalization is after their onset of the disease. However this information was not recorded in the dataset. We address this limitation using a multiple imputation approach to generate synthetic data on which the estimating equations can be used. Using an informed model, we generate a binary variable, Z, conditional on the age at onset, ${X}_{1}$, and the gender, ${X}_{2}$, that attempts to reveal whether institutionalization occurred prior to onset. Specifically, we used the model $\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\left(Z=\mathbb{1}|{x}_{1},{x}_{2}\right)=$ $-2+2\mathbb{1}\left({x}_{1}>85\right)+0.65\mathbb{1}\left(74<{x}_{1}<84\right)+0.15{x}_{2}$ to generate Z, and then ignored patients with $Z=0$, i.e. those patients that whose date of institutionalization is after their onset time. We parametrized the above model such that older patients and females, ${x}_{2}=1$, have more chance to be institutionalized before the onset of dementia. The value of the parameters are extracted from Carrière and Pelletier [43]. These authors estimate the relationship between sociodemographic characteristics and institutionalization of citizens of Canada. One of the limitations of our logistic model for Z is that we do not have all the covariates that are used in Carriere and Pelletier [43] such as income and marital status. We use the above model to fill the missing variable repeatedly and create a collection of 20 imputed data sets [44].

Table 4:

CSHA data analysis (Cox model): estimation of the institutionalization effect on the survival time.

## 7.2 Semiparametric AFT models

We have estimated the institutionalization effect on survival time using the semiparametric estimating equation proposed in Section 5. Table 3 presents the estimated institution effect using the semiparametric estimating equations proposed in Section 5 under the AFT model. PSR is the estimator based on eq. (9), AWE is the weighted estimator based on eq. (10) and DR is the estimator based on eq. (11) described in Theorem 3. We consider three different unadjusted scenarios: Unadjusted${}^{lc}$, Unadjusted${}^{c}$ and Unadjusted${}^{l}$. The results reveal that the institutionalization have a significant positive effect on the survival time when estimated using AWE and PSR while it has a positive effect at the $10\mathrm{%}$ level using DR estimator. The unadjusted estimator shows a small negative effect. In other words, without adjusting for either the length-biased sampling or the treatment adjustment, we might incorrectly conclude that institutionalized subjects tend to have a shorter survival time.

## 7.3 Cox PH model

Although the residual analysis shows that AFT is a suitable model for this data set (Bergeron et al. [3], we have also estimated the marginal institutionalization effect using the weighted estimating equation proposed for Cox models (Table 4). The proposed estimating equation for Cox models can be fitted using standard software, and equivalent to the following command in R for the observed subset of data, ${\mathrm{\delta }}_{i}=1$ for $i=1,...,m$, $coxph\left(Surv\left(y,del\right)\sim d+offset\left(-log\left(hatwy\right)\right),weights=wpi,subset=\left(del==1\right)\right)$where $y$ is the observed survival time, wpi is ${\stackrel{ˆ}{\mathrm{\varpi }}}_{c}=d/\stackrel{ˆ}{\mathrm{\pi }}+\left(1-d\right)/\left(1-\stackrel{ˆ}{\mathrm{\pi }}\right)$ and hatwy is $\stackrel{ˆ}{w}\left(y\right)$ is the Kaplan–Meier estimate for the distribution of the censoring variable. In this parameterization, the coefficients estimated indicate the increase/decrease in the hazard while in the AFT model coefficients indicate an decrease/increase in the survival time, and hence the opposite sign of the coefficients in the AFT and PH model have the same interpretation. To determine whether a fitted Cox model adequately describe the data, we looked at the scaled Schoenfeld residuals plot, Figure 2, for the Cox model. There appears to be a trend in the scaled Schoenfeld residuals for the institution indicator variable which indicates violation of the assumption of PH.

Figure 2:

CSHA data: the scaled Schoenfeld residuals obtained by Cox model against transformed time for the estimated institutionalization effect.

Figure 3:

CSHA data. Survival curves for adjusted and three different unadjusted scenarios. Unadjusted${}^{lc}$: neither the length biased nor the confounding are adjusted for. Unadjusted ${}^{c}$: The length biased is adjusted whereas confounding left unadjusted. Unadjusted${}^{l}$: The confounding is adjusted whereas the length biased left unadjusted.

Figure 4:

CSHA data. Adjusted (thick lines) and unadjusted (thin lines) survival curves for treated and untreated individuals.

## 7.4 Survival curves

We compute adjusted and unadjusted survival curves to compare survival with dementia in the course of time between the exposure groups (Figure 3). Several methods have been proposed to adjust for the length-biased sampling such as the nonparametric maximum likelihood estimator [4547], the truncation product-limit estimator [48] and the maximum pseudo-partial likelihood estimator [49]. Here, we use the method introduced by Huang and Qin [50] which incorporates the information from the marginal distribution of the truncation time from disease onset to recruitment time. The bias induced by confounding can be adjusted by creating a pseudo-population using the inverse probability of being in the group that the individuals actually belong to [5153]. The adjusted survival curves show that the institutionalized patients tend to live longer while the survival curves cross when unadjusted (Figure 3). Moreover, leaving the length-biased sampling unadjusted may lead to overestimate the survival times which is shown in Figure 4. This figure clearly depicts that the survival curve of the institutionalized individuals is always higher than those recruited from the community.

## 8 Concluding remarks

We have presented two different approaches to estimate the exposure effect from right-censored length-biased samples. The estimating equations adjust for two different types of bias at the same time. Our simulation and real data analysis results highlight the importance of adjusting for the two sources of bias; failure to adjust for either the length-biased sampling or the confounding may lead to misleading results.

We have focused on the stationary case. It would, however, be of interest to extend the method to the general left truncation where the left truncation distribution is unknown. This latter approach is robust against departure from stationarity, though it is less efficient when the stationarity assumption holds [45, 46, 54].

## Appendix

Here, we present the assumptions and proofs of the main and other auxiliary results.

## Appendix A

The regularity conditions required for the Cox and the weighted AFT models:

• C.1 ${\mathrm{\mu }}_{d}\left(.\right)$ for $d=0,1$ is a twice continuously differentiable function.

• C.2 $\mathrm{\pi }\left(.\right)$ is bounded away from zero and one ($\mathrm{\gamma }<\mathrm{\pi }\left(.\right)<1-\mathrm{\gamma }$ where $\mathrm{\gamma }>0$).

• C.3 $\mathrm{s}\mathrm{u}\mathrm{p}\left[t:p\left(R>t\right)>0\right]\le \mathrm{s}\mathrm{u}\mathrm{p}\left[t:p\left(C>t\right)>0\right]=s\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}p\left(\mathrm{\delta }=1\right)>0$.

• C.4 ${\int }_{0}^{s}\left[{\left({\int }_{t}^{s}{S}_{C}\left(v\right)dv\right)}^{2}/\left({S}_{C}^{2}\left(t\right){S}_{v}\left(t\right)\right)\right]d{S}_{C}\left(t\right)<\infty .$.

• C.5

• (a)

${\int }_{0}^{s}{\mathrm{\tau }}_{1}^{2}\left(t\right)/\left({S}_{C}^{2}\left(t\right){S}_{R}\left(t\right)\right)d{S}_{C}\left(t\right)<\mathrm{\infty }$ and ${\int }_{0}^{s}{\mathrm{\tau }}^{2}\left(t\right)/\left({S}_{C}^{2}\left(t\right){S}_{R}\left(t\right)\right)d{S}_{C}\left(t\right)<\mathrm{\infty }$

• (b)

${\int }_{0}^{s}{\mathrm{\kappa }}_{d}^{2}\left(t\right)/\left({S}_{C}^{2}\left(t\right){S}_{R}\left(t\right)\right)d{S}_{C}\left(t\right)<\mathrm{\infty }$ and ${\int }_{0}^{s}{\mathrm{\kappa }}_{2}^{2}\left(t\right)/\left({S}_{C}^{2}\left(t\right){S}_{R}\left(t\right)\right)d{S}_{C}\left(t\right)<\mathrm{\infty }$

• (c)

${\int }_{0}^{s}{v}^{2}\left(t\right)/\left({S}_{C}^{2}\left(t\right){S}_{R}\left(t\right)\right)d{S}_{C}\left(t\right)<\mathrm{\infty }$

where ${\mathrm{\tau }}_{1}\left(t\right)=\mathbb{E}\left[\frac{D\mathrm{\delta }{e}^{\mathrm{\beta }D}\mathbb{1}\left(Y>t\right){\int }_{t}^{Y}{S}_{C}\left(v\right)dv}{p\left(D=1|\mathbf{X}\right){w}^{2}\left(Y\right)}\right],$ $\mathrm{\tau }\left(t\right)=\mathbb{E}\left[\frac{\mathrm{\delta }{e}^{\mathrm{\beta }D}\mathbb{1}\left(Y>t\right){\int }_{t}^{Y}{S}_{C}\left(v\right)dv}{p\left(D=d|\mathbf{X}\right){w}^{2}\left(Y\right)}\right],$ ${\kappa }_{d}\left(t\right)=\mathbb{E}\left[\frac{\mathbb{1}\left(D=d\right)\delta \mathbb{1}\left(Y>t\right)\left[\mathrm{log}\left(Y\right)-{\mu }_{d}\left(\pi ,\theta \right)\right]{\int }_{t}^{Y}{S}_{C}\left(v\right)dv}{p\left(D=d|X\right){w}^{2}\left(Y\right)}\right],$ ${\mathrm{\kappa }}_{2}\left(t\right)=\mathbb{E}\left[\frac{\mathrm{\delta }\phantom{\rule{thinmathspace}{0ex}}\mathbb{1}\left(Y>t\right)\left[{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)-\mathrm{\beta }\right]{\int }_{t}^{Y}{S}_{C}\left(v\right)\phantom{\rule{thinmathspace}{0ex}}dv}{p\left(D=d|\mathbf{X}\right){w}^{2}\left(Y\right)}\right],$ $v\left(u\right)=\mathbb{E}\left[\frac{\mathrm{\delta }\mathbf{X}\left(D-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)\mathbb{1}\left(Y>u\right){\int }_{u}^{Y}{S}_{C}\left(v\right)dv}{{w}^{2}\left(Y\right)}\right].$Condition C.1 is a smoothness assumption of the mean function. C.1–C.2 are termed positivity assumptions, meaning that there is a positive chance that a subject falls in either the treatment or the control groups and being not censored, respectively. C.3 is an identifiability condition [54] and C.4–C.5 are required to obtain an estimator with a finite variance.

## Proof of Theorem 1

First, we show that the stochastic process $M\left(s\right)$ has a mean of zero: ${M}_{i}\left(s\right)=\frac{\mathbb{1}\left({D}_{i}=d\right)}{p\left({D}_{i}=d|{\mathbf{X}}_{i},\mathrm{\alpha }\right)}\left[{N}_{i}\left(s\right)-{\int }_{0}^{s}{e}^{\mathrm{\beta }{D}_{i}}w\left(y\right)/w\left({Y}_{i}\right){\mathrm{\delta }}_{i}I\left({Y}_{i}>y\right)d{\mathrm{\Lambda }}_{0}\left(y\right)\right].$Using eqs (8), (10) and the no unmeasured confounder assumption $\mathbb{E}\left[{N}_{i}\left(s\right)\right]=\mathbb{E}\left[\frac{\mathbb{1}\left({D}_{i}=d\right)}{p\left({D}_{i}=d|{\mathbf{X}}_{i},\mathrm{\alpha }\right)}1\left({Y}_{i}and $\begin{array}{rl}& \mathbb{E}\left[{\int }_{0}^{s}\frac{\mathbb{1}\left({D}_{i}=d\right)}{p\left({D}_{i}=d|{\mathbf{X}}_{i},\mathrm{\alpha }\right)}{e}^{\mathrm{\beta }{D}_{i}}w\left(y\right)/w\left({Y}_{i}\right){\mathrm{\delta }}_{i}I\left({Y}_{i}>y\right)d{\mathrm{\Lambda }}_{0}\left(y\right)\right]\\ & \phantom{\rule{1em}{0ex}}=\mathbb{E}\left[{\int }_{0}^{s}{e}^{\mathrm{\beta }d}w\left(y\right)/w\left({Y}_{i}\right){\mathrm{\delta }}_{i}I\left({Y}_{i}\left(d\right)>y\right)d{\mathrm{\Lambda }}_{0}\left(y\right)\right]={\int }_{0}^{s}{e}^{\mathrm{\beta }d}w\left(y\right)\frac{{S}_{d}\left(y\right)}{{\mathrm{\mu }}_{d}}d{\mathrm{\Lambda }}_{0}\left(y\right)\end{array}$ Thus, $\mathbb{E}\left[{M}_{i}\left(s\right)\right]=0$.

In Section 5, we have shown that the estimating equation ${U}^{M}\left(\mathrm{\beta },\mathrm{\alpha }\right)$ given by eq. (13) is unbiased. We need to show that the two representations (12) and (13) are equal. By the definition of the stochastic process ${M}_{i}\left(s\right)$, we have $\begin{array}{rl}{U}^{M}\left(\mathrm{\beta },\mathrm{\alpha }\right)& =\sum _{i=1}^{n}{\int }_{0}^{s}\left[{D}_{i}-\frac{{S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },t\right)}{{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },t\right)}\right]d{M}_{i}\left(u\right)\\ & =U\left(\mathrm{\beta },\mathrm{\alpha }\right)-{\int }_{0}^{s}\sum _{i=1}^{n}\left[\frac{{D}_{i}}{\mathrm{\pi }\left({\mathbf{X}}_{i},\mathrm{\alpha }\right)}w\left(u\right){e}^{\mathrm{\beta }{D}_{i}}{\mathrm{\delta }}_{i}\mathbb{1}\left({Y}_{i}\ge u\right)/w\left({Y}_{i}\right)\right]-n\frac{{S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },t\right)}{{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },t\right)}{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },t\right)d{\mathrm{\Lambda }}_{0}\left(u\right)\\ & =U\left(\mathrm{\beta },\mathrm{\alpha }\right)\end{array}$where ${S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)={n}^{-1}\sum _{j=1}^{n}\frac{{D}_{j}}{\mathrm{\pi }\left({\mathbf{X}}_{j},\mathrm{\alpha }\right)}w\left(u\right){e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right),$ ${S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)={n}^{-1}\sum _{j=1}^{n}\frac{1\left({D}_{j}={d}_{j}\right)}{p\left({D}_{j}={d}_{j}|{\mathbf{X}}_{j},\mathrm{\alpha }\right)}w\left(u\right){e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right).$We have the following estimating equation when $w\left(y\right)$ is replaced by its estimate, $\stackrel{ˆ}{w}\left(y\right)$, ${\stackrel{ˆ}{U}}^{M}\left(\mathrm{\beta },\mathrm{\alpha }\right)=\sum _{i=1}^{n}{\int }_{0}^{s}\left[{D}_{i}-\frac{{\stackrel{ˆ}{S}}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}{{\stackrel{ˆ}{S}}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}\right]d{N}_{i}\left(u\right).$where ${\stackrel{ˆ}{S}}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)={n}^{-1}\sum _{j=1}^{n}\frac{{D}_{j}}{\mathrm{\pi }\left({\mathbf{X}}_{j},\mathrm{\alpha }\right)}\stackrel{ˆ}{w}\left(u\right){e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/\stackrel{ˆ}{w}\left({Y}_{j}\right),$ ${\stackrel{ˆ}{S}}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)={n}^{-1}\sum _{j=1}^{n}\frac{\mathbb{1}\left({D}_{j}={d}_{j}\right)}{p\left({D}_{j}={d}_{j}|{\mathbf{X}}_{j},\mathrm{\alpha }\right)}\stackrel{ˆ}{w}\left(u\right){e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/\stackrel{ˆ}{w}\left({Y}_{j}\right).$The estimating equation ${\stackrel{ˆ}{U}}^{M}\left(\mathrm{\beta },\mathrm{\alpha }\right)$ can be written as $\begin{array}{rl}& \sum _{i=1}^{n}{\int }_{0}^{s}\left[{D}_{i}-\frac{\mathbb{E}\left\{{S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)\right\}}{\mathbb{E}\left\{{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)\right\}}\right]d{M}_{i}\left(u\right)+\sum _{i=1}^{n}{\int }_{0}^{s}\left[\frac{{S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}{{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}-\frac{{\stackrel{ˆ}{S}}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}{{\stackrel{ˆ}{S}}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}\right]d{N}_{i}\left(u\right)+{o}_{p}\left(1\right).\end{array}$Using the strong consistency of $\stackrel{ˆ}{w}\left(y\right)$ to $w\left(y\right)$ [55], we have $\frac{1}{\stackrel{ˆ}{w}\left({Y}_{j}\right)}=\frac{1}{w\left({Y}_{j}\right)}\left[1+\frac{w\left({Y}_{j}\right)-\stackrel{ˆ}{w}\left({Y}_{j}\right)}{w\left({Y}_{j}\right)}\right]+{o}_{p}\left(1\right),$(12)and following the martingale integral representation $\sqrt{n}\left(\stackrel{ˆ}{w}\left(Y\right)-w\left(Y\right)\right)$ introduced by Shen et al. [4] and Qin and Shen [5], we can show that ${\int }_{0}^{s}\left[\frac{{S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}{{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}-\frac{{\stackrel{ˆ}{S}}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}{{\stackrel{ˆ}{S}}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)}\right]d{N}_{i}\left(u\right)={\int }_{0}^{s}\frac{{H}_{1}\left(\mathrm{\beta },u\right)}{{S}_{C}\left(u\right){S}_{R}\left(u\right)}d{M}_{Ci}\left(u\right)+{o}_{p}\left(1\right),$where ${H}_{1}\left(\mathrm{\beta },u\right)={\mathbb{E}}_{{Y}^{\ast }}\left[\frac{\mathbb{E}\left[\mathrm{\tau }\left(Y,u\right)\mathbb{1}\left(Y>{y}^{\ast }\right)\right]}{\mathbb{E}\left[{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },{y}^{\ast }\right)\right]}|{Y}^{\ast }={y}^{\ast }\right]$ with $\mathrm{\tau }\left(Y,u\right)=\frac{D\mathrm{\delta }w\left({y}^{\ast }\right){e}^{\mathrm{\beta }D}\mathbb{1}\left(Y>u\right){\int }_{u}^{Y}{S}_{C}\left(v\right)dv}{\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right){w}^{2}\left(Y\right)},$where ${Y}^{\ast }$ is independent of and identically distributed to Y. Now we can derive the asymptotic variance of our proposed estimator when $\mathrm{\alpha }$ is replaced by $\stackrel{}{\stackrel{ˆ}{\mathrm{\alpha }}}$ in the propensity score model. Note $\begin{array}{rl}\stackrel{˜}{\mathrm{\psi }}\left(\mathrm{\alpha }\right)& =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\stackrel{˜}{\mathrm{\psi }}}_{i}\left(\mathrm{\alpha }\right)=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\mathrm{\delta }}_{i}{\mathbf{X}}_{i}\frac{{D}_{i}-\mathrm{\pi }\left({\mathbf{X}}_{i},\mathrm{\alpha }\right)}{\stackrel{ˆ}{w}\left({Y}_{i}\right)}\\ & =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\mathrm{\delta }}_{i}{\mathbf{X}}_{i}\frac{{D}_{i}-\mathrm{\pi }\left({\mathbf{X}}_{i},\mathrm{\alpha }\right)}{w\left({Y}_{i}\right)}\left[1+\frac{w\left({Y}_{i}\right)-\stackrel{ˆ}{w}\left({Y}_{i}\right)}{w\left({Y}_{i}\right)}\right]+{o}_{p}\left(1\right)\\ & =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\left[{\mathrm{\delta }}_{i}{\mathbf{X}}_{i}\frac{{D}_{i}-\mathrm{\pi }\left({\mathbf{X}}_{i},\mathrm{\alpha }\right)}{w\left({Y}_{i}\right)}+{\int }_{0}^{s}\frac{\stackrel{ˆ}{v}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right],\end{array}$(13)where $\stackrel{ˆ}{v}\left(t\right)=\frac{1}{n}\sum _{i=1}^{n}\left[\frac{{\mathrm{\delta }}_{i}I\left({Y}_{i}>t\right){\mathbf{X}}_{i}\left[{D}_{i}-\mathrm{\pi }\left({\mathbf{X}}_{i},\mathrm{\alpha }\right)\right)\right]{\int }_{t}^{{Y}_{i}}{S}_{C}\left(v\right)dv}{{w}^{2}\left({Y}_{i}\right)}\right].$Hence, using the Taylor expansion and Theorem 1 in Pugh et al. [14], $\mathrm{\zeta }\left(\mathrm{\beta },\mathrm{\alpha }\right)={\left[\mathbb{E}\left\{\frac{\mathrm{\partial }{U}^{\ast }}{\mathrm{\partial }\mathrm{\beta }}\right\}\right]}^{-2}\left[\mathbb{E}\left\{{U}^{\ast 2}\right\}-\mathbb{E}\left\{{U}^{\ast }{\stackrel{˜}{\mathrm{\psi }}}_{i}\left(\mathrm{\alpha }{\right)}^{\prime }\right\}\mathbb{E}\left\{{\stackrel{˜}{\mathrm{\psi }}}_{i}\left(\mathrm{\alpha }\right){\stackrel{˜}{\mathrm{\psi }}}_{i}\left(\mathrm{\alpha }{\right)}^{\prime }{\right\}}^{-1}\mathbb{E}\left\{{\stackrel{˜}{\mathrm{\psi }}}_{i}\left(\mathrm{\alpha }\right){{U}^{\prime }}^{\ast }\right\}\right],$where ${U}^{\ast }={\int }_{0}^{s}\left[D-\frac{\mathbb{E}\left\{{S}^{1}\left(\mathrm{\beta },\mathrm{\alpha },u\right)\right\}}{\mathbb{E}\left\{{S}^{0}\left(\mathrm{\beta },\mathrm{\alpha },u\right)\right\}}\right]dM\left(u\right)+{\int }_{0}^{s}\frac{{H}_{1}\left(\mathrm{\beta },u\right)}{{S}_{C}\left(u\right){S}_{R}\left(u\right)}d{M}_{C}\left(u\right).$

Note: The asymptotic variance $\mathrm{\zeta }\left(\mathrm{\beta },\mathrm{\alpha }\right)$ may be estimated consistently by replacing the expectations in the expressions for ${U}^{\ast }$, ${\mathrm{\tau }}_{1}$ and $\mathrm{\tau }$ with expectations with respect to the empirical measure.

## Proof of Theorem 2

Following eq. (12) and using the martingale integral representation $\sqrt{n}\left(\stackrel{ˆ}{w}\left(Y\right)-w\left(Y\right)\right)$, we have $\frac{\mathrm{\delta }}{\stackrel{ˆ}{w}\left(Y\right)}\left[m\left(\mathbf{X},D,Y\right)-{\mathrm{\beta }}_{0}\right]=\frac{\mathrm{\delta }}{w\left(Y\right)}\left[m\left(\mathbf{X},D,Y\right)-{\mathrm{\beta }}_{0}\right]+{\int }_{0}^{s}\frac{\mathrm{\kappa }\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}+{o}_{p}\left(1\right)$where $\mathrm{\kappa }\left(t\right)=\mathbb{E}\left[\frac{\mathrm{\delta }\phantom{\rule{thinmathspace}{0ex}}\mathbb{1}\left(Y>t\right)\left[m\left(\mathbf{X},D,Y\right)\right]{\int }_{t}^{Y}{S}_{C}\left(v\right)\phantom{\rule{thinmathspace}{0ex}}dv}{{w}^{2}\left(Y\right)}\right]$and $m\left(\mathbf{X},D,Y\right)=\frac{D\left[log\left(Y\right)-{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)}-\frac{\left(1-D\right)\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(1-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)}+{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right).$Hence, the generic elements of the class of influence functions ${\mathcal{G}}^{\left(\mathrm{A}\mathrm{F}\mathrm{T}\right)}$ ${\mathcal{G}}^{\left(\mathrm{A}\mathrm{F}\mathrm{T}\right)}=\left\{\mathrm{\phi }\left(Y,D,\mathbf{X}\right):\frac{\mathrm{\delta }}{w\left(Y\right)}\left[m\left(\mathbf{X},D,Y\right)-{\mathrm{\beta }}_{0}\right]+{\int }_{0}^{s}\frac{\mathrm{\kappa }\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\},$can be written as $\left\{{\stackrel{ˆ}{V}}_{0}\left(\mathrm{\theta }\right)-{\stackrel{ˆ}{V}}_{1}\left(\mathrm{\theta }\right)+{\stackrel{ˆ}{V}}_{2}\left(\mathrm{\theta }\right)\right\}$ where ${V}_{0}\left(\mathrm{\theta },\mathrm{\alpha }\right)=\frac{\left(1-D\right)\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(1-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)w\left(Y\right)}+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{0}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}$ ${V}_{1}\left(\mathrm{\theta },\mathrm{\alpha }\right)=\frac{D\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)w\left(Y\right)}+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{1}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}$ ${V}_{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)=\frac{\mathrm{\delta }}{w\left(Y\right)}\left[{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)-\mathrm{\beta }\right]+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{2}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)},$with ${\mathrm{\kappa }}_{d}\left(t\right)=\mathbb{E}\left[\frac{I\left(D=d\right)\mathrm{\delta }I\left(Y>t\right)\left[log\left(Y\right)-{\mathrm{\mu }}_{d}\left(\mathbf{X},\mathrm{\theta }\right)\right]{\int }_{t}^{Y}{S}_{C}\left(v\right)dv}{p\left(D=d|\mathbf{X},\mathrm{\alpha }\right){w}^{2}\left(Y\right)}\right],\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{f}\mathrm{o}\mathrm{r}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}d\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}0,1$ ${\mathrm{\kappa }}_{2}\left(t\right)=\mathbb{E}\left[\frac{\mathrm{\delta }I\left(Y>t\right)\left[{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)-\mathrm{\beta }\right]{\int }_{t}^{Y}{S}_{C}\left(v\right)dv}{{w}^{2}\left(Y\right)}\right].$In order to show that ${\mathcal{G}}^{\left(\mathrm{A}\mathrm{F}\mathrm{T}\right)}$ results in an unbiased estimator, we need to show that $\mathbb{E}\left[{V}_{0}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=\mathbb{E}\left[{V}_{1}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=\mathbb{E}\left[{V}_{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=0$. For the first expectation, we have $\begin{array}{rl}\mathbb{E}\left[{V}_{0}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]& =\mathbb{E}\left[\frac{\left(1-D\right)\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(1-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)w\left(Y\right)}+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{0}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right]\\ & =\mathbb{E}\left[\frac{\left(1-D\right)\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(1-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)w\left(Y\right)}\right]+\mathbb{E}\left[{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{0}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right]\end{array}$(14)The first expectation on the RHS of eq. (14) is $\begin{array}{l}=\mathbb{E}\left[{\int }_{0}^{\infty }{\int }_{0}^{\infty }f\left(Y=y,A=a,\delta =1,D=1|X=x\right)×\frac{D\left[\mathrm{log}\left(y\right)-{\mu }_{1}\left(x,\theta \right)\right]}{\pi \left(x,\alpha \right)w\left(Y\right)}dady\right]\\ =\mathbb{E}\left[{\int }_{0}^{\infty }\frac{1}{\pi \left(x,\alpha \right)}\frac{f\left(Y=y|D=1,X=x\right)}{{\mu }_{1}\left(x,\theta \right)}w\left(y\right)\frac{{\mu }_{1}\left(x,\theta \right)}{\mu \left(x,\theta \right)}\pi \left(x,\alpha \right)\right]×\frac{\left[\mathrm{log}\left(y\right)-{\mu }_{1}\left(x,\theta \right)\right]}{w\left(y\right)}dy\right]\\ =\mathbb{E}\left[\frac{1}{\mu \left(x,\theta \right)}{\int }_{0}^{\infty }f\left(y|x,D=1\right)\left[\mathrm{log}\left(Y\right)-{\mu }_{1}\left(x,\theta \right)\right]dy\right]=0,\end{array}$where ${\mathrm{\mu }}_{d}\left(\mathbf{x},\mathrm{\theta }\right)=\int yf\left(y|D=d,\mathbf{x}\right)\phantom{\rule{thinmathspace}{0ex}}dy$ and $\mathrm{\mu }\left(\mathbf{x},\mathrm{\theta }\right)={\int }_{0}^{\mathrm{\infty }}p\left({T}^{\mathrm{p}\mathrm{o}\mathrm{p}}\ge a|\mathbf{x},\mathrm{\theta }\right)$. The second equality follows from $p\left(Y\in \left(t,t+dt\right),\mathrm{\delta }=1|d,\mathbf{x}\right)=\frac{f\left(t|d,\mathbf{x}\right)w\left(t\right)dt}{{\mathrm{\mu }}_{d}\left(\mathbf{x},\mathrm{\theta }\right)},$and ${p}_{LB}\left(D=1|\mathbf{X}=\mathbf{x}\right)=\frac{{\mathrm{\mu }}_{1}\left(\mathbf{x},\mathrm{\theta }\right)p\left({D}^{\mathrm{p}\mathrm{o}\mathrm{p}}=1|{\mathbf{X}}^{\mathrm{p}\mathrm{o}\mathrm{p}}=\mathbf{x}\right)}{\mathrm{\mu }\left(\mathbf{x},\mathrm{\theta }\right)},$where ${p}_{LB}\left(D=1|\mathbf{X}=\mathbf{x}\right)$ is the propensity score estimated from the length-biased sample and $\mathrm{\pi }\left(\mathbf{x},\mathrm{\alpha }\right)=p\left({D}^{\mathrm{p}\mathrm{o}\mathrm{p}}=1|{\mathbf{X}}^{\mathrm{p}\mathrm{o}\mathrm{p}}=\mathbf{x}\right)$ is the true propensity score. The second expectation of the RHS of eq. (14) is also equal to zero since $\mathbb{E}\left[{M}_{C}\left(s\right)\right]=0$. Similarly, we can show that $E\left[{V}_{1}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=E\left[{V}_{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=0$.

It can be shown that ${V}_{0}\left(\mathrm{\theta },\mathrm{\alpha }\right)$ and ${V}_{1}\left(\mathrm{\theta },\mathrm{\alpha }\right)$ are uncorrelated. Hence the asymptotic variance of the estimator is given by $\mathrm{\eta }\left(\mathrm{\theta },\mathrm{\alpha }\right)=\mathbb{E}\left[{V}_{1}^{\ast 2}\left(\mathrm{\theta },\mathrm{\alpha }\right)+{V}_{0}^{\ast 2}\left(\mathrm{\theta },\mathrm{\alpha }\right)+{V}_{2}^{\ast 2}\left(\mathrm{\theta },\mathrm{\alpha }\right)-{V}_{0}^{\ast 2}\left(\mathrm{\theta },\mathrm{\alpha }\right){V}_{2}^{\ast 2}\left(\mathrm{\theta }\right)+{V}_{1}^{\ast 2}\left(\mathrm{\theta },\mathrm{\alpha }\right){V}_{2}^{\ast 2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right].$where $\mathbb{E}\left[{V}_{1}^{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=\mathbb{E}\left[{\left\{\frac{D\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)w\left(Y\right)}+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{1}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\}}^{2}\right]$ $\mathbb{E}\left[{V}_{0}^{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=\mathbb{E}\left[{\left\{\frac{\left(1-D\right)\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(1-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)w\left(Y\right)}+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{0}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\}}^{2}\right]$ $\mathbb{E}\left[{V}_{2}^{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]=\mathbb{E}\left[{\left\{\frac{\mathrm{\delta }}{w\left(Y\right)}\left[{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)-\mathrm{\beta }\right]+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{2}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\}}^{2}\right],$and $\begin{array}{rl}\mathbb{E}\left[{V}_{1}\left(\mathrm{\theta },\mathrm{\alpha }\right){V}_{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]& =\mathbb{E}\left[\left\{\frac{D\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)w\left(Y\right)}+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{1}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\}\right)\\ & \phantom{\rule{1em}{0ex}}×\left(\left\{\frac{\mathrm{\delta }}{w\left(Y\right)}\left[{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)-\mathrm{\beta }\right]+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{2}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\}]\\ \mathbb{E}\left[{V}_{0}\left(\mathrm{\theta },\mathrm{\alpha }\right){V}_{2}\left(\mathrm{\theta },\mathrm{\alpha }\right)\right]& =\mathbb{E}\left[\left\{\frac{\left(1-D\right)\mathrm{\delta }\left[log\left(Y\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)\right]}{\left(1-\mathrm{\pi }\left(\mathbf{X},\mathrm{\alpha }\right)\right)w\left(Y\right)}+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{0}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\}\right)\\ & \phantom{\rule{1em}{0ex}}×\left(\left\{\frac{\mathrm{\delta }}{w\left(Y\right)}\left[{\mathrm{\mu }}_{1}\left(\mathbf{X},\mathrm{\theta }\right)-{\mathrm{\mu }}_{0}\left(\mathbf{X},\mathrm{\theta }\right)-\mathrm{\beta }\right]+{\int }_{0}^{s}\frac{{\mathrm{\kappa }}_{2}\left(t\right)d{M}_{C}\left(t\right)}{{S}_{C}\left(t\right){S}_{R}\left(t\right)}\right\}].\end{array}$

Table 5:

Accelerated failure time simulation study when the propensity score is misspecified. PSR is the estimator based on eq. (9).

## Appendix C: Misspecified propensity score or mean model

In this appendix, we study the performance of the DR AFT estimator (11) when either the propensity score or the mean model is misspecified. We use the same simulation model as in Section 6.2 with only changing the treatment assignment model to $D\sim \mathrm{B}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{l}\mathrm{i}\left(\mathrm{e}\mathrm{x}\mathrm{p}\mathrm{i}\mathrm{t}\left\{2-3{x}_{1}-3{x}_{2}\right\}\right)$. Our misspecified propensity score ignores the confounder ${X}_{1}$. Table 5 shows results based on 500 data sets of sizes 200 and 800 with 0, 20 and 30 percent censoring. The superscript $m1$ and $m2$ represent the propensity score and mean model misspecifications, respectively. The misspecified propensity score ignores the variable ${x}_{1}$ and the misspecified mean model ignores the interaction term $d{x}_{2}$ (see Section 6.2). The results confirm that our estimator is doubly robust.

## Appendix D: Derivation of the score function ${\mathcal{I}}_{\mathrm{\beta }}$

The score function ${\mathcal{I}}_{\mathrm{\beta }}$ derived from the following pseudo-partial likelihood after adjusting the risk sets for the confounding and the length-biased sampling ${\mathcal{L}}_{P}\left(\mathrm{\beta }\right)=\prod _{i}{\left\{\frac{exp\left(\mathrm{\beta }{d}_{i}\right)}{\sum _{j=1}^{n}exp\left(\mathrm{\beta }{d}_{j}\right)\mathbb{1}\left({y}_{j}\ge {y}_{i}\ge {a}_{j}\right)\mathbb{1}\left(D=d\right)/p\left({D}_{i}={d}_{i}|{\mathbf{x}}_{i}\right)}\right\}}^{{\mathrm{\delta }}_{i}/p\left({D}_{i}={d}_{i}|{\mathbf{x}}_{i}\right)};$where $I\left({y}_{j}\ge {y}_{i}\ge {a}_{j}\right)/p\left({D}_{j}={d}_{j}|{\mathbf{x}}_{j}\right)$ represents the adjusted risk set for both length-biased sampling and confounding. Followed by Shen et al. [4] and Qin and Shen [5], we estimate the denominator by $\sum _{j=1}^{n}exp\left(\mathrm{\beta }{d}_{j}\right)\frac{{\mathrm{\delta }}_{j}\mathbb{1}\left(D=d\right)\mathbb{1}\left({y}_{j}\ge {y}_{i}\right)}{p\left({D}_{i}={d}_{i}|{\mathbf{x}}_{i}\right)w\left({y}_{j}\right)}$where the focus is on the uncensored subjects and the risk set is inversely weighted by $w\left({y}_{j}\right)$. Note, under assumptions A1–A3, we have $\mathbb{E}\left[\frac{\mathbb{1}\left(D=d\right)}{p\left(D=d|\mathbf{X}\right)}.\frac{\mathrm{\delta }\phantom{\rule{thinmathspace}{0ex}}\mathbb{1}\left(Y\ge y\right)}{w\left(Y\right)}\right]=\frac{{S}_{d}\left(y\right)}{{\mathrm{\mu }}_{d}},$which justifies the form of the score function ${\mathcal{I}}_{\mathrm{\beta }}$.

## Appendix E: Cox and AFT estimating equations when either of the confounding or the length-biased sampling is ignored

• 1.

Estimating equation for Cox model when length biased is left unadjusted: $\sum _{i=1}^{n}{\int }_{0}^{s}\frac{\mathbb{1}\left({D}_{i}={d}_{i}\right)}{p\left({D}_{i}={d}_{i}|{\mathbf{X}}_{i},\mathrm{\alpha }\right)}\left[{D}_{i}-\frac{\sum _{j=1}^{n}\frac{{D}_{j}}{\mathrm{\pi }\left({\mathbf{X}}_{j},\mathrm{\alpha }\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)}{\sum _{j=1}^{n}\frac{\mathbb{1}\left({D}_{j}={d}_{j}\right)}{p\left({D}_{j}={d}_{j}|{\mathbf{X}}_{j},\mathrm{\alpha }\right)}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)}\right]d{N}_{i}\left(u\right)=0.$

• 2.

Estimating equation for Cox model when the confounding is left unadjusted: $\sum _{i=1}^{n}{\int }_{0}^{s}\left[{D}_{i}-\frac{\sum _{j=1}^{n}{D}_{j}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}{\sum _{j=1}^{n}{e}^{\mathrm{\beta }{D}_{j}}{\mathrm{\delta }}_{j}\mathbb{1}\left({Y}_{j}\ge u\right)/w\left({Y}_{j}\right)}\right]d{N}_{i}\left(u\right)=0.$

• 3.

Estimating equation for AFT model when length biased is left unadjusted: $\sum _{i=1}^{n}{\mathrm{\delta }}_{i}\left[\frac{{D}_{i}log\left({Y}_{i}\right)}{{\mathrm{\pi }}_{i}}-\frac{\left(1-{D}_{i}\right)log\left({Y}_{i}\right)}{\left(1-{\mathrm{\pi }}_{i}\right)}-\mathrm{\beta }\right]=0.$

• 4.

Estimating equation for AFT model when the confounding is left unadjusted: $\sum _{i=1}^{n}\frac{{\mathrm{\delta }}_{i}{D}_{i}}{w\left({Y}_{i}\right)}\left[log\left({Y}_{i}\right)-\mathrm{\beta }{D}_{i}\right]=0.$

## References

• 1.

Cox DR, Lewis P. The statistical analysis of series of events. Monographs on applied probability and statistics. London: Chapman and Hall, 1966. Google Scholar

• 2.

Zelen M, Feinlein M. On the theory of screening for chronic diseases. Biometrika 1969;56:601–14.

• 3.

Bergeron PJ, Asgharian M, Wolfson DB. Covariate bias induced by length-biased sampling of failure times. J Am Stat Assoc 2008;103:737–42.

• 4.

Shen Y, Ning J, Qin J. Analyzing length-biased data with semiparametric transformation and accelerated failure time models. J Am Stat Assoc 2009;104:1192–202.

• 5.

Qin J, Shen Y. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics 2010;66:382–92.

• 6.

Ning J, Qin J, Shen Y. Non-parametric tests for right-censored data with biased sampling. J R Stat Soc Ser B (Stat Methodol) 2010;72:609–30.

• 7.

Wicksell SD. The corpuscle problem: a mathematical study of a biometric problem. Biometrika 1925;17:84–99. Google Scholar

• 8.

Fisher RA. The effect of methods of ascertainment upon the estimation of frequencies. Ann Hum Genet 1934;6:13–25. Google Scholar

• 9.

Neyman J. Statistics–servant of all science. Science 1955;122:401–6.

• 10.

Patil GP, Rao CR. Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 1978;34:179–89.

• 11.

Asgharian M, Wolfson C, Wolfson DB. Analysis of biased survival data: the Canadian study of health and aging and beyond. Stat Action Can Outlook 2014;193–208.

• 12.

Binder D. Fitting Cox’s proportional hazards models from survey data. Biometrika 1992;79:139–47.

• 13.

Lin D. On fitting Cox’s proportional hazards models to survey data. Biometrika 2000;87:37–47.

• 14.

Pugh M, Robins J, Lipsitz S, Harrington D (1993): Inference in the Cox proportional hazards model with missing covariate data. Technical report, Harvard School of Public Health, Dept. of Biostatistics. Google Scholar

• 15.

Chen H, Little R. Proportional hazards regression with missing covariates. J Am Stat Assoc 1999;94:896–908.

• 16.

Luo X, Tsai W, Xu Q. Pseudo-partial likelihood estimators for the cox regression model with missing covariates. Biometrika 2009;96:617–33.

• 17.

Qi L, Wang C, Prentice R. Weighted estimators for proportional hazards regression with missing covariates. J Am Stat Assoc 2005;100:1250–63.

• 18.

Rotnitzky A, Robins JM. Inverse probability weighting in survival analysis. In: Armitage P, Coulton, T, editors. Encyclopedia of biostatistics, 2nd ed. New York: Wiley, 2005. Google Scholar

• 19.

Ertefaie A, Asgharian M, Stephens D. Propensity score estimation in the presence of length-biased sampling: a non-parametric adjustment approach. Stat 2014;3:83–94.

• 20.

Wolfson C, Wolfson DB, Asgharian M, M’Lan CE, Østbye T, Rockwood K, et al. A reevaluation of the duration of survival after the onset of dementia. N Engl J Med 2001;344:1111–16.

• 21.

Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat 1978;6:34–58.

• 22.

Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat Theory Methods 1994;23:2379–412.

• 23.

Robins JM. Causal inference from complex longitudinal data. In: Berkane, M, etdior. Latent variable modeling and applications to causality. New York: Springer, 1997:69–117. Google Scholar

• 24.

Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55.

• 25.

Hirano K, Imbens G, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 2003;71:1161–89.

• 26.

Cheng Y, Wang M. Estimating propensity scores and causal survival functions using prevalent survival data. Biometrics 2012;68:707–16.

• 27.

Robins JM, Rotnitzky A, van der Laan M. On profile likelihood: comment. J Am Stat Assoc 2000;95:477–82.

• 28.

van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer Science & Business Media, 2003. Google Scholar

• 29.

Cox DR, Oakes D. Analysis of survival data. Chapman & Hall/CRC, 1984. Google Scholar

• 30.

Robins JM, Mark SD, Newey WK. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 1992;48:479–95. URL http://www.jstor.org/stable/2532304Crossref

• 31.

Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd ed. New York: Wiley, 2002. Google Scholar

• 32.

Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 1952;47:663–85.

• 33.

Hájek J, Dupač V. Sampling from a finite population. New York: Marcel Dekker, 1981. Google Scholar

• 34.

Hernán M, Brumback B, Robins J. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000;11:561–70.

• 35.

Robins JM. Association, causation, and marginal structural models. Synthese 1999;121:151–79.

• 36.

Tsiatis AA. Semiparametric theory and missing data. New York: Springer Verlag, 2006. Google Scholar

• 37.

Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999;94:1096–120.

• 38.

Lipsitz SR, Ibrahim JG, Zhao LP. A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. J Am Stat Assoc 1999;94:1147–60.

• 39.

Neugebauer R, van der Laan M. Why prefer double robust estimators in causal inference? J Stat Plan Inference 2005;129:405–26.

• 40.

Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proc Am Stat Assoc Sec Bayesian Stat Sci 1999;6–10. Google Scholar

• 41.

Addona V, Wolfson DB. A formal test for the stationarity of the incidence rate using data from a prevalent cohort study with follow-up. Lifetime Data Anal 2006;12:267–84.

• 42.

Asgharian M, Wolfson DB, Zhang X. Checking stationarity of the incidence rate using prevalent cohort survival data. Stat Med 2006;25:1751–1767.

• 43.

Carrière Y, Pelletier L. Factors underlying the institutionalization of elderly persons in Canada. J Gerontol Ser B Psychol Sci Soc Sci 1995;50:S164.

• 44.

Little R, Rubin DB. Statistical analysis with missing data. Vol. 539. New York: Wiley, 1987. Google Scholar

• 45.

Asgharian M, M’Lan CE, Wolfson DB. Length-biased sampling with right censoring. J Am Stat Assoc 2002;97:201–9.

• 46.

Asgharian M, Wolfson DB. Asymptotic behavior of the unconditional NPMLE of the length-biased survivor function from right censored prevalent cohort data. Ann Stat 2005;33:2109–31. URL http://www.jstor.org/stable/3448636Crossref

• 47.

Vardi Y. Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 1989;76:751–61.

• 48.

Wang MC, Jewell NP, Tsai WY. Asymptotic properties of the product limit estimate under random truncation. Ann Stat 1986;14:1597–605.

• 49.

Luo X, Tsai W. Nonparametric estimation for right-censored length-biased data: a pseudo-partial likelihood approach. Biometrika 2009;96:873–86.

• 50.

Huang CY, Qin J. Nonparametric estimation for length-biased and right-censored data. Biometrika 2011;98:177.

• 51.

Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Programs Biomed 2004;75:45–9.

• 52.

Nieto FJ, Coresh J. Adjusting survival curves for confounders: a review and a new method. Am J Epidemiol 1996;143:1059.

• 53.

Xie J, Liu C. Adjusted Kaplan–Meier estimator and log-rank test with inverse probability of treatment weighting for survival data. Stat Med 2005;24:3089–110.

• 54.

Wang MC. Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 1991;86:130–43.

• 55.

Pepe, MS, Fleming, TR. Weighted Kaplan-Meier statistics: Large sample and optimality considerations. J R Stat Soc Ser B (Methodol) 1991; 53(2):341–52. Google Scholar

## Footnotes

Published Online: 2015-03-21

Published in Print: 2015-05-01