Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population

In the presence of heterogeneity between the randomized controlled trial (RCT) participants and the target population, evaluating the treatment effect solely based on the RCT often leads to biased quantification of the real-world treatment effect. To address the problem of lack of generalizability for the treatment effect estimated by the RCT sample, we leverage observational studies with large samples that are representative of the target population. This article concerns evaluating treatment effects on survival outcomes for a target population and considers a broad class of estimands that are functionals of treatment-specific survival functions, including differences in survival probability and restricted mean survival times. Motivated by two intuitive but distinct approaches, i.e., imputation based on survival outcome regression and weighting based on inverse probability of sampling, censoring, and treatment assignment, we propose a semiparametric estimator through the guidance of the efficient influence function. The proposed estimator is doubly robust in the sense that it is consistent for the target population estimands if either the survival model or the weighting model is correctly specified and is locally efficient when both are correct. In addition, as an alternative to parametric estimation, we employ the nonparametric method of sieves for flexible and robust estimation of the nuisance functions and show that the resulting estimator retains the root-n consistency and efficiency, the so-called rate-double robustness. Simulation studies confirm the theoretical properties of the proposed estimator and show that it outperforms competitors. We apply the proposed method to estimate the effect of adjuvant chemotherapy on survival in patients with early-stage resected non-small cell lung cancer.


Introduction
In clinical trials or biomedical studies, time-to-event or survival endpoints, such as the time from treatment initiation to death, have been commonly used to evaluate the treatment effect.When estimating the treatment effect, randomized controlled trials (RCTs) are regarded as the gold standard since randomization reduces the effect of confounding variables.However, RCTs often suffer from a lack of generalizability or external validity.Specifically, due to restrictive inclusion and exclusion criteria for enrollment or additional concerns from patients and physicians, RCTs often do not recruit enough participants who represent the real-world patient population, resulting in the covariate distribution of the RCT sample being different from that of the target real-world population.In the presence of such heterogeneity, evaluating the treatment effect based solely on the RCT sample leads to biased quantification of the real-world treatment effect.As a complement of the RCT sample, observational studies have been widely used in comparative effectiveness research, as large samples that are representative of the target population can be studied at a relatively low cost.
Several recent works have proposed integrative methods to generalize findings from the RCT to the target population by leveraging observational studies [1][2][3][4][5].Most existing methods focus on directly modeling the probability of participating in the trial, i.e., the sampling score that is analogous to the treatment propensity score.A widely used approach involves inverse probability of sampling weighting [IPSW;1,6], which can be used to estimate weight-adjusted survival curves [7,8].However, these IPSW-based estimators are unstable under extreme sampling scores.Alternatively, Lee et al. [5] proposed calibration weighting that enforces covariate balance between the RCT and observational study (OS) without explicitly modeling the sampling score.Recently, Colnet et al. [9] provided a comprehensive review of various novel methods combining complementary features of RCTs and observational studies.However, most of these methods focus on continuous and binary outcomes, and generalization of the findings from RCTs for survival outcomes to the target population is less actively studied.
In this article, we consider a broad class of estimands defined as a functional of treatmentspecific survival functions, including differences in survival probability at landmark times and restricted mean survival time (RMST).Various estimators can be constructed to adjust for the unrepresentativeness or selection bias of the RCT sample.One approach relies on fitting conditional survival outcome models and then averaging over the covariate distribution of the observational sample, similar to Chen and Tsiatis [10].Another common approach is to use weighting [7,11] to adjust for the imbalance between the RCT sample and the observational sample.Instead of direct modeling of sampling score as in IPSW, one can consider the more stable approach that calibrates covariate distributions between the RCT and the observational sample [5].Motivated by these two intuitive but distinct approaches, we propose improved estimators for survival outcomes under the guidance of the efficient influence function (EIF), which involve survival outcome regression (OR) and weighting based on inverse probability of treatment, censoring, and sampling, simultaneously.The proposed estimator is doubly robust in the sense that it is consistent for the target population estimand if either the survival model or the weighting model is correctly specified, and is locally efficient when both are correct.In addition, to cope with possible misspecification of nuisance functions, we consider the method of sieves [12], which adds great flexibility and robustness to the proposed estimators, meanwhile retaining the root-n consistency.
The remainder of this article is organized as follows.In Section 2, we formalize the basic causal inference framework for survival outcomes.In Section 3, we introduce two direct estimators based on identification formulas, and in Section 4, we propose improved estimators and describe the corresponding asymptotic properties.The finite sample performance of the proposed estimators is assessed via simulation studies in Section 5. Applying the proposed estimators, we analyze the effect of adjuvant chemotherapy on the survival of lung cancer patients with data from an RCT and an OS in Section 6. Section 7 presents the discussion and concluding remarks.All proofs are provided in the Appendix.

Estimands, observed data, and assumptions
Suppose we are interested in comparing the effectiveness of two treatments.Let A be the binary treatment assignment, A ∈ {0, 1}.Following the potential outcomes framework [13,14], let T a be the potential survival time if a subject received the treatment A = a, and S a (t) and λ a (t) be the corresponding survival and hazard functions, i.e., S a (t) = P(T a ≥ t) and λ a (t) = lim ℎ 0 ℎ −1 P (t ≤ T a ≤ t + ℎ) ∕ P (T a ≥ t).Under the proportional hazards assumption, a widely used measure to characterize the treatment effect is hazard ratio (HR), i.e., λ 1 (t)/ λ 0 (t) being a constant.However, the interpretation of HRs is challenging especially when the proportionality assumption is violated [15,16].
Alternatively, we define the average treatment effect (ATE) measure θ τ as a function of treatment-specific survival curves, θ τ = Ψ τ (S 1 (t), S 0 (t)), where τ is a prespecified constant.This formulation of the ATE includes a broad class of estimands that are favored in survival analysis [17].For example, θ τ = S 1 (τ) − S 0 (τ) is a simple survival difference at a fixed time τ, and θ τ = ∫ 0 τ {S 1 (t) − S 0 (t)}dt is the RMST difference up to τ.The ratio of restricted mean time loss and the difference of the median survival can also be represented with the appropriate choice of Ψ τ (•).
Under the stable unit treatment value assumption, the survival time T is the realization of the potential outcomes, i.e., T = T 1 A + T 0 (1 − A).Let C be the censoring time.In the presence of right censoring, the survival time T is not observed for all subjects; instead, we observe U = T ∧ C and Δ = I(T ≤ C), where ∧ represents the minimum of two values, and I(•) is an indicator function.Let X be a p-dimensional vector of pre-treatment covariates.Also, let δ denotes the binary indicator of RCT participation, and let δ denotes the binary indicator of OS participation.We consider a super-population framework assuming that an RCT sample of size n and an observational sample of size m are sampled from the target population.
From the RCT sample, we observe {U i , Δ i , A i , X i , δ i = 1, δ i = 0} from i = 1, …, n independent and identically distributed subjects.For the observational sample, it is common that only the covariates information is available, i.e., {X i , δ i = 0, δ i = 1} from i = n + 1,…, n + m independent and identically distributed subjects.The sampling mechanism and data structure are illustrated in Figure 1.We assume independence between the RCT and the observational sample, which holds if two separate studies are conducted by independent researchers, the target patient population is sufficiently large, or the patients are enrolled in two separate time periods.
Assumptions 1-3 are not testable in general, and their plausibility should be justified based on subject matter knowledge in practice.Assumption 1 holds in the RCT by default.Assumption 2 (i) is plausible if all information related to the trial participation and the outcome is captured in the data at hand.This assumption is weaker than the ignorablility of the trial participation assumption, i.e., {T 0 , T 1 } ⫫ δ|X.The relationship between Assumption 2 (i) and its stronger version is analogous to that described in the study by Dahabreh et al. [4] in the context of continuous and binary outcomes.Assumption 2 (ii) implies that the absence of patient characteristics prevent from participating in the trial [5].Assumption 3 is commonly made in survival analysis [10,18,19].This assumption is weaker than the conditional independence assumption of the censoring and survival time given only on the treatment [20].
The covariate distribution of the RCT sample f(X|δ = 1) may not be representative of that of the target population f(X) due to restrictive trial enrollment criteria, but the covariate distribution of the observational sample f(X | δ = 1) is often representative of f(X) due to the real-world data collection mechanism.In particular, if the observational sample is a simple random sample of the target population, then f(X | δ = 1) = f(X).More generally, the observational sample can be selected under complex sampling designs.To accommodate such scenarios, we can define d as the known design weight for the observational sample.
Assumption 4. (The known design weight for the observational sample).The observational sample design weight d = 1 ∕ P (δ = 1 | X) is known.
Assumption 4 is commonly assumed in the survey sampling literature.On the basis of Assumption 4, the design-weighted observational sample is representative of the target population.In an OS with simple random sampling, d = N / m, where N is the target population size.
Under the aforementioned assumptions, the ATE θ τ , or S a (t), a ∈ {0, 1} sufficiently, is identified based on the observed data.We consider two identification formulas.Let Y(t) = I(U ≥ t) and define the conditional censoring model S a C (t, X) = P (C > t | X, A = a, δ = 1).One identification formula is based on the conditional survival curves, i.e., which can be called the OS-design-weighted G-computation formula.Note that if the population covariate distribution is available, E{S a (t, X)} is the G-computation formula for S a (t).The other identification formula is based on the inverse probability weighting (IPW) approach for the marginal survival curves, i.e., for a ∈ {0, 1}.The two identification formulas in (1) and (2) motivate the estimators in the following section, depending on different components of the observed data likelihood.
3 Two direct estimators based on identification formulas

Outcome regression
On the basis of the identification formula (1), the treatment-specific conditional survival curve can be modeled and fitted based on the observed data, e.g., using the widely used Cox regression model for the survival outcome.A treatment-specific conditional hazard function at time t given covariate X i is defined as follows: where λ a0 (t) is a treatment-specific baseline hazard function, for a ∈ {0, 1}, i = 1, …, n.Following standard survival analysis techniques, β a can be estimated as a solution to the partial likelihood score equation, and the baseline cumulative hazard Λ a0 (t) ≡ ∫ 0 t λ a0 (u) can be estimated by the Breslow [21] estimator: where A ai = I(A i = a), N i (u) = I(U i ≤ u, Δ i = 1), and Y i (u) = I(U i ≥ u).The survival model in (3) does not imply that the marginal HR of the potential survival outcomes under a = 1 and a = 0, i.e., λ 10 (t)/λ 00 (t), is a constant and thus is not restrictive.Other survival models can be considered, including the additive hazards model [22,23].Chen and Tsiatis [10] proposed a method that accounts for imbalances between treatment groups by first estimating the conditional treatment effect given X and then averaging the effect over X across both treatment groups.A similar approach can be applied to balance the covariate distribution between the RCT sample and the observational sample by first estimating the treatment-specific survival curve conditional on f(X|δ = 1) under model (3), i.e., S a (t, X i ) = exp{ − Λ ai (t)} = exp{ − Λ a0 (t) exp(β a T X i )}, and then applying the design-weighted averaging over f(X | δ = 1).The resulting OR estimator of the marginal treatment-effect survival curve is for a ∈ {0, 1}, and the corresponding ATE estimator is defined as θ τ OR = Ψ τ (S 1 OR (t), S 0 OR (t)).
The OR estimator is consistent when the survival model ( 3) is correctly specified.

Inverse probability and calibration weighting
We can construct the estimator of the marginal treatment-specific survival curve based on the identification formula in (2) that involves sampling score, treatment propensity score, and censoring probability.This approach can be viewed as a combination of IPSW, inverse probability of treatment weighting (IPTW), and inverse probability of censoring weighting (IPCW).The weighting estimator requires positing models for the three probabilities and estimating them.
First, we consider the estimation of the sampling score.As the covariate distribution of the RCT sample is different from that of the target population in general, the estimated ATE based only on the RCT sample can be biased.A widely used approach to account for this selection bias is IPSW, i.e., weighting the RCT sample by the inverse probability of trial participation over that of OS participation, to adjust for differences in covariate distribution between the trial sample and the population.Specifically, π δ (X) can be modeled as π δ (X) = {ω IPSW (X)} −1 , where ω IPSW (X) = P (δ = 1 | δ + δ = 1, X) ∕ P (δ = 1 | δ + δ = 1, X).One can plug in {ω IPSW (X)} −1 for π δ (X) in (2) using the common logistic regression model.However, the IPSW method requires the sampling score model to be correctly specified; it also could be highly unstable if P (δ = 1 | δ + δ = 1, X) is close to zero for some X.
Instead of direct estimating the sampling scores, Lee et al. [5] proposed the calibration weighting approach to reduce the selection bias in the trial-based estimator, which is analogous to the entropy balancing method by Hainmueller [24] and more stable than the IPSW method.The basic idea is that subjects in the RCT sample are calibrated to the observational sample, so that after calibration, the covariate distribution of the RCT sample empirically matches that of the target population.The calibration weighting approach is based on the idea that for any g(X), the following identity holds, where g(X) is a function of X to be calibrated, e.g., the moments or any nonlinear transformations.
The calibration weights ω i are obtained by solving the optimization problem: where W = {w i : δ i = 1}.The last constraint is the empirical representation of ( 5), where N δ i d i is a consistent estimator of E{g(X)} from the observational sample.Minimizing the negative entropy function of the calibration weights in ( 6) enforces the weights to be as close to one another as possible, which reduces the variability due to heterogeneous weights.By using the Lagrange multiplier λ, the objective function of this Under the loglinear sampling score model, the calibration weights from the objective function ( 6) have the same functional form as inverse probability of sampling score weights asymptotically, resulting in the direct correspondence between the calibration weight and the sampling score in that ω i − {Nπ δ (X i )} −1 p 0, as n → ∞ [5].Following that, we posit the loglinear sampling score model, π δ (X) = exp{η 0 T g(X)}, for some η 0 .
Accordingly, in the rest of the article, we represent the calibration weights using If one considers a logistic sampling score model instead, then other objective functions can be used, such as 1), that corresponds to the weights with the same functional form as the inverse to a logistic probability of sampling [25,26].However, the loglinear regression model in ( 8) is close to the logistic regression model when the proportion of the RCT sample in the target population is small.
With respect to treatment assignment, π A (X) is generally known for RCTs.However, several authors suggested estimating the treatment propensity score for the RCTs to increase the efficiency and account for the chance of imbalance of prognostic variables [e.g., 27,28].We choose a logistic regression model for the treatment propensity score, and define Estimating the propensity scores also broadens the scope of the current article to allow the generalization of the OS.Even though this article focuses on generalizing the trial findings where we only require a two-way balancing between the RCT and OS, a three-way balancing approach among the treated, the controlled, and the observational sample (e.g., [29]) can be useful to generalize the findings from the OS to its larger population with the estimation of π A .
Moreover, to account for right censoring C, we posit Cox proportional hazards model with where the standard techniques as for the survival model in ( 3) can be used to estimate γ a and Combining ω i , π ai , and Λ ai T X i ) estimated under the working models in ( 8), (9), and ( 10), respectively, we define the CW estimator of the marginal treatment-specific survival curve as follows: where A ai = I(A i = a).The corresponding ATE estimator is θ τ CW = Ψ τ (S 1 CW (t), S 0 CW (t)).

Efficient influence function
Let O = (X, A, U, Δ, δ, δ ) be one copy of the vector of observed variables.The OR estimator specified in (4) and the CW estimator specified in (11) use different components of the likelihood function f(O).Specifically, the OR estimator is based on modeling S a (t, X) for a ∈ {0, 1}, and the CW estimator is based on modeling π δ (X), π a (X), and S a C (t, X) for a ∈ {0, 1}.These estimators are singly robust in that they are consistent only under the correct survival OR model or the correct weighting models.A vast number of estimators can be constructed by combining these two estimators.In general, the question becomes how to obtain the most efficient estimator.Our approach is to derive the EIF [30] of θ τ to construct semiparametrically efficient estimators.Such estimators also gain robustness to model misspecification as we show later.
We consider a class of influence functions of regular asymptotically linear (RAL) estimators of the treatment-specific survival curve S a (t).Define A a = I(A = a) and π a (X) = aπ A (X) + (1 − a){1 − π A (X)}.Following Tsiatis [30], the class of observed data influence functions includes for arbitrary functions g a1 (•), g a2 (•), and g a3 (•), where Martingale with N a C = A a I(U ≤ u, Δ = 0), and Λ a C (s) is a cumulative hazard for censoring for a ∈ {0, 1}.The last three terms in ( 12) are mean-zero functions.According to the semiparametric theory [30], the EIF is the influence function in ( 12) with the smallest variance.The EIF can be derived by projecting the first term of ( 12) onto the orthogonal complement of the tangent space spanned by the scores of the nuisance functions, i.e., the last three terms.The RAL estimator with the EIF is a semiparametrically efficient estimator.
The following theorem gives the EIF in the class of influence functions ( 12), with the proof given in the Appendix B.1.
Theorem 1.Under Assumptions 1-4, the EIF for the treatment-specific survival curve is Many common treatment effect estimands are functionals of the treatment-specific survival curves, including the survival difference at a fixed time τ, the difference of RMSTs, the ratio of RMTLs, and the difference of τth quantile of survivals.Their EIFs can be expressed in the form of a combination of weighted integrals of the EIF for treatment-specific survival curves [17].To be specific, the EIF for θ τ is for some functions ϕ a (•) that E{ϕ a ( ⋅ ) 2 } < ∞ (see Appendix A for details).We limit our interest to the estimands with such form of the EIF, which covers the broad class of estimators that are favored in survival analysis.

Augmented calibration weighting estimator
Motivated by Theorem 1, under the survival model in ( 3) and the weighting models specified in ( 8)-( 10), we propose the following augmented CW (ACW) estimator of the treatmentspecific survival curve, In addition, following the technique by Zhang and Schaubel [20] that represents the marginal cumulative hazard function Λ a (t) = ∫ 0 t − {S a (u)} −1 dS a (u) by estimating the denominator and the numerator separately, we propose another ACW estimator, where estimates −dS a (u for bounded variation functions ϕ a (•) in ( 14).Under the asymptotic linear characterization, Toward this end, the following theorem shows the local efficiency and asymptotic properties of the proposed ACW estimators of the ATE, i.e., θ τ ACW1 = Ψ τ (S 1 ACW1 (t), S 0 ACW1 (t)) Theorem 2. Under Assumptions 1-4, if either the survival model in ( 3) is correctly specified or the weighting models, i.e., the sampling score model ( 8), the treatment propensity score model (9), and the censoring model ( 10), are correctly specified, under regularity conditions, θ τ ACW1 and θ τ ACW2 are consistent for θ τ , and are asymptotically normal with mean zero and variance E(ς 1 2 ) and E(ς 2 2 ).Moreover, if all working models in ( 3) and ( 8)-( 10) are correctly specified, θ τ ACW1 and θ τ ACW2 are locally efficient, i.e., The proof of Theorem 2 and details of ς 1 2 and ς 2 2 are provided in Appendix B.3.For a straightforward procedure of variance estimation, a nonparametric bootstrap method can be used.Specifically, we draw B bootstrap samples from both the RCT and the observational sample, respectively, and then for each resampled pair, we obtain a replicate of the ACW estimator; the sample variance of the B bootstrap replicates is the variance of the ACW estimator.
The proposed ACW estimators depend on the parametric estimation of nuisance functions.Alternatively, we can consider a flexible nonparametric approach without the parametric assumption, which is often unrealistic in complex problems in practice.The asymptotic behavior of the ACW estimators with the nonparametric estimation of nuisance functions can be characterized by the empirical process perspective.Suppose that the posited nuisance models are consistent, i.e., ‖π δ (X; η) − π δ (X)‖ = o p (1), ‖π A (X; ρ) − π A (X)‖ = o p (1), and ‖S a (t, X; β a ) − S a (t, X)‖ = o p (1), ‖S a C (t, X; γ a ) − S a C (t, X)‖ = o p (1), for a ∈ {0, 1}.Also, suppose that the weighting functions π δ (X; η), π A (X; ρ), and S a C (u, X; γ a ) are bounded away from zero.
Then, we have the effect of the estimated nuisance functions in θ τ ACW − θ τ bounded above by − S a (t, X) + ℙ∫ 0 t ‖dM a C (u, X; γ a )‖ ⋅ ‖S a (u, X) −1 S a (t, X) − S a (u, X; β a ) −1 S a (t, X; β a )‖ up to a multiplicative constant, where ℙ is a true measure such that ℙf(O) = ∫ f(O)dℙ and ∥•∥ is L 2 norm.If each term of the bound is of rate o p (n −1/2 ), then the effect of nuisance function estimations are asymptotically negligible.This statement is formalized in the following theorem.
The proof of Theorem 3 is provided in Appendix B.4.To ensure the consistency of the nuisance function estimation with the convergence rate of the product as in (C3), we use the method of sieves in Section 4.3.

Robust estimation using the method of sieves with penalization
For a robust estimation of the ATE under possibly misspecified working models, we adopt the method of sieves [31,32], which enables flexible data-adaptive estimation of the survival curves and probability weights with root-n consistency [12].We construct the sieves using the linear spans of power series [33], but other basis functions such as Fourier series or splines are applicable.Specifically, for a p-dimensional vector of non-negative integers κ k = (κ k1 , …, κ kp ), we consider a K-vector sieve basis functions g(X) = {g 1 (X), …, g K (X)} T = {X κ 1 …, X κ K } T , where Under standard regularity conditions, the sieves approximation results in a consistent estimation of the survival curves and weighting probabilities with large K (see Lee et al. [5], supporting information).To facilitate the stable estimation and to control the variability of the estimators with large K, we consider the penalized estimation of the nuisance functions.
For penalized sieves estimation of the survival outcome model S a (t, X), the standard penalization technique for the Cox PH model was used based on the partial likelihood.Specifically, we estimate β a by solving where D is the set of indices of the events, R r is the set of indices of the patients at risk at time t r , with t 1 < … < t d denoting the distinct event time, and p ε (•) is the SCAD penalty, for a ∈ {0, 1}.The penalized sieves estimation of the censoring model S a C (t, X) can be adopted by analogy.Finally, for the penalized sieves estimation of the treatment propensity score model π A (X), we use the standard penalization approach for the binary data using the logit link with the SCAD penalty.
Under regularity conditions specified in Fan and Li [37] and Lee et al. [5], the penalized sieve estimators of the nuisance functions possess oracle properties and satisfy two conditions in Theorem 3. Thus, the resulting ACW estimators using the method of sieves achieve the root-n consistency and semiparametric efficiency.

Simulation studies
In this section, we conduct simulation studies to evaluate the finite-sample performance of the proposed estimators.We consider the target population of size N = 200,000 and covariates X = (X 1 , X 2 , X 3 ) T , where each X i , i = 1, 2, 3 is generated from N(0, 1) and truncated at −4 and 4 to satisfy regularity conditions.An RCT sample of size n ~ 1,300 is selected from a hypothetical RCT eligible population with size N 1 = 50,000.From the remaining N 2 = 150,000 OS population, we randomly select a sample of size m = 5,000.
We consider the difference in RMST as the ATE, i.e., , where we choose τ = 20.We compare the proposed CW and the ACW estimators with four other methods, the Naive estimator based only on the RCT sample, the ZS estimator, i.e., the estimator proposed by Zhang and Schaubel [18] based only on the RCT sample, the IPSW estimator using the normalized IPSW weight in (11), and the OR estimator specified in (4).For the ACW estimators, in addition to the original covariate vector g 1 (X) = (X 1 , X 2 , X 3 ) T , we consider the penalized sieve estimation with calibration variables g 1 (X) to its second-order power series including all two-way interactions and quadratic terms.We use (S) to indicate the method of sieves.To assess the performance of these estimators under model misspecification, we consider four scenarios where (i) all four models in ( 3) and ( 8)-( 10) are correct, (ii) only the survival outcome model ( 3) is correct, (iii) the outcome model is incorrect, but the other three weighting models ( 8)-( 10) are correct, and (iv) all four models are incorrect.Details of estimators and specification of the four working models when they are correctly/incorrectly specified are listed in Table 1.
Table 2 and Figure 2 summarize the results based on 1,000 Monte Carlo replications.The bootstrap variance estimation was used for all estimators with B = 100.When all models are correctly specified, the Naive and ZS estimators that are based only on the RCT sample show biased estimations of the ATE due to the selection bias in the RCT sample.The OR, IPSW, CW, and ACW estimators correct that bias by leveraging the observational sample covariates.The ACW estimators were found to be more efficient than other unbiased IPW estimators.Even though the variance of the OR estimator is smaller, it is biased when the outcome model is incorrectly specified.The CW and IPSW estimators are biased when the sampling score model is not correctly specified.The proposed ACW estimators are double robust when either the outcome model or the other three models are correctly specified.In Scenario 4 where both the outcome model and the weighting models are misspecified, the ACW estimator using the penalized sieve estimation, i.e., ACW1(S) and ACW2(S), is still unbiased.In Scenario 1 and 2, the efficiencies of the ACW1(S) and ACW2(S) estimators are comparable to that of the ACW1 and ACW2 estimators.In Scenario 3 and 4 where the outcome model is incorrect, ACW1(S) and ACW2(S) are more efficient than their counterparts, confirming that using the method of sieves can improve efficiency.In addition, the ACW2 and ACW2(S) estimators were found to be more consistent than the ACW1 and ACW1(S) estimators in a finite sample.

Real data application
We apply the proposed method to estimate the effect of adjuvant chemotherapy on survival in patients with early-stage resected non-small cell lung cancer (NSCLC).Cancer and Leukemia Group B (CALGB) 9633 is the only randomized phase III trial designed to evaluate the effectiveness of adjuvant chemotherapy over observation for stage IB NSCLC [40].An additional observational sample for stage IB NSCLC patients was extracted from National Cancer Database (NCDB), including more than 15,000 patients with the same eligibility criteria as CALGB 9633.More details of the two data sources are given in the study by Lee et al. [5], where they conducted an integrative analysis of the CALGB trial sample and the NCDB sample to improve generalizability for the CALGB 9633 trial-based estimator of average risk of cancer recurrence.
Table 3 summarizes the distribution of the four baseline covariates by the data sources, which have been considered important prognostic factors.The baseline covariates of the patients in the CALGB trial are different from those of the patients in NCDB.Specifically, the CALGB trial patients consist of more males, are younger, and have smaller tumor sizes.
Consequently, an important clinical question is whether adjuvant chemotherapy benefits the general stage IB NSCLC patients population, represented by the NCDB, which is a population-based registry capturing approximately 79% of newly diagnosed lung cancers in the United States, and contains more females, are order, and have larger tumor sizes than the CALGB patients.Given that the RCT sample is relatively healthier than the NCDB sample, the estimators based only on CALGB 9633 sample would result in biased estimation of the true effect of adjuvant chemotherapy on the real-world population of early-stage NSCLC patients.
We estimate a 12-year difference of the restricted mean lifetime between adjuvant chemotherapy and observation (i.e., no chemotherapy).The nonparametric bootstrap method is used to estimate the standard errors.The results are given in Table 4, using the proposed estimators and other existing methods in the simulation studies.The Naive and ZS estimators indicate that in the RCT sample, there is no 12-year RMST difference between the adjuvant chemotherapy and observation, i.e., 0.02 and 0.04 years, respectively.All other estimators that utilize the covariate information of the OS show a much larger difference in the RMST.The IPSW and CW estimators show about 0.4-0.5 year increase in RMST for adjuvant chemotherapy over observation, and the OR estimator shows about 0.84 year increase; however, these estimates are not significant.The proposed ACW estimators give an estimate of about a 1-year RMST increase for patients who received adjuvant chemotherapy, which is significant at 0.05 level.
Figure 3 represents the estimated restricted mean lifetime for adjuvant chemotherapy and observation and their difference as a function of restricted times τ.All selection-biasadjusted estimators, i.e., the IPSW, CW, OR, ACW1, ACW2, ACW1(S), and ACW2(S) estimators, show a trend of increasing RMST difference over τ for all τ from 1 to 13 years.Especially, all of the estimators show significant non-zero differences when τ is large.
Compared to the ACW1 and ACW2 estimators, the ACW1(S) and ACW2(S) estimators gain efficiency by using the method of sieves.On the other hand, the Naive and ZS estimators that are based only on the RCT sample show nearly flat trends near zero difference over τ.All these results indicate that the proposed estimators are able to detect the effect of adjuvant chemotherapy on survival in the real world population than that in the RCT sample with higher efficiency and more protection against model misspecification, by leveraging the information from the OS.
This substantial heterogeneity in the treatment effect is mainly due to age and tumor size.While many baseline covariates are prognostic factors for survival risk, age, and tumor size are the few variables associated with the outcome and significantly interact with the treatments.It is the reason that these two variables are of interest for evaluating treatment effect heterogeneity.Subgroup analysis based only on the RCT data supports the same extent of treatment effect heterogeneity varied by age and tumor size.The patients with older age and larger tumor size have significantly less risk for death after adjuvant chemotherapy than those who were younger and had smaller tumors [40].The trend is consistent with the treatment effect heterogeneity found in the integrated analysis for the target population.The remaining difference in treatment heterogeneity could be caused by the imbalance of the covariates distribution between the RCT sample and the target population, and again the difference is expected and reasonable, and we believe that it indeed reflects the value of such integrated analysis.
This article considers a framework to estimate the treatment effect defined as a function of the treatment-specific survival probability function.The proposed ACW estimators are motivated by two identification formulas based on the survival outcome model and the weighting models and achieve local efficiency and double robustness based on semiparametric theory.
The proposed ACW estimators are similar to the estimator proposed by Zhang and Schaubel [18] that combines the treatment-specific Cox model by Chen and Tsiatis [10] and the IPTW Nelson-Aalen method proposed by Wei [41].However, unlike our estimator that is derived from the EIF, their approach is based on augmenting the IPW estimating equation to achieve double robustness, and the efficiency of their estimator has not been studied yet.Moreover, their estimator is based only on the RCT sample, whereas we leverage the observational sample to account for selection bias, resulting in an additional augmentation in terms of the sampling score model.In addition, we focus on the class of estimands defined as functionals of the survival functions, including the framework proposed by Zhang and Schaubel [18] as a special case.The proposed approach is also similar to Zhang et al. [19], focusing on the broad class of Mann-Whitney-type causal effect based on the semiparametric theory.
To derive the EIF, they construct a RAL estimator that excludes censored data.In contrast, by restricting our interest to the functional of the treatment-specific survival curves, we consider a different RAL estimator utilizing both observed and censored data.The proposed class of estimators could be more robust under highly censored data and covers a broad class of estimands that are favored in time-to-event data.
Instead of a penalized nonparametric sieves method, other machine learning methods can be used, such as survival trees [42] or random forests [43] as alternative to the Cox proportional hazard model.A cross-fitting technique can be employed to remove the Donsker's condition, which is questionable to hold for these methods [44].
The proposed methods assume that the trial participation is ignorable, i.e., all covariates related to the trial participation and survival time are captured.However, some important covariates might not be available in the observational samples as there were not originally collected for the research purpose.Future work could involve a sensitivity analysis assessing the robustness of the proposed framework in the presence of unmeasured covariates in observational studies [e.g., 45-48].
In addition to the estimation of the ATE, a key challenge is to identify subgroups of patients for whom the treatment is more effective.Estimating individualized treatment effects is a key toward precision medicine so that doctors can tailor the treatment for individual patients given their characteristics.Interesting future research would be generalizing individualized treatment effects for survival outcomes combining RCT and observational studies, following Yang et al. [49,50] and Wu and Yang [51].

B.1 Proof of Theorem 1
We first derive the EIF for treatment-specific survival curves without censoring, i.e., under full data set V = (X, A, T , δ, δ ), and then derive it in the presence of censoring, i.e., under the observed data set O = (X, A, U, Δ, δ, δ ).The proof based on V is similar to the one in given in the study by Lee et al. [5] using the method of parametric submodel [53].We use ξ in the subscript to denote the submodel.For example, f ξ (V) is a one-dimensional parametric submodel with the true f(V) at ξ = 0, i.e., Since the likelihood of a single V is and δlt a = 0, the score function can be decomposed as follows: We use S .
to represent the score function.
The nuisance tangent space from the semiparametric theory is where Here, we only derive the EIF for S 1 (u).The EIF for S 0 (u) can be easily derived using the similar technique.We denote the EIF for S 1 (u) under V as φ 1 Toward this end, we express For the first term in (B1), we have and for the second term in (B1), we have By combining (B2) and ( B3), Therefore, Let Now, in the presence of censoring, we observe data set O = (X, A, U, Δ, δ, δ ) instead of V.
The EIF for S 0 (t) can be obtained by analogy.

B.2 The influence function of the ACW estimators
For both ACW1 and ACW2 estimators, by the definition of the EIF, 1), for a ∈ {0, 1} .
Thus, under asymptotic linear characterization of the ATE estimator Thus, θ τ ACW has the influence function:

B.3 Proof of Theorem 2
Standard regularity conditions for the consistency of survival OR parameters and treatment propensity score model [e.g., 18,55] and suitable regularity conditions for the M-estimation theory [e.g., 56] are assumed throughout this proof.The former includes almost sure boundedness of X and finite cumulative baseline hazard function for survival and censoring, and the latter includes smoothness and identifiability of working models and applicability of dominated convergence theorem.
For the ACW2 estimator, we can show that the numerator −dS 1 ACW1 (u) converges in probability to −dS 1 (u), thus S 1 ACW2 (t) As ϑ is differentiable at ζ*, we have the derivative S a C (t, X; γ a ) S a (t, X) − S a (t, X; β a ) S a (u, X; β a ) , for a ∈ {0, 1}.By Cauchy-Schwarz inequality and the positivity of π δ {X) and π A {X), we have where ≲ indicates that the inequality holds up to a multiplicative constant.By iterated expectation, (B20) = 0. Lastly, by Cauchy-Schwarz inequality and iterated expectation, we have where the last inequality holds by the condition (C2) in Theorem 3. Therefore, ℙ{S a (t; ζ ) − S a (t; ζ * )} is bounded above by . It is negligible under the conditions (C1) and (C3) in Theorem 3; thus, ℙ{ϑ τ (ζ ) − ϑ τ (ζ * )} is negligible as well and θ τ ACW1 achieves semiparametric efficiency.The proof for θ τ ACW2 is analogous by the small order difference between θ τ ACW1 and θ τ ACW2 .Estimated RMST plots of adjuvant chemotherapy and observation and their difference as a function of restricted times.

ACW1(S)
The penalized ACWl estimator using the method of sieves with g(X) = g 2 (X)

ACW2(S)
The penalized ACW2 estimator using the method of sieves with g(X) = g 2 (X) J Causal Inference.Author manuscript; available in PMC 2023 August 25.

Figure 1 :
Figure 1: Illustration of the data structure of the RCT sample and the OS sample within the target super-population framework.

Table 2 :
Simulation results under four scenarios; T: true (correct) model, W: wrong (incorrect) model.Bias is the empirical bias of point estimates; ESE is the empirical standard error of estimates; RSE is the relative bias (%) of bootstrap standard error estimates; CP is the empirical coverage probability of the 95% confidence intervals θ

Table 3 :
Baseline characteristics of the CALGB 9633 trial sample and the NCDB sample; mean (standard deviation) for continuous and number (proportion) for binary covariate J Causal Inference.Author manuscript; available in PMC 2023 August 25.

Table 4 :
Estimates and 95% confidence intervals of 12-year difference of the restricted mean lifetime between adjuvant chemotherapy and observation J Causal Inference.Author manuscript; available in PMC 2023 August 25.