Abstract
Mann–Whitney-type causal effects are clinically relevant, easy to interpret, and readily applicable to a wide range of study settings. This article considers estimation of such effects when the outcome variable is a survival time subject to right censoring. We derive and discuss several methods: an outcome regression method based on a regression model for the survival outcome, an inverse probability weighting method based on models for treatment assignment and censoring, and two doubly robust methods that involve both types of models and that remain valid under correct specification of the outcome model or the other two models. The methods are compared in a simulation study and applied to an observational study of hospitalized pneumonia.
1 Introduction
Comparative effectiveness research frequently involves comparing one treatment (
where
In this article, we consider estimation of θ when the outcome of interest is a survival time subject to right censoring, which has not been considered in previous studies of Mann–Whitney-type causal effects. If T is a survival time, θ remains well defined and is just as meaningful as it is for another continuous outcome. In fact, the class of Mann–Whitney-type causal effects includes some well known and commonly used effect measures for survival outcomes. For example, if
for which
Estimation of θ for a fully observed outcome has been considered by Chen et al. [11], Vermeulen et al. [12] and Zhang et al. [13], with focus on adjusting for confounders in observational studies and using auxiliary information to improve efficiency in randomized clinical trials. These methods cannot be used to estimate θ for a survival outcome subject to right censoring, which represents an additional challenge. We are not aware of any existing method for estimating θ for a right-censored survival outcome, except in the aforementioned special cases where θ is a difference in distribution function [14], [15], [16] or a difference in mean restricted survival time [17], [18], [19]. In both of these special cases,
In the next section, we describe these methods and discuss their asymptotic behavior. In Section 3, we report a simulation study and present a real application. Concluding remarks are given in Section 4. Technical details are provided in appendices.
2 Methodology
2.1 Notation and assumptions
Let A denote the actual treatment received by an individual subject in a study, which may be a randomized clinical trial or an observational study. Assuming consistency or stable unit treatment value, the actual survival time is then
The ignorability assumption is trivially true in a randomized clinical trial. In an observational study, the assumption requires that
which is trivially true in a clinical trial but not trivial in an observational study.
The actual survival time T may be right-censored by a censoring time C. We assume independent censoring:
where ⊥ denotes independence. The observed data for an individual subject can be represented as
Our goal is to use the observed data to estimate θ for a general (specified) function h. For identifiability, we assume that
2.2 Outcome regression (OR) estimator
The OR method is based on the fact that
where
To deal with the curse of dimensionality, we will assume a parametric or semiparametric model for
where
where Λ and
Whether
At least for parametric models and the proportional hazards model,
If the model
2.3 Inverse probability weighted (IPW) estimator
It follows from assumptions (3)–(5) and a conditioning argument that
where
In reality, although
where
In an observational study, we also need to estimate
Once
The denominator here serves to normalize the inverse probability weights. If the models
2.4 First doubly robust (DR1) estimator
The DR1 method is adapted from the DR method of Zhang and Schaubel [19] for estimating the difference in mean restricted survival time. As an important by-product, Zhang and Schaubel [19] propose a DR estimator of
Then the Zhang-Schaubel estimator of
and the corresponding estimator of
Now θ can be estimated by
2.5 Second doubly robust (DR2) estimator
In Appendix A, we show that the efficient influence function for estimating θ is
where
where
The use of
3 Numerical results
3.1 Simulation
Here we report a simulation study of the finite-sample performance of the methods described in Section 2. Data are generated according to the following mechanism:
where
Each simulated sample is analyzed using the four methods (OR, IPW, DR1 and DR2) described in Section 2 as well as a naive method that ignores censoring and possible confounding and estimates θ with the U-statistic
where
where
Simulation results: empirical bias and standard deviation (SD) for the naive, OR, IPW, DR1 and DR2 estimators with correct and incorrect working models (see Section 3.1 for details).
Method | Models | ||||||
Bias | SD | Bias | SD | ||||
naive | 0.255 | 0.077 | 0.173 | 0.078 | |||
OR | correct | 0.001 | 0.080 | −0.011 | 0.087 | ||
OR | incorrect | 0.157 | 0.097 | 0.141 | 0.103 | ||
IPW | correct | correct | 0.048 | 0.174 | 0.014 | 0.177 | |
IPW | incorrect | incorrect | 0.180 | 0.190 | 0.142 | 0.197 | |
DR1 | correct | correct | correct | 0.008 | 0.138 | 0.001 | 0.146 |
DR1 | correct | incorrect | incorrect | −0.004 | 0.100 | −0.013 | 0.106 |
DR1 | incorrect | correct | correct | 0.017 | 0.143 | 0.010 | 0.136 |
DR1 | incorrect | incorrect | incorrect | 0.155 | 0.120 | 0.146 | 0.112 |
DR2 | correct | correct | correct | 0.008 | 0.118 | −0.045 | 0.116 |
DR2 | correct | incorrect | incorrect | −0.018 | 0.105 | −0.056 | 0.117 |
DR2 | incorrect | correct | correct | 0.014 | 0.120 | −0.038 | 0.108 |
DR2 | incorrect | incorrect | incorrect | 0.138 | 0.122 | 0.096 | 0.121 |
naive | 0.256 | 0.050 | 0.172 | 0.048 | |||
OR | correct | 0.003 | 0.051 | −0.009 | 0.055 | ||
OR | incorrect | 0.162 | 0.061 | 0.147 | 0.064 | ||
IPW | correct | correct | 0.044 | 0.141 | 0.011 | 0.147 | |
IPW | incorrect | incorrect | 0.163 | 0.159 | 0.137 | 0.173 | |
DR1 | correct | correct | correct | 0.014 | 0.082 | 0.002 | 0.104 |
DR1 | correct | incorrect | incorrect | −0.008 | 0.068 | −0.019 | 0.077 |
DR1 | incorrect | correct | correct | 0.018 | 0.077 | 0.007 | 0.088 |
DR1 | incorrect | incorrect | incorrect | 0.157 | 0.080 | 0.146 | 0.078 |
DR2 | correct | correct | correct | 0.017 | 0.077 | −0.032 | 0.107 |
DR2 | correct | incorrect | incorrect | −0.020 | 0.099 | −0.046 | 0.140 |
DR2 | incorrect | correct | correct | 0.020 | 0.078 | −0.028 | 0.091 |
DR2 | incorrect | incorrect | incorrect | 0.146 | 0.110 | 0.117 | 0.141 |
Table 1 shows a summary of the simulation results: empirical bias and standard deviation for estimating θ. As expected, the naive method is severely biased. The OR estimator becomes biased when the model
3.2 Application
We now apply the methods compared in Section 3.1 to a study of hospitalized pneumonia in young children, described in Section 1.13 of Klein and Moeschberger [26]. The study is based on 3,470 newborn children from the National Longitudinal Survey of Youth [27]. The research question is whether the mother’s feeding choice (breast feeding or not) protects the infant against hospitalized pneumonia in the first year of life. The outcome variable of interest is the time to hospitalized pneumonia, which is restricted to one year and possibly censored before one year. The available baseline covariates are infant race (black, white or other), indicators of normal birthweight (at least 5.5 pounds) and having at least one sibling, and some maternal and family characteristics: age, years of schooling, alcohol use, cigarette use, region of the country (Northeast, Northcentral, South and West), poverty, and urban environment.
We are interested in the causal effect of breast feeding on hospitalized pneumonia as measured by θ with h defined by (2). This effect measure can be interpreted as a win-lose probability difference in a hypothetical comparison of the restricted times to hospitalized pneumonia of two randomly chosen infants, who are randomly assigned to breast feeding and no breast feeding as in a clinical trial. The propensity score model in our analysis is a logistic regression model based on all infant characteristics as well as maternal age, cigarette use, and years of schooling. The OR model is specified as a proportional hazards model based on infant characteristics, maternal cigarette use, and urban environment, with separate parameters for each treatment group. The model for censoring is also a separate proportional hazards model for each treatment group, with infant race and maternal age as covariates.
Analysis of pneumonia data: point estimates (PE) of θ and bootstrap standard errors (SE) obtained using different methods (see Section 3.2 for details).
Method | PE | SE |
naive | −0.0098 | 0.0169 |
OR | −0.0020 | 0.0015 |
IPW | 0.1373 | 0.1902 |
DR1 | −0.0028 | 0.0018 |
DR2 | 0.0076 | 0.0023 |
Table 2 shows the point estimates of θ obtained using the five methods compared in Section 3.1 together with nonparametric bootstrap standard errors based on 1,000 bootstrap samples. The IPW method clearly stands out with a large positive point estimate and a large standard error, both of which are likely due to the large variability of the IPW estimator. Considering the large standard error, the IPW estimate of θ may well be due to random variation and does not establish a positive effect of breast feeding. The naive method, on the other hand, has the smallest (i. e., most negative) point estimate and also a relatively large standard error. The results are somewhat similar for the other three methods (OR, DR1 and DR2). DR2 is the only method that reaches statistical significance (with or without adjusting for multiplicity); however, the estimated effect is rather small in magnitude under the DR2 method. Taken together, the point estimates and standard errors in Table 2 provide no clear evidence for a substantial effect of breast feeding on hospitalized pneumonia.
4 Discussion
Mann–Whitney-type causal effects are clinically relevant, easy to interpret, and readily applicable to a wide range of study settings. This article considers estimation of such effects when the outcome variable is a survival time subject to right censoring. We have derived four methods for doing this and compared them theoretically and empirically. Among these methods, the OR and DR methods have unique advantages, and their suitability to a given application will depend on the available information and the relative importance of bias versus efficiency. If one is primarily concerned about bias, then the DR methods may be preferable. The OR method is more efficient when the OR model is correctly specified. No clear advantage has been observed for the IPW method.
The relationship between the two DR estimators remains an open question. The DR1 estimator adapted from Zhang and Schaubel [19] is known to be doubly robust. The proposed DR2 estimator, based directly on the efficient influence function for estimating θ, is doubly robust and locally efficient. In our simulation study, the two estimators appear to perform similarly in large samples when all working models are correctly specified, which suggests that the DR1 estimator might be locally efficient as well. Further research is warranted to clarify how the two DR estimators might relate to each other.
All of these methods assume that treatment assignment is ignorable in the sense of (3). The ignorability assumption cannot be validated with observed data and must be based on background knowledge. If there is insufficient background knowledge to justify the ignorability assumption, it is important to assess the robustness of study results in the presence of unmeasured confounders. Methods for conducting such a sensitivity analysis have been developed for some causal effects [e. g., 28], [e. g., 29] but not for the Mann–Whitney-type effects considered here. Further research is needed to close this methodological gap.
Funding source: Hong Kong Polytechnic University
Award Identifier / Grant number: G-YBCU
Funding statement: Liu’s research was supported by Hong Kong Polytechnic University (grant # G-YBCU) and General Research Funding from Research Grants Council of Hong Kong (grant # 15327216).
Acknowledgment
We thank the Associate Editor and the Reviewer for insightful and constructive comments on an earlier version of this paper. A part of this research was conducted while the first author was visiting the Hong Kong Polytechnic University.
Appendix A Semiparametric theory
Here we derive the efficient influence function for estimating θ using the monotone coarsening argument of Tsiatis [21]. Recall that the observed data for an individual subject is
First, we characterize the influence function of any regular, asymptotically linear (RAL) estimator of θ based on the full data. For argument’s sake, we assume for a moment that
Using the theory of U-statistics [e. g., 30, Chapter 12], it is easy to see that the above estimator is RAL with influence function
With
and the influence function of any full-data RAL estimator of θ takes the form
for some function b such that
Next, we characterize the influence function of any RAL estimator of θ based on the observed data. According to Tsiatis [21, Chapter 9], such an influence function can be represented as
and
As shown in Tsiatis [21, page 217], the augmentation space
where q is an arbitrary function,
for some functions b and q.
To find the efficient influence function, the influence function with the smallest possible variance, we need to minimize the variance of
Section 10.4 of Tsiatis [21] indicates that the projection of
The efficient influence function is therefore
which is equivalent to equation (6) in the main paper.
Appendix B Asymptotic theory
Standard regularity conditions in the M-estimation theory [e. g., 30, Chapter 5] are assumed. These include identifiability and smoothness (in parameters) of working models, existence of integrable envelopes that permit use of the dominated convergence theorem, and certain Donsker properties that help deal with random functions. Techniques for verifying the Donsker property can be found in van der Vaart and Wellner [31].
Let
B.1 Asymptotics for θ ˆ or
Here we assume that the model
For any
Then
It follows from the theory of U-statistics [e. g., 30, Theorem 12.3] that, for any fixed
where
We assume that the map
Substituting (B.3) and (B.4) into (B.1) yields
B.2 Asymptotics for θ ˆ ipw
Here we assume that the models
Let us define
and, for
Then
where
For
which implies that
Applying the delta method once again to
B.3 Asymptotics for θ ˆ dr1
Here we outline a general approach to the analysis of
Therefore, the mapping of
For specific working models, Zhang and Schaubel [19] show that
for some influence function
B.4 Asymptotics for θ ˆ dr2
Here we assume that (i)
We first demonstrate the consistency of
with
Note that
It follows from Tsiatis [21, Section 9.3 and Lemma 10.4] that
Substituting (B.11) and (B.12) into (B.10) leads to
Next, we show that
Then
We assume that ϑ is differentiable at
It follows that
Finally, we show that
References
1. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66:688–701.10.1037/h0037350Search in Google Scholar
2. Agresti A. Categorical data analysis. 3rd ed. Hoboken, NJ: John Wiley and Sons; 2013.Search in Google Scholar
3. Wilcoxon F. Individual comparisons by ranking methods. Biometrics. 1945;1:80–3.10.1007/978-1-4612-4380-9_16Search in Google Scholar
4. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947;18:50–60.10.1214/aoms/1177730491Search in Google Scholar
5. Acion L, Peterson JJ, Temple S, Arndt S. Probabilistic index: an intuitive non-parametric approach to measuring the size of treatment effects. Stat Med. 2006;25:591–602.10.1002/sim.2256Search in Google Scholar PubMed
6. Brumback LC, Pepe MS, Alonzo TA. Using the ROC curve for gauging treatment effect in clinical trials. Stat Med. 2006;25:575–90.10.1002/sim.2345Search in Google Scholar PubMed
7. Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 1: General issues and tail-area-based methods. Stat Med. 2006;25:543–57.10.1002/sim.2323Search in Google Scholar PubMed
8. Wang C, Scharfstein DO, Colantuoni E, Girard TD, Yan Y. Inference in randomized trials with death and missingness. Biometrics. 2017;73:431–40.10.1111/biom.12594Search in Google Scholar PubMed PubMed Central
9. Liu W, Zhang Z, Nie L, Soon G. A case study in personalized medicine: rilpivirine versus efavirenz for treatment-naive HIV patients. J Am Stat Assoc. 2017;112:1381–92.10.1080/01621459.2017.1280404Search in Google Scholar
10. Lumley T. Good, better, worst: what do rank tests really test? Presented at Canterbury Statistics Day, 2012. Available online at http://www.math.canterbury.ac.nz/canterbury-tails/download/68/Lumley—Plant-and-Food.pdf. 2012.Search in Google Scholar
11. Chen SX, Qin J, Tang CY. Mann-Whitney test with adjustments to pretreatment variables for missing values and observational study. J R Stat Soc, Ser B, Stat Methodol. 2013;75:81–102.10.1111/j.1467-9868.2012.01036.xSearch in Google Scholar
12. Vermeulen K, Thas O, Vansteelandt S. Increasing the power of the Mann–Whitney test in randomized experiments through flexible covariate adjustment. Stat Med. 2015;34:1012–30.10.1002/sim.6386Search in Google Scholar PubMed
13. Zhang Z, Ma S, Shen C, Liu C. Estimating Mann–Whitney-type causal effects. 2018. Under review.10.1515/jci-2018-0010Search in Google Scholar
14. Hubbard A, van der Laan MJ, Robins JM. Nonparametric locally efficient estimation of the treatment specific survival distribution with right censored data and covariates in observational studies. In: Halloran E, Berry D, editors. Statistical models in epidemiology: the environment and clinical trials. New York: Springer; 1999. p. 135–78.10.1007/978-1-4612-1284-3_3Search in Google Scholar
15. Zhang M, Schaubel DE. Contrasting treatment-specific survival using double-robust estimators. Stat Med. 2012;31:4255–68.10.1002/sim.5511Search in Google Scholar PubMed PubMed Central
16. Zhang M. Robust methods to improve efficiency and reduce bias in estimating survival curves in randomized clinical trials. Lifetime Data Anal. 2015;21:119–37.10.1007/s10985-014-9291-ySearch in Google Scholar PubMed
17. Chen P, Tsiatis AA. Causal inference on the difference of the restricted mean life between two groups. Biometrics. 2001;57:1030–8.10.1111/j.0006-341X.2001.01030.xSearch in Google Scholar
18. Zhang M, Schaubel DE. Estimating differences in restricted mean lifetime using observational data subject to dependent censoring. Biometrics. 2011;67:740–9.10.1111/j.1541-0420.2010.01503.xSearch in Google Scholar PubMed PubMed Central
19. Zhang M, Schaubel DE. Double-robust semiparametric estimator for differences in restricted mean lifetimes using observational data. Biometrics. 2012;68:999–1009.10.1111/j.1541-0420.2012.01759.xSearch in Google Scholar PubMed PubMed Central
20. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Baltimore, MD: Johns Hopkins University Press; 1993.Search in Google Scholar
21. Tsiatis AA. Semiparametric theory and missing data. New York: Springer; 2006.Search in Google Scholar
22. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.10.21236/ADA114514Search in Google Scholar
23. Cox DR. Regression models and life tables (with discussion). J R Stat Soc, Ser B, Stat Methodol. 1972;34:187–200.Search in Google Scholar
24. Cox DR. Partial likelihood. Biometrika. 1975;62:269–75.10.1093/biomet/62.2.269Search in Google Scholar
25. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–66.10.1080/01621459.1994.10476818Search in Google Scholar
26. Klein JP, Moeschberger ML. Survival analysis: techniques for censored and truncated data. 2nd ed. New York: Springer; 2003.10.1007/b97377Search in Google Scholar
27. National Longitudinal Survey of Youth. NLS Handbook. Columbus, Ohio: Center for Human Resource Research, Ohio State University; 1995.Search in Google Scholar
28. Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc, Ser B, Stat Methodol. 1983;45:212–8.10.1017/CBO9780511810725.017Search in Google Scholar
29. VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167:268–74.10.7326/M16-2607Search in Google Scholar PubMed
30. van der Vaart AW. Asymptotic statistics. Cambridge, UK: Cambridge University Press; 1998.10.1017/CBO9780511802256Search in Google Scholar
31. van der Vaart AW, Wellner JA. Weak convergence and empirical processes with applications to statistics. New York: Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar
32. Nolan D, Pollard D. Functional limit theorems for U-processes. Ann Probab. 1988;16:1291–8.10.1214/aop/1176991691Search in Google Scholar
© 2019 Walter de Gruyter GmbH, Berlin/Boston
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.