Mann–Whitney-type causal effects are clinically relevant, easy to interpret, and readily applicable to a wide range of study settings. This article considers estimation of such effects when the outcome variable is a survival time subject to right censoring. We derive and discuss several methods: an outcome regression method based on a regression model for the survival outcome, an inverse probability weighting method based on models for treatment assignment and censoring, and two doubly robust methods that involve both types of models and that remain valid under correct specification of the outcome model or the other two models. The methods are compared in a simulation study and applied to an observational study of hospitalized pneumonia.
Comparative effectiveness research frequently involves comparing one treatment () with another () with respect to a clinical outcome of interest. For a generic subject in the target population, let , , denote the potential outcome under treatment a . The causal effect of versus is commonly assessed by comparing the means of and , if they are defined. Recently, there has been growing interest in a class of Mann–Whitney-type effect measures defined as , where is a specified function and the subscripts i and j denote two independent subjects. A simple example of h is given by
where is the indicator function; see, for example, Agresti [2, page 58]. Related definitions include , which forms the basis for the rank sum test  and the Mann–Whitney statistic . For an ordered categorical outcome, which does not have a mean, the Mann–Whitney-type effect based on (1) provides a natural overall effect measure. Such effect measures have been recommended for clinical trials with arbitrary (ordinal or higher level) outcomes because of their clinical relevance and interpretability [e. g., 5], [e. g., 6], [e. g., 7]. They are particularly useful in analyzing an ordinal composite outcome that combines a quantitative outcome with death and possibly treatment discontinuation due to adverse events or lack of efficacy , . As a possible drawback of such effect measures, Lumley  points out a lack of transitivity when θ based on (1) is used to order multiple treatments. On the other hand, for comparing two treatments, θ does provide unique insights that are not available from a comparison of means (if feasible at all).
In this article, we consider estimation of θ when the outcome of interest is a survival time subject to right censoring, which has not been considered in previous studies of Mann–Whitney-type causal effects. If T is a survival time, θ remains well defined and is just as meaningful as it is for another continuous outcome. In fact, the class of Mann–Whitney-type causal effects includes some well known and commonly used effect measures for survival outcomes. For example, if , then , where , . As another example, if , where ∧ denotes minimum and τ is a positive constant chosen to ensure identifiability, then θ is the difference in mean restricted survival time. A new effect measure for survival outcomes is obtained by taking h to be a truncated version of (1):
for which can be interpreted as a win-lose probability difference in comparing the restricted survival times of two randomly chosen subjects who are randomly assigned to treatments 1 and 0.
Estimation of θ for a fully observed outcome has been considered by Chen et al. , Vermeulen et al.  and Zhang et al. , with focus on adjusting for confounders in observational studies and using auxiliary information to improve efficiency in randomized clinical trials. These methods cannot be used to estimate θ for a survival outcome subject to right censoring, which represents an additional challenge. We are not aware of any existing method for estimating θ for a right-censored survival outcome, except in the aforementioned special cases where θ is a difference in distribution function , ,  or a difference in mean restricted survival time , , . In both of these special cases, is additive in the sense that it can be written as a function of minus a function of . Here, we consider estimation of θ for a general function h, such as (2), that is not assumed to be additive. We derive and compare several methods: an outcome regression (OR) method based on a regression model for the survival outcome, an inverse probability weighting (IPW) method based on models for treatment assignment and censoring, and two doubly robust (DR) methods that involve both types of models and that remain valid under correct specification of the outcome model or the other two models. One of the DR methods is a straightforward extension of Zhang and Schaubel , and the other one is a new method based directly on the efficient influence function ,  for estimating θ.
In the next section, we describe these methods and discuss their asymptotic behavior. In Section 3, we report a simulation study and present a real application. Concluding remarks are given in Section 4. Technical details are provided in appendices.
2.1 Notation and assumptions
Let A denote the actual treatment received by an individual subject in a study, which may be a randomized clinical trial or an observational study. Assuming consistency or stable unit treatment value, the actual survival time is then . Let be a vector of relevant covariates measured at baseline (before treatment). We assume that treatment assignment is ignorable  given in the sense that
The ignorability assumption is trivially true in a randomized clinical trial. In an observational study, the assumption requires that contain enough information to explain any association between A and the potential outcomes. We also assume positivity:
which is trivially true in a clinical trial but not trivial in an observational study.
The actual survival time T may be right-censored by a censoring time C. We assume independent censoring:
where ⊥ denotes independence. The observed data for an individual subject can be represented as , where and . The observed data for the whole study will be conceptualized as independent copies of and denoted by , .
Our goal is to use the observed data to estimate θ for a general (specified) function h. For identifiability, we assume that is determined by and for some such that almost surely. Accordingly, we will restrict attention to the interval in discussions of distribution, survival and hazard functions. For theoretical reasons, we also assume that , which is satisfied by all examples mentioned earlier. The efficient influence function in this estimation problem is given in Appendix A. In the rest of this section, we propose several estimators of θ, whose asymptotic properties are studied in Appendix B. Asymptotic variance formulas can be derived but may be cumbersome to use in variance estimation, depending on the specific forms of the working models involved. For ease of implementation, we use bootstrap standard errors and confidence intervals for inference.
2.2 Outcome regression (OR) estimator
The OR method is based on the fact that
To deal with the curse of dimensionality, we will assume a parametric or semiparametric model for , say , where may be finite- or infinite-dimensional. This will be referred to as the OR model. A prominent example of a semiparametric model for is the proportional hazards model , :
where is the conditional hazard function of T given , , is the baseline hazard function, and is a vector-valued function of . Under this model,
where Λ and are cumulative versions of λ and , respectively.
Whether is parametric or semiparametric, it is usually convenient to work with the corresponding hazard function . The likelihood for based on the observed data may be written as
At least for parametric models and the proportional hazards model, can be estimated by maximizing the above likelihood. Let be an estimate of , which may be obtained by maximum likelihood or other means. Then θ can be estimated by
If the model is correct and is consistent for (in a suitable sense), then is consistent for θ under mild regularity conditions. In Appendix B, we show that converges to a zero-mean normal distribution under general conditions. A key condition we assume is that is -consistent and asymptotically linear in a suitable sense.
2.3 Inverse probability weighted (IPW) estimator
where . If and are known, then the above identity suggests that θ can be estimated by
In reality, although is known by design in a clinical trial, is generally unknown and must be estimated. We assume a model for , say , which may be parametric or semiparametric so may be finite- or infinite-dimensional. The model may be specified using the same considerations for specifying except that we are now modeling the censoring time instead of the survival time. Let be an estimate of , which may be obtained by maximizing the likelihood:
where is the conditional hazard function of C given under the model , and is the cumulative version of .
In an observational study, we also need to estimate . In fact, even if is known, using an estimate of instead of the known value usually leads to better efficiency in the IPW approach . Let be modeled as , where is a finite- or infinite-dimensional parameter. Because A is binary, a typical choice for would be a logistic regression model. Let be an estimate of , which may be obtained by maximizing the likelihood:
Once and are obtained, the IPW estimator of θ is readily available as a weighted average:
The denominator here serves to normalize the inverse probability weights. If the models and are correct and estimate consistently, then is consistent for θ and asymptotically linear under appropriate conditions (see Appendix B).
2.4 First doubly robust (DR1) estimator
The DR1 method is adapted from the DR method of Zhang and Schaubel  for estimating the difference in mean restricted survival time. As an important by-product, Zhang and Schaubel  propose a DR estimator of , which can be described as follows. Let us define
Then the Zhang-Schaubel estimator of , the cumulative hazard function of , is given by
and the corresponding estimator of is . The superscript “dr” in these estimators indicates that the estimators are doubly robust in the sense of being consistent and asymptotically normal if (i) is correctly specified, (ii) and are both correctly specified, or (iii) both (i) and (ii) hold.
Now θ can be estimated by , and it is easy to see that is doubly robust in the same sense described above. Because the () are not guaranteed to be probability measures, is not guaranteed to lie in the range of h. When falls outside the range of h, it can be truncated into the range of h without changing its consistency or asymptotic normality. It is not immediately clear whether is locally efficient, that is, whether it attains the nonparametric information bound when all three working models are correct. This is the main motivation for our developing a second DR estimator of θ based directly on the efficient influence function for estimating θ.
2.5 Second doubly robust (DR2) estimator
In Appendix A, we show that the efficient influence function for estimating θ is
where , , and . Motivated by this result, we propose to estimate θ by setting the sample average of an empirical version of to 0. The resulting estimator is
The use of in to replace in (6) is an attempt to provide maximal protection against model misspecification. In Appendix B, we show that is indeed doubly robust and, furthermore, locally efficient. Like , may fall outside the range of h but can be truncated back into the range of h.
3 Numerical results
Here we report a simulation study of the finite-sample performance of the methods described in Section 2. Data are generated according to the following mechanism:
where , I is the 2-by-2 identity matrix, , and is either 0 (no treatment effect) or −0.5 (protective treatment effect). The function h in the definition of θ is take to be (2) with . The true value of θ is 0 when and approximately 0.15 when . We consider two sample sizes: . In each case, 1,000 replicate samples are generated.
Each simulated sample is analyzed using the four methods (OR, IPW, DR1 and DR2) described in Section 2 as well as a naive method that ignores censoring and possible confounding and estimates θ with the U-statistic
where and . The OR, IPW, DR1 and DR2 methods are implemented with correct and incorrect working models. The correct models are:
where , , and and are unspecified baseline hazard functions. The incorrect models result from replacing with in the correct models. All models are estimated using the maximum likelihood approach.
Table 1 shows a summary of the simulation results: empirical bias and standard deviation for estimating θ. As expected, the naive method is severely biased. The OR estimator becomes biased when the model is misspecified, as does the IPW estimator when the models and are misspecified. In contrast, the two DR estimators are nearly unbiased unless all models are misspecified, demonstrating double robustness. Regardless of model (in)correctness, the estimators that adjust for confounding and censoring generally follow the pattern in terms of variability. The efficiency comparison of the two DR estimators does not seem to follow a clear pattern. At , DR2 appears more efficient than DR1 when all models are correct, but this difference appears to diminish with increasing sample size.
We now apply the methods compared in Section 3.1 to a study of hospitalized pneumonia in young children, described in Section 1.13 of Klein and Moeschberger . The study is based on 3,470 newborn children from the National Longitudinal Survey of Youth . The research question is whether the mother’s feeding choice (breast feeding or not) protects the infant against hospitalized pneumonia in the first year of life. The outcome variable of interest is the time to hospitalized pneumonia, which is restricted to one year and possibly censored before one year. The available baseline covariates are infant race (black, white or other), indicators of normal birthweight (at least 5.5 pounds) and having at least one sibling, and some maternal and family characteristics: age, years of schooling, alcohol use, cigarette use, region of the country (Northeast, Northcentral, South and West), poverty, and urban environment.
We are interested in the causal effect of breast feeding on hospitalized pneumonia as measured by θ with h defined by (2). This effect measure can be interpreted as a win-lose probability difference in a hypothetical comparison of the restricted times to hospitalized pneumonia of two randomly chosen infants, who are randomly assigned to breast feeding and no breast feeding as in a clinical trial. The propensity score model in our analysis is a logistic regression model based on all infant characteristics as well as maternal age, cigarette use, and years of schooling. The OR model is specified as a proportional hazards model based on infant characteristics, maternal cigarette use, and urban environment, with separate parameters for each treatment group. The model for censoring is also a separate proportional hazards model for each treatment group, with infant race and maternal age as covariates.
Table 2 shows the point estimates of θ obtained using the five methods compared in Section 3.1 together with nonparametric bootstrap standard errors based on 1,000 bootstrap samples. The IPW method clearly stands out with a large positive point estimate and a large standard error, both of which are likely due to the large variability of the IPW estimator. Considering the large standard error, the IPW estimate of θ may well be due to random variation and does not establish a positive effect of breast feeding. The naive method, on the other hand, has the smallest (i. e., most negative) point estimate and also a relatively large standard error. The results are somewhat similar for the other three methods (OR, DR1 and DR2). DR2 is the only method that reaches statistical significance (with or without adjusting for multiplicity); however, the estimated effect is rather small in magnitude under the DR2 method. Taken together, the point estimates and standard errors in Table 2 provide no clear evidence for a substantial effect of breast feeding on hospitalized pneumonia.
Mann–Whitney-type causal effects are clinically relevant, easy to interpret, and readily applicable to a wide range of study settings. This article considers estimation of such effects when the outcome variable is a survival time subject to right censoring. We have derived four methods for doing this and compared them theoretically and empirically. Among these methods, the OR and DR methods have unique advantages, and their suitability to a given application will depend on the available information and the relative importance of bias versus efficiency. If one is primarily concerned about bias, then the DR methods may be preferable. The OR method is more efficient when the OR model is correctly specified. No clear advantage has been observed for the IPW method.
The relationship between the two DR estimators remains an open question. The DR1 estimator adapted from Zhang and Schaubel  is known to be doubly robust. The proposed DR2 estimator, based directly on the efficient influence function for estimating θ, is doubly robust and locally efficient. In our simulation study, the two estimators appear to perform similarly in large samples when all working models are correctly specified, which suggests that the DR1 estimator might be locally efficient as well. Further research is warranted to clarify how the two DR estimators might relate to each other.
All of these methods assume that treatment assignment is ignorable in the sense of (3). The ignorability assumption cannot be validated with observed data and must be based on background knowledge. If there is insufficient background knowledge to justify the ignorability assumption, it is important to assess the robustness of study results in the presence of unmeasured confounders. Methods for conducting such a sensitivity analysis have been developed for some causal effects [e. g., 28], [e. g., 29] but not for the Mann–Whitney-type effects considered here. Further research is needed to close this methodological gap.
Funding source: Hong Kong Polytechnic University
Award Identifier / Grant number: G-YBCU
Funding statement: Liu’s research was supported by Hong Kong Polytechnic University (grant # G-YBCU) and General Research Funding from Research Grants Council of Hong Kong (grant # 15327216).
We thank the Associate Editor and the Reviewer for insightful and constructive comments on an earlier version of this paper. A part of this research was conducted while the first author was visiting the Hong Kong Polytechnic University.
Appendix A Semiparametric theory
Here we derive the efficient influence function for estimating θ using the monotone coarsening argument of Tsiatis . Recall that the observed data for an individual subject is , which is considered a coarsened version of the “full data” . Unless otherwise stated, the distributions of and are restricted by assumptions (3)–(5) only. The full-data Hilbert space is the space of functions of with zero mean and finite variance, equipped with the covariance inner product. The observed-data Hilbert space is the space of functions of with zero mean and finite variance, equipped with the covariance inner product.
First, we characterize the influence function of any regular, asymptotically linear (RAL) estimator of θ based on the full data. For argument’s sake, we assume for a moment that is known, as in a randomized clinical trial. In this special case, a nonparametric RAL estimator of θ based on the full data is given by
Using the theory of U-statistics [e. g., 30, Chapter 12], it is easy to see that the above estimator is RAL with influence function
With known, the orthogonal complement of the full-data tangent space is
and the influence function of any full-data RAL estimator of θ takes the form
for some function b such that . Now we remove the assumption that is known. An RAL estimator of θ in this larger model is certainly an RAL estimator of θ in the smaller model with known. Therefore, the influence function of an RAL estimator of θ in the larger model must also belong to .
Next, we characterize the influence function of any RAL estimator of θ based on the observed data. According to Tsiatis [21, Chapter 9], such an influence function can be represented as , where satisfies
and is an element of , the augmentation space for censoring. It is easy to verify that equation (A.1) is satisfied by , where
As shown in Tsiatis [21, page 217], the augmentation space consists of martingale integrals of the form
where q is an arbitrary function, , and and are, respectively, the conditional survival and cumulative hazard functions of C given . Thus, the influence function of any RAL estimator of θ based on the observed data must be an element of , which can be written as
for some functions b and q.
To find the efficient influence function, the influence function with the smallest possible variance, we need to minimize the variance of with respect to b and q, which amounts to projecting into the orthogonal complement of . To this end, we note that and are orthogonal to each other, so it suffices to project into and separately. The projection of into is easily seen to be
Section 10.4 of Tsiatis  indicates that the projection of into is
The efficient influence function is therefore
which is equivalent to equation (6) in the main paper.
Appendix B Asymptotic theory
Standard regularity conditions in the M-estimation theory [e. g., 30, Chapter 5] are assumed. These include identifiability and smoothness (in parameters) of working models, existence of integrable envelopes that permit use of the dominated convergence theorem, and certain Donsker properties that help deal with random functions. Techniques for verifying the Donsker property can be found in van der Vaart and Wellner .
Let denote the true distribution of , and let denote the empirical distribution of , . Write for the empirical processes. For any measures , we write
B.1 Asymptotics for
Here we assume that the model is correct and that and take values in a suitable Banach space with
For any in the space for , we define
Then , , and
It follows from the theory of U-statistics [e. g., 30, Theorem 12.3] that, for any fixed ,
where . It can be argued as in Nolan and Pollard [32, proof of Theorem 5] that the term in (B.2) is uniformly negligible for in a neighborhood of . This, together with Theorem 19.24 of van der Vaart , implies that
We assume that the map is differentiable at with derivative . By the delta method,
B.2 Asymptotics for
Here we assume that the models and are correct and that and take values in suitable Banach spaces with
Let us define
and, for ,
Then , , and . For , it can be argued as before that
where . It is straightforward to see that
For , we assume that the map is differentiable at with derivative , and use the delta method to deduce that
which implies that
Applying the delta method once again to , we see that
B.3 Asymptotics for
Here we outline a general approach to the analysis of using the functional delta method. We identify , and their estimates as elements of , the space of real-valued functions of bounded variation equipped with the total variation norm. Assuming that is bounded, we have
Therefore, the mapping of to is Frechet-differentiable at with derivative
For specific working models, Zhang and Schaubel  show that () converges in probability to uniformly in if (i) is correctly specified, (ii) and are both correctly specified, or (iii) both (i) and (ii) hold. For general working models that satisfy (i) or (ii), we assume that converges weakly to a random element in with
for some influence function . This is generally true for parametric working models. With and specified as semiparametric proportional hazards models, an explicit expression for is given in Zhang and Schaubel [19, Web-Based Supplementary Materials]. Now it follows from Theorem 20.8 of van der Vaart  and simple algebra that
B.4 Asymptotics for
Here we assume that (i) is correctly specified, (ii) and are both correctly specified, or (iii) both (i) and (ii) hold. With possibly misspecified models, we assume that converges in probability to some with
We first demonstrate the consistency of under condition (i) or (ii). Write . Under mild regularity conditions, converges in probability to , where
with . Zhang and Schaubel  have shown that under condition (i) or (ii). Under condition (ii), , , , , and converges to
It follows from Tsiatis [21, Section 9.3 and Lemma 10.4] that
Substituting (B.11) and (B.12) into (B.10) leads to
Next, we show that is asymptotically normal. Define
Then , , and it can be argued as before that
We assume that ϑ is differentiable at with derivative , which implies that
It follows that
Finally, we show that is locally efficient, that is, the influence function of is equal to when all three models are correct. We do so by showing that under conditions (i) and (ii). Under condition (ii), for any , and its partial derivative, , must be zero. Similarly, we can show that and are both zero under condition (i). Therefore, when all three models are correct, and
2. Agresti A. Categorical data analysis. 3rd ed. Hoboken, NJ: John Wiley and Sons; 2013.Search in Google Scholar
5. Acion L, Peterson JJ, Temple S, Arndt S. Probabilistic index: an intuitive non-parametric approach to measuring the size of treatment effects. Stat Med. 2006;25:591–602.10.1002/sim.2256Search in Google Scholar
7. Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 1: General issues and tail-area-based methods. Stat Med. 2006;25:543–57.10.1002/sim.2323Search in Google Scholar
9. Liu W, Zhang Z, Nie L, Soon G. A case study in personalized medicine: rilpivirine versus efavirenz for treatment-naive HIV patients. J Am Stat Assoc. 2017;112:1381–92.10.1080/01621459.2017.1280404Search in Google Scholar
10. Lumley T. Good, better, worst: what do rank tests really test? Presented at Canterbury Statistics Day, 2012. Available online at http://www.math.canterbury.ac.nz/canterbury-tails/download/68/Lumley—Plant-and-Food.pdf. 2012.Search in Google Scholar
11. Chen SX, Qin J, Tang CY. Mann-Whitney test with adjustments to pretreatment variables for missing values and observational study. J R Stat Soc, Ser B, Stat Methodol. 2013;75:81–102.10.1111/j.1467-9868.2012.01036.xSearch in Google Scholar
12. Vermeulen K, Thas O, Vansteelandt S. Increasing the power of the Mann–Whitney test in randomized experiments through flexible covariate adjustment. Stat Med. 2015;34:1012–30.10.1002/sim.6386Search in Google Scholar
14. Hubbard A, van der Laan MJ, Robins JM. Nonparametric locally efficient estimation of the treatment specific survival distribution with right censored data and covariates in observational studies. In: Halloran E, Berry D, editors. Statistical models in epidemiology: the environment and clinical trials. New York: Springer; 1999. p. 135–78.10.1007/978-1-4612-1284-3_3Search in Google Scholar
16. Zhang M. Robust methods to improve efficiency and reduce bias in estimating survival curves in randomized clinical trials. Lifetime Data Anal. 2015;21:119–37.10.1007/s10985-014-9291-ySearch in Google Scholar
18. Zhang M, Schaubel DE. Estimating differences in restricted mean lifetime using observational data subject to dependent censoring. Biometrics. 2011;67:740–9.10.1111/j.1541-0420.2010.01503.xSearch in Google Scholar
19. Zhang M, Schaubel DE. Double-robust semiparametric estimator for differences in restricted mean lifetimes using observational data. Biometrics. 2012;68:999–1009.10.1111/j.1541-0420.2012.01759.xSearch in Google Scholar
20. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Baltimore, MD: Johns Hopkins University Press; 1993.Search in Google Scholar
21. Tsiatis AA. Semiparametric theory and missing data. New York: Springer; 2006.Search in Google Scholar
23. Cox DR. Regression models and life tables (with discussion). J R Stat Soc, Ser B, Stat Methodol. 1972;34:187–200.Search in Google Scholar
25. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–66.10.1080/01621459.1994.10476818Search in Google Scholar
27. National Longitudinal Survey of Youth. NLS Handbook. Columbus, Ohio: Center for Human Resource Research, Ohio State University; 1995.Search in Google Scholar
28. Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc, Ser B, Stat Methodol. 1983;45:212–8.10.1017/CBO9780511810725.017Search in Google Scholar
© 2019 Walter de Gruyter GmbH, Berlin/Boston
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.