# Abstract

Mann–Whitney-type causal effects are clinically relevant, easy to interpret, and readily applicable to a wide range of study settings. This article considers estimation of such effects when the outcome variable is a survival time subject to right censoring. We derive and discuss several methods: an outcome regression method based on a regression model for the survival outcome, an inverse probability weighting method based on models for treatment assignment and censoring, and two doubly robust methods that involve both types of models and that remain valid under correct specification of the outcome model or the other two models. The methods are compared in a simulation study and applied to an observational study of hospitalized pneumonia.

## 1 Introduction

Comparative effectiveness research frequently involves comparing one treatment (*a* [1]. The causal effect of *i* and *j* denote two independent subjects. A simple example of *h* is given by

where *θ* based on (1) is used to order multiple treatments. On the other hand, for comparing two treatments, *θ* does provide unique insights that are not available from a comparison of means (if feasible at all).

In this article, we consider estimation of *θ* when the outcome of interest is a survival time subject to right censoring, which has not been considered in previous studies of Mann–Whitney-type causal effects. If *T* is a survival time, *θ* remains well defined and is just as meaningful as it is for another continuous outcome. In fact, the class of Mann–Whitney-type causal effects includes some well known and commonly used effect measures for survival outcomes. For example, if *τ* is a positive constant chosen to ensure identifiability, then *θ* is the difference in mean restricted survival time. A new effect measure for survival outcomes is obtained by taking *h* to be a truncated version of (1):

for which

Estimation of *θ* for a fully observed outcome has been considered by Chen et al. [11], Vermeulen et al. [12] and Zhang et al. [13], with focus on adjusting for confounders in observational studies and using auxiliary information to improve efficiency in randomized clinical trials. These methods cannot be used to estimate *θ* for a survival outcome subject to right censoring, which represents an additional challenge. We are not aware of any existing method for estimating *θ* for a right-censored survival outcome, except in the aforementioned special cases where *θ* is a difference in distribution function [14], [15], [16] or a difference in mean restricted survival time [17], [18], [19]. In both of these special cases, *θ* for a general function *h*, such as (2), that is not assumed to be additive. We derive and compare several methods: an outcome regression (OR) method based on a regression model for the survival outcome, an inverse probability weighting (IPW) method based on models for treatment assignment and censoring, and two doubly robust (DR) methods that involve both types of models and that remain valid under correct specification of the outcome model or the other two models. One of the DR methods is a straightforward extension of Zhang and Schaubel [19], and the other one is a new method based directly on the efficient influence function [20], [21] for estimating *θ*.

In the next section, we describe these methods and discuss their asymptotic behavior. In Section 3, we report a simulation study and present a real application. Concluding remarks are given in Section 4. Technical details are provided in appendices.

## 2 Methodology

### 2.1 Notation and assumptions

Let *A* denote the actual treatment received by an individual subject in a study, which may be a randomized clinical trial or an observational study. Assuming consistency or stable unit treatment value, the actual survival time is then

The ignorability assumption is trivially true in a randomized clinical trial. In an observational study, the assumption requires that *A* and the potential outcomes. We also assume positivity:

which is trivially true in a clinical trial but not trivial in an observational study.

The actual survival time *T* may be right-censored by a censoring time *C*. We assume independent censoring:

where ⊥ denotes independence. The observed data for an individual subject can be represented as

Our goal is to use the observed data to estimate *θ* for a general (specified) function *h*. For identifiability, we assume that *θ*, whose asymptotic properties are studied in Appendix B. Asymptotic variance formulas can be derived but may be cumbersome to use in variance estimation, depending on the specific forms of the working models involved. For ease of implementation, we use bootstrap standard errors and confidence intervals for inference.

### 2.2 Outcome regression (OR) estimator

The OR method is based on the fact that

where

To deal with the curse of dimensionality, we will assume a parametric or semiparametric model for

where *T* given

where Λ and *λ* and

Whether

At least for parametric models and the proportional hazards model, *θ* can be estimated by

If the model *θ* under mild regularity conditions. In Appendix B, we show that

### 2.3 Inverse probability weighted (IPW) estimator

It follows from assumptions (3)–(5) and a conditioning argument that

where *θ* can be estimated by

In reality, although

where *C* given

In an observational study, we also need to estimate *A* is binary, a typical choice for

Once *θ* is readily available as a weighted average:

The denominator here serves to normalize the inverse probability weights. If the models *θ* and asymptotically linear under appropriate conditions (see Appendix B).

### 2.4 First doubly robust (DR1) estimator

The DR1 method is adapted from the DR method of Zhang and Schaubel [19] for estimating the difference in mean restricted survival time. As an important by-product, Zhang and Schaubel [19] propose a DR estimator of

Then the Zhang-Schaubel estimator of

and the corresponding estimator of

Now *θ* can be estimated by *h*. When *h*, it can be truncated into the range of *h* without changing its consistency or asymptotic normality. It is not immediately clear whether *θ* based directly on the efficient influence function for estimating *θ*.

### 2.5 Second doubly robust (DR2) estimator

In Appendix A, we show that the efficient influence function for estimating *θ* is

where *θ* by setting the sample average of an empirical version of

where

The use of *h* but can be truncated back into the range of *h*.

## 3 Numerical results

### 3.1 Simulation

Here we report a simulation study of the finite-sample performance of the methods described in Section 2. Data are generated according to the following mechanism:

where **I** is the 2-by-2 identity matrix, *h* in the definition of *θ* is take to be (2) with *θ* is 0 when

Each simulated sample is analyzed using the four methods (OR, IPW, DR1 and DR2) described in Section 2 as well as a naive method that ignores censoring and possible confounding and estimates *θ* with the *U*-statistic

where

where

### Table 1

Method | Models | ||||||

Bias | SD | Bias | SD | ||||

naive | 0.255 | 0.077 | 0.173 | 0.078 | |||

OR | correct | 0.001 | 0.080 | −0.011 | 0.087 | ||

OR | incorrect | 0.157 | 0.097 | 0.141 | 0.103 | ||

IPW | correct | correct | 0.048 | 0.174 | 0.014 | 0.177 | |

IPW | incorrect | incorrect | 0.180 | 0.190 | 0.142 | 0.197 | |

DR1 | correct | correct | correct | 0.008 | 0.138 | 0.001 | 0.146 |

DR1 | correct | incorrect | incorrect | −0.004 | 0.100 | −0.013 | 0.106 |

DR1 | incorrect | correct | correct | 0.017 | 0.143 | 0.010 | 0.136 |

DR1 | incorrect | incorrect | incorrect | 0.155 | 0.120 | 0.146 | 0.112 |

DR2 | correct | correct | correct | 0.008 | 0.118 | −0.045 | 0.116 |

DR2 | correct | incorrect | incorrect | −0.018 | 0.105 | −0.056 | 0.117 |

DR2 | incorrect | correct | correct | 0.014 | 0.120 | −0.038 | 0.108 |

DR2 | incorrect | incorrect | incorrect | 0.138 | 0.122 | 0.096 | 0.121 |

naive | 0.256 | 0.050 | 0.172 | 0.048 | |||

OR | correct | 0.003 | 0.051 | −0.009 | 0.055 | ||

OR | incorrect | 0.162 | 0.061 | 0.147 | 0.064 | ||

IPW | correct | correct | 0.044 | 0.141 | 0.011 | 0.147 | |

IPW | incorrect | incorrect | 0.163 | 0.159 | 0.137 | 0.173 | |

DR1 | correct | correct | correct | 0.014 | 0.082 | 0.002 | 0.104 |

DR1 | correct | incorrect | incorrect | −0.008 | 0.068 | −0.019 | 0.077 |

DR1 | incorrect | correct | correct | 0.018 | 0.077 | 0.007 | 0.088 |

DR1 | incorrect | incorrect | incorrect | 0.157 | 0.080 | 0.146 | 0.078 |

DR2 | correct | correct | correct | 0.017 | 0.077 | −0.032 | 0.107 |

DR2 | correct | incorrect | incorrect | −0.020 | 0.099 | −0.046 | 0.140 |

DR2 | incorrect | correct | correct | 0.020 | 0.078 | −0.028 | 0.091 |

DR2 | incorrect | incorrect | incorrect | 0.146 | 0.110 | 0.117 | 0.141 |

Table 1 shows a summary of the simulation results: empirical bias and standard deviation for estimating *θ*. As expected, the naive method is severely biased. The OR estimator becomes biased when the model

### 3.2 Application

We now apply the methods compared in Section 3.1 to a study of hospitalized pneumonia in young children, described in Section 1.13 of Klein and Moeschberger [26]. The study is based on 3,470 newborn children from the National Longitudinal Survey of Youth [27]. The research question is whether the mother’s feeding choice (breast feeding or not) protects the infant against hospitalized pneumonia in the first year of life. The outcome variable of interest is the time to hospitalized pneumonia, which is restricted to one year and possibly censored before one year. The available baseline covariates are infant race (black, white or other), indicators of normal birthweight (at least 5.5 pounds) and having at least one sibling, and some maternal and family characteristics: age, years of schooling, alcohol use, cigarette use, region of the country (Northeast, Northcentral, South and West), poverty, and urban environment.

We are interested in the causal effect of breast feeding on hospitalized pneumonia as measured by *θ* with *h* defined by (2). This effect measure can be interpreted as a win-lose probability difference in a hypothetical comparison of the restricted times to hospitalized pneumonia of two randomly chosen infants, who are randomly assigned to breast feeding and no breast feeding as in a clinical trial. The propensity score model in our analysis is a logistic regression model based on all infant characteristics as well as maternal age, cigarette use, and years of schooling. The OR model is specified as a proportional hazards model based on infant characteristics, maternal cigarette use, and urban environment, with separate parameters for each treatment group. The model for censoring is also a separate proportional hazards model for each treatment group, with infant race and maternal age as covariates.

### Table 2

Method | PE | SE |

naive | −0.0098 | 0.0169 |

OR | −0.0020 | 0.0015 |

IPW | 0.1373 | 0.1902 |

DR1 | −0.0028 | 0.0018 |

DR2 | 0.0076 | 0.0023 |

Table 2 shows the point estimates of *θ* obtained using the five methods compared in Section 3.1 together with nonparametric bootstrap standard errors based on 1,000 bootstrap samples. The IPW method clearly stands out with a large positive point estimate and a large standard error, both of which are likely due to the large variability of the IPW estimator. Considering the large standard error, the IPW estimate of *θ* may well be due to random variation and does not establish a positive effect of breast feeding. The naive method, on the other hand, has the smallest (i. e., most negative) point estimate and also a relatively large standard error. The results are somewhat similar for the other three methods (OR, DR1 and DR2). DR2 is the only method that reaches statistical significance (with or without adjusting for multiplicity); however, the estimated effect is rather small in magnitude under the DR2 method. Taken together, the point estimates and standard errors in Table 2 provide no clear evidence for a substantial effect of breast feeding on hospitalized pneumonia.

## 4 Discussion

Mann–Whitney-type causal effects are clinically relevant, easy to interpret, and readily applicable to a wide range of study settings. This article considers estimation of such effects when the outcome variable is a survival time subject to right censoring. We have derived four methods for doing this and compared them theoretically and empirically. Among these methods, the OR and DR methods have unique advantages, and their suitability to a given application will depend on the available information and the relative importance of bias versus efficiency. If one is primarily concerned about bias, then the DR methods may be preferable. The OR method is more efficient when the OR model is correctly specified. No clear advantage has been observed for the IPW method.

The relationship between the two DR estimators remains an open question. The DR1 estimator adapted from Zhang and Schaubel [19] is known to be doubly robust. The proposed DR2 estimator, based directly on the efficient influence function for estimating *θ*, is doubly robust and locally efficient. In our simulation study, the two estimators appear to perform similarly in large samples when all working models are correctly specified, which suggests that the DR1 estimator might be locally efficient as well. Further research is warranted to clarify how the two DR estimators might relate to each other.

All of these methods assume that treatment assignment is ignorable in the sense of (3). The ignorability assumption cannot be validated with observed data and must be based on background knowledge. If there is insufficient background knowledge to justify the ignorability assumption, it is important to assess the robustness of study results in the presence of unmeasured confounders. Methods for conducting such a sensitivity analysis have been developed for some causal effects [e. g., 28], [e. g., 29] but not for the Mann–Whitney-type effects considered here. Further research is needed to close this methodological gap.

**Funding source: **Hong Kong Polytechnic University

**Award Identifier / Grant number: **G-YBCU

**Funding statement: **Liu’s research was supported by Hong Kong Polytechnic University (grant # G-YBCU) and General Research Funding from Research Grants Council of Hong Kong (grant # 15327216).

# Acknowledgment

We thank the Associate Editor and the Reviewer for insightful and constructive comments on an earlier version of this paper. A part of this research was conducted while the first author was visiting the Hong Kong Polytechnic University.

## Appendix A Semiparametric theory

Here we derive the efficient influence function for estimating *θ* using the monotone coarsening argument of Tsiatis [21]. Recall that the observed data for an individual subject is

First, we characterize the influence function of any regular, asymptotically linear (RAL) estimator of *θ* based on the full data. For argument’s sake, we assume for a moment that *θ* based on the full data is given by

Using the theory of *U*-statistics [e. g., 30, Chapter 12], it is easy to see that the above estimator is RAL with influence function

With

and the influence function of any full-data RAL estimator of *θ* takes the form

for some function *b* such that *θ* in this larger model is certainly an RAL estimator of *θ* in the smaller model with *θ* in the larger model must also belong to

Next, we characterize the influence function of any RAL estimator of *θ* based on the observed data. According to Tsiatis [21, Chapter 9], such an influence function can be represented as

and

As shown in Tsiatis [21, page 217], the augmentation space

where *q* is an arbitrary function, *C* given *θ* based on the observed data must be an element of

for some functions *b* and *q*.

To find the efficient influence function, the influence function with the smallest possible variance, we need to minimize the variance of *b* and *q*, which amounts to projecting

Section 10.4 of Tsiatis [21] indicates that the projection of

The efficient influence function is therefore

which is equivalent to equation (6) in the main paper.

## Appendix B Asymptotic theory

Standard regularity conditions in the *M*-estimation theory [e. g., 30, Chapter 5] are assumed. These include identifiability and smoothness (in parameters) of working models, existence of integrable envelopes that permit use of the dominated convergence theorem, and certain Donsker properties that help deal with random functions. Techniques for verifying the Donsker property can be found in van der Vaart and Wellner [31].

Let

### B.1 Asymptotics for θ ˆ or

Here we assume that the model

For any

Then

It follows from the theory of *U*-statistics [e. g., 30, Theorem 12.3] that, for any fixed

where

We assume that the map

Substituting (B.3) and (B.4) into (B.1) yields

### B.2 Asymptotics for θ ˆ ipw

Here we assume that the models

Let us define

and, for

Then

where

For

which implies that

Applying the delta method once again to

### B.3 Asymptotics for θ ˆ dr1

Here we outline a general approach to the analysis of

Therefore, the mapping of

For specific working models, Zhang and Schaubel [19] show that

for some influence function

### B.4 Asymptotics for θ ˆ dr2

Here we assume that (i)

We first demonstrate the consistency of

with

*θ*. Terms (B.6) and (B.7) are 0 because

Note that

It follows from Tsiatis [21, Section 9.3 and Lemma 10.4] that

Substituting (B.11) and (B.12) into (B.10) leads to

Next, we show that

Then

We assume that *ϑ* is differentiable at

It follows that

Finally, we show that

### References

1. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66:688–701.10.1037/h0037350Search in Google Scholar

2. Agresti A. Categorical data analysis. 3rd ed. Hoboken, NJ: John Wiley and Sons; 2013.Search in Google Scholar

3. Wilcoxon F. Individual comparisons by ranking methods. Biometrics. 1945;1:80–3.10.1007/978-1-4612-4380-9_16Search in Google Scholar

4. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947;18:50–60.10.1214/aoms/1177730491Search in Google Scholar

5. Acion L, Peterson JJ, Temple S, Arndt S. Probabilistic index: an intuitive non-parametric approach to measuring the size of treatment effects. Stat Med. 2006;25:591–602.10.1002/sim.2256Search in Google Scholar

6. Brumback LC, Pepe MS, Alonzo TA. Using the ROC curve for gauging treatment effect in clinical trials. Stat Med. 2006;25:575–90.10.1002/sim.2345Search in Google Scholar

7. Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 1: General issues and tail-area-based methods. Stat Med. 2006;25:543–57.10.1002/sim.2323Search in Google Scholar

8. Wang C, Scharfstein DO, Colantuoni E, Girard TD, Yan Y. Inference in randomized trials with death and missingness. Biometrics. 2017;73:431–40.10.1111/biom.12594Search in Google Scholar

9. Liu W, Zhang Z, Nie L, Soon G. A case study in personalized medicine: rilpivirine versus efavirenz for treatment-naive HIV patients. J Am Stat Assoc. 2017;112:1381–92.10.1080/01621459.2017.1280404Search in Google Scholar

10. Lumley T. Good, better, worst: what do rank tests really test? Presented at Canterbury Statistics Day, 2012. Available online at http://www.math.canterbury.ac.nz/canterbury-tails/download/68/Lumley—Plant-and-Food.pdf. 2012.Search in Google Scholar

11. Chen SX, Qin J, Tang CY. Mann-Whitney test with adjustments to pretreatment variables for missing values and observational study. J R Stat Soc, Ser B, Stat Methodol. 2013;75:81–102.10.1111/j.1467-9868.2012.01036.xSearch in Google Scholar

12. Vermeulen K, Thas O, Vansteelandt S. Increasing the power of the Mann–Whitney test in randomized experiments through flexible covariate adjustment. Stat Med. 2015;34:1012–30.10.1002/sim.6386Search in Google Scholar

13. Zhang Z, Ma S, Shen C, Liu C. Estimating Mann–Whitney-type causal effects. 2018. Under review.10.1515/jci-2018-0010Search in Google Scholar

14. Hubbard A, van der Laan MJ, Robins JM. Nonparametric locally efficient estimation of the treatment specific survival distribution with right censored data and covariates in observational studies. In: Halloran E, Berry D, editors. Statistical models in epidemiology: the environment and clinical trials. New York: Springer; 1999. p. 135–78.10.1007/978-1-4612-1284-3_3Search in Google Scholar

15. Zhang M, Schaubel DE. Contrasting treatment-specific survival using double-robust estimators. Stat Med. 2012;31:4255–68.10.1002/sim.5511Search in Google Scholar

16. Zhang M. Robust methods to improve efficiency and reduce bias in estimating survival curves in randomized clinical trials. Lifetime Data Anal. 2015;21:119–37.10.1007/s10985-014-9291-ySearch in Google Scholar

17. Chen P, Tsiatis AA. Causal inference on the difference of the restricted mean life between two groups. Biometrics. 2001;57:1030–8.10.1111/j.0006-341X.2001.01030.xSearch in Google Scholar

18. Zhang M, Schaubel DE. Estimating differences in restricted mean lifetime using observational data subject to dependent censoring. Biometrics. 2011;67:740–9.10.1111/j.1541-0420.2010.01503.xSearch in Google Scholar

19. Zhang M, Schaubel DE. Double-robust semiparametric estimator for differences in restricted mean lifetimes using observational data. Biometrics. 2012;68:999–1009.10.1111/j.1541-0420.2012.01759.xSearch in Google Scholar

20. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Baltimore, MD: Johns Hopkins University Press; 1993.Search in Google Scholar

21. Tsiatis AA. Semiparametric theory and missing data. New York: Springer; 2006.Search in Google Scholar

22. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.10.21236/ADA114514Search in Google Scholar

23. Cox DR. Regression models and life tables (with discussion). J R Stat Soc, Ser B, Stat Methodol. 1972;34:187–200.Search in Google Scholar

24. Cox DR. Partial likelihood. Biometrika. 1975;62:269–75.10.1093/biomet/62.2.269Search in Google Scholar

25. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–66.10.1080/01621459.1994.10476818Search in Google Scholar

26. Klein JP, Moeschberger ML. Survival analysis: techniques for censored and truncated data. 2nd ed. New York: Springer; 2003.10.1007/b97377Search in Google Scholar

27. National Longitudinal Survey of Youth. NLS Handbook. Columbus, Ohio: Center for Human Resource Research, Ohio State University; 1995.Search in Google Scholar

28. Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc, Ser B, Stat Methodol. 1983;45:212–8.10.1017/CBO9780511810725.017Search in Google Scholar

29. VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167:268–74.10.7326/M16-2607Search in Google Scholar

30. van der Vaart AW. Asymptotic statistics. Cambridge, UK: Cambridge University Press; 1998.10.1017/CBO9780511802256Search in Google Scholar

31. van der Vaart AW, Wellner JA. Weak convergence and empirical processes with applications to statistics. New York: Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar

32. Nolan D, Pollard D. Functional limit theorems for U-processes. Ann Probab. 1988;16:1291–8.10.1214/aop/1176991691Search in Google Scholar

**Received:**2018-03-11

**Revised:**2018-09-23

**Accepted:**2018-12-01

**Published Online:**2018-12-13

**Published in Print:**2019-04-26

© 2019 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.