We address the characterization of problems in which a consistent estimator exists in a union of two models, also termed as a doubly robust estimator. Such estimators are important in missing information, including causal inference problems. Existing characterizations, based on the semiparametric theory of projections, have seen sufficient progress, but can still leave one’s understanding less than satisfied as to when and especially why such estimation works. We explore here a different, explanatory characterization – an exegesis based on logical operators. We show that double robustness exists if and only if we can produce consistent estimators for each contributing model based on an “AND” estimator, i. e., an estimator whose consistency generally needs both models to be correct. We show how this characterization explains double robustness through falsifiability.
Consider the following problem that often motivates doubly robust estimation . For patients with a particular disease, one of two treatments, or 0 can be given, in which case a potential outcome can be observed . For each patient i, physicians first measure a set of covariates , and assign treatment such that the mechanism of assignment given is ignorable , i. e. such that is independent of given . Finally, physicians measure outcome . The target is to estimate the average outcomes if all patients had been assigned vs 0. For simplicity, focus on estimating one of the two averages, say using consistent estimators.
The average is identified from the observed data distribution as . If the covariate is discrete, substituting the empirical distribution provides a consistent, nonparametric MLE. If, however, the covariate space is continuous, an accurate nonparametric estimator is infeasible , and researchers often consider additional assumptions.
One usual assumption is that the regression is in a small enough model that renders consistently estimable by at least one computable estimator, say . Under general conditions, this implies that the limit of the substitution estimator is the same as the limit of , which is the desired . An alternative assumption has been that physicians provide enough information on the propensity score  to restrict it in a model that renders it consistently estimable by at least one computable estimator, say . Generally, this implies that the estimator has the same limit as that of , which, again, is the true target . Below we abbreviate the first assumption by , where y is the general form of the probability limit of even if the assumption is false. Similarly, we abbreviate the second assumption by , where e is the general form of the probability limit of even if the above assumption is false.
An interesting case is if one assumes that, for the true distribution of the data, or (or both). As a union set, this is more plausible than either assumption considered only by itself. For this assumption, it is well known that the estimator
Although the verification of this result is easy, standard derivations of the above estimator in the first place are relatively lengthy and involve semiparametric projection theory. One such derivation includes finding the form of all regular and asymptotically linear estimators of under assumption only; and then doing the same under assumption only. Then the set of estimators each of which can be expressed in both forms  characterizes the doubly robust estimators of the problem.
If the estimator solves the nonparametric influence function evaluated at some working models, then an alternative way to find the robustness conditions of such estimator is to find the zeros of the second order remainder in the von Mises expansion. Specifically if the remainder has factors that are differences between “posited minus true” value of certain component, then correct specification of that component is a sufficient condition for robustness (Mark van der Laan, personal communication).
More generally, such characterization based on semiparametric theory, have seen sufficient progress , , but still leave one’s understanding less than satisfied as to when and especially why such estimation works. Our purpose here is to add to this understanding by presenting an exegesis – an alternative characterization based on logical operators that focuses on falsifiability.
2 Characterization emphasizing falsifiability
We argue that double robustness is critically related to how to produce an OR operator using an AND operator. By an operator here we mean the equivalence class that contains a possibly random function with a probability limit, and all other functions with the same limit. Denote “share same equivalence class” by .
For the example with the ignorable assignment of the previous section, consider an operator that takes as input the components and has the “AND” property that:
Then, the operator defined as
This operator provides explanatory and predictive power on a number of important points regarding the structure of doubly robust estimators.
(a) The logical operator reproduces easily well known results. To see how the logical operation argument can easily reproduce this result in the ignorable assignment example, we can choose to depend on both components as the operator
satisfying (2). Then, we can obtain as equivalent to (with same probability limit as) the operator (the latter has the same limit as ); and we can obtain as equivalent to the operator (the latter has the same limit as ). Then, the OR operator in (3) is constructed as
So, estimating the components by estimators produces the estimator version of (5)
For the above example, there exist a number of other estimators that are doubly robust . However, all such estimators are in the same equivalence class as above, and they share the same influence function. Also, although the above example is relatively simple, many enriched variations, including parametric vs single value for ; multiple time points; other censored or missing data versions; or more complex outcomes, share the basic structure as above.
(b) The three-term structure arises from the general logical relation between an OR and an AND operator. We can generalize the above arguments as follows.
If a consistently estimable operator has its two terms, and , consistently estimable, then the resulting estimator of the operator in (3) produces a doubly robust estimator. Also, if there exists an OR operator, then it has the structure (3).
The first claim is easy to check from above. For the second claim, note that if there exists an operator, say , then it is also an operator, and, when used as such in the RHS of (3) it produces the OR operator . If we now use this new OR operator again as an AND operator in the RHS of (3), we obtain back the original OR operator.
This result predicts that the three-term structure of (1) is not specific to the above example, but rather obeys the logical structure that any OR operator has as constructed from an AND operator (3).
(c) The role of the AND operator in falsifiability. The AND operator can be seen more practically as essential for falsifiability. To see this, focus in the case where indeed either or , as in the last two rows of Figure 1 for the example discussed above. Suppose the researcher, who does not know which of the rows is correct, observes that the estimator forms of and differ beyond statistical variability. The first conclusion is that one of the two limit values is correct and the other is incorrect. It is by construction, and not happenstance, that in either row, the operator points to the incorrect value of τ, and it is in this way that we can conclude that its alternative is the true value.
(d) Connection with variation dependent components. Double robustness is usually examined only when the components are variation independent, meaning that the hypothesis, say, does not a priori alter the possibility that the hypothesis, say , is correct. It is possible that the researcher may want to consider hypotheses on components that are not variation independent. Nevertheless, the same characterization based on the AND operator in the above result appears to be unaffected by and applicable to problems with variation dependent components.
Consider an extreme example where the estimand now is the true regression , and, as with simple hypothesis testing, assume that it is either equal to a specified function , or equal to a different specified function . As with double robustness, this assumption for is a union of two models, but where, now, the components involved in each of the two models are the same and equal to the estimand, i. e., there is complete variation dependence. As is well known, though, one can easily discern with high precision that the true regression will be the one with the smallest estimated mean squared error in a large enough sample. This argument is also the result of the operator that chooses the regression function with the larger (worse) of the two estimated mean squared errors, and (and chooses any of the two if they have equal mse). Then, note that (i) is indeed an AND operator in the sense of (2): it equals if (but not necessarily only if) and ; (ii) the resulting OR operator
still produces the true regression if either or ; and (iii) the AND operator actually never produces the true value , but this does not matter because its role is to show the false value and cancel it in (7).
(e) Further comments. The interpretation of why and how one of the two component models is misspecified is important in the discussion between the physician and the statistician. As  (p. 923) also note, the construction of a doubly robust estimator allows us to point to which of the two component estimators it disagrees with the most, and interpret this as evidence of which component is misspecified. Although useful, this observation is different from that discussed in (c) above, and is formed after construction and in terms of the doubly robust estimator. Therefore, such observation by itself does not quite answer why and how this evidence arises, as with the more elemental use of the components in the three terms of the OR operator.
The present characterization does not antagonize use of semiparametric methods, which can be very useful for deriving, when possible, AND operators with regularly estimable marginal components. On the other hand, there exist union models with consistent estimators that are irregular , but which still obey the three-term structure discussed above.
1. Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc. 1999;94:1096–120.10.1080/01621459.1999.10473862Search in Google Scholar
3. Robins JM, Rotnitzky A. Comments on “Inference for semiparametric models: some questions and an answer” by Bickel PJ and Jaimyoung K. Stat Sin. 2011;11:920–36.Search in Google Scholar
6. Tsiatis A, Davidian M. Comments on “Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data” by Kang JDY and Schafer JL. Stat Sci. 2007;22:569–73.10.1214/07-STS227BSearch in Google Scholar
7. Kang JD, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007;22:523–39.Search in Google Scholar
8. Frangakis C, Rubin D. Rejoinder to discussions on addressing an idiosyncrasy in estimating survival curves using double sampling in the presence of self-selected right censoring. Biometrics. 2001;57:351–3.10.1111/j.0006-341X.2001.00351.xSearch in Google Scholar
© 2019 Walter de Gruyter GmbH, Berlin/Boston
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.