Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter March 14, 2019

A Falsifiability Characterization of Double Robustness Through Logical Operators

Constantine Frangakis

Abstract

We address the characterization of problems in which a consistent estimator exists in a union of two models, also termed as a doubly robust estimator. Such estimators are important in missing information, including causal inference problems. Existing characterizations, based on the semiparametric theory of projections, have seen sufficient progress, but can still leave one’s understanding less than satisfied as to when and especially why such estimation works. We explore here a different, explanatory characterization – an exegesis based on logical operators. We show that double robustness exists if and only if we can produce consistent estimators for each contributing model based on an “AND” estimator, i. e., an estimator whose consistency generally needs both models to be correct. We show how this characterization explains double robustness through falsifiability.

1 Motivation

Consider the following problem that often motivates doubly robust estimation [1]. For patients with a particular disease, one of two treatments, z=1 or 0 can be given, in which case a potential outcome Y(z) can be observed [2]. For each patient i, physicians first measure a set of covariates Xi, and assign treatment Zi such that the mechanism of assignment given Xi is ignorable [2], i. e. such that Zi is independent of Yi(z) given Xi. Finally, physicians measure outcome Yi:=Yi(Zi). The target is to estimate the average outcomes E0{Yi(z)} if all patients had been assigned z=1 vs 0. For simplicity, focus on estimating one of the two averages, say τ0:=E0{Yi(z=1)} using consistent estimators.

The average τ0 is identified from the observed data distribution as E0{E0(YiZi=1,Xi)}. If the covariate is discrete, substituting the empirical distribution provides a consistent, nonparametric MLE. If, however, the covariate space is continuous, an accurate nonparametric estimator is infeasible [3], and researchers often consider additional assumptions.

One usual assumption is that the regression y0(Xi):=E0(YiZi=1,Xi) is in a small enough model that renders y0 consistently estimable by at least one computable estimator, say yˆ(Xi). Under general conditions, this implies that the limit of the substitution estimator 1nyˆ(Xi) is the same as the limit of 1ny0(Xi), which is the desired τ0. An alternative assumption has been that physicians provide enough information on the propensity score e0(Xi):=pr0(Zi=1Xi) [4] to restrict it in a model that renders it consistently estimable by at least one computable estimator, say eˆ(Xi). Generally, this implies that the estimator 1nYiZi/eˆ(Xi) has the same limit as that of 1nYiZi/e0(Xi), which, again, is the true target τ0. Below we abbreviate the first assumption by y0=y, where y is the general form of the probability limit of yˆ even if the assumption is false. Similarly, we abbreviate the second assumption by e0=e, where e is the general form of the probability limit of eˆ even if the above assumption is false.

An interesting case is if one assumes that, for the true distribution of the data, y0=y or e0=e (or both). As a union set, this is more plausible than either assumption considered only by itself. For this assumption, it is well known that the estimator

(1)1nyˆ(Xi)+YiZi/eˆ(Xi)yˆ(Xi)Zi/eˆ(Xi)

is consistent for τ0, and is therefore called a “doubly robust estimator” [1], [5].

Although the verification of this result is easy, standard derivations of the above estimator in the first place are relatively lengthy and involve semiparametric projection theory. One such derivation includes finding the form of all regular and asymptotically linear estimators of τ0 under assumption y0=y only; and then doing the same under assumption e0=e only. Then the set of estimators each of which can be expressed in both forms [6] characterizes the doubly robust estimators of the problem.

If the estimator solves the nonparametric influence function evaluated at some working models, then an alternative way to find the robustness conditions of such estimator is to find the zeros of the second order remainder in the von Mises expansion. Specifically if the remainder has factors that are differences between “posited minus true” value of certain component, then correct specification of that component is a sufficient condition for robustness (Mark van der Laan, personal communication).

More generally, such characterization based on semiparametric theory, have seen sufficient progress [7], [3], but still leave one’s understanding less than satisfied as to when and especially why such estimation works. Our purpose here is to add to this understanding by presenting an exegesis – an alternative characterization based on logical operators that focuses on falsifiability.

2 Characterization emphasizing falsifiability

We argue that double robustness is critically related to how to produce an OR operator using an AND operator. By an operator here we mean the equivalence class that contains a possibly random function with a probability limit, and all other functions with the same limit. Denote “share same equivalence class” by =.

For the example with the ignorable assignment of the previous section, consider an operator τAND(y,e) that takes as input the components (y,e) and has the “AND” property that:

(2)τAND(y,e)=τ0(the true value) ify=y0ANDe=e0.

Then, the operator defined as

(3)τOR(y,e):=τAND(y,e0)+τAND(y0,e)τAND(y,e)produces the true valueτ0ify=y0ORe=e0.

This operator provides explanatory and predictive power on a number of important points regarding the structure of doubly robust estimators.

(a) The logical operator reproduces easily well known results. To see how the logical operation argument can easily reproduce this result in the ignorable assignment example, we can choose τAND(y,e) to depend on both components (y,e) as the operator

(4)τAND(y,e):=1ny(Xi)Zi/e(Xi)with pr. limitE0{y(Xi)e0(Xi)/e(Xi)}

satisfying (2). Then, we can obtain τAND(y,e0) as equivalent to (with same probability limit as) the operator 1ny(Xi) (the latter has the same limit as 1ny(Xi)Zi/e0(Xi)); and we can obtain τAND(y0,e) as equivalent to the operator 1nYiZi/e(Xi) (the latter has the same limit as 1ny0(Xi)Zi/e(Xi)). Then, the OR operator in (3) is constructed as

(5)1ny(Xi)τAND(y,e0)+YiZi/e(Xi)τAND(y0,e)y(Xi)Zi/e(Xi)τAND(y,e).

So, estimating the components (y,e) by estimators (yˆ,eˆ) produces the estimator version of (5)

(6)τOR(yˆ,eˆ):=1nyˆ(Xi)τAND(yˆ,e0)+YiZi/eˆ(Xi)τAND(y0,eˆ)yˆ(Xi)Zi/eˆ(Xi)τAND(yˆ,eˆ),

which, by (3), has the property that τOR(yˆ,eˆ)=τ0ifyˆ=y0OReˆ=e0. In other words, we have constructed in this alternative way the estimator τOR(yˆ,eˆ), which equals to (1) (see also Fig. 1), as a doubly robust estimator.

For the above example, there exist a number of other estimators that are doubly robust [7]. However, all such estimators are in the same equivalence class as above, and they share the same influence function. Also, although the above example is relatively simple, many enriched variations, including parametric vs single value for (y,e); multiple time points; other censored or missing data versions; or more complex outcomes, share the basic structure as above.

(b) The three-term structure arises from the general logical relation between an OR and an AND operator. We can generalize the above arguments as follows.

Result.

If a consistently estimable operator τAND(y,e) has its two terms, τAND(y,e0) and τAND(y0,e), consistently estimable, then the resulting estimator of the operator τOR(y,e) in (3) produces a doubly robust estimator. Also, if there exists an OR operator, then it has the structure (3).

The first claim is easy to check from above. For the second claim, note that if there exists an OR operator, say τOR(y,e), then it is also an AND operator, and, when used as such in the RHS of (3) it produces the OR operator 2τ0τOR(y,e). If we now use this new OR operator again as an AND operator in the RHS of (3), we obtain back the original OR operator.

Figure 1 Generation of OR from AND operator, and falsifiability in the example of Section 1.

Figure 1

Generation of OR from AND operator, and falsifiability in the example of Section 1.

This result predicts that the three-term structure of (1) is not specific to the above example, but rather obeys the logical structure that any OR operator has as constructed from an AND operator (3).

(c) The role of the AND operator in falsifiability. The AND operator can be seen more practically as essential for falsifiability. To see this, focus in the case where indeed either e=e0 or y=y0, as in the last two rows of Figure 1 for the example discussed above. Suppose the researcher, who does not know which of the rows is correct, observes that the estimator forms of τAND(y,e0) and τAND(y0,e) differ beyond statistical variability. The first conclusion is that one of the two limit values is correct and the other is incorrect. It is by construction, and not happenstance, that in either row, the operator τAND(y,e) points to the incorrect value of τ, and it is in this way that we can conclude that its alternative is the true value.

(d) Connection with variation dependent components. Double robustness is usually examined only when the components (y,e) are variation independent, meaning that the hypothesis, say, H1:y0=y does not a priori alter the possibility that the hypothesis, say H2e0=e, is correct. It is possible that the researcher may want to consider hypotheses on components that are not variation independent. Nevertheless, the same characterization based on the AND operator in the above result appears to be unaffected by and applicable to problems with variation dependent components.

Consider an extreme example where the estimand now is the true regression y0(x), and, as with simple hypothesis testing, assume that it is either equal to a specified function y(1)(x), or equal to a different specified function y(2)(x). As with double robustness, this assumption for y0(x) is a union of two models, but where, now, the components involved in each of the two models are the same and equal to the estimand, i. e., there is complete variation dependence. As is well known, though, one can easily discern with high precision that the true regression will be the one with the smallest estimated mean squared error in a large enough sample. This argument is also the result of the operator yAND(y(1),y(2)) that chooses the regression function with the larger (worse) of the two estimated mean squared errors, mse(y(1)) and mse(y(2)) (and chooses any of the two if they have equal mse). Then, note that (i) yAND(y(1),y(2)) is indeed an AND operator in the sense of (2): it equals y0 if (but not necessarily only if) y0=y(1) and y0=y(2); (ii) the resulting OR operator

(7)yOR(y(1),y(2)):=yAND(y(1),y0)+yAND(y0,y(2))yAND(y(1),y(2))

still produces the true regression y0(x) if either y0=y(1) or y0=y(2); and (iii) the AND operator actually never produces the true value y0, but this does not matter because its role is to show the false value and cancel it in (7).

(e) Further comments. The interpretation of why and how one of the two component models is misspecified is important in the discussion between the physician and the statistician. As [3] (p. 923) also note, the construction of a doubly robust estimator allows us to point to which of the two component estimators it disagrees with the most, and interpret this as evidence of which component is misspecified. Although useful, this observation is different from that discussed in (c) above, and is formed after construction and in terms of the doubly robust estimator. Therefore, such observation by itself does not quite answer why and how this evidence arises, as with the more elemental use of the components in the three terms of the OR operator.

The present characterization does not antagonize use of semiparametric methods, which can be very useful for deriving, when possible, AND operators with regularly estimable marginal components. On the other hand, there exist union models with consistent estimators that are irregular [8], but which still obey the three-term structure discussed above.

References

1. Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc. 1999;94:1096–120.10.1080/01621459.1999.10473862Search in Google Scholar

2. Rubin DB. Inference and missing data. Biometrika. 1976;63:581–92.10.1093/biomet/63.3.581Search in Google Scholar

3. Robins JM, Rotnitzky A. Comments on “Inference for semiparametric models: some questions and an answer” by Bickel PJ and Jaimyoung K. Stat Sin. 2011;11:920–36.Search in Google Scholar

4. Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.10.21236/ADA114514Search in Google Scholar

5. Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998;66:315–31.10.2307/2998560Search in Google Scholar

6. Tsiatis A, Davidian M. Comments on “Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data” by Kang JDY and Schafer JL. Stat Sci. 2007;22:569–73.10.1214/07-STS227BSearch in Google Scholar

7. Kang JD, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007;22:523–39.Search in Google Scholar

8. Frangakis C, Rubin D. Rejoinder to discussions on addressing an idiosyncrasy in estimating survival curves using double sampling in the presence of self-selected right censoring. Biometrics. 2001;57:351–3.10.1111/j.0006-341X.2001.00351.xSearch in Google Scholar

Received: 2018-06-04
Revised: 2018-10-25
Accepted: 2018-11-09
Published Online: 2019-03-14
Published in Print: 2019-04-26

© 2019 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.