## 1 Introduction

Estimators based on the propensity score (PS), the probability of receiving a treatment given baseline covariates, are popular for estimation of causal effects such as the average treatment effect (ATE), average treatment effect among the treated (ATT), or the average outcome under treatment. Such methods can be thought of as adjusting for the propensity score in place of baseline covariates, and generally require consistent estimation of the propensity score if it is not known. Common propensity score methods include stratification or subclassification [1–3], inverse probability of treatment weighting (IPTW) [4, 5], and propensity score matching [6–8].

A “balancing score” as defined by Rosenbaum and Rubin [8] is a function of baseline covariates such that treatment and baseline covariates are independent conditional on that function. The propensity score is perhaps the most well-known example of a balancing score, but balancing scores are more general. Typically, propensity score-based methods are said to be consistent when the true propensity score is consistently estimated. Methods that adjust for the propensity score nonparametrically, such as matching or stratification by the propensity score, actually only need that the estimated propensity score converge to some balancing score in order for the parameter of interest to be estimated consistently. However, we are not aware of specific claims in the literature that particular propensity score-based methods are consistent under this weaker condition. We say that an estimator using the propensity score or other balancing score has the balancing score property if it is consistent when the estimated propensity score converges to a balancing score.

Though not guaranteed in general, it is possible for an estimated propensity score based on a misspecified model to converge to a balancing score that is not equal to the true propensity score. Propensity score-based estimators that have the balancing score property are robust to this sort of estimator misspecification of the PS, while other propensity score-based estimators are not. The balancing score property is desirable because, even though most such estimators were initially developed based on the PS specifically, they inherit this robustness for free. Estimators with the balancing score property are in general not efficient.

An efficient estimator is one that achieves the minimum asymptotic variance of all regular estimators. In many cases, for example when estimating the ATE, ATT, and average outcome under treatment, doubly robust estimators can be constructed. A doubly robust estimator is one that relies on an estimate of both the propensity score and of the outcome regression, the conditional mean of the outcome given baseline covariates and treatment. Doubly robust estimators are consistent if either the estimated propensity score or outcome regression is consistent. Examples include targeted minimum loss-based estimation (TMLE) [9, 10] and augmented inverse probability of treatment weighted estimation (A-IPTW) [11, 12]. In addition to being doubly robust, both TMLE and A-IPTW are efficient when both the propensity score and outcome regression are consistently estimated.

In this article, we discuss a general class of estimators that have the balancing score property. We also construct a TMLE [9, 10] with the balancing score property. This new TMLE not only has the benefit of the robustness provided by the balancing score property, it also is a locally efficient, doubly robust plug-in estimator. This means that our new estimator retains all of the attractive properties of a traditional TMLE while gaining robustness that other estimators with the balancing score property enjoy when the propensity score only converges to a balancing score.

In Section 2, we introduce notation and define the statistical parameter we wish to estimate. In Section 3 we describe a TMLE for the statistical parameter. In Section 4 we discuss the balancing score property and describe the proposed new TMLE. In Section 5 we compare the performance of the new estimator to a traditional TMLE as well as other common estimator and conclude with a discussion in Section 6. A list of notation used throughout the article is provided in Appendix A. Some results and proofs not included in the main text are in Appendix A.2 and two modifications to the TMLE algorithm are presented in Appendix A.3. An example implementation of the proposed new TMLE in R [13] is provided in Appendix A.4.

## 2 Preliminaries

Consider the random variable *W* is a real-valued vector, *A* is binary with values in *Y* is univariate real number. Call the probability distribution of *O**W*. This is sometimes called a positivity assumption. Define the parameter mapping *P* to

Suppose *W* represents a vector of baseline covariates measured before treatment, and *Y* represents some outcome measured after treatment. Then under additional causal assumptions, *A* is independent of the counterfactual outcome had each observation received treatment 1 given covariates *W*. This is known as the randomization assumption or the “no unmeasured confounders” assumption, and the validity depends on the particular application. Under the randomization positivity assumptions,

For a probability distribution *w*. The parameter mapping *P* only through

For a distribution *W*. We may put some restriction on possible functions *g*, for example we may know that *W*. The model

Let *n* independent and identically distributed random variables drawn from *n* to denote an estimate based on a dataset of size *n*, so, for example,

## 3 Targeted minimum loss-based estimation

A plug-in estimator takes an estimate of the distribution *P* through *W*, we can calculate the plug-in estimate as

*W*. Plug-in estimators are desirable because they fully utilize known global constraints of

TMLE is a general framework for constructing a plug-in estimator for *W* along with the updated

An estimator that is asymptotically linear can be written as

*Q*and

*g*is

Suppose for now *Y* is binary or bounded by 0 and 1. A modification to the algorithm and a different TMLE are described in Appendix A.3 if this is not the case. The initial estimate

The updating step is defined by a choice of loss function *L* for *Q* such that *L* is the negative log likelihood,

Define the loss function

*Y*is binary,

*Y*is at least bounded by 0 and 1 if not binary,

*W*, and its true mean is minimized by

For a working submodel for

The estimate *W* as an initial estimate for

## 4 Balancing score property and proposed estimator

A function *b* of *W* is called a balancing score if *f* such that *W* between the treated and untreated observations is equal or balanced. That is, *W* which we state in Lemma 1 and offer a different proof in Appendix A.2.

*If**is a balancing score under distribution P, then*

This result gives rise to methods for estimating

In practice, an estimator

Estimators based only on the propensity score are not doubly robust. We now construct a locally efficient doubly robust estimator with the balancing score property. We start with initial estimators *Y* on *A* and *W*.

To update *A* and *b* to be the limits of *A* and *b* is not necessarily

*Y*is binary, and a linear working model

Define

*A*and

*A*and

Suppose for now that we have an estimate of *b* is a balancing score.

When initial estimator *b* is a balancing score and *b* is a balancing score and *W*, the initial estimator

We now return to the problem of estimating *b*. To estimate *k* categories based on quantiles. The parameter *W*, for example *k* is fixed and does not grow with sample size, stratification is not consistent, though one hopes that the residual bias is small [2]. If *k* is too large, there is a possibility of all observations in a particular stratum having the same value for *A*, in which case *n* consistent when *k* is fixed, the BSA-TMLE removes this remaining bias if *k* can be chosen based on cross-validation in such a way that it can grow with sample size.

Alternatively, when

## 5 Simulations

We demonstrate properties of the proposed BSA-TMLE in various scenarios, and compare it to other estimators. The estimators compared in simulations include a plug-in estimator based on just the initial estimator of

The plug-in estimator not adjusted for a balancing score is calculated as *W*. The non-DR-BSA plug-in estimator can be thought of as only adjusting for *W*. The IPTW estimator is calculated as

Summary of properties of compared estimators

Estimator | Plug-in | Consistent if | Efficient if | ||

Simple plug-in | |||||

BSA | |||||

DR-BSA | |||||

IPTW | |||||

TMLE | |||||

BSA-TMLE |

In the simulation studies, we use two methods for adjusting the initial estimator with the propensity score. All simulations were conducted in R [13]. The initial estimator

The initial estimates for *Y* is continuous, and logistic regression when *Y* is binary. To investigate robustness to various kinds of model misspecification, models are either correctly specified, or some relevant covariates are excluded.

The data generating distribution in the simulations was as follows. Baseline covariates *A* is Bernoulli with mean

*Y*is either Bernoulli or normal with variance 1 and mean

*m*is

*Y*is Bernoulli, or the identity if

*Y*is normal. All estimators were evaluated on 1,000 datasets of size

In the first scenario, which we call distribution one,

The first set of results in Table 2 demonstrate the balancing score property. The initial estimate

Simulation results for distribution one with

Estimator | n = 100 | n = 1,000 | ||||

Bias | Variance | MSE | Bias | Variance | MSE | |

BSA, NN | 0.0276 | 0.0180 | 0.0188 | 0.0026 | 0.0018 | 0.0018 |

BSA, GAM | 0.0075 | 0.0163 | 0.0163 | 0.0041 | 0.0015 | 0.0015 |

IPTW | –0.0249 | 0.0087 | 0.0093 | –0.0246 | 0.0010 | 0.0016 |

TMLE | 0.1063 | 0.0111 | 0.0224 | 0.1082 | 0.0010 | 0.0127 |

BSA-TMLE, NN | 0.0276 | 0.0180 | 0.0188 | 0.0026 | 0.0018 | 0.0018 |

BSA-TMLE, GAM | 0.0070 | 0.0164 | 0.0165 | 0.0037 | 0.0015 | 0.0015 |

Table 3 shows similar performance in a more realistic scenario. In this setting, the initial estimator for *n* because

Simulation results for distribution one with

Estimator | n = 100 | n = 1,000 | ||||

Bias | Variance | MSE | Bias | Variance | MSE | |

BSA, NN | 0.0311 | 0.0166 | 0.0176 | 0.0027 | 0.0016 | 0.0016 |

BSA, GAM | 0.0147 | 0.0159 | 0.0161 | 0.0033 | 0.0014 | 0.0014 |

IPTW | 0.0390 | 0.0410 | 0.0425 | 0.0357 | 0.0025 | 0.0037 |

TMLE | 0.0096 | 0.0172 | 0.0173 | 0.0098 | 0.0016 | 0.0017 |

BSA-TMLE, NN | 0.0311 | 0.0166 | 0.0176 | 0.0027 | 0.0016 | 0.0016 |

BSA-TMLE, GAM | 0.0101 | 0.0189 | 0.0190 | –0.0042 | 0.0015 | 0.0016 |

Table 4 examines the performance of estimators when the model for

Simulation results for distribution one with

Estimator | n = 100 | n = 1,000 | ||||

Bias | Variance | MSE | Bias | Variance | MSE | |

Simple plug-in | 0.0071 | 0.0120 | 0.0120 | 0.0011 | 0.0013 | 0.0013 |

BSA, NN | 0.1190 | 0.0126 | 0.0268 | 0.1064 | 0.0014 | 0.0128 |

DR-BSA, NN | 0.0064 | 0.0139 | 0.0140 | 0.0003 | 0.0015 | 0.0015 |

BSA, GAM | 0.1139 | 0.0116 | 0.0246 | 0.1096 | 0.0012 | 0.0133 |

DR-BSA, GAM | 0.0152 | 0.0129 | 0.0132 | 0.0015 | 0.0013 | 0.0013 |

IPTW | 0.1061 | 0.0115 | 0.0228 | 0.1035 | 0.0012 | 0.0119 |

TMLE | 0.0076 | 0.0129 | 0.0130 | 0.0009 | 0.0013 | 0.0013 |

BSA-TMLE, NN | 0.0064 | 0.0139 | 0.0140 | 0.0003 | 0.0015 | 0.0015 |

BSA-TMLE, GAM | 0.0154 | 0.0133 | 0.0136 | 0.0014 | 0.0013 | 0.0013 |

In a second scenario, called distribution two, *Y* is conditionally normal with *Y* depends on *A* does not, so they are not confounders. Additionally, *A* depends on *Y* does not, so

Table 5 shows results from distribution two where the initial estimate for *A*,

Simulation results from distribution two with

Estimator | n = 100 | n = 1,000 | ||||

Bias | Variance | MSE | Bias | Variance | MSE | |

Simple plug-in | –0.0112 | 0.0505 | 0.0506 | 0.0007 | 0.0048 | 0.0048 |

BSA, NN | 0.0080 | 0.1815 | 0.1815 | 0.0020 | 0.0185 | 0.0185 |

DR-BSA, NN | –0.0108 | 0.0578 | 0.0579 | 0.0024 | 0.0059 | 0.0060 |

BSA, GAM | –0.0061 | 0.3207 | 0.3208 | –0.0008 | 0.0097 | 0.0097 |

DR-BSA, GAM | –0.0112 | 0.0565 | 0.0566 | 0.0010 | 0.0051 | 0.0051 |

IPTW | –0.0072 | 0.7559 | 0.7560 | –0.0021 | 0.0231 | 0.0231 |

TMLE | –0.0182 | 0.0575 | 0.0578 | 0.0009 | 0.0052 | 0.0052 |

BSA-TMLE, NN | –0.0108 | 0.0578 | 0.0579 | 0.0024 | 0.0059 | 0.0060 |

BSA-TMLE, GAM | –0.0181 | 0.0587 | 0.0590 | 0.0009 | 0.0053 | 0.0053 |

## 6 Discussion

In this paper, we discuss the balancing score property of estimators that nonparametrically adjust for the propensity score. We see in simulations that, even when the propensity score estimator is not consistent,

In order for an estimator to have the balancing score property, we need to estimate some balancing score. We acknowledge that in practice, one does not expect an estimate of the propensity score to converge exactly to a balancing score that is not

We now discuss some possible generalizations to the work in this paper and areas for further research. The estimators present in this paper are for the statistical parameter *Y* when *Y* is subject to missingness [24]. The results and similar estimators are immediately applicable to other interesting statistical parameters such as

Propensity score-based methods are most often applied in settings where the treatment variable is binary. In settings where the treatment variable is not binary, Imai and Van Dyk [27] generalize the notion of the propensity score to the propensity function, the conditional probability of observed treatment given covariates. Imai and Van Dyk [27] show that the propensity function is a balancing score. When the propensity function can be characterized by a finite dimensional parameter, one can estimate parameters of the distribution of counterfactuals by adjusting for the dimensional characterization of the propensity function in place of all covariates. Using the approach of Imai and Van Dyk [27], the methods in this paper may be extended to develop estimators that are doubly robust and efficient with the balancing score property for more general situations where treatment is categorical or potentially even continuous.

Traditionally, propensity score-based estimators estimate the propensity score based on how well

Mark van der Laan was supported by NIH grant 5R01AI074345-05. We thank the associate editor and anonymous referees for their insightful comments, which we believe helped greatly improve the quality of the paper.

## A.1 Notation

- –
: observed data structure$O=(W,A,Y)$ - –
*W*: vector of covariates - –
*A*: treatment indicator, 0 or 1 - –
*Y*: univariate outcome - –
*P*: a distribution of*O*

- –
- –
: statistical model, set of possible probability distributions$M$ *P* - –
: expectation under distribution${E}_{p}(\cdot )$ *P* - –
$Q=(\stackrel{\u02c9}{Q},{Q}_{W})$ - –
$\stackrel{\u02c9}{Q}(a,w)={E}_{P}(Y\phantom{\rule{0ex}{0ex}}|\phantom{\rule{0ex}{0ex}}A=a,W=w)$ - –
${Q}_{W}\left(w\right)=P(W=w)$

- –
- –
$g\left(a\phantom{\rule{0ex}{0ex}}|\phantom{\rule{0ex}{0ex}}w\right)=P(A=a\phantom{\rule{0ex}{0ex}}|\phantom{\rule{0ex}{0ex}}W=w)$ - –
, also called the propensity score when.$\stackrel{\u02c9}{g}\left(w\right)=g\left(1\phantom{\rule{0ex}{0ex}}|\phantom{\rule{0ex}{0ex}}W\right)$ - –
: statistical parameter mapping from$\Psi $ to$M$ .$\mathbb{R}$ - –In particular,
$\Psi \left(P\right)={E}_{P}\left[{E}_{P}\right(Y\phantom{\rule{0ex}{0ex}}|\phantom{\rule{0ex}{0ex}}A=1,W\left)\right]$ - –Also written as
$\Psi \left(Q\right)$

- –In particular,
- –
$\psi =\Psi \left(P\right)$ - –Subscript 0: indicates the truth, e.g.
is the true parameter value${\psi}_{0}=\Psi \left({P}_{0}\right)$ - –Subscript
*n*: indicates an estimate based on*n*observations, e.g. is an estimate of${\stackrel{\u02c9}{Q}}_{n}$ ${\stackrel{\u02c9}{Q}}_{0}$ - –
an initial estimate of${\stackrel{\u02c9}{Q}}_{n}^{0}$ ${\stackrel{\u02c9}{Q}}_{0}$ - –
*L*: loss function - –
: loss function for${L}_{Y}$ $\stackrel{\u02c9}{Q}$ - –
: loss function for${L}_{W}$ ${Q}_{W}$ - –
a working submodel through$Q\left(\epsilon \right)$ *Q* - –
*IC*: an influence curve - –
: the efficient influence curve${D}^{\ast}$ - –
a TMLE updated estimate of some initial${\stackrel{\u02c9}{Q}}_{n}^{\ast}$ $\phantom{\rule{0ex}{0ex}}\phantom{\rule{0ex}{0ex}}{\stackrel{\u02c9}{Q}}_{n}^{0}$ - –
: some function of$b\left(w\right)$ *w*that is a potential balancing score - –
: some function of$\theta $ *a*and$b\left(w\right)$ - –
: a working submodel through${\stackrel{\u02c9}{Q}}^{b,\theta}$ for a particular$\stackrel{\u02c9}{Q}$ *b*and$\theta $ - –
a loss function for${L}^{\prime}$ , used in Section 4${\stackrel{\u02c9}{Q}}^{b,\theta}$

## A.2 Some results and proofs

*Proof of Lemma~1*. In this proof, *E* means expectation with respect to *P*. First note that *b* is a function of only *W*. Next,

*W*and

*b*is a balancing score. Thus,

*Assume*

In addition, assume that either

Proof By definition of

*h*of

*A*and

*b*is a balancing score if and only if there exists a function

*f*so that

*P*, and notation

*O*and distribution

*P*. Since

*b*is a balancing score.

Consider now the case that

*Assume*

In addition, assume that *b* is a balancing score, or

*Proof*. Firstly, assume *b* is a balancing score so by Theorem 2 Rosenbaum and Rubin [8] there exists a mapping *f* so that

This now proves that the limit *b* is a balancing score.

Consider now the case that

*If**takes only discrete values with support G, then**is a TMLE if**is estimated as**using MLE in a saturated parametric model*

*I*is the indicator function.

*Proof of Lemma~2*. The MLE *Y* is not binary), solves the score equations for each parameter

*h*of

*A*and

*Define**and**. Assume**falls in a**-Donsker class with probability tending to 1*; *in probability as*

Then

*Proof*. Since

## A.3 TMLE when *Y* is not bounded by 0 and 1

If *Y* is not bounded by 0 and 1, but we can assume *Y* is bounded by *l* and *u* with *Y* can be transformed to *l* and *u* are not known, they can be set to the minimum and maximum of the observed *Y* as described in [20].

For completeness we can define an alternative TMLE using a linear working model where

Asymptotically, a TMLE using a linear working model (or linear fluctuation) is the equivalent to a TMLE with a logistic working model, but in practice can perform poorly. This is because if *Y* by some *l* and *u*, it the logistic working model is recommended because

## A.4 Example implementation of a BSA-TMLE estimator in R

bsatmle <- **function**(QnA1, QnA0, gn1, A, Y, family = “binomial”) {

*# computes estimates of E(E(Y|A=1, W)) (called ey1 in the*

*# output), E(E(Y|A=0, W)) (called ey0), and*

*# E(E(Y|A=1, W)) – E(E(Y|A=1, W)) (called ate)*

*#*

*# Inputs*:

*# QnA1, QnA0: vectors, initial estimates of \bar{Q}_n(1, W)*

*# and \bar{Q}_n(O, W)*

*# gn1: vector, estimates of g_n(1|W)*

*# A: vector, indicator of treatment*

*# Y: vector, outcome*

*# family: “binomial” for logistic fluctuation, “gaussian”*

*# for linear fluctuation*.

*# if “binomial”, Y should be binary or bounded*

*# by 0 and 1*

**if** (!**require**(mgcv)) **stop**(“mgcv package is required”)

**if** (family==“binomial”) {

*#use quasibinomial to suppress error messages about*

*#non-integer Y*

family <- “quasibinomial”

link <- qlogis

} **else** {

link <- identity

}

QnAA <- **ifelse**(A==1, QnA1, QnA0)

*# Use a generalized additive model to estimate theta_0*

*# using the initial estimate of \bar{Q}*

gamfit <- **gam**(Y **factor**(A)+**s**(gn1, by=**factor**(A))+**offset**(off),

family, data=**data.frame**(A=A, gn1=gn1, off=**link**(QnAA)))

*#Get predictions from gam fit*

QnA1.gam <- **predict**(gamfit, type=“response”,

newdata=**data.frame**(A=1, gn1=gn1, off=**link**(QnA1)))

QnA0.gam <- **predict**(gamfit, type=“response”,

newdata=**data.frame**(A=0, gn1=gn1, off=**link**(QnA0)))

QnAA.gam <- **ifelse**(A==1, QnA1.gam, QnA0.gam)

*# compute a/g_n(1|W)*

hA1 <- 1/gn1

hA0 <- –1/(1 – gn1)

hAA <- **ifelse**(A==1, hA1, hA0)

*#using glm, fluctuate the gam-updated initial fit of \bar{Q}*

glmfit <- **glm**(Y -1+h + **offset**(off), family,

data=**data.frame**(h=hAA, off=**link**(QnAA.gam)))

QnA1.star <- **predict**(glmfit, type=“response”,

newdata=**data.frame**(h=hA1, off=**link**(QnA1.gam)))

QnA0.star <- **predict**(glmfit, type=“response”,

newdata=**data.frame**(h=hA0, off=**link**(QnA0.gam)))

*#compute the final estimates*

ey1 <- **mean**(QnA1.star)

ey0 <- **mean**(QnA0.star)

ate <- ey1-ey0

**list**(ey1=ey1, ey0=ey0, ate=ate)

}

## References

- 1.↑
Austin PC. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies.

*Stat Med*2010;29:2137–48. - 2.↑
Lunceford J, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study.

*Stat Med*2004;23:2937–60. - 3.↑
Rosenbaum P, Rubin D. Reducing bias in observational studies using subclassification on the propensity score.

*J Am Stat Assoc*1984;79:516–24. - 4.↑
Robins J, Hernán M, Brumback B. Marginal structural models and causal inference in epidemiology.

*Epidemiology*2000;11:550–60. - 6.↑
Caliendo M, Kopeinig S. Some practical guidance for the implementation of propensity score matching.

*J Econ Surv*2008;22:31–72. - 7.
Dehejia R, Wahba S. Propensity score-matching methods for nonexperimental causal studies.

*Rev Econ Stat*2002;84:151–61. - 8.↑
Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects.

*Biometrika*1983;70:41. - 9.↑
van der Laan MJ, Rose S.

*Targeted learning: causal inference for observational and experimental data*. New York: Springer, 2011. - 10.↑
van der Laan MJ, Rubin D. Targeted maximum likelihood learning.

*Int J Biostat*2006;2. Available at: http://www.degruyter.com/view/j/ijb.2006.2.1/ijb.2006.2.1.1043/ijb.2006.2.1.1043.xml - 11.↑
Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed.

*J Am Stat Assoc*1994;89:846–66. - 12.↑
van der Laan MJ, Robins JM.

*Unified methods for censored longitudinal data and causality*. New York: Springer, 2003. - 13.↑
R Core Team.

*R: a language and environment for statistical computing*. Vienna, Austria: R Foundation for Statistical Computing, 2012. Available at: http://www.R-project.org/, ISBN 3-900051-07-0 - 14.↑
Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects.

*Econometrica*1998;66:315–31. - 15.↑
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA.

*Efficient and adaptive estimation for semiparametric models*. Baltimore, MD: The Johns Hopkins University Press, 1993. - 16.↑
McCullagh P, Nelder J.

*Generalized linear models*, Vol. 37. Boca Raton, FL: Chapman & Hall/CRC, 1989. - 17.↑
van der Laan MJ, Polley EC, Hubbard AE. Super learner.

*Stat Appl Genet Mol Biol*2007;6. Available at: http://www.degruyter.com/view/j/sagmb.2007.6.1/sagmb.2007.6.1.1309/sagmb.2007.6.1.1309.xml - 18.↑
van der Laan MJ. Targeted maximum likelihood based causal inference: part I.

*Int J Biostat*2010;6. Available at: http://www.degruyter.com/view/j/ijb.2010.6.2/ijb.2010.6.2.1211/ijb.2010.6.2.1211.xml - 19.↑
van der Laan MJ. Targeted maximum likelihood based causal inference: part II.

*Int J Biostat*2010;6. Available at: http://www.degruyter.com/dg/viewarticle/j$002fijb.2010.6.2$002fijb.2010.6.2.1241$002fijb.2010.6.2.1241.xml - 20.↑
Gruber S, Van Der Laan M. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome.

*Int J Biostat*2010;6. Available at: http://www.degruyter.com/view/j/ijb.2010.6.1/ijb.2010.6.1.1260/ijb.2010.6.1.1260.xml - 21.↑
Wood SN. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.

*J R Stat Soc Ser B (Stat Methodol)*2011;73:3–36. - 22.↑
Abadie A, Imbens G. Bias-corrected matching estimators for average treatment effects.

*J Bus Econ Stat*2011;29:1–11. - 23.↑
Sekhon JS. Multivariate and propensity score matching software with automated balance optimization: the matching package for R.

*J Stat Softw*2011;42:1–52. Available at: http://www.jstatsoft.org/v42/i07/ - 24.↑
Kang J, Schafer J. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data.

*Stat Sci*2007;22:523–39. - 25.↑
Robins JM. Marginal structural models. In Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1–10, 1997.

- 26.↑
Rosenblum M, van der Laan MJ. Targeted maximum likelihood estimation of the parameter of a marginal structural model.

*Int J Biostat*2010;6:Article 19. - 27.↑
Imai K, Van Dyk DA. Causal inference with general treatment regimes.

*J Am Stat Assoc*2004;99:854–66. - 28.↑
van der Laan MJ, Gruber S. Collaborative double robust targeted maximum likelihood estimation.

*Int J Biostat*2010;6. Available at: http://www.degruyter.com/view/j/ijb.2010.6.1/ijb.2010.6.1.1181/ijb.2010.6.1.1181.xml - 29.↑
Zheng W, Laan MJ Asymptotic theory for cross-validated targeted maximum likelihood estimation. Working Paper 273, U.C. Berkeley Division of Biostatistics Working Paper Series, 2010. Available at: http://www.bepress.com/ucbbiostat/paper273/

- 30.↑
Zheng W, van der Laan MJ. Targeted maximum likelihood estimation of natural direct effects.

*Int J Biostat*2012;8. Available at: http://www.degruyter.com/view/j/ijb.2012.8.issue-1/1557-4679.1361/1557-4679.1361.xml - 31.↑
van der Vaart AW.

*Asymptotic statistics*. Cambridge and New York: Cambridge University Press, 1998. - 32.↑
van der Vaart AW, Wellner JA.

*Weak convergence and empirical processes*. New York: Springer, 1996.