BY 4.0 license Open Access Published by De Gruyter July 30, 2021

Nonparametric inference for interventional effects with multiple mediators

David Benkeser and Jialu Ran

Abstract

Understanding the pathways whereby an intervention has an effect on an outcome is a common scientific goal. A rich body of literature provides various decompositions of the total intervention effect into pathway-specific effects. Interventional direct and indirect effects provide one such decomposition. Existing estimators of these effects are based on parametric models with confidence interval estimation facilitated via the nonparametric bootstrap. We provide theory that allows for more flexible, possibly machine learning-based, estimation techniques to be considered. In particular, we establish weak convergence results that facilitate the construction of closed-form confidence intervals and hypothesis tests and prove multiple robustness properties of the proposed estimators. Simulations show that inference based on large-sample theory has adequate small-sample performance. Our work thus provides a means of leveraging modern statistical learning techniques in estimation of interventional mediation effects.

MSC 2010: 62G05; 62G08; 62G20

1 Introduction

Recent advances in causal inference have provided rich frameworks for posing interesting scientific questions pertaining to the mediation of effects through specific biologic pathways (Yuan and MacKinnon [1], Imai et al. [2], Valeri and VanderWeele [3], Pearl [4], Naimi et al. [5], Zheng and van der Laan [6], VanderWeele and Tchetgen Tchetgen [7], among others). Foremost among these advances is the provision of model-free definitions of mediation parameters, which enables researchers to develop robust estimators of these quantities. A debate in this literature has emerged pertaining to the reliance of methodology on cross-world independence assumptions that are fundamentally untestable even in randomized controlled experiments [8,9,10]. One approach to this problem is to utilize methods that attempt to estimate bounds on effects (Robins and Richardson [10], Tchetgen Tchetgen and Phiri [11], among others). A second approach considers seeking alternative definitions of mediation parameters that do not require such cross-world assumptions (VanderWeele et al. [12], Rudolph et al. [13], among others). Rather than considering deterministic interventions on mediators (i.e., a hypothetical intervention that fixes every individual mediator to a particular value), these approaches consider stochastic interventions on mediators (i.e., hypothetical interventions where the mediator is drawn from a particular conditional distribution). In this class of approaches, that of Vansteelandt and Daniel [14] is particularly appealing. Building on the prior work of VanderWeele et al. [12], the authors provide a simple decomposition of the total effect into direct effects and pathway-specific effects via multiple mediators. Interestingly, their decompositions hold even when the structural dependence between mediators is unknown.

Vansteelandt and Daniel [14] described two approaches to estimation of the effects using parametric working models for relevant nuisance parameters. In both cases, the nonparametric bootstrap was recommended for inference. A potential limitation of the proposal is that correctly specifying a parametric working model may be difficult in many settings. In these instances, we may rely on flexible estimators of nuisance parameters, for example, based on machine learning. When such techniques are employed, the nonparametric bootstrap does not generally guarantee valid inference [15]. This fact motivates the present work, where we develop nonparametric efficiency theory for the interventional mediation effect parameters. This theory allows us to utilize frameworks for nonparametric efficient inference to develop estimators of the quantities of interest. We propose a one-step and a targeted minimum loss-based estimator and demonstrate that under suitable regularity conditions, both estimators are nonparametric efficient among the class of regular asymptotically linear estimators. The estimators also enjoy a multiple robustness property, which ensures consistency of effect estimates if at least some combinations of nuisance parameters are consistently estimated. Another benefit enjoyed by our estimators is the availability of closed-form confidence intervals and hypothesis tests.

2 Interventional effects

Adopting the notation of Vansteelandt and Daniel [14], suppose the observed data are represented as n independent copies of the random variable O = ( C , A , M 1 , M 2 , Y ) P , where C C is a vector of confounders, A { a , a } is a binary intervention, M 1 1 and M 2 2 are mediators, and Y Y is a relevant outcome. Our developments pertain to both discrete and real-valued mediators, while without loss of generality, we assume Y = ( 0 , 1 ) . We assume pr P { 0 < pr P ( A = a C ) < 1 } = 1 ; that is, any subgroup defined by covariates C that is observed with positive probability should have some chance of receiving both interventions. We also assume that for a 0 = a , a , the probability distribution of ( M 1 , M 2 ) given A = a 0 , C has density q a 0 , M 1 , M 2 ( m 1 , m 2 c ) with respect to some dominating measures and this density satisfies pr P { inf m 1 , m 2 q a 0 , M 1 , M 2 ( m 1 , m 2 C ) } > 0 } = 1 , where the infimum is taken over 1 × 2 . Similarly, we assume that for all sup c , m 1 , m 2 q a 0 , M 1 , M 2 ( m 1 , m 2 c ) < . Beyond these conditions, P encodes no assumptions about P ; however, the efficiency theory that we develop still holds under a model that makes assumptions about pr P ( A C ) , including the possibility that this quantity is known exactly, as in a stratified randomized trial.

To define interventional mediation effects, notation for counterfactual random variables is required. For a 0 { a , a } , and j = 1 , 2 , let M j ( a 0 ) denote the counterfactual value for the j th mediator when A is set to a 0 . Similarly, let Y ( a 0 , m 1 , m 2 ) denote the counterfactual outcome under an intervention that sets A = a 0 , M 1 = m 1 , and M 2 = m 2 . As a point of notation, when introducing quantities whose definition depends on particular components of the random variable O , we will use lower case letters to denote the particular value and assume that the definition at hand applies for all values in the support of that random variable.

The total effect of intervening to set A = a versus A = a is ψ = E { Y ( a , M 1 ( a ) , M 2 ( a ) ) } E { Y ( a , M 1 ( a ) , M 2 ( a ) ) } , where we use E to emphasize that we are taking an expectation with respect to a distribution of a counterfactual random variable. The total effect describes the difference in counterfactual outcome considering an intervention where we set A = a and allow the mediators to naturally assume the value that they would under intervention A = a versus an intervention where we set A = a and allow the mediators to vary accordingly. To contrast with forthcoming effects, it is useful to write the total effect in integral form. Specifically, we use Q ¯ a 0 ( m 1 , m 2 , c ) to denote the covariate-conditional mean of the counterfactual outcome Y ( a 0 , m 1 , m 2 ) , Q M 1 ( a 0 ) , M 2 ( a 0 ) ( , c ) to denote the covariate-conditional bivariate cumulative distribution function of ( M 1 ( a 0 ) , M 2 ( a 0 ) ) , and Q C to denote the marginal distribution of C . The total effect can be written as

ψ = C 1 × 2 Q ¯ a ( m 1 , m 2 , c ) d Q M 1 ( a ) , M 2 ( a ) ( m 1 , m 2 c ) 1 × 2 Q ¯ a ( m 1 , m 2 , c ) d Q M 1 ( a ) , M 2 ( a ) ( m 1 , m 2 c ) d Q C ( c ) .

The total effect can be decomposed into interventional direct and indirect effects. The interventional direct effect is the difference in average counterfactual outcome under two population-level interventions. The first intervention sets A = a , and subsequently for individuals with C = c draws mediators from Q M 1 ( a ) , M 2 ( a ) ( c ) . Thus, on a population level the covariate conditional distribution of mediators in this counterfactual world is the same as it would be in a population where everyone received intervention A = a . This is an example of a stochastic intervention [16]. The second intervention sets A = a and subsequently allows the mediators to naturally assume the value that they would under intervention A = a , so that the population level mediator distribution is again Q M 1 ( a ) , M 2 ( a ) ( c ) . The interventional direct effect compares the average outcome under these two interventions,

ψ A = C 1 × 2 { Q ¯ a ( m 1 , m 2 , c ) Q ¯ a ( m 1 , m 2 , c ) } d Q M 1 ( a ) , M 2 ( a ) ( m 1 , m 2 c ) d Q C ( c ) .

For interventional indirect effects, we require definitions for the covariate-conditional distribution of each mediator, which we denote for j = 1 , 2 by Q M j ( a 0 ) ( c ) . The interventional indirect effect through M 1 is

ψ M 1 = C 2 1 Q ¯ a ( m 1 , m 2 , c ) { d Q M 1 ( a ) ( m 1 c ) d Q M 1 ( a ) ( m 1 c ) } d Q M 2 ( a ) ( m 2 c ) × d Q C ( c ) .

As with the direct effect, this effect considers two interventions. Both interventions set A = a . The first intervention draws mediator values independently from the marginal mediator distributions Q M 1 ( a ) ( c ) and Q M 2 ( a ) ( c ) , while the second intervention draws mediator values independently from the marginal mediator distributions Q M 1 ( a ) ( c ) and Q M 2 ( a ) ( c ) . The effect thus describes the average impact of shifting the population level distribution of M 1 , while holding the population level distribution of M 2 fixed. The interventional indirect effect on the outcome through M 2 is similarly defined as

ψ M 2 = C 1 2 Q ¯ a ( m 1 , m 2 , c ) d Q M 1 ( a ) ( m 1 c ) { d Q M 2 ( a ) ( m 2 c ) d Q M 2 ( a ) ( m 2 c ) } × d Q C ( c ) .

Note that when defining interventional indirect effects, mediators are drawn independently from marginal mediator distributions. The final effect in the decomposition essentially describes the impact of drawing the mediators from marginal rather than joint distributions. Thus, we term this effect the covariant mediator effect, defined as

ψ M 1 , M 2 = C 1 × 2 Q ¯ a ( m 1 , m 2 , c ) [ d Q M 1 ( a ) , M 2 ( a ) ( m 1 , m 2 c ) d Q M 1 ( a ) × M 2 ( a ) ( m 1 , m 2 c ) { d Q M 1 ( a ) , M 2 ( a ) ( m 1 , m 2 c ) d Q M 1 ( a ) × M 2 ( a ) ( m 1 , m 2 c ) } ] d Q C ( c ) ,

where d Q M 1 ( a 0 ) × M 2 ( a 0 ) ( m 1 , m 2 c ) = d Q M 1 ( a 0 ) ( m 1 c ) d Q M 2 ( a 0 ) ( m 2 c ) . Vansteelandt and Daniel [14] discussed situations where these effects are of primary interest.

From the aforementioned definitions, we have the following effect decomposition ψ = ψ A + ψ M 1 + ψ M 2 + ψ M 1 , M 2 . These component effects can be identified using the observed data under the following assumptions:

  1. (i)

    the effect of A on Y is unconfounded given C , Y ( a , M 1 ( a ) , M 2 ( a ) ) A C ;

  2. (ii)

    the effect of M 1 and M 2 on Y is unconfounded given A and C , Y ( a 0 , M 1 ( a 0 ) , M 2 ( a 0 ) ) M 1 , M 2 A = a 0 , C ;

  3. (iii)

    the effect of A on M 1 , M 2 is unconfounded given C , M 1 ( a 0 ) , M 2 ( a 0 ) A C .

Under these assumptions, the counterfactual mean Q ¯ a 0 ( m 1 , m 2 , c ) is identified by Q ¯ a 0 ( m 1 , m 2 , c ) = E P ( Y A = a 0 , M 1 = m 1 , M 2 = m 2 , C = c ) , commonly referred to as the outcome regression as it may generally be estimated using mean regression of the outcome Y onto treatment, mediators, and confounders. The cumulative distribution of ( M 1 ( a 0 ) , M 2 ( a 0 ) ) given C = c is identified by Q a 0 , M 1 , M 2 ( m 1 , m 2 c ) = pr P ( M 1 m 1 , M 2 m 2 A = a 0 , C = c ) . We assume the existence of a density q a 0 , M 1 , M 2 for the mediators with respect to a dominating measure and define marginal mediator densities q a 0 , M i ( m i c ) = m j j d Q a 0 , M 1 , M 2 ( m 1 , m 2 c ) for i , j = 1 , 2 and i j . We subsequently refer to these objects as marginal mediator distributions, though they are in fact conditional on A = a 0 and C .

The identifying formula for each effect can now be written as a statistical functional of the observed data distribution by substituting the outcome regression for Q ¯ a 0 ( m 1 , m 2 , c ) and the observed-data mediator distributions for the respective counterfactual distributions in the aforementioned integral expressions.

We note that the aforementioned assumptions preclude the existence of treatment-induced confounding of the mediator-outcome association. In the Supplementary material, we provide relevant extensions to this setting.

3 Methods

3.1 Efficiency theory

In this section, we develop efficiency theory for nonparametric estimation of interventional effects. This theory centers around the efficient influence function of each parameter. The efficient influence function is important for several reasons. First, it allows us to utilize two existing estimation frameworks, one-step estimation [17,18] and targeted minimum loss-based estimation [19,20], to generate estimators that are nonparametric efficient. That is, under suitable regularity conditions, they achieve the smallest asymptotic variance among all regular estimators that, when scaled by n 1 / 2 , have an asymptotic normal distribution. We discuss how these estimators can be implemented in Section 3.2. The second important feature of the efficient influence function is that its variance equals the variance of the limit distribution of the scaled estimators. Thus, an estimate of the variance of the efficient influence function is a natural standard error estimate, which affords closed-form Wald-style confidence intervals and hypothesis tests (Section 3.3). Finally, the efficient influence function also characterizes robustness properties of our proposed estimators (Section 3.4).

To introduce the efficient influence function, several additional definitions are required. For a given distribution P P , we define g a 0 ( c ) = pr P ( A = a 0 C = c ) , commonly referred to as a propensity score. For i , j = 1 , 2 and i j , we introduce the following partially marginalized outcome regressions, Q ˜ a , M i ( m j , c ) = Q ¯ a ( m 1 , m 2 , c ) d Q a , M i ( m i c ) . We also introduce notation for the indicator function 1 a : { a , a } { 0 , 1 } defined by 1 a ( a ˜ ) = 1 if a ˜ = a and zero otherwise. 1 a is similarly defined.

Theorem 1

Under sampling from P P , the efficient influence function evaluated on a given observation o ˜ for the total effect is

D ( P ) ( o ˜ ) = 1 a ( a ˜ ) g a ( c ˜ ) { y ˜ Q ˜ a , M 1 , M 2 ( c ˜ ) } 1 a ( a ˜ ) g a ( c ˜ ) { y ˜ Q ˜ a , M 1 , M 2 ( c ˜ ) } + Q ˜ a , M 1 , M 2 ( c ˜ ) Q ˜ a , M 1 , M 2 ( c ˜ ) ψ .

The efficient influence function for the interventional direct effect is

D A ( P ) ( o ˜ ) = 1 a ( a ˜ ) g a ( c ˜ ) q a , M 1 , M 2 ( m ˜ 1 , m ˜ 2 c ˜ ) q a , M 1 , M 2 ( m ˜ 1 , m ˜ 2 c ˜ ) { y ˜ Q ¯ a ( m ˜ 1 , m ˜ 2 , c ˜ ) } 1 a ( a ˜ ) g a ( c ˜ ) { y ˜ Q ¯ a ( m ˜ 1 , m ˜ 2 , c ˜ ) } + 1 a ( a ˜ ) g a ( c ˜ ) [ Q ¯ a ( m ˜ 1 , m ˜ 2 , c ˜ ) Q ¯ a ( m ˜ 1 , m ˜ 2 , c ˜ ) { Q ˜ a , M 1 , M 2 ( c ˜ ) Q ˜ a , M 1 , M 2 ( c ˜ ) } ] + Q ˜ a , M 1 , M 2 ( c ˜ ) Q ˜ a , M 1 , M 2 ( c ˜ ) ψ A .

The efficient influence function for the interventional indirect effect through M 1 is

D M 1 ( P ) ( o ˜ ) = 1 a ( a ˜ ) g a ( c ˜ ) { q a , M 1 ( m ˜ 1 c ˜ ) q a , M 1 ( m ˜ 1 c ˜ ) } q a , M 2 ( m ˜ 2 c ˜ ) q a , M 1 , M 2 ( m ˜ 1 , m ˜ 2 c ˜ ) { y ˜ Q ¯ a ( m ˜ 1 , m ˜ 2 , c ˜ ) } + 1 a ( a ˜ ) g a ( c ˜ ) { Q ˜ a , M 2 ( m ˜ 1 , c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) } 1 a ( a ˜ ) g a ( c ˜ ) { Q ˜ a , M 2 ( m ˜ 1 , c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) } + 1 a ( a ˜ ) g a ( c ˜ ) [ Q ˜ a , M 1 ( m ˜ 2 , c ˜ ) Q ˜ a , M 1 ( m ˜ 2 , c ˜ ) { Q ˜ a , M 1 × M 2 ( c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) } ] + Q ˜ a , M 1 × M 2 ( c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) ψ M 1 .

The efficient influence function for the interventional indirect effect through M 2 is

D M 2 ( P ) ( o ˜ ) = 1 a ( a ˜ ) g a ( c ˜ ) { q a , M 2 ( m ˜ 2 c ˜ ) q a , M 2 ( m ˜ 2 c ˜ ) } q a , M 1 ( m ˜ 1 c ˜ ) q a , M 1 , M 2 ( m ˜ 1 , m ˜ 2 c ˜ ) { y ˜ Q ¯ a ( m ˜ 1 , m ˜ 2 , c ˜ ) } + 1 a ( a ˜ ) g a ( c ˜ ) { Q ˜ a , M 1 ( m ˜ 2 , c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) } 1 a ( a ˜ ) g a ( c ˜ ) { Q ˜ a , M 1 ( m ˜ 2 , c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) } + 1 a ( a ˜ ) g a ( c ˜ ) [ Q ˜ a , M 2 ( m ˜ 1 , c ˜ ) Q ˜ a , M 2 ( m ˜ 1 , c ˜ ) { Q ˜ a , M 1 × M 2 ( c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) } ] + Q ˜ a , M 1 × M 2 ( c ˜ ) Q ˜ a , M 1 × M 2 ( c ˜ ) ψ M 2 .

The efficient influence function for the covariant interventional effect is D M 1 , M 2 = D D A D M 1 D M 2 .

A proof of Theorem 1 is provided in the Supplementary material.

3.2 Estimators

We propose estimators of each interventional effect using one-step and targeted minimum loss-based estimation. Both techniques develop along a similar path. We first obtain estimates of the propensity score, outcome regression, and joint mediator distribution; we collectively refer to these quantities as nuisance parameters. With estimated nuisance parameters in hand, we subsequently apply a correction based on the efficient influence function to the nuisance estimates.

To estimate the propensity score, we can use any suitable technique for mean regression of the binary outcome A onto confounders C . Working logistic regression models are commonly used for this purpose, though semi- and nonparametric alternatives would be more in line with our choice of model. We denote by g n , a 0 ( c ) the chosen estimate of g a 0 ( c ) . Similarly, the outcome regression can be estimated using mean regression of the outcome Y onto A , M 1 , M 2 , and C . For example, if the study outcome is binary, logistic regression could again be used, though more flexible regression estimators may be preferred. As above, we denote by Q ¯ n , a 0 the estimated outcome regression evaluated under A = a 0 , with Q ¯ n , a 0 ( m 1 , m 2 , c ) providing an estimate of E P ( Y A = a 0 , M 1 = m 1 , M 2 = m 2 , C = c ) . To estimate the marginal cumulative distribution of C , we will use the empirical cumulative distribution function, which we denote by Q n , C .

Estimation of the conditional joint distribution of the mediators is a more challenging proposition, as fewer tools are available for flexible estimation of conditional multivariate distribution functions. We hence focus our developments on the development of approaches for discrete-valued mediators. The approach we adopt could be extended to continuous-valued mediators by considering a fine partitioning of the mediator values. We examine this approach via simulation in Section 4. To develop our density estimators, we use the approach of Dìaz Muñoz and van der Laan [21], which considers estimation of a conditional density via estimation of discrete conditional hazards. Briefly, consider estimation of the distribution of M 2 given A and C , and, for simplicity, suppose that the support of M 2 is { 1 , 2 , 3 } . We create a long-form data set, where the number of rows contributed by each individual contribute is equal to their observed value of M 2 . An example is illustrated in Table 1. We see that the long-form data set includes an integer-valued column named “bin” that indicates to which value of M 2 each row corresponds, as well as a binary column 1 bin ( M 2 ) indicating whether the observed value of M 2 corresponds to each bin. These long-form data can be used to fit a regression of the binary outcome 1 bin ( M 2 ) onto C , A , and bin. This naturally estimates λ b ( a 0 , c ) = P ( M 2 = b M 2 > b 1 , A = a 0 , C = c ) , the conditional discrete hazard of M 2 given A and C . Let λ n , denote the estimated hazard obtained from fitting this regression. An estimate of the density at m 2 2 is

q n , a 0 , M 2 ( m 2 c ) = λ n , m 2 ( a 0 , c ) b = 1 m 2 1 { 1 λ n , b ( a 0 , c ) } m 2 [ λ n , m ( a 0 , c ) b = 1 m 1 { 1 λ n , b ( a 0 , c ) } ] .

Similarly, an estimate q n , a 0 , M 1 ( m 2 , c ) of the conditional distribution of M 1 given A = a 0 , M 2 = m 2 , C = c can be obtained. An estimate of the joint conditional density is implied by these estimates, q n , a 0 , M 1 , M 2 ( m 1 , m 2 c ) = q n , a 0 , M 1 ( m 1 m 2 , c ) q n , a 0 , M 2 ( m 2 c ) , while an estimate of the marginal distribution of M 1 is q n , a 0 , M 1 ( m 1 , c ) = m 2 2 q n , a 0 , M 1 , M 2 ( m 1 , m 2 c ) .

Table 1

An illustration of how to make a long form data set suitable for estimating mediator distributions

An ID is uniquely assigned to each independent data unit and a single confounder C is included in the mock data set.

In principle, one could reverse the roles of M 1 and M 2 in the above procedure. That is, we could instead estimate the distribution of M 1 given A = a 0 , C and of M 2 given A = a 0 , C , M 1 . Cross-validation could be used to pick between the two potential estimators of the joint distribution. Other approaches to conditional density estimation are permitted by our procedure as well. For example, approaches based on working copula models may be particularly appealing in this context, as they allow separate specification of marginal vs joint distributions of the mediators.

Given estimates of nuisance parameters, we now illustrate one-step estimation for the interventional direct effect. One-step estimators of other effects can be generated similarly. A plug-in estimate of the conditional interventional direct effect given C = c is the difference between

(1) Q ˜ n , a , M 1 , M 2 ( c ) = 1 × 2 Q ¯ n , a ( m 1 , m 2 , c ) d Q n , a , M 1 , M 2 ( m 1 , m 2 c ) and Q ˜ n , a , M 1 , M 2 ( c ) = 1 × 2 Q ¯ n , a ( m 1 , m 2 , c ) d Q n , a , M 1 , M 2 ( m 1 , m 2 c ) .

To obtain a plug-in estimate ψ n , A of ψ A , we standardize the conditional effect estimate with respect to Q n , C , the empirical distribution of C . Thus, the plug-in estimator of ψ A is ψ n , A = C { Q ˜ n , a , M 1 , M 2 ( c ) Q ˜ n , a , M 1 , M 2 ( c ) } d Q n , C ( c ) .

The one-step estimator is constructed by adding an efficient influence function-based correction to an initial plug-in estimate. Suppose we are given estimates of all relevant nuisance quantities and let P n denote any probability distribution in P that is compatible with these estimates. The efficient influence function for ψ A under sampling from P n is D A ( P n ) , and the one-step estimator is ψ n , A , + = ψ n , A + n 1 i = 1 n D A ( P n ) ( O i ) . All other effect estimates are generated in this vein: estimated nuisance parameters are plugged in to the efficient influence function, the resultant function is evaluated on each observation, and the empirical average of this quantity is added to the plug-in estimator.

While one-step estimators are appealing in their simplicity, the estimators may not obey bounds on the parameter space in finite samples. For example, if the study outcome is binary, then the interventional effects each represent a difference in two probabilities and thus are bounded between 1 and 1. However, one-step estimators may fall outside of this range. This motivates estimation of these quantities using targeted minimum loss-based estimation, a framework for generating plug-in estimators. The implementation of such estimators is generally more involved than that of one-step estimators. In this approach, a second-stage model fitting is used to ensure that nuisance parameter estimates satisfy efficient influence function estimating equations. The approach for this second-stage fitting is dependent on the specific effect parameter considered and the procedure differs subtly for the various effect measures presented here. Supplementary material includes a detailed exposition of how such estimators can be implemented.

3.3 Large sample inference

We now present a theorem establishing the joint weak convergence of the proposed estimators to a random variable with a multivariate normal distribution. Because the asymptotic behavior of the one-step and targeted minimum loss estimators (TMLEs) is equivalent, we present a single theorem. A discussion of the differences in regularity conditions required to prove the theorem for one-step versus targeted minimum loss estimation is provided in the Supplementary material. Let ψ n , denote the vector of (one-step or targeted minimum loss) estimates of ψ = ( ψ A , ψ M 1 , ψ M 2 , ψ M 1 , M 2 ) and let D ( P ) denote the vector of efficient influence functions defined by

o ˜ ( D A ( P ) ( o ˜ ) , D M 1 ( P ) ( o ˜ ) , D M 2 ( P ) ( o ˜ ) , D M 1 , M 2 ( P ) ( o ˜ ) ) .

In the theorem, we use to denote the L 2 ( P ) -norm, define for any P -measurable f as f 2 = f ( o ) d P ( o ) .

Theorem 2

Under sampling from P P , if for a 0 = a , a ,

  1. (i)

    sup c g n , a 0 ( c ) g a 0 ( c ) 0 in probability as n ,

  2. (ii)

    sup m 1 , m 2 , c q n , a 0 , M 1 , M 2 ( m 1 , m 2 c ) q a 0 , M 1 , M 2 ( m 1 , m 2 c ) 0 in probability as n ,

  3. (iii)

    g n , a 0 g n , a 0 = o p ( n 1 / 4 ) ,

  4. (iv)

    Q ¯ n , a 0 Q ¯ a 0 = o p ( n 1 / 4 ) ,

  5. (v)

    q n , a 0 , M 1 , M 2 q a 0 , M 1 , M 2 = o p ( n 1 / 4 ) ,

  6. (vi)

    q n , a 0 , M 1 q n , a 0 , M 2 q a 0 , M 1 q a 0 , M 2 = o p ( n 1 / 4 ) ,

  7. (vii)

    q n , a 0 , M 1 q a 0 , M 1 = o p ( n 1 / 4 ) , q n , a 0 , M 2 q a 0 , M 2 = o p ( n 1 / 4 ) , and

  8. (viii)

    { D ( P n ) ( o ) D ( P ) ( o ) } 2 d P ( o ) 0 in probability as n and D ( P n ) falls in a P -Donsker class with probability tending to 1,

then n 1 / 2 ( ψ n , ψ ) d N o r m a l ( 0 , Σ ) , where Σ = D ( P ) ( o ) D ( P ) ( o ) d P ( o ) .

The regularity conditions required for Theorem 2 are typical of many problems in semiparametric efficiency theory. We provide conditions in terms of L 2 ( P ) -norm convergence, as this is typical of this literature; however, alternative and potentially weaker conditions are possible to derive. For further discussion, see the supplementary material. As with any nonparametric procedure, there is a concern related to the dimensionality C , particularly in situations with real-valued mediators. Minimum loss estimators (MLEs) in certain function classes can attain the requisite convergence rates. For example, an MLE in the class of functions that are right-continuous with left limits (i.e., càdlàg) with variation norm bounded by a constant achieves an L 2 ( P ) convergence rate faster than n 1 / 4 irrespective of the dimension of the conditioning set [22]. However, this may not allay all concerns pertaining to the curse of dimensionality due to the fact that in moderately high dimensions, these function classes can be restrictive and thus the true function may fall outside this class. Nevertheless, we suggest (and our simulations show) that in spite of concerns pertaining to the curse of dimensionality our procedure will enjoy reasonable finite-sample performance in many settings.

The covariance matrix Σ may be estimated by the empirical covariance matrix of the vector D ( P n ) applied to the observed data, where P n is any distribution in the model that is compatible with the estimated nuisance parameters. With the estimated covariance matrix, it is straightforward to construct Wald confidence intervals and hypothesis tests about the individual interventional effects or comparisons between them. For example, a straightforward application of the delta method would allow for a test of the null hypothesis that ψ M 1 = ψ M 2 .

3.4 Robustness properties

As with many problems in causal inference, consistent estimation of interventional effects requires consistent estimation only of certain combinations of nuisance parameters. To determine these combinations, we may study the stochastic properties of the efficient influence function. In particular, consider a parameter whose value under P is ψ ˜ and whose efficient influence function under sample from P can be written D ˜ ( P , ψ ˜ ) , where ψ ˜ is the value of the parameter of interest under P . Then we may study the circumstances under which D ˜ ( P , ψ ˜ ) d P ( o ) = 0 . This generally entails understanding which parameters of P must align with those parameters of P to ensure that the influence function D ˜ ( P , ψ ˜ ) has mean zero under sampling from P . We present the results of this analysis in a theorem below and refer readers to the Supplementary material for the proof.

Theorem 3

Locally efficient estimators of the total effect and the intervention direct, indirect, and covariant effects are consistent for their respective target parameters if the following combinations of nuisance parameters are consistently estimated:

  • Total effect: ( Q ¯ a , Q ¯ a , Q a , M 1 , M 2 , Q a , M 1 , M 2 ) or ( g a , g a )

  • Interventional direct effect: ( Q ¯ a , Q ¯ a , g a ) or ( Q ¯ a , Q ¯ a , Q a , M 1 , M 2 , Q a , M 1 , M 2 ) or ( Q a , M 1 , M 2 , Q a , M 1 , M 2 , g a , g a ) ;

  • Interventional indirect effect through M 1 : ( Q ¯ a , Q a , M 1 , Q a , M 1 , Q a , M 2 ) or ( g a , Q a , M 1 , M 2 , Q a , M 1 , Q a , M 2 ) or ( Q ¯ a , g a , g a , Q a , M 2 ) or ( Q ¯ a , g a , g a , Q a , M 1 ) ;

  • Interventional indirect effect through M 2 : ( Q ¯ a , Q a , M 2 , Q a , M 2 , Q a , M 1 ) or ( g a , Q a , M 1 , M 2 , Q a , M 2 ) or ( Q ¯ a , g a , g a , Q a , M 1 ) or ( Q ¯ a , g a , g a , Q a , M 2 ) ;

  • Interventional covariant effect: ( Q ¯ a , Q ¯ a , Q a , M 1 , M 2 , Q a , M 1 , M 2 ) or ( g a , g a , Q ¯ a , Q ¯ a , Q a , M 1 , Q a , M 2 ) or ( g a , g a , Q a , M 1 , M 2 , Q a , M 1 , Q a , M 2 ) .

The most interesting robustness result is perhaps that pertaining to the indirect effects. The first condition for consistent estimation is expected, as the propensity score plays no role in the definition of the indirect effect. The second condition shows that the joint mediator distribution and propensity score together can compensate for inconsistent estimation of the outcome regression, while the relevant marginal mediator distributions are required to properly marginalize the resultant quantity. The third and fourth conditions show that inconsistent estimation of the marginal distribution one, but not both, of the mediators can be corrected for via the propensity score.

We note that Theorem 3 provides sufficient, but not necessary, conditions for consistent estimation of each effect. For example, a consistent estimate of the total effect is implied by a consistent estimate of Q ˜ a , M 1 , M 2 and Q ˜ a , M 1 , M 2 , a condition that is generally weaker than requiring consistent estimation of the outcome regression and joint mediator distribution. Because our estimation strategy relies on estimation of the joint mediator distribution, we have described robustness properties in terms of the large sample behavior of estimators of those quantities.

3.5 Extensions

In the Supplementary material, we provide relevant extensions to the setting where the mediator–outcome relationship is confounded by measured covariates whose distributions are affected by the treatment. In this case, both the effects of interest and their efficient influence functions involve the conditional distribution of the confounding covariates. We discuss the relevant modifications to the estimation procedures to accommodate this setting in the supplement.

Generalization to other effect scales requires only minor modifications. First, we determine the portions of the efficient influence function that pertain to each component of the additive effect. For example, considering ψ M 1 , we identify the portions of the efficient influence function that pertain to the mean counterfactual under draws of M 1 from q a , M 1 ( C ) and of M 2 from q a , M 2 ( C ) versus those portions that pertain to the mean counterfactual under draws of M 1 from q a , M 1 ( C ) and of M 2 from q a , M 2 ( C ) . We then develop a one-step or TMLE for each of these components separately. Finally, we use the delta method to derive the resulting influence function. In the Supplementary material, we illustrate an extension to a multiplicative scale.

Our results can also be extended to estimation of interventional effects for more than two mediators. As discussed in Vansteelandt and Daniel [14], when there are more than two mediators, say M 1 , , M t , there are many possible path-specific effects. However, our scientific interest is usually restricted to learning effects that are mediated through each of the mediators, rather than all possible path-specific effects. Moreover, strong untestable assumptions are required to infer all path-specific effects, including assumptions about the direction of the causal effects between mediators. Therefore, it may be of greatest interest to evaluate direct effects such as

ψ t , A = C 1 × × t { Q ¯ a ( m 1 , , m t , c ) Q ¯ a ( m 1 , , m t , c ) } d Q M 1 ( a ) , , M t ( a ) ( m 1 , , m t c ) d Q C ( c ) ,

which describes the effect of setting A = a versus A = a , while drawing all mediators from the joint conditional distribution given A = a , C , and for s = 1 , , t , indirect effects such as

ψ t , M s = C 1 × × t Q ¯ a ( m 1 , m 2 , c ) × { d Q M s ( a ) ( m s c ) d Q M s ( a ) ( m s c ) } × u = 1 s 1 d Q M u ( a ) ( m u c ) v = s + 1 t d Q M v ( a ) ( m v c ) d Q C ( c ) ,

which describes the effect of setting M s to the value it would assume under A = a versus A = a while drawing M 1 , , M s 1 from their respective marginal distributions given A = a , C and drawing M s + 1 , , M t from their marginal distribution given A = a , C . We provide relevant efficiency theory for these parameters in the Supplementary material.

4 Simulations

4.1 Discrete mediators

We evaluated the small sample performance of our estimators via Monte Carlo simulation. Data were generated as follows. We simulated C = ( C 1 , , C 5 ) by drawing C 1 , C 2 , C 3 independently from Uniform(0,1) distributions, and C 4 , C 5 independently from Bernoulli distributions with success probability of 0.25 and 0.5, respectively. The treatment variable A , given C = c , was drawn from a Bernoulli distribution with g a ( c ) = logit 1 ( 1 + 0.125 c 1 + 0.25 c 2 ) and g a ( c ) = 1 g a ( c ) . Here, we consider a = 1 and a = 0 . Given C = c , A = a 0 , the first mediator M 1 was generated by taking draws from a geometric distribution with success probability logit 1 ( 1.1 + 0.45 c 1 + 0.125 a 0 ) . Any draw of six or greater was set equal to six. The second mediator was generated from a similarly truncated geometric distribution with success probability logit 1 ( 1.1 + 0.15 c 1 + 0.2 c 2 0.2 a 0 ) . Given C = c , A = a 0 , M 1 = m 1 , M 2 = m 2 , the outcome Y was drawn from a Bernoulli distribution with success probability logit 1 ( 1 + c 1 c 2 + 0.25 m 1 + 0.25 m 2 + 0.25 a 0 ) . The mediator distribution is visualized for combinations of c and a 0 in Figure 1. The true total effect is approximately 0.06, which decomposes into a direct effect of 0.05, an indirect effect through M 1 of 0.01 , an indirect effect through M 2 of 0.02, and a covariant effect of 0.

Figure 1 
                  Joint distribution of mediators used in the simulation.

Figure 1

Joint distribution of mediators used in the simulation.

The nuisance parameters were estimated using regression stacking [23,24], also known as super learning [25] using the SuperLearner package for the R language [26]. We used this package to generate an ensemble of a main-terms logistic regression (as implemented in the SL.glm function in SuperLearner), polynomial multivariate adaptive regression splines (SL.earth), and a random forest (SL.ranger). The ensemble was built by selecting the convex combination of these three estimators that minimized tenfold cross-validated deviance.

We evaluated our proposed estimators under this data generating process at sample sizes of 250, 500, 1,000, and 2,000. At each sample size, we simulated 1,000 data sets. Point estimates were compared in terms of their Monte Carlo bias, standard deviation, and mean squared error. We evaluated weak convergence by visualizing the sampling distribution of the estimators after centering at the true parameter value and scaling by an oracle standard error, computed as the Monte Carlo standard deviation of the estimates, as well as scaling by an estimated standard error based on the estimated variance of the efficient influence function. Similarly, we evaluated the coverage probability of a nominal 95% Wald-style confidence interval based on the oracle and estimated standard errors.

In terms of estimation, one-step and TMLE behave as expected in large samples (Figure 2). The estimators are approximately unbiased in large samples and have mean squared error appropriately decreasing with sample size. Comparing the two estimation strategies, we see that one-step and TMLEs had comparable performance for the interventional direct effect, while the TMLE had better performance for the indirect effects. However, the one-step was uniformly better for estimating the covariant effect owing to large variability of the TMLE of this quantity. Further examination of the results revealed that the second-stage model fitting required by the targeted minimum loss approach was could be unstable in small samples, leading to extreme results in several data sets.

Figure 2 
                  Comparison of one-step and TMLEs in terms of their Monte Carlo-estimated bias, standard deviation, and mean squared-error for the interventional direct (
                        
                           
                           
                              
                                 
                                    ψ
                                 
                                 
                                    A
                                 
                              
                           
                           {\psi }_{A}
                        
                     ), indirect (
                        
                           
                           
                              
                                 
                                    ψ
                                 
                                 
                                    
                                       
                                          M
                                       
                                       
                                          1
                                       
                                    
                                 
                              
                              ,
                              
                                 
                                    ψ
                                 
                                 
                                    
                                       
                                          M
                                       
                                       
                                          2
                                       
                                    
                                 
                              
                           
                           {\psi }_{{M}_{1}},{\psi }_{{M}_{2}}
                        
                     ), and covariant (
                        
                           
                           
                              
                                 
                                    ψ
                                 
                                 
                                    
                                       
                                          M
                                       
                                       
                                          1
                                       
                                    
                                    ,
                                    
                                       
                                          M
                                       
                                       
                                          2
                                       
                                    
                                 
                              
                           
                           {\psi }_{{M}_{1},{M}_{2}}
                        
                     ) effects.

Figure 2

Comparison of one-step and TMLEs in terms of their Monte Carlo-estimated bias, standard deviation, and mean squared-error for the interventional direct ( ψ A ), indirect ( ψ M 1 , ψ M 2 ), and covariant ( ψ M 1 , M 2 ) effects.

The sampling distributions of the centered and scaled estimators were approximately a standard normal distribution (Figures 3 and 4), excepting the TMLE scaled by an estimated standard error. Confidence intervals based on an oracle standard error came close to nominal coverage in all sample sizes, while those based on an estimated standard error tended to have under-coverage in small samples.

Figure 3 
                  Illustration of weak convergence and Wald-style confidence intervals based on the one-step estimator. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid triangle) and an estimated standard error (open triangle).

Figure 3

Illustration of weak convergence and Wald-style confidence intervals based on the one-step estimator. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid triangle) and an estimated standard error (open triangle).

Figure 4 
                  Illustration of weak convergence and Wald-style confidence intervals based on the TMLE. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid circle) and an estimated standard error (open circle).

Figure 4

Illustration of weak convergence and Wald-style confidence intervals based on the TMLE. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid circle) and an estimated standard error (open circle).

4.2 Continuous mediators

We examined the impact of discretization of the mediator distributions when in fact the mediators are continuous valued. To that end, we simulated data as follows. Covariates were simulated as above. The treatment variable A given C = c was drawn from a Bernoulli distribution with g a ( c ) = logit 1 ( 1 + 0.25 c 1 0.5 c 1 c 2 ) . Given C = c , A = a 0 , M 1 and M 2 were, respectively, drawn from normal distributions with unit variance and mean values c 1 0.5 a 0 c 1 and c 2 c 2 a 0 . As above, Super Learner was used to estimate all nuisance parameters. To accommodate appropriate modeling of the interactions, we replaced the main terms GLM (SL.glm) with a forward stepwise GLM algorithm that included all two-way interactions (SL.step.interaction). The true effect sizes were approximately the same as in the first simulation. We evaluated discretization of each continuous mediator distribution into 5 and 10 evenly spaced bins. For the sake of space, we focus results on the one-step estimator; results for TMLE are included in the supplement.

Overall, discretization of the continuous mediator distribution had a greater impact on the performance of indirect effect estimators compared to direct effects (Figures 5 and 6). For the latter effects, oracle confidence intervals for both levels of discretization achieved nominal coverage for all sample sizes considered. For the indirect effects, we found that there was non-negligible bias in the estimates due to the discretization. The impacts in terms of confidence interval coverage were minimal in small sample sizes, but lead to under-coverage in larger sample sizes. Including more bins generally lead to better performance, but these estimates still exhibited bias in the largest sample sizes that impacted coverage. Nevertheless the performance of the indirect effect estimators was reasonable with oracle coverage > 90% for all sample sizes.

Figure 5 
                  Illustration of weak convergence and Wald-style confidence intervals based on the one-step estimator that discretized mediator distributions into five bins. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid triangle) and an estimated standard error (open triangle).

Figure 5

Illustration of weak convergence and Wald-style confidence intervals based on the one-step estimator that discretized mediator distributions into five bins. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid triangle) and an estimated standard error (open triangle).

Figure 6 
                  Illustration of weak convergence and Wald-style confidence intervals based on the one-step estimator that discretized mediator distributions into ten bins. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid triangle) and an estimated standard error (open triangle).

Figure 6

Illustration of weak convergence and Wald-style confidence intervals based on the one-step estimator that discretized mediator distributions into ten bins. The left two columns show the kernel density estimate of the sampling distribution of the centered estimates of interventional effects scaled by the oracle standard error (left) and by their estimated standard error (middle). In each case, the asymptotic distribution is shown in black. The right panel shows coverage probability of a nominal 95% Wald-style confidence interval based on an oracle standard error (solid triangle) and an estimated standard error (open triangle).

4.3 Additional simulations

In the Supplementary material, we include several additional simulations studying the impact of the number of levels of the discrete mediator, as well as the impact of inconsistent estimation of the various nuisance parameters. For the former, we found that the results of the simulation were robust to number of mediator levels in the setting considered. For the latter, we confirmed the multiple robustness properties of the indirect effect estimators by studying the bias and standard deviation of the estimators in large sample sizes under the various patterns of misspecification given in our theorem.

5 Discussion

Our simulations demonstrate adequate performance of the proposed nonparametric estimators of interventional mediation effects in settings with relatively low-dimensional covariates (five, in our simulation). In certain settings, it may only be necessary to adjust for a limited number of covariates to adequately control confounding. For example, in the study of the mediating mechanisms of preventive vaccines using data from randomized trials, we need to only adjust for confounders of the mediator/outcome relationship, since other forms of confounding are addressed by the randomized design. Generally, there are few known factors that are likely to impact vaccine-induced immune responses and so nonparametric analyses may be quite feasible in this case. For example, Cowling et al. [27] studied mediating effects of influenza vaccines, adjusting only for age. Thus, we suggest that interventional mediation estimands and nonparametric estimators thereof may be of interest for studying mediating pathways of vaccines. However, in other scenarios, it may be necessary to adjust for a high-dimensional set of confounders. For example, in observational studies of treatments (e.g., through an electronic health records system), we may require control for a high-dimensional set of putative confounders of treatment and outcome. This may raise concerns related to the curse-of-dimensionality when utilizing nonparametric estimators. Studying tradeoffs between the selection of various estimation strategies in this context will be an important area for future research.

We have developed an R package intermed with implementations of the proposed methods, which is included in the Supplementary material. The package focuses on implementations for discrete mediators. However, our simulations demonstrate a clear need to extend the software to accommodate adaptive selection of the number of bins in the mediator density estimation procedure for continuous mediators. In small sample sizes, we found that course binning leads to adequate results, but as sample size increased, unsurprisingly there was a need for finer partitioning to reduce bias. In future versions of the software, we will include such adaptive binning strategies, as well as other methods for estimating continuous mediator densities.

The behavior of the TMLE of the covariant effect in the simulation is surprising as generally we see comparable or better performance of such estimators relative to one-step estimators. This can likely be attributed to the fact that the targeted minimum loss procedure does not yield a compatible plug-in estimator of the vector ψ , in the sense that there is likely no distribution P n that is compatible with all of the various nuisance estimators after the second-stage model fitting. A more parsimonious approach could consider either an iterative targeting procedure or a uniformly least favorable submodel that simultaneously targets the joint mediator density and outcome regression. The former is implemented in a concurrent proposal [28], where one-step and TMLEs of interventional effects are developed for a single mediator when the mediator–outcome relationship is subject to treatment-induced confounding. In their set up, if one can treat the treatment-induced confounder as a second mediator, then their proposal results in an estimate of one component of our indirect effect. In their simulations, they find superior finite-sample performance of the TMLE relative to the one step, suggesting that targeting the mediator densities may be a more robust approach. However, their simulation involved only binary-valued mediators, so further comparison of these approaches is warranted in settings similar to our simulation, where mediators can take many values. We leave these developments to future work.

The Donsker class assumptions of our theorem could be removed by considering cross-validated nuisance parameter estimates (also known as cross-fitting) [29,30]. This technique is implemented in our R package, but we leave to future research the examination of its impact of estimation and inference. We hypothesize that this approach will generally improve the anti-conservative confidence intervals in small samples, but will have little impact on performance of point estimates in terms of bias and variance.

  1. Funding information: D. Benkeser was funded by National Institutes of Health award R01AHL137808 and National Science Foundation award 2015540. Code to reproduce simulation results is available at https://github.com/benkeser/intermed/tree/master/simulations.

  2. Conflict of interest: Prof. David Benkeser is a member of the Editorial Board in the Journal of Causal Inference but was not involved in the review process of this article.

References

[1] Yuan Y, MacKinnon DP. Bayesian mediation analysis. Psychol Methods. 2009;14(4):301. Search in Google Scholar

[2] Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychol Methods. 2010;15(4):309. Search in Google Scholar

[3] Valeri L, VanderWeele TJ. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods. 2013;18(2):137. Search in Google Scholar

[4] Pearl J. Interpretation and identification of causal mediation. Psychol Methods. 2014;19(4):459. Search in Google Scholar

[5] Naimi AI, Schnitzer ME, Moodie EE, Bodnar LM. Mediation analysis for health disparities research. Am J Epidemiol. 2016;184(4):315–24. Search in Google Scholar

[6] Zheng W, van der Laan MJ. Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. J Causal Infer. 2017;5(2):20160006. Search in Google Scholar

[7] VanderWeele TJ, Tchetgen Tchetgen EJ. Mediation analysis with time varying exposures and mediators. J R Stat Soc B (Statistical Methodology). 2017;79(3):917–38. Search in Google Scholar

[8] Dawid AP. Causal inference without counterfactuals. J Am Stat Assoc. 2000;95(450):407–24. Search in Google Scholar

[9] Pearl J. Direct and indirect effects. In: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann; 2001. Search in Google Scholar

[10] Robins JM, Richardson TS. Alternative graphical causal models and the identification of direct effects. Causality and psychopathology: Finding the determinants of disorders and their cures. Oxford, New York: Oxford University Press; 2011. p. 103–58. Search in Google Scholar

[11] Tchetgen Tchetgen EJ, Phiri K. Bounds for pure direct effect. Epidemiology (Cambridge, Mass.). 2014;25(5):775. Search in Google Scholar

[12] VanderWeele TJ, Vansteelandt S, Robins JM. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology. 2014;25(2):300. Search in Google Scholar

[13] Rudolph KE, Sofrygin O, Zheng W, van der Laan MJ. Robust and flexible estimation of stochastic mediation effects: a proposed method and example in a randomized trial setting. Epidemiol Methods. 2018;7(1):2017007. Search in Google Scholar

[14] Vansteelandt S, Daniel RM. Interventional effects for mediation analysis with multiple mediators. Epidemiology. 2017;28(2):258. Search in Google Scholar

[15] Coyle J, van der Laan MJ. Targeted bootstrap. In Targeted learning for data science. Cham: Springer International Publishing; 2018. p. 523–39. Ch. 28. Search in Google Scholar

[16] Muñoz ID, van der Laan MJ. Population intervention causal effects based on stochastic interventions. Biometrics. 2012;68(2):541–9. Search in Google Scholar

[17] Ibragimov I, Khasminskii R. Statistical estimation: asymptotic theory. New York: Springer-Verlag; 1981. Search in Google Scholar

[18] Bickel P, Klaassen C, Ritov Y, Wellner J. Efficient and adaptive estimation for semiparametric models. Berlin Heidelberg New York: Springer; 1997. Search in Google Scholar

[19] van der Laan M, Rubin DB. Targeted maximum likelihood learning. Int J Biostat. 2006;2(1):11. Search in Google Scholar

[20] van der Laan M, Rose S. Targeted learning: causal inference for observational and experimental data. Berlin Heidelberg New York: Springer; 2011. Search in Google Scholar

[21] Dìaz Muñoz I, van der Laan MJ. Super learner based conditional density estimation with application to marginal structural models. Int J Biostat. 2011;7(1):1–20. Search in Google Scholar

[22] Benkeser D, van der Laan MJ. The highly adaptive lasso estimator. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE; 2016. p. 689–96. Search in Google Scholar

[23] Wolpert DH. Stacked generalization. Neural Netw. 1992;5:241–59. Search in Google Scholar

[24] Breiman L. Stacked regressions. Mach Learn. 1996;24:49–64. Search in Google Scholar

[25] van der Laan M, Polley E, Hubbard A. Super learner. Stat Appl Genet Mol. 2007;6(1):25. Search in Google Scholar

[26] Polley E, LeDell E, Kennedy C, van der Laan MJ. SuperLearner: Super Learner Prediction. R package version 2.0-28; 2013. https://CRAN.R-project.org/package=SuperLearnerSearch in Google Scholar

[27] Cowling BJ, Lim WW, Perera RA, Fang VJ, Leung GM, Peiris JM, et al. Influenza hemagglutination-inhibition antibody titer as a mediator of vaccine-induced protection for influenza B. Clin Infect Dis. 2019;68(10):1713–17. Search in Google Scholar

[28] Dìaz I, Hejazi NS, Rudolph KE, van der Laan MJ. Nonparametric efficient causal mediation with intermediate confounders. Biometrika. 2020. 10.1093/biomet/asaa085.Search in Google Scholar

[29] Zheng W, van der Laan MJ. Asymptotic theory for cross-validated targeted maximum likelihood estimation. Technical Report 273. Berkeley: Division of Biostatistics, University of California, Berkeley; 2010. Search in Google Scholar

[30] Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, et al. Double/debiased machine learning for treatment and structural parameters. Econom J. 2018;21(1):C1–C68. Search in Google Scholar

Received: 2020-08-19
Revised: 2021-06-28
Accepted: 2021-06-28
Published Online: 2021-07-30

© 2021 David Benkeser and Jialu Ran, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.