Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes

Abstract Causal mediation analysis is complicated with multiple effect definitions that require different sets of assumptions for identification. This article provides a systematic explanation of such assumptions. We define five potential outcome types whose means are involved in various effect definitions. We tackle their mean/distribution’s identification, starting with the one that requires the weakest assumptions and gradually building up to the one that requires the strongest assumptions. This presentation shows clearly why an assumption is required for one estimand and not another, and provides a succinct table from which an applied researcher could pick out the assumptions required for identifying the causal effects they target. Using a running example, the article illustrates the assembling and consideration of identifying assumptions for a range of causal contrasts. For several that are commonly encountered in the literature, this exercise clarifies that identification requires weaker assumptions than those often stated in the literature. This attention to the details also draws attention to the differences in the positivity assumption for different estimands, with practical implications. Clarity on the identifying assumptions of these various estimands will help researchers conduct appropriate mediation analyses and interpret the results with appropriate caution given the plausibility of the assumptions.


Introduction
Causal inference analyses, explicitly or implicitly, generally involve three steps: define the target causal effect (also known as the estimand, i.e., what we wish to estimate); assess its identifiability (what assumptions are required to learn this causal effect from observed data, and whether they are likely to hold); and then estimate it (i.e., learn it from data) (Pearl, 2018).If there is concern that an identification assumption may not hold, this issue should be dealt with if the analysis were to proceed, e.g., via adding a sensitivity analysis as a fourth step after estimation, or building the uncertainty about the assumption into the estimation procedure.While assumptions are part of most statistical analyses, they are especially important when inferring causal effects from observational data, because some assumptions are untestable, and if they do not hold effects may even be not interpretable.It is important that the researcher conducting an analysis understand the assumptions and judge their plausibility.This paper clarifies for applied researchers the identifying assumptions often invoked in a causal mediation analysis -using a simple setting with a binary exposure A, a single mediator M and a single outcome Y , with appropriate temporal ordering.Even here things are more complicated than in the non-mediation situation.Our goal is to unpack the complicated in a way that is digestible and thus helpful for practice.
Two comments before we proceed.First, this paper focuses on the single mediator case, where the mediator may be univariate or multivariate (but considered en bloc).This leaves quite a few cases outside the scope of the paper, such as settings with multiple mediators where the effects through each mediator are of interest (VanderWeele et al., 2014;Daniel et al., 2015;Vansteelandt and Daniel, 2017) and settings with repeated exposure and mediator over a longitudinal process (Zheng and van der Laan, 2017;VanderWeele and Tchetgen Tchetgen, 2017).Second, as mediation analysis concerns causal effects, exposure-mediator-ourcome temporal ordering is required.Unfortunately, reviews (Vo et al., 2020;Stuart et al., 2021) continue to find many mediation analyses not satisfying this minimal requirement, making it all the more important to reiterate.Without appropriate temporal ordering, our effect identification exercise here would be nonsense.

Estimands
In this paper we talk about estimands using the language of potential outcomes (Rubin, 1974) and potential mediator values.There are a variety of causal estimands to choose from, each being a contrast of potential outcomes under two conditions.The estimands addressed in this paper are defined by conditions where the exposure and in most cases the mediator are (hypothetically) manipulated -an approach championed by Pearl (2001). 1 These include well-known direct and indirect effects of several types, and a range of effects that do not fit a direct or indirect effect label.We give a brief introduction of these estimands here, and return to each one when discussing identification.As the current focus is on identifying (not defining) effects, we refer the reader to the companion article Nguyen et al. (2021) for a detailed discussion of the meaning and relevance of these effect types.
Direct effects reflect the notion of the exposure's influence on the outcome that does not go through the mediator.They are each defined based on some manner of "blocking" the influence that goes through the mediator.With controlled direct effects (Robins and Greenland, 1992;Pearl, 2001), this blocking is done by fixing the mediator to one value (not letting it change in response to the exposure), so a controlled direct effect is the effect of the exposure on the outcome when the mediator is fixed, and it depends on the mediator control value.With natural direct effects (Robins and Greenland, 1992;Pearl, 2001), the blocking is instead done by holding the mediator at the individual's own potential mediator value under one exposure condition.While there are as many controlled direct effects as there are possible mediator values, there are only two natural direct effects, depending on which of the two potential mediator values (under exposure and nonexposure) 1 Separate from the common approach of defining causal effects based on conditions where the mediator is manipulated, Robins et al. (2020) take a different approach anchored on the idea of splitting exposure into components that affect the outcome through different pathways.They define estimands as effects of the split exposures, and develop a theory with graphical rules for identification.That theory is relevant to the research questions addressed in our illustrative example in several ways: the notion of split exposure effects speaks directly to the trimmed intervention question; some insights from the theory are indirectly helpful for the consideration of other interventional effects; and a result from the theory sheds additional light on natural (in)direct effects identification.The same research questions from the current illustrative example here could alternatively be analyzed using this split exposure theory plus a transportability lens. is used.For the interventional direct effects (Didelez et al., 2006;VanderWeele et al., 2014), it is the mediator distribution (rather than the individual's mediator value) that is fixed, and it is fixed to be the same as a potential mediator distribution conditional on covariates.For a set of covariates, there is one potential mediator distribution for exposure and another for nonexposure, so there are two interventional direct effects.The generalized direct effects (Didelez et al., 2006;Geneletti, 2007;Nguyen et al., 2021) generalize the different types of direct effects by letting the mediator distribution be held at any relevant distribution, not just a value or a potential mediator distribution.
Indirect effects reflect the notion of the exposure's influence on the outcome that goes through the mediator.An indirect effect is defined as the effect on the outcome of a switch in the mediator value or distribution from the potential value/distribution under nonexposure to that under exposure (as if in response to a switch in the exposure), while keeping exposure unchanged.When the switch involves the individual specific potential mediator values, we have natural indirect effects; when the switch involves the potential mediator distributions (conditional on covariates), we have interventional indirect effects.Each natural indirect effect pairs with a natural direct effect in summing up to the total causal effect; in fact the motivation for the original establishment of natural (in)direct effects is to split the total effect into path-specific components.Interventional (in)direct effects, in contrast, are not made to decompose the total effect.There are no controlled indirect effects.
All the effects above, except the natural (in)direct effects, are part of a class of effects we call interventional effects.An effect in this class is a contrast between an interventional condition where the exposure and/or mediator are manipulated and a comparison condition with a different manipulation (or no manipulation); the sort of manipulation referenced here is one that sets the variable or its distribution to a value/distribution that is known or can be determined.This is a broad class that contains many effects that do not fit the notion of direct or indirect effects.An example is where the exposure is an existing intervention program, but researchers are also interested in the effect of a hypothetically modified intervention program that no longer targets a mediator, relative to the no intervention condition.For more examples, see Nguyen et al. (2021).
For simplicity, causal effects are represented in this paper on the additive scale and in average form -as differences in potential outcome mean between contrasted conditions.Alternatively, other effect scales (e.g., ratio of means) could be used, and other features of the potential outcome distribution (e.g., median) could be contrasted; the same identification assumptions apply.

Nonparametric point identification
The type of identification discussed here -the type commonly encountered in the causal mediation literature -is nonparametric point identification.Let us clarify what this means.
Since each of our estimands (an average causal effect) is a contrast of potential outcomes under two conditions, things would be simple if we were to observe both of those potential outcomes for each individual.Unfortunately, at most one potential outcome may be observed for each individual; this is called the fundamental problem of causal inference (Holland, 1986).For each individual we observe the one actual outcome (Y ) plus the exposure (A), the mediator (M ) and perhaps some other variables including pre-exposure covariates (C) and other covariates that are affected by exposure (L).The key idea is, if certain assumptions hold, the estimand can be connected to the observed data distribution, i.e., the distribution of {C, A, L, M, Y }.Specifically, the estimand is equated to a function of features of the observed data distribution (e.g., marginal or conditional means and densities).We then call the estimand identified, or more precisely point identified.2As a well known example, in a perfect two-arm randomized controlled trial (RCT), the relevant identifying assumptions (discussed later) hold by design.The average total effect, i.e., the difference between the means of the two potential outcomes (under treatment and control) is identified: it is equal to the difference in mean observed outcome between the two RCT arms.But the RCT does not guarantee identification of the various other effects mentioned above, because the mediator is not randomized.For those effects, identification requires untestable assumptions that should be carefully considered by the researcher.
The identifying assumptions we make do not place any restrictions on the observed data distribution.This means no parametric assumptions such as the type of distribution (normal or other) of variables or the functional form (linear or other) for the associations among variables.The identifying assumptions we make are about equating certain conditional means or densities of potential outcomes (or potential mediators) with conditional means or densities of observed variables, the latter being free to be what they are.This type of identification is thus called nonparametric identification. 3ote that this paper addresses identification, not estimation.Questions such as what models should be fit, or how much can be learned from a sample of a certain size, belong in the estimation step.To put them aside, it may be helpful to imagine having infinite data.

Effect identification via five potential outcome types and three assumptions
Much has been written about assumptions for identification of specific effects in causal mediation analysis (see e.g., Pearl, 2001;Petersen et al., 2006;Imai et al., 2010;VanderWeele et al., 2014;Robins et al., 2020, etc.).The current paper focuses on a systematic explanation of the assumptions, so that the logic of why an assumption is required for one estimand and not another is clear, and the reader can pick out which assumptions are required for the causal effect they target.
Causal effect identification amounts to identifying the two potential outcome means in the contrast.We organize the potential outcomes involved in the various causal effects mentioned above into five types and explain the identifying assumptions required for each, starting with the type that requires the weakest and gradually building toward the one that requires the strongest, assumptions, clarifying connections from one type to the next.
We show that identification of the mean (or distribution) of each potential outcome requires three different types of assumptions.We refer to them as consistency, conditional independence and positivity, but note that they have been discussed under various names in the literature.Consistency assumptions (VanderWeele and Vansteelandt, 2009;Cole and Frangakis, 2009) are closely related to Rubin's (1974) stable unit treatment value assumption (SUTVA).Positivity is also known as overlap or common support in the context of identifying causal contrasts.Assumptions in our conditional independence category have been called unconfoundedness, ignorability, exchangeability, conditional randomization, etc. (see e.g., Hernán and Robins, 2019;Imbens, 2000;Imbens and Rubin, 2008;Rosenbaum and Rubin, 1983;VanderWeele and Vansteelandt, 2009).As assumptions in this category are formally stated using conditional independence statements -of certain variables with potential outcomes (potential mediators) -we adopt the shorthand label conditional independence, which allows concise reference to specific assumption components.
A note for readers not familiar with the concept of conditional independence: that variables A and B are independent conditional on (or given) variable C, formally A ⊥ ⊥ B | C, means that within levels of C (or within each subpopulation that shares the same value on C), knowing A does not tell us anything about B and vice versa.In our current problem, the place of B is occupied by a potential outcome or potential mediator; the A place in most assumptions is occupied by the observed exposure or mediator, except in one assumption it is occupied by a potential mediator; and the C place is occupied by covariates.

Illustrative example
After establishing the identifying assumptions specific to each potential outcome type, we apply them to examine the full set of assumptions needed to identify one or more relevant causal effects.We illustrate this using a running fictional example.In this example, of interest is the health of people who have both a psychiatric disorder and chronic medical problems, and the issue is that psychiatric symptoms may pose challenges to the patient's access to and effective use of medical care (Zeber et al., 2009).We restrict our attention to people who are members of a (more or less) organized system for the provision of health care; in the US this could be a health maintenance organization or an accountable care organization.The members all have health insurance coverage, and the system has some ability to enact certain system-wide change in the practice of care.
Suppose a local (e.g., state) branch of the system offers an intervention program that aims to improve outcomes for members with a bipolar diagnosis and a chronic medical problem, at no additional charge.(This intervention is loosely based on Kilbourne et al.'s (2008) bipolar disorder medical care model.)It consists of (i) a self-management education component that teaches patients (in group sessions over a three-month period) how to manage chronic psychiatric and medical conditions, improve dietary behavior and physical activity, and communicate better with medical providers; and (ii) a care-management component (starting at about four months) that involves help from a case manager who facilitates the patient's communication with their medical providers and oversees the patient's health care.
With this intervention, we take the outcome to be the patient's health-related quality of life at 18 months after program enrollment (quality of life).Two variables are theorized to be on the causal path: a measure of proficient self management of psychiatric symptoms at three months and a measure of effective use of medical care services at 12 months.We label these variables symptom management and service use for conciseness, noting that the second variable is about effectiveness (not quantity) of service use.We illustrate causal contrasts of several types, some focusing on the intervention itself, others in consideration of context changes or system-wide practice adjustments.
2 Five potential outcome types 2.1 First type: Y a Y a is the potential outcome in a world where exposure is set to a, where a may be 1 (exposed) or 0 (unexposed).We denote its mean by E[Y a ], with the general notation E[•] indicating expectation (or mean).This potential outcome type is relevant to the average total effect, defined as TE = , the difference between mean potential outcome under exposure and mean potential outcome under nonexposure.E[Y a ] is also involved in the natural (in)direct effects mentioned earlier, which decompose the total effect.(These will be formally defined shortly.)In addition, E[Y a ] is relevant to effects of all (hypothetical) intervention conditions that are assessed relative to the existing nonexposure condition.

Second type: Y am
This is the potential outcome in the hypothetical world where exposure is set to a and mediator is set to a specific value m.Y am is relevant to controlled direct effects, as formally the controlled direct effect for a mediator control level is the building block for identification of the next potential outcome means -where we consider not just one mediator value but a range of values under a distribution.

Third type:
Y aM where M is a known distribution This is the potential outcome in a hypothetical world where the exposure is set to a and the mediator is intervened upon and set to a distribution M (which we call the interventional mediator distribution), where M is a distribution that is either known or is defined based on data that are observed.Seen from the individual perspective, each individual is assigned a mediator value randomly drawn from the distribution M.This is thus a stochastic intervention (Díaz Muñoz and van der Laan, 2012) on the mediator, whereas the intervention corresponding to the second potential outcome type is deterministic.
Y aM is relevant to generalized direct effects where the mediator distribution is fixed to a distribution M, i.e., GDE( In addition, it is relevant to a wide range of interventional effects where in the active intervention condition (or in both conditions contrasted) the potential outcome is of the Y aM type.Applications 3 and 4 in our illustrative example (see sections 6.5 and 6.6) concern such effects.
Depending on the specific condition of interest, the interventional mediator distribution M may be defined unconditionally (i.e., the same distribution applies to everyone) or conditional on preexposure covariates (e.g., different distributions for men and women) or on post-exposure covariates.In the latter case, we require a same-world rule: the distribution M may be defined conditional on L a (which arises after exposure has been set to a), but not on L a ′ (where a ′ is the other exposure condition) or on the observed L (a mixture of L a and L a ′ ).We can think of this as a plausible interventional world: after exposure is set to a, only L a arises, so an intervention on the mediator may condition on C and L a .
2.4 Fourth type: Y aM with M defined based on potential mediator distribution(s) This is similar to the third type, except that M is defined based on potential mediator distribution(s).This type is involved in interventional (in)direct effects.Recall that an interventional direct effect is the effect of exposure on outcome had the mediator distribution been fixed to be the same as a potential mediator distribution, and an interventional indirect effect is the effect on the outcome of a switch in the mediator distribution from the potential distribution under nonexposure to that under exposure while exposure itself is fixed.These are formally IDE a where M a * |C (a * being either 0 or 1) is convenient notation indicating that the interventional mediator distribution M is defined to be the same distribution as that of the potential mediator M a * given C. Like the previous potential outcome type, this fourth type is relevant to a wide range of interventional effects, not limited to interventional (in)direct effects.Applications 5 and 6 in the illustrative example (sections 7.7 and 7.8) concern such effects.
With this potential outcome type, we make an important subtle differentiation between the interventional mediator distribution M and the potential mediator distribution(s): the former is defined based on the latter, the latter informs the former, but the two are not the same.This allows us to consider a simple potential mediator type, M a * , the potential mediator if exposure were set to a * (a * being just a separate index that is not tied to a), but have ample flexibility in defining the distribution M. For example, M could be defined based on the distribution of M 1 only, or of M 0 only (as in the interventional (in)direct effects above), or a mixture of both.M could be defined unconditionally, e.g., to be the same distribution as the marginal distribution of M a * , or could be defined conditional on C to be the same as the distribution of M a * given C. M could also be defined conditional on (C, L a ) to be the same distribution as that of M a * given (C, L a * ).Note that in all these cases, the interventional mediator distribution M respects the same-world rule.
2.5 Fifth type: the cross-world potential outcome Y aM a ′ , with a = a ′ This is the potential outcome in a completely imaginary world where the exposure is set to condition a and then the mediator is set, for each individual, to its potential value under the opposite condition a ′ .There are two potential outcomes of this type, Y 1M 0 and Y 0M 1 .They decompose the total effect into two pairs of natural (in)direct effects: Additional note: The five potential outcome types above are not exhaustive.These types treat the exposure differently from the mediator: the exposure is set to one value (with no randomness), while the mediator is set either to one value or to a distribution (that is, with randomness).There may be cases in which we are interested in a condition where exposure is set to a certain distribution A instead of a single value 1 or 0 (e.g., where an outreach campaign helps substantially increase the rate of enrollment in the intervention program but not make it 100%).We do not consider this potential outcome type separately, because identification for the five listed types renders this type identified, e.g., if the distribution A is that of 1/3 exposed and 2/3 unexposed, then

Consistency assumption
Consistency is the type of assumption that connects potential variables to observed variables.Here the assumption is that Y = Y a if A = a.That is, for individuals with actual exposure A = a, the observed outcome Y reveals the potential outcome Y a .This seems like an obvious fact, but it is an assumption; the idea is that the potential outcome Y a is well defined (Cole and Frangakis, 2009), no matter how exposure value a is assigned to the individual and no matter what exposure is assigned to other individuals (Rubin, 1974).This assumption would be violated if one person's exposure affects others' outcomes (a typical example being vaccination), or, say, if a person's potential outcome under an exposure varies depending on whether they self-select or are assigned the exposure.

Leveraging covariates
Consistency takes us one step toward our goal; it says that we observe Y a in some individuals.But there is a missing data problem, as we want the mean of Y a over the whole population.To handle this problem, the strategy is to leverage a set of observed covariates C such that conditional on C (i.e., within each subpopulation that shares the same C values), the mean of Y a is identified from the partially observed data.Consequently, the population mean is identified, as it is basically a weighted average of the subpopulation means, where the weights reflect the distribution of the covariates.This is written formally as To identify this conditional mean, in addition to consistency, we need conditional independence and positivity assumptions.
To understand these assumptions (formalized shortly), we need to take a close look at what is meant by covariates.For simplicity, we take covariates to be confounders, and use the common cause definition: a confounder of two variables (an exposure and an outcome) is a cause they share.As a common cause, it induces an association between the two variables, which confounds, or confuses, their true causal relationship.For example, education is likely a confounder of the occupationhappiness relationship, as it influences what occupation people have, and may influence happiness in ways other than through occupation.Confounders are only a subset of variables that can be used to remove confounding, which are called deconfounders (Pearl, 2014).The use of deconfounders that are not simply confounders is an advanced topic we leave out of this paper.
We draw a causal directed acyclic graph (DAG) (Pearl, 2018) in Fig. 1 representing the relationships among the relevant variables.Because the potential outcome Y a is agnostic to mediators, we can leave mediators out of the DAG and just include: exposure A, outcome Y , common causes C of these two variables, and arrows representing the causal influences among those variables.5(In the special case where exposure is randomized, there will be no C, as A and Y do not have common causes.)Also shown are U A and U Y , causes of A and Y that are not shared; such unique causes are often omitted from DAGs.An important note: the arrow from A to Y in this DAG captures all the influence of A on Y (inclusive of influence through and not through the mediator M ); the arrow from C to Y captures all the influence of C on Y except the part that goes through A.

Conditional independence assumption
In most applications, we are not privy to the truth, or the full truth, about confounders.What we try to do is to guess, based on prior knowledge and theory, what the important confounders are, and collect data on them.Then we resort to making the untestable assumption that the covariates C we observe capture all the confounders (of the relationship between A and Y a ).This is the gist of the conditional independence (also known as unconfoundedness, exchangeability or ignorability) assumption.
Let us be a bit more formal here to build clarity that will help with the later potential outcome types.For ease of reference, we label this assumption (I a ), with I for "independence" and the subscript a indexing potential outcome Y a .This assumption is that Y a is independent of exposure status conditional on covariates, formally where ⊥ ⊥ is the symbol for independent.Intuitively, within each subpopulation that shares the same values on covariates C, individuals are similar enough that whether an individual happens to be exposed or not does not carry any additional information about their Y a .This means we can ignore exposure status when considering Y a .Put another way, within each such subpopulation, exposed and unexposed individuals are exchangeable in the sense that they share the same distribution of variable Y a .This allows equating the subpopulation mean of Y a with the mean of Y a in the A = a group in the subpopulation, formally, What conditional independence allows us to do, here and later in the paper, is what we informally call going from whole to part, or vice versa.Here the whole is the (C-value-specific) subpopulation, and the part is the A = a group in the subpopulation.The beauty of this move is that we do not need to worry about the unobserved Y a values of the individuals whose actual exposure is not a.
What we now want to do is to replace the potential outcome Y a on the right-hand side above with the observed Y .Consistency apparently suggests doing so, since on the right-hand side, we are considering Y a only among those with A = a.It turns out, though, that in addition to consistency, we also need the third assumption, positivity.

Positivity assumption
is only legitimate if the latter is well defined for all values of C.This requires the assumption that there is a positive chance of A = a for all C values, formally, P(A = a | C) > 0, where P(•) is the notation for probability or If dist.M conditions on L a , also require: where M is defined based on dist. of M a *

Same as above Same as above, and
If definition of M relies on info.about dist. of M a * given L a * , also require: where M is defined based on dist. of M a *

Potential outcome mean Positivity of exposure conditions
Positivity of mediator values where M is a known distribution

Same as above
For all values m in dist.M, where M is defined based on dist. of M a * Same as above, and if a * = a, also require and is defined to be the same as dist. of M a * given (C, L a * ), also require probability density.Combined with consistency, this means that for all C values there is a positive chance of observing Y a .If this is not the case for certain C values, then the mean of Y a given those values is unidentified, and thus E[Y a ] is unidentified.
Unlike the other two assumptions, positivity is testable, in the sense that given data that have been collected, one could check whether there are parts of the observed covariate distribution where there are no individuals with exposure condition a.
For ease of reference, these three assumptions for all of the five potential outcome types are collected in Table 1.

Identification result
The three assumptions combined help identify the conditional mean of Y a , And then using double expectation, we identify the population mean: Here the inner expectation is the mean observed outcome given C among those in the A = a condition, and the outer expectation averages this over the distribution of C. (In identification result labels, R stands for "result".)

Application 1: the total effect
Identifying the average total effect, , involves identifying both potential outcome means.The assumptions required, collected in Table 2, include (i) consistency of both potential outcomes, (ii) conditional independence of exposure with both potential outcomes; and (iii) positivity of both exposure conditions.(i) and (iii) are often written concisely as Y = AY 1 + (1 − A)Y 0 and 0 < P(A = 1 | C) < 1, respectively.Under these assumptions, the identification result is Table 2: Identifying assumptions for TE In the example, we consider all effects as average effects over the population of people with a bipolar diagnosis and a chronic medical problem who are in the local branch of the health care system.The total effect is difference between the potential outcome means under intervention and under usual care, where the means are taken over this population.
The consistency assumption simply says that the observed quality of life in a patient in the intervention group is the same as their potential quality of life under intervention (Y 1 ), and the observed quality of life in a patient in the usual care group is the same as their potential quality of life under usual care (Y 0 ).We assemble the set C of likely confounders (i.e., common causes of intervention participation and quality of life): age, sex, education, occupation, income, psychiatric and medical diagnoses, baseline psychiatric and medical symptoms, baseline quality of life, and baseline measures of self management of symptoms and effective medical care use.The conditional independence assumption says that individuals that share the same values on these C variables also share the same Y 1 distribution and the same Y 0 distribution, regardless of whether they actually receive the intervention or usual care.If an important confounder, say baseline quality of life, is not included in C, this assumption is violated.
The positivity assumption means that for individuals with any given realization of C, there is a positive chance of receiving the intervention and a positive chance of receiving usual care.If the sample includes individuals with some specific covariate value none of whom participated in the intervention (e.g., patients who could not attend group sessions due to physical mobility challenges), positivity is violated.
A couple of comments: First, in most practical settings, the distinction between the double conditional independence assumption for TE (which involves both Y 1 and Y 0 ) and the single version specific to one potential outcome Y a may not matter, for we would arrive at roughly the same set of covariates.This is fortunate, as substantive considerations tend to be imprecise -asking what are the common causes of A and Y , instead of A and Y a for a specific value a.Note though that when we need to identify the mean of Y 0 or of Y 1 but not both (e.g., the effect of a modified intervention relative to usual care involves Y 0 but not Y 1 ), only the single version relevant to the specific potential outcome is required.Second, the reason why the double positivity assumption here is called covariate overlap or common support is that positivity of both exposure conditions implies that the support of the covariate distribution (i.e., range of covariate values) is shared between the exposed and unexposed.

Several types of confounders in the mediation setting
Unlike Y a , the other four potential outcome types correspond to conditions where both the exposure and the mediator are manipulated.Since the mediator is manipulated, the relevant DAG is expanded from the one shown in Fig. 1.It includes exposure A, outcome Y , mediator of interest M , and covariates that are common causes of any two of these variables.Intuitively, we can think of the covariates as consisting of confounders of the exposure-mediator, exposure-outcome and mediatoroutcome relationships.(And in the special case where exposure is randomized, there are only mediator-outcome confounders.)For any application, the actual DAG (the one that informs the analysis) may get rather complicated, with multiple confounders and complex causal relationships.For the sake of explaining the identifying assumptions, the causal mediation literature commonly uses a shorthand DAG of the form in Fig. 2 (e.g., VanderWeele et al., 2014).
Here covariates are represented by C and L, which differentiate whether they are influenced by exposure -L is but C is not.Relating this to the relationships to be confounded, exposure-mediator and exposure-outcome confounders all belong in C -they influence exposure rather than the other way around.Mediator-outcome confounders are split between C and L: those not influenced by exposure (regardless of whether they influence exposure or not) are in C, while those influenced by exposure are in L. We call this DAG shorthand because C is a collection of different (although overlapping) types of variables.Precisely, of the four arrows depicted as emitting from the node C Note that the current set C is larger than the set C in Fig. 1.The set in Fig. 1 consists of exposure-outcome confounders (which include exposure-mediator confounders7 ), but does not include mediator-outcome confounders that do not influence exposure.While an abuse of notation, the reuse of the label C here is not problematic because the current set C also works for the identifying assumptions of E[Y a ].In fact, there are a couple of other places where either the current set C or a subset of it could be used in an assumption.To keep presentation simple, we simply use C when stating the assumptions, but also note subsets that could replace C where appropriate.
These C variables are commonly referred to as pre-exposure confounders or pre-exposure covariates.This terminology is somewhat imprecise, as mediator-outcome confounders in C do not necessarily precede exposure either in time or in the causal structure; the key point is that they are not influenced by exposure.L variables are often called post-exposure confounders or intermediate confounders.The latter label signals that they are intermediate variables, i.e., also mediators of the effect of A on Y , albeit not the mediator of interest (M ).Another way to think of L variables is that they are variables on the causal pathway from A to M that happen to also influence Y in ways that are not through M .
In the illustrative example with the intervention for bipolar patients, depending on the specific research question, one may take symptom management, or service use, or the combination of both, as the mediator of interest M .Whichever the choice, the covariates need to be expanded to cover the different types of confounders.In principle, exposure-mediator confounders should already be part of the exposure-outcome confounders selected when considering the total effect, but it is helpful to double-check.Also, mediator-outcome confounders may need to be added (as C and L variables); and this should be thought through for the specific mediator being considered.
If symptom management is taken as M , baseline general health-related self-efficacy may be an important common cause of symptom management and quality of life.We thus add this variable to the pre-exposure covariate set C. Since symptom management is measured early on (at completion of the self-management intervention component), we are confident that there are no important post-exposure confounders.
If service use is taken as M , then symptom management is likely an intermediate confounder (an L variable).In addition, we add to the set C a variable indicating whether the patient had an annual checkup in the previous year, which is believed to reflect a baseline tendency to use medical services for self care, a likely mediator-outcome confounder.A variable indicating the patient's specific health insurance plan is also added to the set C.

Consistency assumption
For E[Y am ], this assumption is simply that Y = Y am if A = a and M = m (that is, in individuals with actual exposure a and actual mediator value m, their observed Y reveals their potential outcome Y am ), for a and m being the specific exposure and mediator values that define the potential outcome.

Conditional independence assumption
Recall that E[Y a ] identification requires exposure-outcome conditional independence.Identification of E[Y am ], which corresponds to a condition where not only the exposure but also the mediator is manipulated, requires an assumption of both exposure-outcome and mediator-outcome conditional independence, where outcome refers to Y am .Specifically, Here the exposure-outcome component (I am -ay) says that within levels of C, individuals are similar enough that their actual exposure condition provides no information about Y am .The mediatoroutcome component (I am -my) says that among those with exposure A = a, within levels of the combination of covariates {C, L}, individuals are similar enough that the actual mediator value M does not provide any additional information about Y am .(In other words, with appropriate conditioning, exposure is as good as randomized and so is mediator value.)The exposure-outcome component is similar to assumption (I a ), and like (I a ), it is satisfied by design if exposure is randomized.Since the mediator is not randomized, the mediator-outcome component is always an untestable assumption.
Relating to other terminology, this assumption (essentially unconfoundedness of Y am ) includes no unobserved exposure-outcome and mediator-outcome confounding, where C captures all A-Y am confounders, and C and L capture all M -Y am confounders for those with A = a.The two components of this assumption could also be referred to as ignorability of exposure assignment, and ignorability of mediator value assignment, for Y am .In exchangeability terms, the distribution of Y am is exchangeable between exposure conditions conditional on C, and is exchangeable across mediator values within the A = a group conditional on C, L.

Positivity assumption
Because Y am corresponds to a condition in which both exposure and mediator are manipulated, the positivity assumption includes both positivity of exposure condition a and positivity of mediator value m.The latter is formally P(M = m | C, L, A = a) > 0. That is, among those with A = a, the chance of M = m is positive for any combination of values that covariates {C, L} may take.
A side note about L: In both the mediator conditional independence and mediator positivity assumptions above, L is part of the conditioning set.We will see that for this and the next two potential outcome types (and all the interventional effects that involve them), identification is possible with intermediate confounders L, but for the fifth potential outcome type (and the natural (in)direct effects that involve it) the presence of L generally results in nonidentifiability.

Identification
Under consistency, Y am values are observed in individuals with A = a and M = m.To see how the other assumptions then bridge to the population mean of Y am , it is easier to first consider the special case with no intermediate confounders.In this special case, (I am -my) reduces to M ⊥ ⊥ Y am | C, A = a, and the argument involves going from whole to part twice.Again, consider a subpopulation of individuals that share the same values of C. In this subpopulation, the overall mean of Y am is equal to the mean of Y am in those with A = a (because exposure status is ignorable for Y am under (I am -ay)), which in turn is equal to the mean of Y am among those with A = a and M = m (because mediator value is ignorable for Y am among those with A = a under (I am -my)).Formally, In the general case with intermediate confounders (L), the bridging involves additional steps similar in nature to those above.We leave the details to the Appendix and just consider the result here.As the outcome depends on L in addition to exposure, mediator, and C, the inner expectation now conditions on both C and L, becoming E[Y | C, L, A = a, M = m], the mean observed outcome for those with A = a, M = m within levels of the combination of {C, L}.And instead of the double expectation, we have a triple expectation that averages over L before averaging over C, where the distribution of L that is averaged over is that of those with A = a, within levels of C. Our general identification result is There are two observations that are technical (and non-essential) but may provide additional insight.First, since E[Y am ] concerns only one mediator value, m, (I am -my) may be simplified by replacing M with a dichotomized version of this variable indicating whether it is equal to m or not.We can thus think about Y am as the potential outcome in a condition that intervenes on two binary variables.In most applications this likely does not make a difference as to which variables are included in C and L. However, this clarity might help make the assumption more meaningful 8 as it provides a parallelism: (I am -ay) says within levels of C, individuals are similar enough that whether they receive A = a or not does not carry any information about Y am ; and the simplified (I am -my) says that in the A = a condition, within levels of the combination of {C, L}, individuals are similar enough that whether they receive M = m or not does not tell us anything about Y am .
Second, C here may be replaced by a subset of C consisting of variables that directly influence L and/or Y (labeled C LY ), leaving out variables with no arrows to L and Y (labeled C ❍ ❍ LY ).The reason is simple: when targeting Y am we need to deal with confounders of the relationship between the combination {A, M } and Y am , and variables in C ❍ ❍ LY influence the former but do not influence the latter (other than through the former) so they do not confound this relationship.Intuitively, variables that are included in C only because they are exposure-mediator confounders may be ignored for the purpose of identifying E[Y am ].9

Application 2: a controlled direct effect
Now suppose that our local branch of the health care system receives communication from headquarters that leadership is considering a system-wide revamping of standard operating procedures which would incorporate substantial support for the care of medical problems in people with psychiatric disorders, through the use of a range of case management, provider education, integrated patient records and enhanced linkage network solutions.The communication also says that the expected result of this practice change is to obtain the "maximal effectiveness in use of medical services for care of chronic medical conditions".
Such a system change would have many implications which branch management has to consider.One of the first questions asked is, if left as is, what would be the effect of the intervention for bipolar patients in the new context, where (based on the expected result of the system change) the service use variable takes the highest value (5 on a 0-to-5 scale) for all bipolar patients.Treating this variable as the mediator (M ), the question points to the controlled direct effect CDE(5 for mediator control level 5. (There is, however, doubt about whether this high level is realistic.)CDE(5) identification requires identifying the means of Y 1,5 and Y 0,5 , with assumptions collected in Table 3.Note that within the conditional indepence assumption, the mediator-outcome component is exposure congruent.It is among patients in the intervention program (A = 1) that we assume service use (M ) is ignorable for the potential outcome under intervention-and-highest-service-use (Y 1,5 ) given baseline covariates (C) and symptom management (L).Similarly, conditional independence of M with Y 0,5 is required among patients in usual care (A = 0) only.
The positivity of mediator value assumption, written concisely P(M = 5 | C, A, L) > 0, means that in either exposure condition (intervention or usual care) patients with any realized values of {C, L} had a positive chance of scoring 5 on service use.This assumption is violated if there is any subpopulation defined by C, L, A values whose range of this variable does not include level 5.
These assumptions in Table 3 are weaker than assumptions often stated for controlled direct effects in the literature (e.g., VanderWeele and Vansteelandt, 2009) in two ways.First, our assumptions involve only the specific mediator control level (m = 5), while assumptions in the literature cover all possible mediator values.Such blanket assumptions are less likely to hold, and are only Table 3: Identifying assumptions for CDE(5) needed to identify the collection of controlled direct effects corresponding to every mediator level.Second, the mediator-outcome conditional independence assumption in the literature is not exposure specific, e.g., We only require the latter.
Under the assumptions in Table 3, CDE( 5) is identified by

Identification of E[Y aM ] where M is a known distribution
The assumptions that identify E[Y aM ] for a known distribution M build on those that identify The key extension is that the assumptions are now required to hold for all values m in the support of the distribution M. Note that if the interventional mediator distribution M is defined conditional on covariates, the range of m values depends on the covariates.For example, if M represents an intervention that differentiates by sex, then the range of m for which the assumptions must hold may differ between men and women.

Consistency assumption
Consistency of the potential outcome now means Y = Y am if A = a and M = m for all relevant values m in the support of the distribution M. In addition, if the distribution M is defined conditional on post-exposure covariates L a , then consistency of L a is also required, that is, L = L a if A = a.This assumption calls for conditional independence of exposure and mediator with not just one potential outcome corresponding to a single mediator value, but a collection of potential outcomes corresponding to the range of mediator values from the distribution M. Like the previous case, C here may be replaced by the subset C LY .

Positivity assumption
This assumption includes both positivity of exposure condition a and positivity of relevant mediator values.The latter is P(M = m | C, L, A = a) for all values m in the support of M. This means among those with exposure a, within each subpopulation that shares the same {C, L} values, the actual range of mediator values has to cover the range of values prescribed by the interventional distribution M. If not, this assumption is violated.

Identification result
The identification result is an extension of the triple expectation (R am ) to a quadruple expectation that involves averaging over the distribution M in addition to averaging over L and C.

Application 3: a generalized direct effect
Additional communication from headquarters later clarified that the earlier statement about "maximal effectiveness in use of medical services" meant matching the effective service use level of otherwise similar patients who are without psychiatric disorders, not the highest possible level 5.This means that we are drawing information from the observed effective service use distribution in nonpsychiatric patients, conditional on key covariates (age, sex, education, occupation, income, health insurance plan, medical diagnoses, previous year annual checkup).Instead of the controlled direct effect CDE(5), we are now considering the generalized direct effect GDE( where M is defined to be the same as that observed distribution.The identifying assumptions for GDE(M) are the same as those for CDE(5) in Table 3, except that the service use score 5 is replaced with all values m from the support of the distribution M (the range of variable service use observed in non-psychiatric patients).This range is covariatedependent, e.g., it may be different for people on different health insurance plans, or for people who did versus did not have a checkup in the previous year.
While the assumptions here are more complex than those for CDE(m) where m is a single value, practical considerations of the conditional independence assumption tend to be similarseeking in broad terms exposure-mediator, exposure-outcome and mediator-outcome confounders.The positivity of relevant mediator values assumption, however, deserves attention.It requires that for any (C, L) pattern, regardless of exposure condition, the observed range of mediator values covers the range given that (C, L) pattern in the distribution M. In the current example, since M is defined based on the distribution in the non-psychiatric population, this means that within (C, L) levels, (i) the effective service use score range in bipolar patients in intervention and (ii) the corresponding range in bipolar patients in usual care both cover (iii) the range of this variable in non-psychiatric patients.

Under these assumptions, GDE(M) is identified by E
A side note: While this GDE(M) is a useful contrast to consider, branch management notes that, as an approximation of an anticipated situation, it has an important limitation.If the branch's intervention is effective in helping bipolar patients with symptom management, one would expect that the effective service use distribution would not be exactly the same with or without the intervention.(A patient with better managed bipolar symptoms might benefit more from support, e.g., because they are more likely answer the calls of a case manager.)This is a general limitation of controlled/generalized direct effects.It is hard to match them to plausible situations where in both the exposed and unexposed conditions the mediator (a variable that naturally is affected by exposure) could be fixed to one value or set to the same distribution.
Generalized direct effects are just one of many types of contrasts that involve this potential outcome type.Let us examine another simple example.

Application 4: effect of a not yet implemented program
After a round of consultation between headquarters and branches, it is decided that more research needs to be done before deciding whether to adopt the sweeping system change.One question is what would be its effect on health and well-being, assuming the above-mentioned result of eliminating the difference between psychiatric and non-psychiatric patients in terms of effective use of services for chronic medical problems.Our local branch decides to look into this potential effect for our population of bipolar patients.For a rough answer, we use data from the usual care condition, and consider the contrast where M is defined to be the same as in Application 3.
Identification of τ 1 requires identifying the means of Y 0M and Y 0 .Table 4 collects all the required assumptions (from the relevant sections above, or relevant rows of Table 1).These assumptions are arguably weaker than those for GDE(M), because for τ 1 we do not need to identify Under these assumptions, , where the first term is the same as the second term in the result for GDE(M).To help the reader easily spot this and several other connections among the identification results of the various effects considered in the illustrative example, we gather all those results in Table 5.The table also shows simplified results in the special case with no intermediate confounders.
After examining data from and consulting with branches, and after serious consideration of logistics, costs and benefits, leadership drops the plan for the system change.This concludes a Table 5: Identification results for the effects in the applications (top panel, see assumptions in relevant sections) and their simplication in the special case with no L (bottom panel) where M 0|C is identified as: and M 1|C is identified as: where M 0|C is identified the same as in Application 5, where M |C,0,L1 is identified as: not applicable to τ 3 , because τ 3 defined specifically in the context with L chapter of our story.In the next sections we will pay more attention to the intervention program for bipolar patients at our local branch.

Identification of E[Y aM ]
where M is defined based on potential mediator distribution(s) This case inherits the same assumptions and the same result (R aM ) from the previous case.The only complication is that (R aM ) involves averaging over the distribution M, which in the current case is not known a priori.This means the distribution M itself needs to be identified, which requires identification of the potential mediator distribution(s) used in the definition of M. This adds components to the three assumptions.
As a notation reminder, we use M a * with a generic index a * to denote a potential mediator

Conditional independence assumption
Like before, exposure-outcome and mediator-outcome conditional independence is assumed: for all relevant values m in the support of the distribution M, (I aM -my) In addition exposure-mediator conditional independence is assumed: Unlike when the distribution M was known, here no subset of C can replace C in all three components of this conditional independence assumption.12

Positivity assumption
Like before, positivity of exposure condition a and positivity of relevant mediator values are required, i.e., P(A = a | C) > 0 and P(M = m | C, L, A = 1) > 0 for all m values in the support of the distribution M. In addition, positivity of exposure condition a * (for all levels of C) is required, i.e., P(A = a * | C) > 0. In case (iii), if a * = a, this is replaced by the stronger assumption of positivity of exposure condition a * for all levels of (C, L), i.e., P(A = a * | C, L) > 0.

Identification result
The identification result is of the same form as that for when M is a known distribution: except the distribution M here is not known but is identified as a function of the observed data distribution.In cases (ii) and (iii), this result could be written out in a simple format: For case (i), the expression13 is complicated.

The support of the distribution M
To be complete, we now clarify the support of the distribution M mentioned in the assumptions.In case (i), the support of the distribution M is the same as the support of variable M given A = a * .In case (ii), it is the same as the support of variable M given C, A = a * .In case (iii), it is the same as the support of variable M given C, A = a * , L = l where l is the actual value of L a .These details will become more salient in the applications.
Note that there is asymmetry in the ranges of mediator values for which the assumptions need to hold, which reflects the asymmetry of Y 1M 0|C .We represent this asymmetry via separate expressions of m and n value ranges.The m value range that defines the collection of potential outcomes Y 1m is larger than the n value range that defines the collection of potential outcomes Y 0n .The former is the support of the observed M given C (or the combined support of M 1 and M 0 given C); the latter is the support of M given C and A = 0 (or the support of M 0 given C).
Consider the n-specific component P(M = n | C, L, A = 0) > 0 for all n values in the support of M given C, A = 0. To simplify reasoning, we condition on C and A = 0, and consider a subpopulation of patients in usual care that share the same values of C. Within such a subpopulation, this assumption means that the range of M values does not depend on L, otherwise there are L values for which the range of M does not fully cover the support of M in the subpopulation.In our example, if there is such a subpopulation (patients with a certain profile defined by baseline covariates C who receive usual care) for whom the range of effective service use score (M ) depends on the level of symptom management (L) (e.g., patients with poor symptom management have this score in the range of 0 to 3, while the full range for this subpopulation is 0 to 5), then this assumption is violated.The m-related component, P(M = m | C, L, A = 1) > 0 for all m values in the support of M given C, is even more restrictive in that it requires not only that the range of M given C in the exposed not depend on L, but also that it cover the corresponding range in the unexposed.
Putting the two components together, the positivity of relevant mediator values assumption for identification of this effect pair means: within levels of C, (i) the range of M if unexposed does not depend on L, (ii) the range of M if exposed does not depend on L, and (iii) the latter covers the former.For any application, this stringent positivity assumption should be checked against data.
In the special case with no intermediate confounders, then L is an empty set, which substantially simplifies the positivity requirement.For example, if instead of service use we take symptom management to be the mediator of interest M , and define IDE 0 and IIE 1 the same way as above, except changing the mediator variable, then positivity of relevant mediator values simply means that (iii) within levels of baseline covariates (C), the range of symptom management (M ) under intervention covers the range under usual care.
Under the assumptions in Table 6, the identification result is: where the interventional mediator distributions M 0|C and M 1|C are identified as , respectively (both independent of L given C).In the special case with no L (see Table 5, bottom panel), this simplifies and coincides with the result for natural (in)direct effects (in Application 7, section 8.5).

Application 6: effect of a modified intervention
We continue to treat the service use variable as the mediator M .Now we have a different problem.
Anticipating funding cuts we need to trim the intervention down to a lighter version, so a question is whether to remove the care-management component from the intervention and keep only the self-management component.With such a modified intervention, we expect that effective service use scores may be lower compared to levels under the original intervention, but would not be lower than levels under usual care.
What would be the effect on quality of life of a modified intervention removing care management?Our first answer is this effect could be conservatively approximated by where Y 1M 0|C is the potential outcome in a hypothetical world where the exposure is set to 1 (intervention) and the mediator is assigned a value drawn from the distribution of M 0 (effective service use distribution under usual care) given C.This assumes that the effective service use distribution under the modified intervention is the same as that under usual care.The identifying assumptions are collected in Table 7.
As τ 2 and IDE 0 share the same active intervention condition, but differ in the comparison condition, let us compare their identifying assumptions.There are differences in the consistency and conditional independence assumptions between the two contrasts, but these are not likely to matter in most applications.For example, with a rich enough collection of mediator-outcome confounders that we are willing to assume M ⊥ ⊥ Y 1m | C, L, A = 1 (which is required for both effects), it is likely that we are also willing to assume M ⊥ ⊥ Y 0m | C, L, A = 0 (which is additionally required by IDE 0 ), simply because there is a limit to how deeply we can realistically think about these assumptions.But again, the difference in the positivity of relevant mediator values assumption has practical implications.This assumption, for τ 2 , is simply that within levels of baseline covariates (C), the effective service use (M ) range under intervention, regardless of symptom management (L) value, has to cover the full range under usual care.For IDE 0 , however, the assumption also requires that within levels of C, the M range if unexposed does not depend on L values.This means positivity is more likely to hold for τ 2 than for IDE 0 .
Under the assumptions in Table 7, τ 2 is identified by where the first term is the same as the first term in the result for IDE 0 , but the second term is simpler (see Table 5).In the special case with no L, the two results simplify and coincide with each other, and coincide with the result for NDE 0 .
As τ 2 is a conservative approximation of the effect of the modified intervention, we also consider a closer approximation that does not fix the mediator distribution at M 0|C .The rationale is that improvement in symptom management (which results from the self-management component of the intervention) may itself lead to more effective service use.We thus consider τ where Y 1M |C,0,L 1 is the potential outcome where everything occurs as if in the original intervention condition, except that the effective service use variable is shifted to a distribution M |C,0,L 1 that conditions on covariate values (C, L 1 ) but is defined to be the same as the distribution of M 0 given (C, L 0 ).That is, P Intuitively, this distribution allows the change in symptom management to influence effective service use.
The identifying assumptions for τ 3 are shown in Table 8.Comparing to τ 2 , there are two differences in the positivity assumption.The positivity of the unexposed condition assumption is more restrictive for τ 3 than for τ 2 : for τ 3 it requires that for any realized values of (C, L) combined, the probability of receiving usual care is positive; for τ 2 it only requires this for all C values.On the other hand, the positivity of relevant mediator values assumption is more restrictive for τ 2 : for τ 2 it requires that within levels of C, the M range under intervention for any L value covers the full M range under usual care; for τ 3 it requires that within levels of (C, L), the M range under intervention covers the corresponding range under usual care.Footnote14 clarifies this using numeric values.
Under the assumptions in Table 8, follows naturally.The outcome Y aM a ′ arises based on the combination of (C, a, L a , M a ′ ) where L a and M a ′ are from different worlds.
We need to somehow connect the mean of this unobservable Y aM a ′ to observable data.First, we use the whole-to-part move to narrow down to considering this potential outcome only among those in the A = a condition.By conditioning on C and invoking an exposure-outcome conditional independence assumption similar to (I aM -ay), within levels of C, we replace the mean of Y aM a ′ with its mean among those who experience . This move does not make Y aM a ′ any more observable (it is not) but matches the exposure condition; the mediator remains mismatched.
Next, consider only those in the A = a condition.For each of these individuals, the observed mediator is M a (under consistency).Ideally we want to swap this M a value for the individual's M a ′ value in order to obtain knowledge about the target potential outcome Y aM a ′ , but unfortunately M a ′ is unobserved.Our strategy is to use as a proxy for M a ′ a distribution (to which M a ′ belongs) that captures all the information about M a ′ that is relevant for the purpose of identifying E[Y aM a ′ ], and try to identify that distribution.Using the proxy distribution, we could then apply the swapping of mediator distributions move to identify the target potential outcome mean.But what should be the proxy distribution?It turns out that to capture all the relevant information about M a ′ , it has to be a distribution of M a ′ that conditions on a set of observed variables that removes the confounding of the relationship between M a ′ and Y aM a ′ .In the special case with no intermediate confounders, C is the only common cause of M a ′ and Y aM a ′ : C causes M a ′ in the world with exposure a ′ , and causes Y aM a ′ directly in the hypothetical world we are considering.This simple confounding structure is shown in Fig. 3b.The appropriate proxy for M a ′ in those with A = a is thus the distribution of M a ′ given C, A = a.Although M a ′ is not observed for those with A = a, under the same exposure-mediator conditional independence assumption as (I aM -am) (except replacing M a * with M a ′ ), this distribution is identified to be equal to the distribution of the observed M given C, A = a ′ .This means that, assuming all the other assumptions hold, the result for E[Y aM a ′ ] is a simple adaptation of the result for E[Y aM ], replacing M with this proxy distribution and removing L.
In the general case, there are likely intermediate confounders.Now the relationship of M a ′ and Y aM a ′ is confounded not only by C but also by the unique cause U L of L: U L causes M a ′ through L a ′ , and also causes Y aM a ′ through L a (see Fig. 3a).This means that the set of variables that a proxy distribution (if one exists) conditions on must include not only C but also at least one of the three variables U L , L a , L a ′ (to remove confounding by U L ).Among those with A = a (whose outcomes we are counting on using to learn about the target potential outcome mean), only L a is observed.This suggests using the distribution of M a ′ given C, L a , A = a as the proxy distribution.Unfortunately, any distribution of M a ′ that conditions on the other-world L a is unidentified.Hence E[Y aM a ′ ] is unidentified.Consequently, identification of E[Y aM a ′ ] requires that there are no intermediate confounders.

Consistency assumption
Similar to the previous case, this assumption here includes the regular (i) consistency of potential outcomes, Y = Y am if A = a, M = m for m in the support of the observed M given C, A = a ′ ; and (ii) consistency of the potential mediator, M = M a ′ if A = a ′ .In addition, it includes (iii) consistency of the cross-world potential outcome, Y aM a ′ = Y am if M a ′ = m.The latter belongs in a different category that connect different types of potential variables, rather than connecting potential to observed variables.

Conditional independence assumption
To show how this assumption compares to the one for the previous potential outcome type, we present it in four elements as follows.
A The first three elements are similar to the assumption from the previous potential outcome type, except replacing a * with a ′ and removing L (since we require an empty L). 15 The key difference (from all the previous sets of assumptions) is the fourth element, which says that conditioning on C is sufficient to remove confounding between M a ′ and Y aM a ′ .This fourth element is well known as the cross-world independence assumption.

Positivity assumption
This assumption includes positivity of both exposure conditions, 0 < P(A = 1 | C) < 1; and positivity of relevant mediator values, P(M = m | C, A = a) > 0 for all m values in the support of M given C, A = a ′ .

Identification result
The identification result is a rather simple triple expectation: (R aM a ′ ) the intermediate confounder symptom management (L); this violates the key cross-world independence assumption required to identify E[Y 1M 0 ].While natural (in)direct effects are not identified, recall from Application 5 that if the assumptions in Table 6 hold, interventional (in)direct effects are identified.This is perhaps a common situation, as the absence of intermediate confounders may be a special situation; it has been noted that unless the mediator comes closely in time after the exposure, there are likely intermediate confounders (Vansteelandt and VanderWeele, 2012).In such a situation, it might be tempting to want to switch the estimand from natural to interventional (in)direct effects (as in our story in Application 5), but then that effectively changes the question being asked.An alternative is to opt for estimating bounds on natural (in)direct effects (Miles et al., 2017) or to do a sensitivity analysis on the unidentified cross-world associations (Daniel et al., 2015).
Let us put our bipolar intervention example with the intermediate confounder problem aside and examine the assumptions in Table 9 as generic assumptions.We note that these assumptions are asymmetric, which means they are weaker than the symmetric assumptions often found in the literature.While the symmetric assumptions allow identification of both pairs of natural (in)direct effects, it seems that often researchers are interested in only one of the two pairs, which means only the asymmetric assumptions relevant to the pair are required.Admittedly it may be hard to think in practical terms about when the conditional independence assumptions may hold for one but not the other pair of natural (in)direct effects.But this is more clear with the positivity of mediator values assumption.Its symmetric statement in the literature implies that within levels of C, the mediator range is the same between the two exposure conditions, which is unnecessarily restrictive if we are interested in only one pair of effects.For the current pair of effects, it is only required that within levels of C, the mediator range in the exposed condition covers the mediator range in the unexposed condition.For the other pair of effects, the opposite is required.The practical implication is that in some cases, one pair of natural (in)direct effects may be identified but the other is not due to positivity violation.
Under the assumptions in Table 9, the current pair of natural (in)direct effects are identified as In this case, an alternative is to treat L and M both as mediators and target path-specific effects (see e.g., VanderWeele et al., 2014).This would take us to the general topic of multiple mediators analysis, which is outside the scope of the current paper.Here we just note a few key points without explication.Path-specific effects are extensions of natural (in)direct effects for the multiple causally ordered mediators case.Definition of these effects requires a different kind of nested potential outcome, Y aL a ′ M a ′′ L a ′ (for the two mediators case).For example, one decomposition of the total effect is into three components: a direct effect (E[Y 1L 0 M 0L 0 ] − E[Y 0 ]), an effect through the first mediator ) and an effect through the second but not the first mediator Roughly speaking, identification of these three effects requires that there are no unobserved exposure-mediator, exposure-outcome, mediator-mediator (L-M ) and mediatoroutcome confounders, and that there are no exposure-induced (observed or unobserved) confounders of the relationship between (L, M ) and Y , plus relevant consistency and positivity assumptions.

Concluding remarks
We have shown that identification of a wide range of causal effects in the single mediator case boils down to identification of the mean (distribution) of potential outcomes of five types, and how the assumptions required are connected, getting more complex only as the condition that defines the potential outcome gets more complex.We provide Table 1 as a menu the substantive researcher can use to assemble identifying assumptions for their target causal estimand.We demonstrate the plausibility consideration of such assumptions for several estimands of common interest through an illustrative example.We recommend using this paper alongside the companion "estimands" paper Nguyen et al. ( 2021).The combination of the two papers aim to help the applied researcher first flexibly define causal mediation effects to match their research question and then to assess the effects' identifiability.
This paper did not cover more complex cases such as multiple causally ordered or unordered mediators and repeated exposure and/or mediator over a longitudinal process.Each of these settings comes with a range of effect definitions and identification strategies.The same kind of exercises we conducted in the simple case (connecting effect definitions to real-world research questions and systematic examination of their identification assumptions) is highly recommended for these more complex cases -for the purpose of making advanced methods more accessible and meaningful to applied researchers, facilitating their appropriate use and promoting quality research.
where the right-hand side is a double expectation, with the inner expectation E[Y a | C] representing the mean of Y a conditional on covariates C, and the outer expectation averaging the conditional mean over the covariate distribution. 4Essentially we identify E[Y a ] by identifying the conditional mean E[Y a | C].

Figure 1 :
Figure 1: A simple DAG with variables relevant to E[Y a ]

Figure 2 :
Figure 2: A shorthand DAG: C is a collection of variables, each of which has at least two of the four depicted arrows emitted from C Similar to previous reasoning, under consistency and positivity, the right-hand side is replaced with E[Y | C, A = a, M = m], thus identifying the conditional mean E[Y am | C].Then using the double expectation trick, we obtain E[Y am ] = E C { E[Y | C, A = a, M = m] }, where the inner expectation is the mean observed outcome among those with A = a, M = m within levels of C, and the outer expectation averages over the distribution of C.This is very much in the same spirit as (R a ), the identification result of E[Y a ].
This assumption 10 is that for all the relevant values m in the support of the distribution M,A ⊥ ⊥ Y am | C, (I aM -ay)10 If the distribution M is defined conditional on La, strictly speaking, the first component of the assumption is A ⊥ ⊥ (La, Yam) | C. Conditional independence between A and La is needed because, since La is observed only in those with A = a, identification of E[YaM] (where M conditions on La) requires borrowing information about La across exposure conditions.Fortunately this distinction does not make any practical difference, because the C variables we are considering are confounders.A set of confounders C that satisfies A ⊥ ⊥ Yam | C also satisfies A ⊥ ⊥ La | C.This follows from the fact that La is a cause of Yam, which means the confounders of the A-Yam relationship (a subset of C) contain the confounders of the A-La relationship.M ⊥ ⊥ Y am | C, L, A = a.(I aM -my)

Figure 3 :Y
Figure 3: Examining the common causes of M a ′ and Y aM a ′ ⊥ ⊥ M a ′ | C, (I aM a ′ -am)and for all values m in the support ofM given C, A = a ′ , A ⊥ ⊥ Y am | C, (I aM a ′ -ay) M ⊥ ⊥ Y am | C, A = a, (I aM a ′ -my1) M a ′ ⊥ ⊥ Y am | C.(I aM a ′ -my2)

NDE 0
= E C E M |C,A=0 E[Y | C, M, A = 1] − E C E[Y | C, A = 0] , NIE 1 = E C E[Y | C, A = 1] − E C E M |C,A=0 E[Y | C, M, A = 1] .8.5.1 A foot note on extension of natural effects in the case with LThe presence of intermediate confounders L means that the natural (in)direct effects are unidentified.

Table 1 :
Three identifying assumptions for five potential outcome means