A Two-Stage Joint Modeling Method for Causal Mediation Analysis in the Presence of Treatment Noncompliance

Estimating the effect of a randomized treatment and the effect that is transmitted through a mediator is often complicated by treatment noncompliance. In literature, an instrumental variable (IV)-based method has been developed to study causal mediation effects in the presence of treatment noncompliance. Existing studies based on the IV-based method focus on identifying the mediated portion of the intentionto-treat effect, which relies on several identification assumptions. However, little attention has been given to assessing the sensitivity of the identification assumptions or mitigating the impact of violating these assumptions. This study proposes a two-stage joint modelingmethod for conducting causal mediation analysis in the presence of treatment noncompliance, in which modeling assumptions can be employed to decrease the sensitivity to violation of some identification assumptions. The use of a jointmodelingmethod is also conducive to conducting sensitivity analyses to the violation of identification assumptions. We demonstrate our approach using the Jobs II data, in which the effect of job training on job seekers’ mental health is examined.


Introduction
In randomized experiments, the interest is often not only in the effect of a randomized treatment but also in the effect transmitted through a mediator. This is because investigating mediating mechanisms provides a complete explanation of the effect of the treatment. Recently, there have been many studies on causal mediation analysis, which focuses on how to identify and estimate the average effect of a treatment transmitted through a mediator (See, e.g., [1][2][3][4]). One complication that arises when conducting this type of analysis is non-ignorability of a mediator because mediators are seldom randomized, even in randomized experiments.
Another complication in randomized experiments is that some participants do not adhere to the assigned treatment. In this article, we refer this non-adherence to the assigned treatment to treatment noncompliance. In the presence of treatment noncompliance, the treatment receipt status is no longer random even when the treatment is assigned randomly because participants self-select to adhere to the treatment or not. One analytical option to address this issue is to focus on the effect of the assigned treatment, namely, intentionto-treat (ITT) effects. Under a randomized treatment and the stable unit treatment value assumption (SUTVA),¹ the ITT effect is identified as the difference in the average outcome value between those who are assigned to the treatment and those who are not. ITT analysis avoids the problem of treatment noncompliance because inference relies only on the randomization of the treatment [5].
Further challenge arises when identifying the mediated portion of this ITT effect. Simply employing the mediation formula [6] with the assigned treatment (as if the assigned treatment is actual receipt of the treatment) does not provide a valid result [7] because this approach violates an important assumption of causal mediation analysis: no treatment-induced mediator and outcome confounding [1,3,6]. In the presence of treatment noncompliance, the actual treatment receipt status impacts both mediator and outcome and is influenced by the assigned treatment (i.e., treatment-induced mediator and outcome confounding). One way of circumventing this issue is to identify this mediated portion of the ITT effects on the basis of the average causal mediation effect (ACME) among compliers. Among compliers, the assigned treatment always coincides with the treatment received and thus, the ACME can be estimated without this issue of treatment induced mediator and outcome confounding. Yamamoto [7] proposed this way of identifying the mediated portion of the ITT effect using the instrumental variables (IV) approach.
While the IV approach successfully addresses the issue of identifying the mediated ITT effect, a concern remains. Estimating the mediated ITT effect on the basis of the ACME among compliers requires multiple identification assumptions. Due to these multiple identification assumptions, validating results in the IV approach is often challenging. Previous research by Yamamoto [7] left assessing the validity of results to the violation of identification assumptions to future study. Another study by Park and Kürüm [8] assessed the validity of results by assuming a worst case scenario but failed to assess the sensitivity of the results systematically to all possible scenarios. Therefore, it is necessary to develop an approach that can mitigate the impact of violations of identification assumptions and/or be more conducive to conducting sensitivity analyses.
In this article, we propose a two-stage joint modeling method to estimate the mediated ITT effect because of its potential benefit of employing modeling assumptions such as distributional assumptions [9] and additional covariates [10,11] that can mitigate the impact of the violation of some identification assumptions. Another benefit of this method is that it provides a relatively convenient setting to conduct sensitivity analyses to the violation of identification assumptions, compared to the IV-based method. These benefits are demonstrated using the JOBS II data, in which the effect of job training on job-seekers' mental health is examined.
The rest of the article is organized as follows. We introduce our motivating example in Section 2. In Section 3, we present the identification result of the mediated and unmediated ITT effects. In Section 4, we propose a two-stage joint modeling estimation method, which is followed by a simulation study that examines the role of modeling assumptions when identification assumptions are violated (Section 5). In Section 6, we propose sensitivity analyses based on the proposed joint modeling method. In Section 7, we show how sensitivity analyses to the violation of identification assumptions can be conducted in the context of our example. We conclude with a discussion.

JOBS II Intervention Project
This study is motivated by the JOBS Search Intervention Study (JOBS II) [12]. Job loss can lead to harmful effects on a worker's mental, physical, and social health [13][14][15]. The JOBS II study was designed as a randomized trial to examine the effects of a job training intervention on unemployed individuals' mental health. The goal of this intervention was to prevent the negative effects of job loss by equipping job seekers with efficient job search strategies. The randomized treatment group was assigned to five half-day job searching seminars. Both treatment and control groups received a booklet describing job searching skills. In the JOBS II study, the job training intervention seminars were only available to subjects in the treatment group; subjects in the control group had no way of participating in the seminars. In line with many previous studies [16][17][18], we define the treatment receipt status as attending at least one out of five job-searching seminars. Forty-eight percent of those who were assigned to the treatment did not attend any job searching seminars.
Project recruitment consisted of a short screening questionnaire (T0) to determine eligibility, resulting in 1,801 participants. The pre-treatment survey was mailed (T1), and follow-up surveys were mailed two months (T2), six months (T3), and two years (T4) after the week of job training seminars. Data collected in this study included demographic variables such as age, gender, race, and marital status, as well as measures of depression, self-esteem, job-search efficacy, internal control orientation, and reemployment status. Descriptive statistics for the variables used in our analysis are presented in Table 1.
Previous analysis of JOBS II data showed that the job-training intervention produced beneficial effects, including increased reemployment rates and improved mental health [8,19,20]. More specifically, Price et al. [21] showed that the intervention had beneficial effects on those who were identified as being at high risk for experiencing mental health setbacks such as episodes of depression. They also identified sense of mastery as a mediator for the relationship between the intervention and depression. Our analysis will differ from these analyses in that we will investigate the association between job-training seminars and depression in the presence of the mediator, sense of mastery, and by addressing treatment noncompliance using a twostage joint method, which provides a convenient setting for systematic analyses of sensitivity to the violation of identification assumptions. The outcome variable, depression, was measured using responses to an 11-item list based on the Hopkins Symptom Checklist [22]. The mediator variable, sense of mastery, was computed as the mean score of job-search efficacy, self-esteem, and internal control orientation.  3. depression levels measured after (T3) the training, 4. represent the ratio of males in our data, 5. represent the ratio of nonwhite subjects in our data, and 6. depression levels measured before the training.

Identification
In order to precisely define the effects of interest, consider an experimental setting that mimics the JOBS II project, where some subjects did not comply with the assigned treatment. Let Z i represent the assigned treatment, where Z i = 0 if individual i is assigned to the control condition and Z i = 1 otherwise; let T i represent the actual treatment received, where T i = 0 if individual i did not receive the treatment and T i = 1 if individual i attended at least one job training seminar; M i and Y i represent the mediator and outcome, respectively; and X is a vector of multiple observed pre-treatment covariates. The supports of the distributions of X i , M i , and Y i are represented as X, M and Y, respectively. Under the SUTVA, T i (z) represents the treatment receipt status if individual i was assigned to Z i = z; M i (z) represents the potential mediator of M under Z i = z; Y i (z, m) represents the potential outcome Y under Z i = z, and M i = m for individual i for z ∈ {0, 1} and m ∈ M. P i is an indicator for compliance type that includes compliers (P i = c) and never takers (P i = n).
Throughout the paper, we assume the randomization of the treatment assignment.

Effects of Interest.
Our primary effects of interest are the mediated and unmediated portion of the ITT effect. These are the average effect of offering the treatment on the outcome transmitted through (mediated ITT) or not through (unmediated ITT) a mediator. Since the decomposition is based on the average effect of offering the treatment, we include both those who did and did not comply with the assigned treatment in the analysis. In other words, ITT analysis tests the effectiveness of a randomized intervention regardless whether the subjects actually received the treatment or not. Therefore, the mediated and unmediated ITT effects are of interest for those who want to evaluate the overall effect of an intervention and investigate underlying mechanisms of the effect in a usual setting, in which not every subject complied with the treatment. Throughout this paper, we focus on the mediated and unmediated portion of the ITT effects that include both compliers and non compliers. Following Yamamoto [7], the mediated and unmediated portion of the ITT effect will be identified and estimated on the basis of the ACME and average natural direct effect among compliers, respectively. Therefore, we first define the complier average causal mediation effect (CACME) and complier average natural direct effect (CANDE), as where z ∈ {0, 1}. In our example, δc(1) indicates among the compliers to what degree the level of depressive symptoms has changed in response to the change in the sense of mastery (from the value that would have resulted under the training to the value that would have resulted under the control) under the job training condition. Likewise, ζc(1) indicates among compliers the average change in the level of depressive symptoms in response to the change in treatment status (that is, from being assigned to job training vs no training), while holding the mediator at the value under the job training condition. In order to obtain the CACME and CANDE, distributions of mediator and outcome need to be modeled. We use the likelihood to model the distribution of Y , M, and T, given X and Z. For t ∈ {0, 1} and z ∈ {0, 1}, let S TZ tz denote a set of observations with T = t and Z = z. Under assumption 1, the likelihood is where f (·|·) is a conditional probability density function of a random variable of M and Y; α tz and β tz are the vectors of coefficients in the mediator and outcome models, respectively, when T = t and Z = z; and λ is the vector of coefficients for treatment receipt status. From this likelihood, however, it is not possible to model the distributions within the subpopulation of compliers because compliance type is unknown. According to Angrist et al. [23], an individual compliance type can be expressed as the difference in the actual treatment receipt status that would have been observed under the treatment and control conditions. For example, compliers are those who adhere to their assigned treatment (that is, T i (1) − T i (0) = 1). Always takers are those who receive the treatment regardless of assignment, and never takers are those who do not receive the treatment regardless of assignment (that is, T i (1) − T i (0) = 0). Defiers are those who do not comply with the treatment protocol and do the opposite of what they are assigned to (that is, T i (1) − T i (0) = −1). The compliance type for each individual is unknown because subjects are assigned to either the treatment or control condition but not to both (that is, T i (1) or T i (0)). Therefore, we need to invoke more assumptions to identify the distributions of mediator and outcome by compliance type, which are strong monotonicity and exclusion restriction for never takers. In a study where program protocol prohibits subjects in the control group from having access to the intervention, T i (0) = 0 for all i. This implies that we can rule out the possibility of defiers and always takers. After excluding defiers, those who are assigned to the training but did not attend (T i (1) = 0) are uniquely identified as never takers. After excluding always takers, those who are assigned to the training and attended (T i (1) = 1) are uniquely identified as compliers. However, the compliance type for those who are assigned to the control group is still not identified. Therefore, we make the exclusion restriction assumption for never takers.

Assumption 3: Exclusion restriction (ER) for never takers.
This assumption was discussed by Little and Yau [16] in the absence of a mediator, and we extend it to a mediation setting. This assumption states that the never-taker distribution in terms of the mediator (or the outcome) is the same under either assignment, given covariates. In formal expression, for z ∈ {0, 1}, z ′ = 1 − z, m ∈ M, and x ∈ X, where αpz and βpz are the vector of coefficients in the mediator and outcome models, respectively, when P = p and Z = z.
This assumption implies that the direct and indirect effects are allowed only for compliers (but not for never takers), given baseline covariates. This assumption enables us to identify the complier distributions of the mediator and the outcome by fixing the parameters for never-taker distributions at the same value under either assignment, given covariates. The plausibility of this assumption is often questionable due to psychological effects unless a doubleblind design was used to prevent these effects. For example, this assumption would be violated if those who are assigned to but did not receive the job training (i.e., never takers) regretted their failure to take advantage of the intervention and improved job-searching skills by reading a book. Therefore, we develop a sensitivity analysis to assess the effect of violating this assumption for studies in which this assumption might be violated or not plausible, and we demonstrate this sensitivity analysis approach in the JOBS II example.
Under assumptions 1-3, the likelihood can be rewritten as where πp is the probability of P i = p, given covariates. We offer four remarks regarding this likelihood. First, the compliance type for those who are assigned to the treatment is uniquely identified under strong monotonicity. Second, even with strong monotonicity, the compliance type for those who are assigned to the control condition is not uniquely identified. Therefore, the likelihood is expressed as the mixture between complier and never taker distributions, as shown in the last two lines of equation (4). Third, under the exclusion restriction for never takers, parameters for never-taker distributions are fixed to α n1 and β n1 under either assignment. Fourth, parameters by compliance type for the mediators and outcome models can be consistently estimated from this likelihood although the estimates may not be necessarily given a causal interpretation. Since Assumption 4 (LSI) is not assumed, the estimates are obtained given the correlation between the errors in the mediator and outcome models generated from the data.
Based on the parameters among compliers obtained from the likelihood, we can write the following linear structural equation models (LSEM) with varying coefficients as where these terms are the mean parameters of corresponding varying coefficients.
Under assumption 1, we can causally identify the complier average effect of treatment on the mediator (i.e., αcz) and on the outcome (i.e., cz). However, the complier average effect of mediator on the outcome (i.e., βcm and βczm) is not causally identified due to possible confounding in the mediator and outcome relationship among compliers. Therefore, we need to additionally invoke the local sequential ignorability assumption. [7]. This assumption asserts ignorability of the mediator with respect to the potential outcome among compliers, given treatment and pretreatment covariates. This assumption implies that 1) among compliers, there is no pre-treatment confounding between M and Y, given baseline covariates and 2) among compliers, there is no treatment-induced confounding in the M and Y relationship, given baseline covariates. In formal expression,

Assumption 4: Local sequential ignorability (LSI)
Instead of requiring no unmeasured confounding in the M − Y relationship for every participant as in standard causal mediation literature, the local sequential ignorability assumption requires the unconfoundness between the mediator and outcome to be met only for compliers. Although LSI is required for a smaller subset of participants, this assumption is still challenging to meet in practice. Therefore, it is essential to examine the sensitivity of results against this assumption.

Estimation
In this section, we propose a two-stage estimation method based on a joint modeling approach, in which distributional assumptions or additional covariates can be used to reduce the impact of violating some identification assumptions. The proposed estimation method consists of two stages. In the first stage, using joint modeling, we estimate the densities of f (y|m, x, p; βpz) and f (m|x, p; αpz), which depend on parameters βpz and αpz, respectively; and the probability of compliers πc(x, λ), which depend on parameters λ. In the second stage, the CACME and CANDE are estimated based on the identification results presented in the previous section. Subsequently, the mediated and unmediated ITT effects are estimated on the basis of the CACME and CANDE estimates, respectively.
First Stage. In the first stage, we use joint modeling, which has been used for estimating the complieraverage causal effect (CACE) [16,23]. We generalize that work by formulating and fitting a model to investigate CACME and CANDE. The estimation procedure of this joint modeling approach is based on the expectationmaximization (EM) algorithm, in which the unobserved compliance type for each subject in the control group is treated as missing data. The E-step computes the expected values of sufficient statistics, given data and current estimates, and the M-step maximizes the likelihood shown in equation (4), given the updated sufficient statistics obtained from the E-step. These steps iterate until the estimates of the parameters become stabilized (See [11,16,24,25] for further details on this procedure).
Using the EM algorithm, we can obtain the probability of compliers. We assume that the distribution of P i given covariates is assumed to have a Bernoulli distribution with a probability of compliance πc( , and πn( where λ is a vector of logistic regression coefficients. Compared to the previous IV-based method [7,8], the proposed method provides additional information about the probability of compliers. This information will be used to create a pseudo-population of compliers in order to conduct a sensitivity analysis to violation of LSI. The conditional probability density functions of random variables M and Y are obtained using the following parametric models. Given that we have two compliance types (compliers and never takers), the mediator and outcome models can be expressed as a mixture distribution between these two compliance types as where C i and N i are indicators for compliers and never takers, respectively; αp, and αpz are the mean parameters of the mediator model coefficients; and βp, βpz, βpm, and βpzm are the mean parameters of the outcome model coefficients when p ∈ {c, n}. The error terms for the mediator and outcome models are e p2,i and e p3,i for p ∈ {c, n}, respectively. These error terms follow a bivariate normal distribution with a mean of zero and where ρp is the correlation between e p2,i and e p3,i ; and σ p2 and σ p3 are standard deviations of the two error terms.
To impose ER, we fixed the effect of treatment on the mediator and the outcome among never takers to zero (that is, αnz = βnz = βnmz = 0) thus not allowing a treatment effect among never takers. To impose LSI, we fixed the the covariance among compliers between errors obtained from mediator and outcome models to be zero as Second Stage. Based on parameter estimates obtained from the first stage, the CACME and CANDE can be estimated asδc(z) =αcz × (βcm +βczm z) andζc(z) =βcz +βczm(αc +αcz z), respectively. The mediated and unmediated ITT effects are estimated by multiplying the proportion of compliers to the CACME and CANDE estimates respectively, asδ(z) =δc(z)×πc andζ (z) =ζc(z)×πc. Two-stage estimation is known to be inefficient in terms of standard errors [24], so we employed a bootstrap procedure to obtain correct standard errors for mediated and unmediated ITT effects.

Simulation Study
The purpose of this simulation study is to 1) assess the performance of the proposed joint modeling method and 2) examine statistical power in the method. In addition, we examine the sensitivity of the estimates to violations of identification assumptions and we explore changes in this sensitivity when the normality assumption is met or when a strong predictor of compliance exists. In the context of CACE, the impact of violation of ER can be mitigated by using additional covariates [11]. However, the reliance on modeling assumptions in case of violating the ER assumption is not well known in a mediation setting. This will be addressed in our simulation study. For simplicity, we focus on the decomposition of τ = δ(1) + ζ (0) in this simulation study.

Data Generation.
Our simulation results are based on 1000 replications with the sample sizes of 200, 400, and 600. The assigned treatment Z is a binary variable that takes the value of 1 or 0. The two values of Z are randomly assigned for each observation with the proportion of 0.5. In line with the JOBS II data, we assume that there are two compliance types: compliers and never takers. The compliance type for each observation is determined by a pretreatment covariate following the logistic regression shown in equation (6), in which the pretreatment covariate (X) is generated to follow a standard normal distribution. The true ratio of compliers and never takers is 50 : 50. The mediator (M) and outcome (Y) are generated for each compliance type following the regression shown in equation (7). For simplicity, the average complier treatment effect on the mediator is set to αcz = 1, and the average complier mediator effect and its interaction with the treatment on the outcome are set to βcm = βczm = 1, respectively. Thus, the true values of the mediated and unmediated ITT effects are assumed to be δ(1) = ζ (0) = 1. The true residual variance is 1 for compliers and never takers (i.e., σ 2 p2 = σ 2 p3 = 1, where p ∈ {c, n}). One of the important conditions that we vary is the strength of the predictor (X) of compliance. In order to reflect the strong, medium, and small impact of the predictor, we vary the true values of λn = {2.3, 1.2, and 0.7}, which are equivalent to the odds ratios of 0.1, 0.3, and 0.5. This setting is in line with Jo and Stuart [25] and Stuart and Jo [26], which investigated the impact of predictors of compliance on estimating treatment effects conditional on compliance types.
In addition, we generated three types of data in which 1) both mediator and outcome follow a normal distribution, 2) the outcome follows a normal distribution but the mediator does not, and 3) the mediator follows a normal distribution but the outcome does not. For the case in which both mediator and outcome follow a normal distribution, we generated errors for the mediator and outcome from the standard normal distribution. When either the mediator or the outcome violated the normality assumption, we generated two normal distributions that follow N(−1, 1) and N(3, 1) separately and combined them, which generates a bimodal distribution.
In order to create a situation in which the ER is violated, the effect of the treatment on the mediator and outcome among never takers is varied to αnz = βnz = βnzm = {−0.5, 0.25, 0, 0.25, 0.5}. Since the residual variance is 1, these deviations of the ER can be considered as standard deviation (SD) units. We chose these ranges of values because the treatment effect on the mediator and the outcome for compliers is set to 1. We set the maximum values of αnz and βnz to half the size of the corresponding complier effect (i.e., αcz and βcz) because never takers did not actually receive the treatment. In the analytical model in which we estimate mediated and unmediated ITT effects using the generated model, we assumed the ER and LSI. The rest of the parameters are specified as follows: αx = βx = βnm = 1.
To assess the performance of the proposed method in various settings, we first examine the bias of the probability of compliers. This is crucial because this information will be used for sensitivity analysis in the later section. Then, we examine the percent bias (%bias), the percent normalized root mean square errors (%nRMSE), and coverage rate for the mediated and unmediated ITT effects to summarize our simulation results. The %bias measures the difference between the average of estimates and the true value relative to the true value. The %nRMSE measures the square root of the average of squared difference between the estimate and the true value relative to the true value. The coverage rate is defined as the proportion of replications where the true value is covered by the 95% confidence interval out of 1000 replications. To examine the statistical power in the method, we calculate the power under different sample sizes and distributions of the mediator and the outcome. The power is defined as the proportion of replications where the effect estimate is significantly different from zero (α = 0.05) out of 1000 replications.
Simulation Results. The simulation results are summarized in Figures 1a-1c. The top plots present the bias of P(c) as well as %bias, %nRMSE, and 95% confidence interval coverage rates of δ(1) with a normally distributed mediator and outcome. The middle and bottom plots present the same quantities with a nonnormally distributed mediator and a non-normally distributed outcome, respectively.
The estimates of P(c) under the deviation of zero from ER are unbiased regardless of whether or not normality holds. The estimates of P(c) tends to be biased when the data deviate from ER although the bias is relatively small. Even when the ER is violated by the 0.5 S.D, the bias is less than 0.07 with the small impact of covariates (OR of 0.5).  Not surprisingly, the estimates of δ(1) under the deviation of zero from ER are unbiased, and the 95% coverage rate reaches the nominal level even when normality is not met. Although the bias is almost zero regardless of whether or not normality holds, the nRMSE tends to be large if the normality does not hold for either the mediator or outcome distribution. When normality holds, the nRMSE is less than 19% with a strong predictor of compliance. With the same setting, the nRMSEs are 32% and 22%, respectively, when normality is violated for the mediator and outcome. This indicates that standard errors tend to be large if normality is violated for the mediator or outcome distribution when all identification assumptions are met.
As expected, the effect estimates of δ(1) become biased when the data deviate from ER regardless of whether or not normality holds. If normality does not hold, the nRMSE becomes larger. When normality is met for both the mediator and outcome and the ER is violated by the 0.25 S.D, the bias is less than 10% and the nRMSE is 21% with the medium impact of covariate (OR of 0.3) (Figure 1a). With the same setting but when the normality is violated for the mediator, the bias is same as 10% but the nRMSE is 35% (Figure 1b). The nRMSE is also larger (24%) when normality is violated for the outcome (Figure 1c).
Also, the bias is smaller in cases with a stronger predictor of compliance. In cases with a covariate with a strong effect size (OR of 0.1), the biases are about half what they are with a covariate with a medium effect size. In the same setting (with a covariate with a medium effect size), the bias is less than 5% when normality is met (Figure 1a), and the bias is same when normality is not met for the mediator and outcome (Figure 1b and Figure 1c).
In summary, when normality is met and a strong predictor of compliance exists, the bias due to the relatively smaller deviation from ER (one fourth of the complier average effect) may be negligible given that the bias is less than 5% of the true value. However, when normality is violated for either the mediator or outcome, the nRMSE becomes larger, which will result in large standard errors.
The statistical power for the mediated ITT effect (δ(1)) under different sample sizes and distributions of the mediator and the oucome is shown in Figure 2. The figure illustrates that statistical power to detect the mediated ITT effect is greatly influenced by whether or not normality holds (Figure 2a). For example, if normality holds, statistical power is greater than 0.8 regardless of whether strong or small impact of covariates were used. If normality in the mediator does not hold, statistical power ranges from 0.4 (sample size of 200) to 0.9 (sample size of 600) (Figure 2b). Statistical power does not appear to be different if normality in the outcome does not hold. In summary, statistical power to detect the mediated ITT effect reaches a desirable level if normality holds even with a small sample size (N=200). 2) The results for ζ (0) are similar to the ones for δ (1). Given the similarity, we present the results for ζ (0) in the e-Appendix.

Joint Modeling-based Sensitivity Analysis
In this section, we propose sensitivity analyses that can assess the validity of results to a possible violation of ER for never takers and LSI. We focus on sensitivity analyses with respect to these two assumptions because the identification of the mediated and unmediated ITT effects crucially rely on them. The proposed sensitivity analyses can be employed when investigating a mediating mechanism with any randomized experiments that suffer from treatment noncompliance, in which access to the treatment is prohibited for those who are assigned to the control condition. Sensitivity analysis for ER for never takers. The ER assumption for never takers requires that there is no effect of the assigned treatment on the mediator (or on the outcome) and, hence, the treatment effect is zero for never takers. As shown in our simulation study, the impact of violation of ER is smaller if there is a strong predictor of compliance and the normality assumption is met. However, the validity of results may still be questioned if these modeling assumptions do not hold and/or the degree to which ER is violated could be severe.
Although many sensitivity analyses have been developed for ER, very few sensitivity analyses are available for a mediation setting. For example, an alternative sensitivity analysis technique has been developed by Park and Kürüm [8] on the basis of the IV-based method. This technique involves specifying a ratio of the predicted outcome (mediator) value given Z=1 to the predicted outcome (mediator) value given Z=0 among never takers relative to a corresponding ratio among compliers. This approach is similar to our proposed sensitivity analysis technique. However, an IV-based sensitivity analysis technique does not have any means to decrease the impact of violating ER and thus provides a relatively large range of estimates for the change in the sensitivity parameters. In contrast, our proposed sensitivity analysis technique provides a smaller range of results for the change in the sensitivity parameters when normality is met or additional covariates exist.
If ER is violated, we can no longer assume that the distributions of mediator and outcome among never takers are the same under either assignment. Therefore, our sensitivity parameters are based on expected difference in the mediator and outcome distributions among never takers between those who are assigned to the treatment and control conditions. Specifically, let ϵm be the expected difference in the mediator value among never takers between those who are assigned to the treatment and control conditions, given covariates. Let ϵ y1 + ϵ y2 m be the expected difference in the outcome value among never takers between those who are assigned to the treatment and control conditions, given covariates for every m ∈ M. Formally, Suppose that ER is violated but other assumptions are met. Then, given particular values of ϵm , ϵ y1 , and ϵ y2 , the mediated and unmediated ITT effects are identified, respectively, as whereπc ,πn ,αn ,αcz ,βcz ,βcm ,βnm, andβczm are obtained from the maximized complete-data likelihood given particular values of ϵm , ϵ y1 , and ϵ y2 . The proof of this result is provided in Appendix B. Sensitivity analysis for LSI. The first part of assumption 2 states that among compliers, there is no unmeasured confounding in the mediator and outcome relationship given baseline covariates. In many cases, the more covariates we observe, the more plausible the assumption is. However, we may not be able to measure all the covariates to remove confounding between the mediator and outcome among compliers. Many studies have addressed this issue of unmeasured mediator and outcome confounding when perfect compliance was assumed (e.g., [1,2,29,30]). However, very few studies have addressed this issue when perfect compliance was not assumed. The previous study based on the IV-based method [8] examined the sensitivity of the results to the violation of LSI by assuming the worst case scenario. In this study, we provide a systematic sensitivity analysis technique that can be used for all possible scenarios of unobserved confounding between the mediator and the outcome.
Imai et al. [1] identified the ACME given a value of the correlation between two error terms obtained from the mediator and outcome models when perfect compliance to the treatment was assumed. However, we cannot apply this approach in the presence of treatment noncompliance because the previously developed IV-based method does not provide any information on individual compliance status. Unlike the IV-based method, the joint modeling method provides the probability of an individual being a complier and this information can be used to assess the sensitivity to a possible violation of LSI.
Development of the sensitivity analysis for LSI relies on using an individual's probability of being a complier as a weight to create a pseudo-population of compliers. The term "pseudo-population" is often used in the field of survey sampling that mimics the original population by replicating sample units based on the probability of being sampled. Here, we define pseudo-population as the original population of compliers, which is partially observed. For the treatment group, those who attended the job training will be assigned a weight of 1, and those who did not attend the training will be assigned a weight of 0 because the probability of being a complier is measured without any error under strong monotonicity. For the control group, we cannot uniquely identify compliance types for each individuals because they are not observed; yet, we can create a weighted sample based on the probability of compliers. Each individual will be assigned a weight of πc(x)/πc , where πc(x) is the probability of being a complier given pretreatment covariates from equation (6) and πc is the proportion of compliers. By giving a weight of πc(x), those who have a high chance of being a complier will be given more weight and those who have low chance of being a complier will be given less weight. By dividing the weight by the proportion of compliers (πc), we can recover the total sample size of the control group. For example, an individual in the control group with the probability of compliers of πc(x) = 0.8 will be replicated 0.8 0.5 = 1.6 times (when πc = 0.5), delivering 1.6 clones for the pseudo-population. The same logic was used in Ding and Lu [27].
Based on this pseudo-population of compliers, the sensitivity of the results will be examined across the varying values of the correlation between the errors obtained from the mediator and the outcome models as in Imai et al. [1].
Suppose that LSI is violated, but the other assumptions are met. Let the correlation between the error terms from the mediator and outcome models fitted among the pseudo-population of compliers be denoted as ρc. Then, given a value of ρc, the mediated and unmediated ITT effects are identified as where z ′ = 1 − z for z ∈ {0, 1}. The termρcz is the correlation between the error terms ϵ c1,i and ϵ c2,i (from equations (5)) when Z i = z; and σ c1 and σ c2 are standard deviations of the error terms, respectively, which are fixed to be constant across the values of Z i . The proof of this result is given in Appendix C.

Application to Jobs II Study
Our question of interest is whether the effect of the JOBS II intervention on reducing job-seekers' depression is transmitted through increased sense of mastery. To answer this question, we estimate the mediated and unmediated portion of the ITT effect via sense of mastery using the proposed joint modeling method. We then show how the sensitivity of the estimated mediated and unmediated ITT effects to the violation of ER and LSI can be investigated using the results from the previous section.
Results. Table 2 shows the estimates of the mediated and unmediated ITT effects given assumptions 1-4. The difference in the outcome value between treatment and control subjects of -0.07 estimates the ITT estimand. The mediated portion of the ITT effect for treated and controlled conditions are negatively significant as -0.03 and -0.04, which occupy the 43.1% and 61.1% of the ITT effect, respectively. In contrast, the unmediated ITT effects for the treated and controlled conditions are not significant. This implies that the mediating mechanism through which the job training impacts job-seekers' depression includes enhanced sense of mastery under assumptions 1-4. However, for a valid causal interpretation of the estimates, it is crucial to examine the sensitivity of the estimates to a violation of the identification assumptions. We require randomization, strong monotonicity, ER for never takers, and LSI. Randomization is satisfied because job training is assigned randomly. Strong monotonicity is also guaranteed to be met because program protocol prohibits subjects in the control group to have access to the job search seminar. However, ER for never takers is controversial. ER might be violated due to psychological effects. For example, some participants who were assigned to the job training but failed to attend (never takers) may feel more depressed, which violates ER. Another controversial assumption is LSI because there could be unobserved confounding between sense of mastery and depression given the treatment level and pretreatment covariates. Therefore, we conduct sensitivity analyses for ER and LSI.
Sensitivity analysis for ER. In our study, we assume that this psychological effect is unlikely to be large because never takers did not actually attend the training. Hence, we limit the violation of ER to be at most half the size of the complier average effect. The sensitivity parameters of ϵm , ϵ y1 , and ϵ y2 were given a value of one fourth (0.25) or half (0.5) the size of the corresponding complier average effect. Table 3 shows the adjusted estimates of the mediated ITT effect by varying values of ϵm , ϵ y1 , and ϵ y2 . It appears that the mediated ITT effect for those who are assigned to the treatment (δ(1)) is robust to the violation of ER with respect to both mediator and outcome. For example, the mediated ITT effect for those who are assigned to the treatment is still negative and significant when the treatment effect among never takers is half the size of the corresponding complier average effect for either mediator and outcome model (ϵm = 0.5, or ϵ y1 = ϵ y2 = 0.5). In contrast, the mediated ITT effect for those who are assigned to the control (δ(0)) is relatively vulnerable to the violation of ER with respect to both mediator and outcome. The mediated ITT effect for those who are assigned to the control is still negative but loses its significance when the treatment effect among never takers is the one fourth of the size of the complier average effect for either mediator and outcome model (ϵm = 0.25 or ϵ y1 = ϵ y2 = 0.25).
Sensitivity analysis for LSI. We next examine whether our conclusion about the mediated ITT effect changes if there are unmeasured pre-treatment covariates between the mediators and outcome among compliers while assuming other assumptions are satisfied. Figures 3a and 3b show the sensitivity of the mediated ITT effect estimates under treatment and control conditions, respectively, to the violation of the LSI while assuming other assumptions are satisfied. These figures show how the change in ρc affects the mediated ITT effect estimates. The sensitivity parameter ρc represents the correlation among compliers between the errors obtained from the mediator and outcome model, and a non-zero value of ρc indicates the existence of unmeasured confounding among compliers in the mediator and outcome relationship. The bold line in the middle represents the changed mediated ITT effect estimates depending on the value of ρc, and the solid lines represent the lower and upper values of 95% confidence intervals. 3) **: p<0.01, and *: p<0.05 As shown in Figure 3a, the mediated ITT effect estimate for those who are assigned to the control will be close to zero if ρc is -0.4. However, the 95% confidence interval of the effect estimate will cover zero with a smaller value of ρc, which is -0.3. This value of ρc is equivalent to the amount of confounding that explains the variances of mediator and outcome, for example, by 25% and 36%, respectively³. This amount of confounding can be considered very large given that the strongest covariate (i.e., pre-measured depression) in the existing model explains the variances of mediator and outcome by 5.8% and 16.8%, respectively.
As shown in Figure 3b, the mediated ITT effect estimate for those who are assigned to the treatment will be zero if ρc is -0.3. However, the 95% confidence interval of the effect estimate will cover zero if ρc is -0.2, which is equivalent to the amount of confounding that explains the variances of mediator and outcome, for example, by 16% and 25%, respectively⁴. This amount of confounding can still be considered very large.  In summary, the significant mediation effect for those who are treated is robust to a potential violation of ER and it is robust to a potential violation of LSI while other assumptions are assumed to be satisfied. However, the mediation effect for those who are controlled may lose its significance if the effect of never takers are as large as one fourth of the corresponding complier-average effect; however, it is robust to a potential violation of the LSI when other assumptions are met. For these sensitivity analyses, we used Mplus [28]. Annotated Mplus code can be found in the online appendix.

Summary and Conclusions
In this article, we proposed a two-stage joint modeling method that combines a mediation analysis with a mixture analysis to conduct causal mediation analysis in the presence of treatment noncompliance. On the basis of the mediation analysis, the mediator and outcome models can be specified and estimated. On the basis of the mixture analysis, the compliance-specific parameters can be specified and estimated, considering the mixed distributions of compliers and never takers.
One useful feature of the joint modeling method is that it is conducive to conducting sensitivity analyses to the violation of identification assumptions. In this study, we offer a systematic sensitivity analysis that addresses the two identification assumptions (the ER and LSI), which was not available in the previous instrumental variables approach. Sensitivity analysis is an important component in any causal inference framework because many identification assumptions are not verifiable with empirical data. The proposed sensitivity analysis can be easily used by applied researchers to test their results against violation of these identification assumptions.
Another useful feature of the joint modeling method is that we can invoke modeling assumptions such as normality or the existence of strong predictors of compliance that can decrease the sensitivity of violating some identification assumptions such as the ER. In the context of CACE, including a strong predictor of compliance can decrease the bias due to violation of the ER and increase precision of the estimates. We demonstrate in our simulation study that these benefits also apply when estimating the mediated ITT effect. Normality also plays a role in estimating compliance type more precisely, and the simulation study suggests that estimating compliance type is more affected by the outcome distribution than the mediator distribution.
However, these benefits come with a cost. From the simulation study, we observe a large variance in the estimates even when all identification assumptions are met if normality is violated. If normality is violated, advantages of the proposed joint modeling method disappear. In this case, one should consider using a propensity score method, suggested by Jo and Stuart [25], Ding and Lu [27], which relies only on pre-treatment covariates to identify unobserved compliance types and, thus, reduces reliance on particular parametric assumption such as normality.
In this article, we introduced a two-stage joint modeling method to estimate the mediated and unmediated portion of the ITT effect and demonstrated the benefits of employing this method through simulation and case studies. A next logical step for future research is to compare relative performance between the proposed joint modeling method and the previous approach using the IV method [7,8]. Unlike the joint modeling method, the IV method does not require modeling assumptions and hence, the identification of the mediated ITT effect relies only on identification assumptions. Comparing relative performance when modeling assumptions are met or not met would be an interesting subject for future study.

Appendix A: Identification of δ(z) and ζ (z)
The mediated and unmediated portion of ITT effects are identified on the basis of CACME and CANDE, respectively. Therefore, we first identify the CACME and CANDE. From equations (5), note that parameters in the second line of equations (5) are identified under randomization because e c1 (z) ⊥ Z|X = x holds. The parameters in the third line of equations (5) are identified under randomization and LSI because e c3 (z, m) ⊥ Z|X = x and e c3 (z, m) ⊥ M|Z = z ′ , X = x, P = c. Given these parameters, the CACME is identified as The first equality is from the definition of CACME. The second equality holds after incorporating the outcome model (i.e., the third line of equations (5)). The fourth equality holds after incorporating the mediator model (i.e., second line of equations (5)). The fifth equality holds due to LSI (assumption 2). Specifically, given compliers, Y i (z, m) − Y i (z, m ′ ) = β cm,i + β czm,i z is independent from M i (z) for any z ∈ {0, 1} as in LSI.
Likewise, the CANDE is identified as =E[β cz,i + β czm,i (α c,i + α cz,i z)] =βcz + βczm(αc + αcz z). (A-2) The first equality is from the definition of CANDE. The second equality holds after incorporating the outcome model (i.e., third line of equations (5)). The fourth equality holds after incorporating the mediator model (i.e., second line of equations (5)). The fifth equality holds due to LSI (assumption 2). Specifically, given compliers, Y i (1, m) − Y i (0, m) = β czm,i z is independent from M i (z) for any z ∈ {0, 1} as in LSI. Next, we identify the the mediated and unmediated ITT effects on the basis of CACME and CANDE as δ(z) =δc(z)πc + δn(z)πn = δc(z)πc , and ζ (z) =ζc(z)πc + ζn(z)πn = ζc(z)πc , where δn and ζn are ACME and average natural direct effects among never takers, respectively. The first equality holds because of strong monotonicity. The second equality holds because of ER for never takers. This completes the proof.