For the SNMM model, the assumptions necessary for unbiased inference are: (1) randomization (i. e., ignorability) of baseline intervention assignment; (2) Stable Unit Treatment Value Assumption (SUTVA) (not testable) [3]; (3) independence of observations for standard error estimation (not testable); and (4) model assumptions including no-interaction assumptions among baseline covariates, the baseline randomized intervention, and the post-randomization modifier (both testable and not testable).

The randomization assumption implies stochastic independence between the randomization indicator for the baseline intervention, $R,$ and the potential outcomes within strata defined by baseline covariates $\mathbf{X}:$
$Pr({Y}_{0},{Y}_{1}\left|R,X)=Pr({Y}_{0},{Y}_{1}\right|X)$(4)

This assumption means that the distribution of observed and unobserved baseline covariates should be balanced between the randomized baseline intervention groups. Although this assumption is not testable, it is implied by physical randomization.

The Stable Unit Treatment Value Assumption (SUTVA; [3, 26]) consists of two sub-assumptions. First, for each participant, there is a single value of the potential outcome random variable corresponding to the random assignment level $r$, ${Y}_{r}$, regardless of the randomization assignment of any other participant. This assumption implies that ${Y}_{r}$ is defined with scalar indices for a given participant, rather than vectors of indices representing baseline intervention assignments and post-randomization modifier levels of all patients. Additionally, SUTVA implies that for each treatment level *r*, a single value for the potential outcome ${Y}_{r}$and potential post-randomization variable ${M}_{r}$exists regardless of how the treatment level *r* was administered. We outline two approaches to estimating parameters in the SNMM. First is a two-stage least squares (2SLS) approach. To apply this approach, we regress the outcome on randomization and baseline covariates:
$\begin{array}{c}E(Y|R,X)={\sum}_{m}E(Y|R,M=m,X)\mathrm{Pr}(M=m|R,X)\\ ={\sum}_{m}\left\{E({Y}_{0}|R,M=m,X)+R{\theta}_{R}+RM{\theta}_{RM}\right\}\mathrm{Pr}(M=m|R,X)\\ ={\sum}_{m}\left\{E({Y}_{0}|R,M=m,X)\mathrm{Pr}(M=m|R,X)+R{\theta}_{R}\mathrm{Pr}(M=m|R,X)+RM{\theta}_{RM}\mathrm{Pr}(M=m|R,X)\right\}\\ =E({Y}_{0}|R,X)+R{\theta}_{R}+RE(M|R,X){\theta}_{RM}\\ =E({Y}_{0}|X)+R{\theta}_{R}+RE(M|R,X){\theta}_{RM}\end{array}$

Here, the second equality follows by substituting the SNMM into the equation, and the final equality $E({Y}_{0}\left|R,X)=E({Y}_{0}\right|X)$ follows due to randomization. We can use a two stage procedure to estimate the parameters in the model; we first estimate $E(M|R,X)$ by regressing *M* on *R* and *X*, then estimate $\mathrm{\theta}\equiv \left\{{\mathrm{\theta}}_{R},{\mathrm{\theta}}_{RM}\right\}$ by regressing *Y* on *X, R*, and *R* times the estimate $\stackrel{\u02c6}{E}(M|R,X)$. If both stages are performed using least squares, this is a two-stage least squares procedure. Standard software for 2SLS accounts in the variance estimation for the fact that $\stackrel{\u02c6}{E}(M|R,X)$ is estimated.

2SLS is an asymptotically efficient approach under the SNMM assumptions, homoscedasticity, and a linear outcome model [27]. The SNMM estimate is less efficient than a standard regression model because it is using only a part of the variation in M to estimate the effect modification of R by M. The SNMM extracts the part of the variation in M that is due to X and uses this confounder-free variation (under the SNMM assumptions) to estimate the effect modification by M whereas the standard regression approach uses the whole variation in M (but this variation may be confounded, leading to bias of the standard regression approach).

We concentrate on the 2SLS approach and use it in simulation and practice, because standard software could be used. G-estimation [28, 29] is a more general approach that can be used in a larger variety of settings. G-estimation is less reliant on parametric assumptions and extends to time-varying treatments. For G-estimation, we note that ${Y}_{0}$ is independent of *R* given *X*. We can then create a random variable $U(\mathrm{\theta})\equiv Y-R{\mathrm{\theta}}_{R}-RM{\mathrm{\theta}}_{RM}$. We can use estimating equations:

${\sum}_{i}(R-p)\left\{U(\mathrm{\theta})-{\mathrm{\mu}}_{X}\right\}W=0,$

where *W* is a vector of non-collinear weights of dimension 2, and ${\mathrm{\mu}}_{X}$ is some function of *X*. Typically, we choose ${\mathrm{\mu}}_{X}=E\{U(\mathrm{\theta})|X\}$ or ${\mathrm{\mu}}_{X}=0$, and ${W}_{x}=\{1,E(M|R=1,X){\}}^{T}$.

Estimation using these equations or the 2SLS approach requires that $E(M|R=1,X)$ vary with *X*; otherwise we have collinearity. Estimation using these approaches essentially assumes that variation with *X* in the causal effect of *R* on *Y* can be explained by variation across levels of the baseline covariate *X* in the expected value of the post-treatment variable *M* in the treatment group. The reasonableness of this assumption should be assessed in any application. This assumption may be relaxed by allowing the effect of treatment to vary across levels of some subset of covariates; nonetheless, the general approach requires that the variation in the effect of *R* on *Y* across some subset of covariates *X* be explained by the variation in the effect of *R* on *M* across those covariates. Under our design and other assumptions, this assumption is not fully testable and its reasonableness must be assessed in each setting.

In the context of our example, estimation of effect modification depends on having pre-treatment predictors of the expected value of early outcomes (e. g., depression severity at one month) that in turn predict the subsequent effects of the intervention on outcomes at three to six months. The efficiency of the estimation procedure depends on the strength of these relationships.

## A note on standard regression

The standard regression estimates of the parameters $\mathrm{\beta}$ or $\mathrm{\gamma}$ will in general be biased as estimators of causal effects $\mathrm{\theta}$ if M is affected by R, since they condition or adjust on a variable *M* affected by the main treatment *R* [30, 31]. We note that M is likely to be affected by R in our data given the theory of the common treatment factor that predicts an early effect of the CT intervention on one month depression severity, which in turn will modify subsequent effects of the intervention on three and six month outcomes.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.