In this section we compare multiplicative and additive dif-in-dif models. First, we highlight the key difference is the specification of the time trend. Second, we show that the two models are related, but crucially one cannot give a causal interpretation to both.

We start with the simplest, though quite popular dif-in-dif setting, involving two groups (*g* ∈ {*control*, *treated*}) and two time periods (*t* ∈ {*pre*, *post*}), with only one group actually receiving the treatment in the second period. We analyse the case of a continuous outcome *y*, such as earnings or consumption.^{2} First, we specify a model for the expected value of *y* when non treated (*y*_{0igt}), conditional on *g* and *t*. The second step is to assume how the expected value of the potential outcome when treated (*y*_{1igt}) is related with the expected *y*_{0igt}.

The standard dif-in-dif in levels (Angrist and Pischke 2009) combines an additive common trends assumption with an additive treatment effect:

$$E\left[{y}_{1igt}|g,t\right]=E\left[{y}_{0igt}|g,t\right]+{\delta}^{\ast}={\mu}_{g}^{\ast}+{\lambda}_{t}^{\ast}+{\delta}^{\ast}.$$(1)

where ${\mu}_{g}^{\ast}$ are group specific effects, ${\lambda}_{t}^{\ast}$ are time specific, while *δ*^{*} is the treatment effect. The superscript ^{*} is used to differentiate the model in levels from the multiplicative one.^{3} The additive model for the counterfactual outcomes leads to a linear model for observed outcome *y*_{it}:

$${y}_{it}={\beta}_{0}^{\ast}+{\beta}_{1}^{\ast}treate{d}_{it}+{\beta}_{2}^{\ast}pos{t}_{it}+{\beta}_{3}^{\ast}treate{d}_{it}\times pos{t}_{it}+{\u03f5}_{it}$$(2)

$$E\left[{\u03f5}_{it}|treate{d}_{it},pos{t}_{it}\right]=0.$$(3)

where *treated*_{it} is a dummy for the treatment group and *post*_{it} for the second period. If the correct model for the counterfactuals is additive, then ${\beta}_{0}^{\ast}={\mu}_{control}^{\ast}+{\lambda}_{pre}^{\ast}$, ${\beta}_{1}^{\ast}={\mu}_{treated}^{\ast}-{\mu}_{control}^{\ast}$ and ${\beta}_{2}^{\ast}={\lambda}_{post}^{\ast}-{\lambda}_{pre}^{\ast}$. The coefficient on the interaction term captures the quantity of interest, because it is equal to the treatment effect: ${\beta}_{3}^{\ast}={\delta}^{\ast}$.

Differently, one might specify an exponential model

$$E\left[{y}_{0igt}|g,t\right]=exp\left({\mu}_{g}+{\lambda}_{t}\right)$$(4)

where the assumption of common trends is in multiplicative form (see Mullahy 1997, for a discussion of IV estimation of an exponential model). Over time, the outcome in the absence of treatment would increase by the same percentage in both groups. We can assume a proportional treatment effect:

$$\frac{E\left[{y}_{1igt}|g,t\right]-E\left[{y}_{0igt}|g,t\right]}{E\left[{y}_{0igt}|g,t\right]}=exp\left(\delta \right)-1,$$(5)

which implies that the effect is expressed as a proportional change with respect to the counterfactual scenario in the absence of the treament. This comes naturally in applications involving continuous variables such as consumption and wages, where changes are commonly expressed in proportional terms, and it is consistent with the (proportional) specification of the time trend and the group difference. The multiplicative model is therefore:

$$E\left[{y}_{1igt}|g,t\right]=exp({\mu}_{g}+{\lambda}_{t}+\delta )$$(6)

Intuitively, the total percentage change in the expected outcome of the treated group is composed of a percentage change due to time (call it %*time*) and the percentage effect of the treatment (call it %*effect*), so that (1 + %*change*) = (1 + %*time*) × (1 + %*effect*). Differently, for the control group (1 + %*change*) = (1 + %*time*). In this case the counterfactual model leads to an exponential model for the observed outcomes:

$${y}_{it}=exp\left({\beta}_{0}+{\beta}_{1}treate{d}_{it}+{\beta}_{2}pos{t}_{it}+{\beta}_{3}treate{d}_{it}\times pos{t}_{it}\right){\eta}_{it}$$(7)

where *η*_{it} is a multiplicative error term that satisfies a mean independence assumption:

$$E\left[{\eta}_{it}|treate{d}_{it},pos{t}_{it}\right]=1.$$(8)

If the correct model for the counterfactuals is multiplicative, then *β*_{0} = *μ*_{control} + *λ*_{pre}, *β*_{1} = *μ*_{treated} − *μ*_{control} and *β*_{2} = *λ*_{post} − *λ*_{pre}. More importantly, the exponentiated coefficient on the interaction term, which is a ratio of ratios (ROR, see Mullahy 1999; Buis 2010), is the quantity of interest to the researcher because it is directly related to the proportional treatment effect:

$$exp\left({\beta}_{3}\right)=\frac{E\left[{y}_{it}|treate{d}_{it}=1,pos{t}_{it}=1\right]}{E\left[{y}_{it}|treate{d}_{it}=1,pos{t}_{it}=0\right]}/\frac{E\left[{y}_{it}|treate{d}_{it}=0,pos{t}_{it}=1\right]}{E\left[{y}_{it}|treate{d}_{it}=0,pos{t}_{it}=0\right]}=exp\left(\delta \right).$$(9)

The researcher can also calculate the impact of the treatment (on the treated) during the *post* period in levels:

$$\begin{array}{rl}& exp({\beta}_{0}+{\beta}_{1}+{\beta}_{2}+{\beta}_{3})-exp({\beta}_{0}+{\beta}_{1}+{\beta}_{2})\\ & \phantom{\rule{1em}{0ex}}=exp({\mu}_{treat}+{\lambda}_{post}+\delta )-exp({\mu}_{treat}+{\lambda}_{post})=E[{y}_{1i,g=treated,t=post}]-E[{y}_{0i,g=treated,t=post}].\end{array}$$(10)

The well-known suggestion by Ai and Norton (2003) for non-linear models would instead lead us to calculate the cross difference (Mullahy 1999), which is not equal to the treatment effect:

$$\begin{array}{rl}& \left[exp\left({\beta}_{0}+{\beta}_{1}+{\beta}_{2}+{\beta}_{3}\right)-exp\left({\beta}_{0}+{\beta}_{1}\right)\right]-\left[exp\left({\beta}_{0}+{\beta}_{2}\right)-exp\left({\beta}_{0}\right)\right]=\\ & \phantom{\rule{1em}{0ex}}[exp({\mu}_{treated}+{\lambda}_{post}+\delta )-exp({\mu}_{treated}+{\lambda}_{pre})]-[exp({\mu}_{control}+{\lambda}_{post})-exp({\mu}_{control}+{\lambda}_{pre})],\end{array}$$(11)

Indeed, as argued by Puhani (2012) (reprised in Karaca-Mandic, Norton, and Dowd 2012), in any non-linear dif-in-dif model with an index structure and a strictly monotonic transformation function, the treatment effect is not equal to the cross-difference of the observed outcome, but rather to the difference between two cross-differences. Differently from the general non-linear case of Puhani (2012), the exponentiated interaction coefficient is directly interpretable as the proportional effect.

The treatment effect can therefore be easily expressed in levels in both models (using eq. 10). This highlights that the key difference between them is how the common trends assumption is specified. The reason is that the counterfactual *y*_{0igt} is identified by looking at the change in the control group over time. Once this counterfactual is correctly modeled, we can use it to understand which share of the average change in the treated group has to be attributed to the treatment. To clarify, Figure 1 is generated with an exponential model as in eq. (6). In this case the treated group starts from a lower position. Given the multiplicative trend, in the absence of the treatment the increase in this group over time would be smaller in absolute value. Therefore a standard dif-in-dif in levels would underestimate the share of the change that has to be attributed to the treatment. The bias will be larger the larger the pre-treatment difference between the groups and the larger the proportional time effect. Differently, once we account correctly for the multiplicative time trend, it does not matter whether we express the treatment effect as a percentage difference or as a level difference. Indeed, the former is the fraction on the left hand side of eq. (5), while the latter is simply its numerator. Nevertheless, once the time trend is in multiplicative form, having a multiplicative treatment effect leads to an exponential model, which is clearer and easier to estimate. Furthermore, the effect in levels as measured in eq. (10) is specific to the *post* period. If we want to predict how the policy will affect future outcomes, and we believe that the true treatment effect is multiplicative, then it is more appropriate to focus on the percentage change.

Figure 1: An Example of a Dif-in-Dif Setting with Multiplicative Time Trend.

The figure shows the expected value of the actual and counterfactual outcome for the different groups according to the exponential model in eq. (4) and (5) with *μ*_{control} = −0.3; *μ*_{treated} = −0.7; *λ*_{pre} = −0.2; *λ*_{post} = 0; *δ* = 0.2.

In a standard dif-in-dif setting with two periods and two groups, the practitioner may still be induced to treat the two models as fully equivalent. This is actually true with respect to the model for *observed* outcomes, because both the exponential model (7) and the linear one (2) are saturated: the four parameters fit perfectly the four averages given by the combination of *treated*_{it} and *post*_{it}. Indeed, the exponential model is just a reparametrisation of the linear one, with

$$exp\left({\beta}_{3}\right)-1=\frac{\left({\beta}_{0}^{\ast}+{\beta}_{1}^{\ast}+{\beta}_{2}^{\ast}+{\beta}_{3}^{\ast}\right)/\left({\beta}_{0}^{\ast}+{\beta}_{1}^{\ast}\right)}{\left({\beta}_{0}^{\ast}+{\beta}_{2}^{\ast}\right)/{\beta}_{0}^{\ast}}-1.$$(12)

This was noted by Gregg, Waldfogel, and Washbrook (2006), who showed that we can estimate eq. (2) and then recover both the level and the percentage (multiplicative) effect.^{4}

In spite of (12), if the true model for counterfactuals is multiplicative, ${\beta}_{3}^{\ast}$ cannot be interpreted as a causal parameter, because it includes not only the level change due to the treatment, but also the difference between the time change in levels for the treatment and control groups [in fact, ${\beta}_{3}^{\ast}$ is equal to the cross-difference from eq. (11)]. More generally, the equivalence (12) does not hold if we are willing to condition on other covariates, such as demographic controls. The reason is that the equation for the observed outcome is no longer saturated. Therefore it must be that either the linear model is correctly specified, or the exponential one, but not both. This is also true if we have more than two periods and a time trend is included.

The discussion of how the different specifications of the counterfactual are crucial for causal interpretation is related to Angrist and Pischke (2009, p. 230) comment that the assumption of common trends can hold either in logs or in levels, but not in both. We find it more natural to look at the choice between multiplicative or additive effects, rather than focusing on whether taking logs or not. This perspective has the advantage of stressing the distinction between specification and estimation. More importantly, in the next section we show that the multiplicative model and the log-linearised one are equivalent only under a strong restriction.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.