Dif-in-Dif Estimators of Multiplicative Treatment Effects

We consider a difference-in-differences setting with a continuous outcome, such as wages or expenditure. The standard practice is to take the logarithm of the outcome and then interpret the results as an approximation of the multiplicative treatment effect on the original outcome. We argue that a researcher should instead focus on the original outcome when discussing causal inference. Furthermore, it is preferable to use a non-linear estimator, because running OLS on the log-linearised model might confound distributional and mean changes. We illustrate the argument with an original empirical analysis of the impact of the UK Educational Maintenance Allowance on households' expenditure, and with a simulation exercise.


Introduction 1
In applied empirical research, it is common to replace continuous outcomes, such as earnings or expenditure, with their logarithm. Often, the choice is motivated by distributional features, like skewness, that a researcher may want to consider. In the difference-in-differences (dif-in-dif) setting, the desire to give a causal interpretation to the estimates complicates the choice. The model the researcher has in mind is usually one with multiplicative effects, which are linearised taking logs. If this is the case, the assumptions needed for causal inference refer to the non-transformed model. In general, this is not explicitly discussed.
To explore the attention received by this issue in the dif-in-dif literature, we reviewed papers published in one top journal with an empirical focus, the Quarterly Journal of Economics, between 2001-2011. A table with complete references is available in Appendix A. In total, 25 papers using a dif-in-dif estimator with continuous outcomes were found. In 9 cases, the outcome is not transformed and an additive model is estimated. We found 16 papers in which at least one outcome is expressed in logarithmic form. The variables most commonly log-transformed are earnings and productivity, followed by a group of other monetary quantities including expenditure, land value, exports and loans. In only 5 out of 16 cases is an explicit reason for the log-transformation given (Manning (1998) lays out common reasons for taking the logarithm of a dependent variable). For example,  refer to concerns about skewness in the dependent variable, whereas  state that they wish to account for percentage changes in the control variables. In general, no discussion of the impact of the log transformation on the causal interpretation is given. Only  states that the OLS estimates for the log of the dependent variable relate to E(ln(y|x)), and not ln(E(y|x)). To provide estimates of ln(E(y|x)),  estimates a generalised linear model (GLM) with log links.
Previous theoretical literature on non-linear dif-in-dif mostly focused on the interpretation of the interaction effect. Mullahy (1999) discussed the case of a loglinearised exponential model. Ai and Norton (2003) showed that in non-linear models the marginal effect of the interaction term is not directly related to its coefficient in the linear index. However, Puhani (2012) recently argued that their way of calculating the marginal effect is not the correct one for the dif-in-dif case. A separate stream of research, not directly related to dif-in-dif, focused on the estimation of exponential models (Mullahy, 1997;Manning, 1998;Manning and Mullahy, 2001;Ai and Norton, 2008). Santos Silva and Tenreyro (2006) showed that the OLS estimator of the log-linearised model may not be consistent for the parameter of interest. Blackburn (2007) discussed how to estimate wage differentials without using logarithms.
In this paper we attempt to reconcile the two streams of research for the dif-in-dif case. Our main aim and contribution is to recollect in a unified setting a number of results that are scattered in the literature, in order to provide the practitioner with a clear guide on the choice of modelling and estimation. Using a potential outcome framework, we reinterpret previous findings to argue that the choice between a multiplicative and an additive model is fundamental to the causal interpretation of the estimands. This choice should be taken before deciding whether or not to take logs, which should be understood as an estimation strategy rather than a matter of model specification. 2 2  discussed how serial correlation may severely bias inference in dif-in-dif, because conventional standard errors are likely to underestimate the true standard deviation. We do not discuss how to account for this problem in exponential models. However, the main example throughout their paper has log(wage) as the dependent variable, so that the problems discussed here also apply in their context. For instance, one of the possible solutions that they suggested is to collapse the data over the pre-treatment and post-treatment period. Given that this implies Specifically, we point out in Section 2 that the choice between an exponential or a level model is essentially related to the common trends assumption. Differently, whether the treatment effect is multiplicative or additive does not make a large difference, at least from an ex − post evaluation perspective. Although this follows from different results available in the literature, we could not find a reference that made this important point explicit. In terms of estimation, building on Santos Silva and Tenreyro (2006), but focusing on the dif-in-dif case, we show that using OLS on the log-linearised model may give biased estimates of the true multiplicative effect.
This problem can arise if the treatment causes not only a shift in the mean, but also other distributional changes, for instance an increase in the variance for the treated group.
Fortunately, different authors (Mullahy, 1997;Santos Silva and Tenreyro, 2006;Blackburn, 2007) have pointed out that to estimate a multiplicative effect there is no need to log-linearise, because a simple and robust non-linear estimator (Poisson Pseudo-Maximum Likelihood) is available. Although Gregg et al. (2006) noted that it is possible to recover a percentage treatment effect from linear OLS in levels, we point out that one cannot give a causal interpretation to both the additive and multiplicative model. We also correct their calculation in order to properly account for a multiplicative time trend.
We finally show that, in the case of heterogeneous effects, the exponential difin-dif model with a conditional mean assumption does not identify the average multiplicative effect, but rather the multiplicative effect on the average. Moreover, the necessary conditions for the latter to be consistently estimated using log-linearisation are less likely to hold with heterogeneous effects. To the best of our knowledge, these two results were not discussed in the dif-in-dif literature, although they are partially related to a comment by Angrist (2001).
In Section 3 we present an original applied example. We study the impact on averaging logs, further research might try to understand whether it introduces a different source of bias.
households' expenditure of the introduction of the Educational Maintenance Allowance (EMA) in the UK. In Section 4 we present a simulation to illustrate our main arguments. Section 5 concludes and summarises the discussion in terms of a guideline for practitioners.

Model specification and inference
A practitioner willing to estimate a dif-in-dif model with a continuous outcome, such as wages or expenditure, usually faces three main decisions: 1. Shall I model the time trend in additive or in multiplicative form? And shall I report the treatment effect as a difference in levels, or as a percentage change? 2. How can I estimate the multiplicative model? Shall I take logs?
3. What kind of average effect is identified by my model?
We argue that these points should be addressed independently, in order to correctly separate model specification from estimation. The next three subsections are dedicated to these issues.

Multiplicative or additive effects?
In this section we compare an additive dif-in-dif model with one with multiplicative effects. Firstly, we highlight that the key difference is the way in which the time trend is specified. Secondly, we show that, in practice, the two models are related, but that one cannot give a causal interpretation to both.
We start with the simplest, though quite popular dif-in-dif setting, involving two groups (g ∈ {control, treated}) and two time periods (t ∈ {pre,post}), with only one group actually receiving the treatment in the second period. In this paper, we analyse the case of a continuous outcome y, such as earnings or consumption.
8 Several assumptions are required in order to identify the causal effect of the treatment. We draw attention on those related to the functional form. These depend on which feature of the distribution of y we are interested in. Here we focus on the expected value, which is usually the target in program evaluation using dif-in-dif. 3 First, we specify a model for the expected value of y when non treated (y 0igt ), conditional on g and t. The second step is to assume how the expected value of the potential outcome when treated (y 1igt ) is related with the expected y 0igt . The standard dif-in-dif in levels starts from (Angrist and Pischke, 2009): where we combine an additive common trends assumption with an additive treatment effect. The superscript * is used to differentiate the model in levels from the multiplicative one. Note also that receiving the treatment, with a potential outcome y 1igt , does not coincide with being in the treated group, because in the first period all individuals go untreated.
Differently, one might specify an exponential model where the assumption of common trends is in multiplicative form. 4 Over time, the outcome in the absence of treatment would increase by the same percentage in both 3 One important point to highlight is that in this paper we focus only on the average outcomes. Athey and Imbens (2006) proposed instead a generalised dif-in-dif model that gives a structural interpretation to all differential changes in the distribution of the outcome y over time. Their assumptions on the model of y would therefore be valid for any f (y), where f (·) is a strictly monotone transformation (such as log). Differently, in this paper we give a structural interpretation only to changes in the expected value. We ignore higher moments of the distribution of y, which are allowed to change either as a consequence of treatment or time. As noted by Athey and Imbens (2006, pg. 435-436), this approach focused on the conditional mean is not nested in their model, unless one assumes that all individual shocks are statistically independent from group and time.
4 See Mullahy (1997) for a discussion of IV estimation of an exponential model.
groups. Now we can assume a proportional treatment effect: where δ is a parameter on the linear index of the exponential model. The multiplicative model for E [y 1igt |g, t] is therefore exp(µ g + λ t + δ). Intuitively, the total percentage change in the expected outcome of the treated group is given by the composition of a percentage change due to time (call it %time) and the percentage effect of the treatment (call it %effect), so that (1+%change) = (1+%time) × (1+%effect). Differently, for the control group (1+%change) = (1+%time).
To be precise, the key difference between the exponential model and the linear one is in the common trends assumption. The choice of a multiplicative or additive treatment effect plays a less important role. If we are only interested in the ex-post evaluation problem, in the spirit of DiNardo and Lee (2011), we may just want to understand which share of the treated-control difference should be attributed to the treatment. However, with multiplicative time trends we still need the counterfactual to be specified as in eq. (2), otherwise we would confound time and treatment effects.
To clarify, Figure 1 is generated with an exponential model as in eq. (2) and (3). In this case the treated group starts from a lower position. Given the multiplicative trend, in the absence of the treatment the increase in this group over time would be smaller in absolute value. Therefore a standard dif-in-dif in levels would underestimate the share of the change that has to be attributed to the treatment.
Differently, once we account correctly for the multiplicative time trend, it does not matter whether we express the treatment effect as a percentage difference or as a level difference. Indeed, the former is the fraction on the left hand side of eq. (3), while the latter is simply its numerator. Nevertheless, once the time trend is in multiplicative form, having a multiplicative treatment effect leads to an exponential model, which is clearer and easier to estimate. 5 5 The situation is different if we are willing to predict how the policy will affect future outcomes. Suppose now that we choose a multiplicative model for y 1igt and y 0igt and we want to understand which would be the correct specification for the observed outcome y it . Define the dummies treated it for the treatment group and post it for the second period. The particular data structure leads to an exponential model for observed In this case the coefficient on treated it × post it has a meaningful interpretation, because it is directly related to the treatment effect. Indeed, exp (δ) is a ratio of If we believe that the treatment is likely to have the same proportional effect in other time periods, then it should be presented in percentage form.
ratios (ROR), as highlighted by Mullahy (1999) and Buis (2010): Differently, one could follow the well-known suggestion from Ai and Norton (2003) and calculate the interaction effect as the cross difference (Mullahy, 1999, pg. 7): which is equal to the change in levels of the average for the treated group minus the change in levels of the average for the control group. However, following the previous discussion (see Figure 1), this quantity does not properly account for the exponential time trend. Therefore, in a multiplicative model the causal parameter of interest is recovered by the ROR and not by the cross-difference. 6 In this simplified setting, the exponential model from eq. (4) is actually related to the parameters of a standard dif-in-dif linear regression. Even if the true potential outcomes model is in multiplicative form, the conditional expectation of the observed outcome y it can also be correctly specified as linear: The reason is that the model is saturated: the four parameters fit perfectly the four averages given by the combination of treated it and post it . Indeed, the exponential model from (4) is just a reparametrisation of the linear one, with This was noted by Gregg et al. (2006), who showed that we can estimate eq. (8) and then recover both the level and the percentage (multiplicative) effect. However, Gregg et al. (2006) defined the dif-in-dif "percentage method" as the percentage change in the treatment group minus the percentage change for the controls. This differs from exp (δ) − 1. The reason is that the percentage change in the treatment group is equal to %effect+%time+%effect×%time. If we subtract the percentage change in the control group, we are left with %effect×(1+%time). The difference is likely to be negligible if %time is small.
In spite of the equivalence in (9), we cannot interpret both τ and δ as causal effects. If we believe that the common trends assumption holds in multiplicative form, τ includes not only the level change due to the treatment, but also the difference between the time change in levels for the treatment and control groups. Indeed, τ is equal to the cross-difference from eq. (7), because the interaction term in the linear model identifies the difference in the change in levels of the two groups. Referring again to the example from Figure 1, this quantity is obtained by subtracting the incorrect linear counterfactual. The difference with respect to the true effect is in this case negative, because the correct counterfactual, calculated using a multiplicative trend, lies below the linear one.
More generally, the equivalence (9) does not work if we are willing to condition on other covariates, such as demographic controls. The reason is that the equation for the observed outcome is no longer saturated. Therefore it must be that either the linear model is correctly specified, or the exponential one, but not both. This is also true if we have more than two periods and a time trend is included.
The discussion of how the different specifications of time effects are crucial for causal interpretation is related to Angrist and Pischke (2009, pg. 230) comment that the assumption of common trends can hold either in logs or in levels, but not in both. We find it more natural to look at the choice between multiplicative or additive effects, rather than focusing on whether taking logs or not. This perspective has the advantage of stressing the distinction between specification and estimation.
More importantly, in the next section we show that the multiplicative model and the log-linearised one are equivalent only under a strong restriction.

Estimation of a multiplicative dif-in-dif model
A popular solution to estimate a multiplicative model is to log-linearise it. In this section we show that the standard mean-independence assumption imposed on the error tem of the original exponential model is not enough to guarantee the unbiasedness of the OLS estimator of the log-linearised version. The reason is that the mean-independence of the multiplicative error term in the exponential model does not imply the mean-independence of the (log-linearised) error in the log-linearised model. An alternative is to directly estimate the non-linear model.
To discuss this issue in details, we follow Santos Silva and Tenreyro (2006), but we reinterpret the problem in the context of the dif-in-dif setting. In line with the previous section, define a mean independent error term η it that enters multiplicatively in the model for the observed outcome (eq. 4), so that To estimate it, we can log-linearise: In order for the OLS estimator on this log-linearised model to consistently estimate δ, we need E [lnη it |1, treated it , post it ] = 0. However, as argued by Santos Silva and Tenreyro (2006) and Blackburn (2007), the mean independence assumption imposed on the original exponential model does not ensure that lnη it is mean-independent as well. The reason is Jensen's inequality, which implies that the mean of the log is not equal to the log of the mean. In general, E [lnη it |1, treated it , post it ] = 0 would hold if we could impose the stronger condition that η it was statistically independent from x it ≡ (1, treated it , post it ). This condition would imply: In order for the error term of the log-linearised model to be mean-independent, the ratio of variances between different groups or time periods should be directly related to the differences in the conditional mean. Furthermore, the treatment effect must not only shift the conditional mean, but also increase (or decrease) the conditional variance by a factor equal to the square of exp (δ). This pattern of variance does not necessarily hold under the weaker condition of mean independence (E [η it |x it ] = 1), from which we started.
For instance, take a multiplicative error η it such that E [η it |x it ] = 1, but suppose that the stronger statistical independence condition η it ⊥ ⊥ x it holds only in the absence of the treatment, that is when treated it × post it = 1. Differently, assume that the treatment has a distributional effect which differs from the simple increase in variance that would follow from condition (11), so that the error η it is not statistically independent. Following the previous discussion, this causes the log-linearised error lnη it to be not mean-independent, because its conditional mean now depends on treated it × post it . The conditional expectation of lny it becomes: The OLS estimator for the log-linearised model would therefore be consistent for δ * , which differs from the true δ because it confounds distributional with mean effects.
The bias is equal to the difference between the mean of lnη it in the treated group in the post period and the one for the control group and pre-policy period.
It should be added that this bias would be present even if the true treatment effect on the mean was zero, and there was no difference across groups or time (β 1 = β 2 = 0). A similar bias would arise if the treatment had no effect at all on the outcome distribution, but in the second period there was some change in the variance of y within the treatment group that violates the assumption η it ⊥ ⊥ x it . Such a situation would be compatible with the multiplicative common trends assumption stated in terms of conditional mean (eq. 2), because it does not impose any restriction on higher moments. 7 Differently, the OLS estimator might not be affected by a situation as in Blackburn (2007), where the conditional variance across groups does not follow the pattern in eq. (11), but the condition is respected over time within the same group. Suppose that E [η it |x it ] = 1. However, assume that the variance and higher moments in the distribution of η it depend on the group, though neither on the time period, nor on the treatment. In general, we would have that As discussed by Abadie (2005), if the two groups are unbalanced in terms of observable covariates, this may generate differential time trends. Interestingly, these non-parallel dynamics may involve not only the average, but also higher moments. In this case, the method suggest by Abadie (2005) applied to the log-linearised model may also reduce the bias introduced by the differential pattern in the variance. However, it should be noted that our discussion holds even if covariates are perfectly balanced, because the heteroskedasticity may be induced by the treatment itself or by other distributional changes across time that differ across groups. Therefore, both the intercept and the coefficient on the group dummy will be different from β 0 and β 1 , but the coefficient on the interaction would recover the true treatment effect.
Nevertheless, we know from the literature that there is an alternative estimation strategy which is consistent in both cases, because it only requires η it to be mean independent from x it , and not necessarily statistically independent. Santos Silva and Tenreyro (2006) and Blackburn (2007) proposed to directly estimate the non-linear model. In practice, one can use both Non Linear Least Squares (NLS) and Poisson Pseudo Maximum Likelihood (PPML). In both cases, the estimator simply exploits the restriction that the conditional expectation is correctly specified as exponential.
Santos Silva and Tenreyro (2006) argued in favour of the latter, because NLS is likely to be less efficient.
The PPML simply estimates the model as if it was a Poisson, by maximizing the relative likelihood. 8 The fact that the actual variable is continuous rather than count does not hinder the consistency of the estimator. This follows from the well known result that the Poisson Maximum Likelihood estimator is consistent as long as the mean is correctly specified, which is given by the standard mean independence assumption E [η it |1, treated it , post it ] = 1, or equivalently by eq. (4). However, given that the actual variable does not respect the other properties of the Poisson distribution, standard maximum likelihood inference would be invalid, and therefore a robust covariance matrix should be used.
Practically, PPML can be implemented in the most popular statistical packages and results can be easily interpreted. In Stata TM , one can simply run the poisson command, with all variables in levels.
8 An alternative and equivalent way to fit the model is by GLM with a log link function.

Heterogeneous treatment effects in the multiplicative case
One important question is what the level and multiplicative model are effectively identifying when treatment effects are heterogeneous, in the sense that they vary at the individual level. 9 In the level model, given the additive nature of the effects, the well know result is that the dif-in-dif estimand identifies the average treatment effect on the treated.
To discuss the multiplicative model, we follow the IV-exponential model in Angrist (2001) and we include an individual fixed effect ω i and a heterogeneous treatment effect δ i . Here the superscript * is used to differentiate the error η * it in the unobservable model from the error η it in the observable model: Similarly to the additive dif-in-dif model, the individual fixed effects ω i are introduced to allow a potential source of heterogeneity between the two groups. Differently, the purpose of δ i is to allow for the effect of the treatment to differ across individuals and groups. 10 Regarding the individual fixed effects, there is nothing new with respect to the discussion from section 2.1. In the cross-sectional case, in order for the dif-indif estimator to remove the heterogeneity in the ω i we need the composition of both groups not to change over time, so that the expected values of the individual fixed effects are stable (Blundell and Macurdy, 1998). However, it is interesting to understand whether the estimator identifies the average of the treatment effects δ i 9 We do not focus on the heterogeneity with respect to different observable characteristics. This discussion would hold even if the treated and control groups were completely homogeneous in terms of observables, but the distribution of the treatment effect was heterogeneous across individuals. 10 Blackburn (2007, pg. 91-92) discussed how to include individual fixed effects in an exponential conditional mean, in order to correctly estimate wage differentials. Differently from his paper, we also allow individual heterogeneity in the treatment effects and we mainly focus on the estimation of their average. or something different.
The (unobservable) model generating the observable exponential model from eq.
(4) becomes If we use PPML to estimate it, we know that exp (δ), the exponentiated coefficient on the interaction term, identifies the ROR (eq. 6). Given the model in (18), it follows that so that only the multiplicative treatment effect on the average is identified, and not the average of the multiplicative effect. This was already noted by Angrist (2001) for IV estimation of an exponential model.
If we assume the stronger condition that η * it ⊥ ⊥ (t, ω i , δ i ), we can log-linearise the model in (18): where statistical independence ensures that lnη * it is mean-independent from t, ω i and δ i , so that (20) represents a conditional expectation. In this case, the standard dif-in-dif level regression applied to lny it identifies the cross difference (in levels) of the outcome (eq. 7), which from model (20) is equal to This quantity, although related to the original parameters, is not of direct interest. The reason is that the multiplicative treatment effect for each individual is equal In a nutshell, in the presence of heterogeneous effects the standard mean independence assumptions behind the exponential dif-in-dif model only allow us to identify the multiplicative effect on the average for the treated group, and not an average multiplicative effect. The presence of heterogeneous treatment effects is likely to induce a dependence between the error term η it in the observed model and the covariates. Therefore the statistical independence assumption is not likely to hold, and the OLS estimates of the log-linearised model would not recover the quantity of interest.

An applied example
To provide an example, we apply the PPML estimator in a dif-in-dif setting to assess Here we study how families targeted by the scheme spent the available resources.
In line with the theory of the previous section, we specify a multiplicative model for expenditure, using both OLS on the log-linearised values and PPML.

Data and identification strategy
We Treatment and control groups are not defined according to information on education status, which may be endogenous to the reform. Rather, information is used on exogenous age at interview of the household members. The treated group of households is defined to be those where at least one 17 year old is residing, be-cause conditional on having low income they will be eligible to receive EMA. The control group is formed of households where at least one 14 or 15 year old resides, excluding households defined to be in the treated group as above. We exclude 16 and 18 year olds to avoid misclassification, because full information on date of birth would be required to determine EMA eligibility status and the EFS only contains information on age at interview. We report the robust Huber-White standard errors. 11 EMA was already in operation in 41 of the 150 pilot Local Education Authorities before the start of our sample period. Pilot areas cannot be removed from the treated group as the EFS does not record information on LEA status. This implies that treatment is less than 100% for the treated group and that the presented estimates of the effect of the policy on household expenditure patterns, therefore, represent a lower bound of the effect on those actually receiving EMA. On the one hand, it may be enough for a policy maker to know this intent to treat, on the other, the interest may be in the effect for those that actually received EMA. In Appendix B we comment on the possibility of rescaling the estimated treatment effects to reflect this fact. We show that in the case of a multiplicative model, rescaling may be problematic.
11 Procedures to correct for the fact that regular standard errors may overstate the precision of estimates of a treatment effect in dif-in-dif regressions are the subject of an ongoing debate (see Donald and Lang, 2007;Wooldridge, 2003Wooldridge, , 2006. To address problems of serial correlation, we restrict the sample to only 5 years of data. Furthermore, under the definition of the treatment and control groups, we see no reason to believe that there are shocks that occur at the group level.
We present estimates for 7 major areas of spending: food and non-alcoholic drinks; alcoholic beverages and tobacco; clothing and footwear; furnishings, household equipment and carpets; transport; communication; and recreation. Following on from the earlier discussion, it is natural to specify the common trends assumption in multiplicative form. That is expenditures, following the growth of the economy, increase by a constant percentage in the absence of treatment.
In general, we know that total household expenditure tends to be log-normally distributed (see, for instance, Battistin et al., 2009). One might claim that, consequently, log-linearisation is harmless. This is not necessarily the case. First of all, the required assumption refers to the conditional distribution of y it , that is within group and time period. Secondly, even if the error η it was log-normal, we would need it to be statistically independent from the covariates. Indeed, in the simulation in section 4 we show that a log-normally distributed disturbance is not enough for log-OLS consistency. Finally, nothing ensures log-normality of each category of spending. This is particularly true if there is a non-negligible proportion of zero, which may be due to measurement error in the recording of small amounts, as pointed out by Santos Silva and Tenreyro (2006) for trade data. Table 1 presents dif-in-dif estimates of the effect of the EMA scheme on each of the 7 spending categories for the treated group of households. The results in columns 1 correspond to estimates of the multiplicative effect using OLS on the log-linearised model, while in column 2 the reform effect is estimated directly using the PPML estimator. For completeness, we also report OLS estimates for a level model of expenditure in column 3. It is important to stress that, as usual, observations with zero expenditure are dropped from OLS log estimates, while not for the PPML estimates.

Results
Nevertheless, results are similar when excluding zero cases across all estimators or when setting the logarithm equal to zero in the case of a zero expenditure (results available on request).
Following the national roll out of EMA in September 2004, we expect the treated group of households to increase expenditures in some categories. For the OLS loglinearised estimates in column 1 we see positive dif-in-dif estimates for food, nonalcoholic drinks; alcoholic beverages and tobacco; transport; recreation and negative effects for clothing footwear; furnishings household equipment and carpets; communication. None of the estimated effects are, however, statistically significant.
Turning to the reform effects in column 2, EMA might also have distributional effects that make the multiplicative error term statistically dependent on the time and group dummies. In this case, we expect the previous OLS log results to suffer from bias, while PPML results should be consistent. For most of the categories, coefficients are in line with the OLS log results, but for transport spending the estimated coefficient has increased in magnitude. Moreover, it is now statistically different from zero at the 5% level. The result implies an increase around 23% in transport spending due to the reform, calculated as exp δ − 1. This finding is in line with evidence from the EMA piloting, in which EMA recipients were more likely to be contributing to transport expenditures compared to non-recipients and EMA eligibles residing in control areas (see Ashworth et al., 2001, p. 59). In comparison to the standard log expenditure estimates of column 1, the PPML coefficient implies an EMA effect of 10.8 percentage points bigger, which is more precisely estimated. For the remaining spending categories, we observe statistically insignificant coefficients, which are also generally smaller than the effect on transport. 12 On the OLS level results of column 3, the estimated signs and significance of the interaction terms match with the PPML results. For transport spending, the treated × post interaction is a statistically significant £16.46, which corresponds to 12 One criticism might be that the result for transport is accidental, because in a full set of regressions it is not unlikely to find at least one statistically significant estimate. However, here we focus on the difference between OLS (logs) and PPML results. Moreover, given the relatively small sample sizes, rather than making adjustments to standard errors, which can be conservative and computationally intensive (Duflo et al., 2008), we draw on the evidence from the EMA trials to support our conclusions. 18.3% of the pre-reform mean. However, the dif-in-dif coefficient presented only corresponds to the causal effect of EMA on the level of expenditure if we are willing to impose common trends in expenditure levels. If the common multiplicative trend is the correct one, then no meaningful interpretation can be given to the coefficient of the level model.
Columns (4)-(6) better target the groups affected by the reform by repeating the analysis for a sample of low income households. Results give further strength to the main finding with PPML estimates in column 5 suggesting that households devoted the additional resources from EMA primarily to transport spending. The PPML estimate increases in magnitude with little reduction in the precision (comparing to column 2). This is once again in contrast to the OLS log result which remains smaller and statistically insignificant.
We compare how the models perform on Ramsey's RESET test (Ramsey, 1969) for misspecification of the conditional mean. In the simulation we consider a setting similar to that reported in figure 1, so that the reader can follow the difference between the estimates in levels and the ones in multiplicative form (see Appendix C for a graph using the simulation parameters).
The outcome of interest is generated according to equation (4), i=1,..., 2631. The sample size and size of the groups are selected to be as in the applied example of section 3, in order to be able to detect similar differences between estimators.
Each replication is generated according to β 0 = 3.5, β 1 = −0.4, β 2 = 0.03 and with the hypothetical reform having a constant multiplicative treatment effect (ratio of ratios) equal to δ = 0.2. Simulations with a negative time trend and a positive difference between groups lead to the same conclusions and are available on request.
A mean-independent random error term is introduced, so that each individual observation is generated according to y it = exp(x it β)η it where η it is a log normal random variable with E(η it |x) = 1 and var(η it |x) = σ 2 it . The variance σ 2 it is specified as: where 1 is an indicator function and α a parameter that determines the degree 27 of heteroskedasticity in η it .
To assess the performance of the three estimation strategies outlined above, simulations are reported for five key values of α. Table 2 reports results from 1000 replications of the simulation procedure.
The first special case of interest is the scenario where α = 0 implying η it is statistically independent of the treatment and other regressors, so that OLS estimates from the log-linear model will provide consistent estimates of the multiplicative treatment effect. As expected both the OLS log in column 1 of With α = 0.1 we introduce heteroskedasticity. For instance, this could be the case if the treatment has a distributional effect above that due to the simple increase in the conditional mean. This may also be the consequences of other changes in the higher moments of the distribution over time. Indeed, it is important to remember that the standard dif-in-dif identifying assumption of common trends (in this case multiplicative) expressed in terms of the conditional mean of y it places no restriction on these moments. From eq. (14) we know that the bias in using OLS on the loglinearised outcome is due to the fact that the differential increase in variance causes the log-linearised error to be not mean independent. In this case, the analytical bias depends on the formula for the variance of η it (eq. 22) and is equal to 13 Therefore, we expect that an increase in variance in the post-treatment period for the treated group (due to α > 0) should induce a negative bias. Accordingly, in Table 2 we observe that now the OLS log procedure performs less well. Here, the OLS log estimates confound the distributional effect of treatment with the mean effect and are biased for the true multiplicative effect. The distance from the true effect is around 2.9 percentage points, similar to the 2.6 point difference that can be calculated using the formula from (23). 14 As expected, the value of this bias increases with the value of α, even though the variance of the estimated effects remains small. For example, the mean of the estimated treatment effects being only It is worth pointing out that the parameter values considered above imply an independent effect of treatment on the conditional variance of y that deviates only slightly from statistical independence. For example, under the strongest pattern of heteroskedasticity considered (α = 0.4), the independent effect of treatment is to increase the conditional standard deviation of y by only 22%, whereas when α = 0.1 the increase in standard deviation is just 5%. Even when these very small distributional effects of treatment are introduced, the estimates in the table from the log-linearised model are strongly biased.
Another important question is what would happen if, ignoring the multiplicative structure, we estimate a standard additive dif-in-dif regression. The set of parameters imply that, for the treated group, the difference in levels between y 1i and y 0i (the counterfactual) in the post treatment period would be £5.06, as could be easily calculated from the exponential model. This is the part of the change that can be attributed to the reform after accounting properly for the multiplicative trend. Differently, the estimand of a standard dif-in-dif in levels would be is £4.73. The difference is due to the fact that the multiplicative time trend is not properly accounted for. However, given that the change over time is relatively small, the distortion is not large.
The simulation result illustrates this point. Column 3 of table 2 presents estimates from standard OLS estimation in levels. From the table, we observe that the estimated treatment effects repeatedly underestimate the true reform impact. The effect is £4.69 in the baseline case, in contrast to the change in levels that is implied by the multiplicative model (£5.06). The bias is independent from the value of α.
So although the regression for y i is saturated and therefore correctly specified, the estimated effect in levels confounds the treatment and trend effects.
Given an exponential model for the conditional mean of y, a researcher may wish to test whether estimation by log-linearisation will be consistent. Table 2 also presents evidence on the performance of two tests. The first is a Park test (Manning and Mullahy, 2001;Santos Silva and Tenreyro, 2006) for whether the conditional variance of y is proportional to the conditional mean squared. This involves testing whether γ 1 is statistically different from 2 in the equation: where y is a consistent estimate of E[y i |x], obtained using PPML. Inference from equation (24)  In appendix C we also analyse the case in which the treatment has no distributional effect, but the pattern of variance across the treated and control groups does not respect the proportional structure from eq. (11). As discussed in the theoretical section, in this case the log-OLS estimator for the treatment effect is consistent, while the treated-control difference is biased. We also analysed the case with a constant variance of y. Again here log-OLS for the treatment effect performs poorly.
Results are available from the authors.

Conclusion
We critically assessed the standard practice of log-linearising in a dif-in-dif setting.
We argued that a researcher should first decide whether a multiplicative or additive effect model is appropriate for the non-transformed outcome, because we cannot give a causal interpretation to both. If the multiplicative model is chosen and the researcher makes only a standard mean independence assumption, using Poisson Pseudo Maximum Likelihood on the non-transformed variable can be preferable to using OLS on the log-linearisation. The reason is that the latter might give biased estimates of the multiplicative effect if there are changes in the higher moments of the outcome distribution that make the log-linearised error not mean independent.
In particular, this bias may cause the OLS estimator to confound other distributional effects with the treatment effect on the mean.
As a summary, we think that the best practice for an applied researcher willing to estimate a dif-in-dif model with continuous outcome should be (a summary table can be found at the end of Appendix C): 1. Decide whether the time trend is more likely to hold in multiplicative or in level form.
2. If in levels, the best solution would be to use the standard level model and estimate it through OLS. The coefficient on the interaction term could be interpreted as an average treatment effect for the treated.
3. If in multiplicative form, the most coherent solution is to use and estimate an exponential model, with a multiplicative treatment effect.
(a) Without covariates, the multiplicative treatment effect can be recovered from OLS estimates of the standard dif-in-dif regression in levels (eq. 9).
(b) Estimating the exponential model with PPML allows for covariates and for the presence of zeros in the dependent variable, and does not require statistical independence of the error term.
(c) The researcher can test for heteroskedasticity using a BP test for the presence of heteroskedasticity with respect to the treated × post variable.
If they fail to reject the null of homoskedasticity, and the researcher is willing to assume statistical independence, OLS on the log-linearised model would be unbiased and efficient. This method also requires to eliminate or censor the zeros, which may introduce another source of bias.
(d) In the case of heterogeneous effects, the exponentiated coefficient on the interaction term (exp (δ)−1), can be interpreted as a multiplicative effect on the average for the treated group.
33  Autor (2001) hourly wage Donohue and Levitt (2001) crime rate, arrest rate, arrest level Donohue et al. (2002) black/white ratio: teacher salaries, pupil/teacher ratio, term length  weekly earnings Finkelstein (2004) new vaccine clinical trials Morgan et al. (2004) absolute deviation from conditional mean growth in: gross state product, employment, personal income, state GNP Khwaja and Mian (2005) default rate of firm, loan amount, export value, export value/total loans Bailey (2006) hours/weeks worked Stevenson and Wolfers (2006) homicide rate Bandiera et al. (2007) workers' productivity Bleakley (2007) schooling and earnings  vote share and total vote cast Field (2007) weekly hours in labour force  admissions, patient days, beds, payroll expenditures, and total expenditures Donohue and Levitt (2008) arrests Foote and Goetz (2008) arrests Gruber and Hungerman (2008) charitable giving Verhoogen (

B.2 Data details
The EFS is managed by the Office for National Statistics.

B.3 Possible violations of dif-in-dif assumptions
One of the possible problems in giving a causal interpretation to dif-in-dif estimates is that the treatment group assignment could reflect short-term idiosyncratic shocks, causing an Ashenfelter's dip (see Ashenfelter (1978) and Blundell and Dias (2009)).
In such a scenario units assigned to the treatment group may recover more quickly in terms of the outcome of interest, than those in the control group. This is not a likely problem for the identification strategy outlined above, where treatment group assignment is allocated according to information on age.
Another possible source of bias is the presence of "anticipation effects", where households changed their spending behaviour prior to the reform. Pre-reform control households with younger children could potentially adjust their spending behaviour in anticipation of becoming EMA eligible in the post-reform period. This would lead to a downward bias in the estimated reform effects, assuming control households anticipating eligibility increased current spending. For the pre-reform treatment group (i.e. those with a 17 year old member), there is no problem of anticipation. Given that the roll-out of the policy applied only to new entrants in post 16 education, these individuals were ineligible. 5 A final important issue that could affect the causal interpretation of the dif-in-dif estimates is the presence of general equilibrium effects that influenced the spending behaviour of the control group. EMA is a large programme and any increase in post 16 participation rates implies increased competition for post 16 schooling places. This may further affect the spending behaviour of the control households in the event that it caused a change in their expected post 16 schooling plans or future expected wage rates, which in turn lead them to adjust their current spending behaviour. 6

B.4 A note on rescaling
As discussed in the methodology section, the initial piloting of the scheme means that in 41 of the 150 English LEAs, EMA was in operation before the start of the sample period. The above estimates therefore reflect a lower bound for the effect of EMA on the treated. Whilst the EFS data does not record information on LEA status, Government Office Region (GOR) information is available with each GOR being made up of multiple LEAs. Given information on EMA receipt by LEA, one may wonder whether it is possible to rescale the estimated EMA effects to reflect the fact that treatment on the treated group is less than 100%, but by a known number. Here, we point out that rescaling a multiplicative treatment effect may not always make sense. 7 We know that when effects are heterogeneous, PPML returns the multiplicative effect on the average. For the case of EMA, we know that the effect of the treatment is zero for a known share p of recipients. Following the main text, exp(δ)−1 identifies E [y 1igt |g = treated] − E [y 0igt |g = treated] E [y 0igt |g = treated] . (1) However, both the numerator and denominator are going to be a weighted average of the two groups counterfactuals, so that exp(δ) − 1 is equal to Under the assumption that it is fairly trivial to rescale the estimated multiplicative effects of the results section.
The scale factor (1 − p) could be calculated from publicly available data on EMA receipt by GOR and by then appropriately weighting for regional shares from the main estimation sample. However, eq. 3 is unlikely to hold in this example, where the pilot regions were on average much poorer than the national rollout areas. For this reason, we argue that rescaling may not make sense and caution against making such adjustments to the intent to treat estimates.
Appendix C: Simulation of a reform with only mean effects, in the presence of a different variance across groups We consider a separate scenario corresponding to the case where α = 0, so that the multiplicative error term is statistically independent of the treatment, but where its properties depend on the group. Specifically, y it is heteroskedastic with respect to the group status but not to the treatment itself. To illustrate this scenario, consider the simulation procedure above but now σ it is generated according to: where γ is now the parameter determining the degree of heteroskedasticity with respect to group status. The table confirms that, under the particular form of heteroskedasticity considered, log linearisation works well when estimating the multiplicative treatment effect, but it performs poorly in terms of estimating the treated group effect. In contrast, PPML performs well for both the treated group and treated × post coefficients and for all values of γ. This illustrates the point made in section 2 that, in the dif-in-dif setting, heteroskedasticity is only a problem for consistently estimating the treatment effect when the treatment itself (and not the group) has an independent distributional effect; although estimates of the treated group effect can be misleading.
Finally, for the wrongly specified OLS level model in column 3, estimates of the multiplicative treatment effect again confound treatment and trend effects. Additionally, here we see that the form of heteroskedasticity means that the estimates of the treated group effect become more dispersed for higher values of gamma.