Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter April 6, 2018

Covariate Balancing Inverse Probability Weights for Time-Varying Continuous Interventions

Curtis Huffman and Edwin van Gameren


In this paper we present a continuous extension for longitudinal analysis settings of the recently proposed Covariate Balancing Propensity Score (CBPS) methodology. While extensions of the CBPS methodology to both marginal structural models and general treatment regimes have been proposed, these extensions have been kept separately. We propose to bring them together using the generalized method of moments to estimate inverse probability weights such that after weighting the association between time-varying covariates and the treatment is minimized. A simulation analysis confirms the correlation-breaking performance of the proposed technique. As an empirical application we look at the impact the gradual roll-out of Seguro Popular, a universal health insurance program, has had on the resources available for the provision of healthcare services in Mexico.

1 Introduction

Propensity Score (PS) methods can be applied to a wide range of different research settings, and have become a popular tool for making causal inference. As shown by Robins [1], PS methods can even be used to evaluate programs that evolve over long periods of time, reacting to changes in the environment that themselves help to bring about. That class of statistical models came to be known as Marginal Structural Models (MSMs) [2], [3], [4]. Despite its theoretical appeal, MSMs as a tool for making causal inference from longitudinal data have yet to extend to general treatment regimes, that is, from binary to (potentially) continuous treatment intensities. Surprisingly, while several extensions of propensity score methods to general treatment regimes have been proposed [5], [2], [6], [7], [8] and keep gaining popularity among applied researchers [9], these methodologies haven’t been coupled with the MSM framework.

Perhaps behind this gap between methodological and applied research is the practical difficulty of estimating the inverse probability weights that are required for MSMs to generate a pseudo-sample from which to estimate the causal quantities of interest. The correct model is generally unknown and its misspecification can actually increase bias, even if the selection on observables assumption holds. In contrast with cross-sectional PS methods, in MSMs the effect of model misspecification is exacerbated, for it propagates across time since the algorithm provided by Robins et al. [2] to estimate the weights in these models involves the product of propensity scores estimated for each time period.

Since, in theory at least, inverse probability weighting renders the intervention independent of time-varying covariates, usual practice involves iteratively checking if weighting on the estimated probability meets this expected property – revising the model until said correlation is minimized. The downside of this algorithm is that it can lead to ad hoc specifications of the probability models which may not agree with the preferred theoretical specification.

As a way to get around this practical difficulty surrounding MSMs, Imai and Ratkovic [10] came up with the idea of formally dovetailing this correlation minimizing property of the weights into the model estimation method itself, eliminating the need to manually and iteratively check the treatment assignment model. They accomplished this by making use of the Generalized Method of Moments (GMM), adding to the model’s score vector – that is, the gradient of the log-likelihood – the correlation minimizing property as an extra moment condition. They have called this new approach the Covariate Balancing Propensity Score (CBPS) [11] and it can easily be implemented through publicly available open-source software [12]. Simulation and empirical studies suggest that the CBPS improves the empirical performance of MSMs by making the treatment assignment model depend less heavily on assumptions which are often not met in practice. However, so far the CBPS extension to MSMs has been confined to binary time-dependent treatments.

Here we propose a further extension to the CBPS methodology that can be used by researchers facing continuous time-dependent treatments. Our proposal builds on the recent generalization of the CBPS methodology to general treatment regimes [13] taking it to the longitudinal setting, thus bridging the gap between two extensions of the original CBPS estimator.

In Section 2 we briefly review MSMs and their assumptions. Section 3 we describe the proposed extension to the CBPS methodology. In Section 4 we present our simulation study. In addition, we present an empirical application in Section 5 to show the performance of the proposed methodology with real data. Finally, we offer concluding comments in Section 6.

2 Marginal structural models

Assume a program that evolves over long periods of time intervening with different intensity across recipients at every stage of their implementation, reacting to changes in the environment which the program itself contributed to bring about; that is to say, reacting to its own previous results in a feedback loop. In this context, variables that can both affect future exposure to the program and be affected by past exposure may confound the impact of the program in a way that no cross-sectional impact evaluation method can sort out [4]. In this Section we briefly review the MSM framework [2], [3], [4] that has been developed to analyze the impact of such programs in case of binary treatments.

Denote recipient i’s observed k-dimensional vector of time-varying covariates at stage t of the program as Xit, with i=1,,N and t=1,,T. At every stage t of the program, recipients are observed receiving the exposure level ait to the benefits of the program; that is, one possible realization of the exposure variable Ait, with ai0=0. Collecting all the observed exposures to the program for a given recipient, from its launch up to stage t, gives us the history of exposure a_it=ai1,,ait. Independent, identically distributed realizations of Yit,A_it,X_it are observed for each recipient at every stage of the program, where Yit is some outcome variable of interest. Let X_it and x_it be similarly defined for a covariate history, where Xit is the most recent set of variables that could affect Ait, and are also affected by past exposure history, A_it1.

Following Robins’ adaptation of the potential outcome framework [14], [15] to longitudinal settings [16], we use Yita_t to represent the potential outcome for recipient i measured at time t associated to the exposure history a_ up to t, which we assume does not depend on the exposure histories of other recipients. This is often referred to in the literature as the stable unit treatment value assumption. Naturally, any recipient of the program exhibits only one of these potential outcomes at any given stage t; that is, the one associated to its own particular exposure history up to that point in time Yit=Yita_it, the consistency assumption. All other, unobserved, potential outcomes are said to be counterfactual.

As with other PS methods, MSMs are based on the assumption of no unmeasured confounding, in this case at each time period. This assumption, also referred to as sequential ignorability [4] or conditional interchangeability [3], states that the exposure level at stage t is statistically independent (ignorable) of the potential outcomes, conditional on the covariate and exposure histories up to that point. Using previous notation, sequential ignorability says that for any exposure history up to t, at, Yita_tAitA_it1=a_t1,X_it=x_t. Additionally, also typical in PS methods, MSMs also assume that at any stage t, there is no covariate history x_t and past exposure a_t1 such that all recipients with such histories are certain to receive the identical exposure at. That is, each exposure history must have some positive probability of occurring. This last assumption can be formally written as, if fA_it1,X_it0 then fA_itA_it1,X_it>0, where f stands for the probability density function (pdf).

Under the above assumptions (sequential ignorability, consistency and positivity), Robins [1] showed that the conditional probability density of exposure history can be used to consistently estimate the impact of such dynamic interventions. In this generalization of Inverse Probability of Treatment Weights (IPTW), the probability of observing a particular exposure history is expressed as the product of the respective probability density at every stage, t1TfAtA_t1,X_t. However, using this probability directly as weights when fitting the outcome model leads to highly variable estimates and researchers generally follow Cole and Hernán [17] and use stabilized weights of the form


where, for convenience, we define SW0=1.

In nonrandomized evaluations, irrespective of the nature of the intervention whose impact one might be trying to estimate – whether it’s best coded as binary, multivalued or continuous –, these weights are unknown, and have to be estimated. Usually, a parametric model is used to estimate the probability density of exposure given the past at each time period. When the intervention comes as a single action that either occurs or not at every stage of the program (i. e. coded as a binary variable), a common approach is to estimate this probability density at every stage with a standard logistic regression (logit) model. In the case of continuous interventions, Gaussian models (with constant variance) are often assumed, i. e., At|A_t1,X_tNhA_t1,X_t;γ,σ2, which can be estimated by the GMM.[1] If there is reason to believe that h is a linear, additive function of exposure and covariates histories, and parameters θt=α,β,σ2, researchers may regress the current level of exposure on all past exposure and covariates,[2] e. g.,


The numerator of the stabilized weights is typically estimated using the empirical distribution of the exposure. If the assumptions behind equation 2 hold, and further assuming the marginal distribution of At to be Gaussian with mean zero (possibly after the pertinent transformation), the stabilizing weight is given by[3]


Once these stabilized weights are estimated, all the researchers have to do to is use them in running whatever model of the outcome, as function of treatment history alone, they would have used in the case of a randomized evaluation.[4]

3 Correlation breaking IPTW

As with other PS methods, Robins’ stabilized weights break the link between the exposure history and the time-varying covariates, guaranteeing the exposure history is unconfounded. That is, at any given stage of the program, we expect the histories of both exposure and the time-varying covariates to be uncorrelated in the weighted data [4], [18], [19].

Formally, under our time-varying setting, we can see this correlation-breaking property for all t as


Note that if we center both the exposure variable and the covariates, equation 4 equals zero. We propose to use this weighted cross moment as an extra moment condition in a GMM framework, which for t=1 is the same Fong et al. [13] used in the cross-sectional setting.

Including equation 4 among the score conditions – the gradient of the log-likelihood with respect to θ – as moments, allows to account for both properties we would expect the stabilized weights satisfy in a single estimator: predict the exposure among recipients and break the link between the exposure and the time-varying covariates.

3.1 Estimation

There are many ways to specify the moment conditions for the GMM estimator depending on the assumptions one is willing to make in estimating the probability models and how to include the correlation-breaking condition, each with different computational burden. Assuming AtA_t1,X_t independent and identically distributed Gaussian (normal) across time periods, and making use of the fact that it also follows from the correlation-breaking property that current exposure is uncorrelated with the past, we have the following moment conditions,


Where the first two entries correspond to the first-order-conditions of the log-likelihood of the exposure assignment model (eq. 2) with respect to the model parameters α,β and σ2, respectively, and the third entry captures the correlation-breaking condition (eq. 4), after centering both the exposure and the covariates.

The GMM estimator of θ is obtained minimizing


where gθ is the mean over i of gθ, and θA_t,X_t is the mean over i of the 3╳3 covariance matrix.[5]

Even though assuming an independent correlation structure is important in keeping the score conditions and the covariance matrix as simple as possible, its role can be attenuated depending on how the exposure and covariates histories enter the exposure assignment model.[6] Also, in practice, researchers faced with time series data serially correlated of unknown form can preserve the original time series structure resorting to block bootstrap, bootstrapping whole sequences at unit level i. We follow this strategy in Section 5.[7]

In programming the optimization procedure we have followed closely the original CBPS algorithm of Fong et al. [12]. In particular, we have modified their subroutine for Continuous CBPS to accommodate the longitudinal case estimating all the weights at once.

Given the complexity of the objective function in equation 6, for practical purposes we recommend to compare the performance of different optimization algorithms and starting points (use a search net) including the OLS solution, and apply non-derivative-based methods because numerical derivatives can become a serious hurdle for the optimization algorithm – at least for Stata’s optimize function. Since the estimation of the covariance matrix of the sample moments requires the coefficient estimates, the optimization procedure can start with the maximum likelihood values of β, and successively update the GMM estimators throughout the optimization. This, of course, increases the computation time, to the point of making it impractical for simulations. The method can be made faster by fixing the estimate of the covariance matrix at the maximum likelihood value. In Section 4, all the simulation are carried out using the Nelder–Mead method, using the OLS solution as starting points, and no updating of the covariance (weight) matrix.

4 Simulation studies

In order to assess the empirical performance of these correlation-breaking stabilized weights, we carry out some simulation studies. In particular, we consider the case of three time periods, i. e., t=1,2,3, and use five different sample sizes n= 300, 600, 900, 1200 and 1500, corresponding to 100, 200, 300, 400 and 500 exposure-recipients across our time frame. In our setup, summarized in Figure 1, the treatment-generating process is a function of a vector of exogenous time-varying covariates, X_it, and the previous history of exposure, A_it1. The outcome variable, Y, is a function of both entire histories of covariates and exposure.

Figure 1 Data Generating Process in Simulation Study.

Figure 1

Data Generating Process in Simulation Study.

We use this data generating process (DGP) to examine the performance of the proposed methodology, vis-à-vis the usual Ordinary Least Squares (OLS) approach, when faced with both misspecification of the treatment-assignment model – incorrectly specifying the lag structure, distributional family and cross-time homoscedasticity – and measurement error.

We use a similar simulation setup as Imai and Ratkovic [10] in two scenarios. In the first scenario, the simulation consists of four covariates. For time t, we use the covariates


where each Zitk is an i.i.d. draw from the standard normal distribution, and Uit is constructed as


for t=2,3 and Uit=1 for t=1. The continuous treatment Ait is generated by


where α=.8,.2,.4,.6 and vi is a standard normal disturbance – a cross-time homoscedastic normal model. All parameters were selected in an almost random manner looking only to guarantee a minimum dispersion. The continuous outcome Y is generated by a linear combination of the whole exposure and confounders histories such that


where δ=23.5,12.5,7.3,17.4 and ui is another normal disturbance with mean zero and standard deviation five. Note that we have generated the outcome variable for every t, making it a function of both entire histories of covariates and exposure.[8]

There are many ways in which researchers can misspecify the treatment assignment model. In our simulations we explore two scenarios. In the first misspecified scenario, we use the following non-linear transformation of the covariates


as measurement error, as well as incorrectly specify the lag structure of the treatment-assignment model using only the data from the immediately previous period such that


but maintain the correct outcome model. We report results of 1,000 simulated datasets.

In this first scenario, with every dataset, we fit a Pooled OLS model (GLM) as the treatment assignment model, using correct and incorrect model specifications as discussed above. Then we fit the same models using the proposed correlation-breaking IPTW methodology (CBIPTW) in two ways: first using both score and correlation-breaking conditions (Over-identified Correlation Breaking Inverse Probability of Treatment Weights (CBIPTW)) as described in section 2, and with the correlation-breaking condition alone (Just-identified CBIPTW), ignoring the score conditions. This gives us three different estimates according to the importance placed on both expected properties of the weights. The Generalized Linear Model (GLM) estimates obviate the correlation-breaking property, ignoring the subvector g3, while the just-identified CBIPTW pays no attention to predict exposure assignment, ignoring the subvector g1,g2. Over-identified CBIPTW stands in the middle attending to both criteria.[9] Finally, we regress in a pooled model the outcome variable on the cumulated exposure, t=1tAit using the stabilized MSM weights. The coefficient that results from this last regression is compared with the numerical estimate obtained from a thousand simulations using large datasets (n=30,000) and the true exposure assignment probabilities.

Table 1 presents the results for our five sample sizes. The columns show the bias and Root Mean Squared Error (RMSE) from the pooled weighted linear regression of the outcome on the cumulated exposure using the stabilized weights. The results are in line with those of previous Covariate Balance Propensity Scores for MSMs [10]. When the exposure assignment model is correctly specified, all methods have small bias and small RMSE. In contrast, when the model is misspecified, the bias and RMSEs are large and even grow with the sample size. However, the RMSE of the CBIPTW estimates grow at a much smaller rate, thereby outperforming OLS.

Table 1

Performance of Probability Weights Estimation Methods (Normal Homoscedastic Treatment-Assignment DGP).

Correct Specification
CBIPTW Over-identified1.443.941.263.381.113.010.972.790.992.90
CBIPTW Just-identified1.473.931.073.331.
Misspecification (measurement and lag structure)
CBIPTW Over-identified3.896.424.498.873.657.273.737.353.778.66
CBIPTW Just-identified4.166.044.486.633.846.723.976.393.997.63

In the second misspecified scenario, we use almost the same DGP, only this time vi is a Gamma disturbance with different variance at every time-point: .5, 1 and 2 for t1, t2 and t3 respectively. In this case, the misspecification consists in fitting a normal cross-time homoscedastic model for the treatment Ait, with no measurement error and the correct lag structure.

Table 2

Performance of Probability Weights Estimation Methods (Gamma Heteroscedastic Treatment-Assignment DGP).

Correct Specification
Misspecification (distribution family and cross-time homoscedasticity)
CBIPTW Over-identified2.564.842.534.852.474.762.504.382.534.62
CBIPTW Just-identified2.594.842.514.752.434.682.504.402.574.63

Table 2 shows the result of this second misspecified scenario, fitting a Gaussian cross-time homoscedastic treatment-assignment model, just as in the first scenario, but this time to a Gamma, cross-time heteroscedastic DGP. Here too we see the RMSEs of the CBIPTW grow at a much smaller rate, outperforming again the traditional OLS approach, this time also in bias.

What the statistical evidence shows is that the proposed method effectively improves the data fitting processes narrowing the estimator’s dispersion, albeit at the expense of bias, resulting in more precise estimates than GLM. In a way, the extra balancing conditions work as a muffling effect that prevents the OLS estimates (that only consider the score conditions) from drifting away whenever the model is misspecified.

In short, Tables 1 and 2 tell a simple story for time-varying continuous interventions: if researchers are lucky enough to know the true DGP, and it happens to be Normal, they are probably better off with OLS. If, on the other hand, researchers have reasonable doubts with respect to the origin of their data, they are probably better off with CBIPTW, given that it is less affected by (small) departures from the assumptions underlying the traditional OLS approach.

Together, what these two sets of simulations show is that the balancing conditions counterweight quite general misspecifications of the treatment-assignment probability model at roughly the same bias – or less – and quickly stabilizing an otherwise exploding RMSE.

5 Empirical application

As an empirical application of the proposed method, we estimate the impact of the Mexican universal health insurance program, Seguro Popular (SP), on key variables associated with the provision of healthcare services by the Secretaría de Salud – Ministry of Health – (SSA). We analyze data from the federal records on infrastructure and human resources employed by the SSA,[10] and the coverage records of the program at a yearly frequency from 2001, the year prior the introduction of the program, to 2013. The information is aggregated to the Sanitary Jurisdiction (SJ) level, the most basic administrative units of SSA in charge of the operation of healthcare services and its programs. Time-varying covariates include doctors’ offices, staffed and non-staffed hospital beds, physicians and nurses, both with and without day-to-day contact with patients, all per 1,000 people outside the social security network.[11] In order to make the common support assumption more credible and the estimates less model dependent, 39 out of 233 SJs are dropped from the analysis.[12]

Figure 2 Covariate-Balance for Time-varying SP coverage on Sanitary Jurisdictions.

Figure 2

Covariate-Balance for Time-varying SP coverage on Sanitary Jurisdictions.

Figure 2 (continued).

Figure 2


The estimations of the treatment exposure model fit a single probability model, regressing the yearly increment of SP coverage (per 1,000 people outside the social security network) on the most recent set of covariates and past (cumulative) SP coverage.[13] Looking at the degree of covariate balance achieved by the CBIPTW (Fig. 2, Appendix Table A.1), we see that it is improved in all cases, as medians are closer to zero, and that is best achieved using the just-identified approach, both on location and dispersion. It is worth noting that, in this case, fitting only the probability model (GLM) does not break the correlation between the exposure to treatment and three covariates: doctors’ offices, physicians and nurses in day-to-day contact with patients.

Using weighted linear regressions to estimate the impact of (cumulated) SP coverage on the change in the (relative) numbers of doctors’ offices, physicians and nurses (per 1,000 people outside the social security network) providing clinical care, we can see (Fig. 3, Appendix Table A.2) that in all probability the effect of SP is positive in all three outcome variables, and grows bigger the less correlated are the covariates with the treatment in the pseudo-sample.

Analysis by Bosch and Campos-Vzquez [28] on the impact of SP on the expansion of healthcare services ignored both the duration of the roll-out process and the different levels of treatment intensities observed during that process, applying a differences-in-differences approach where treatment was dichotomized (municipalities were considered as ‘treated’ once more than 10 individuals had been affiliated to SP). Huffman and van Gameren [19] improve on that by both considering a continuous treatment indicator (as in this paper) as well as a better-suited unit of analysis. While our results are somewhat similar to earlier findings, to the extent that positive impacts of SP on healthcare services is reported, our results – slightly bigger than Huffman and van Gameren [19] – should be considered more precise as we use an estimator both less model dependent and that outperforms the balance achieved trough GLM.

It is important to note that, in empirical applications, the least comparable the units are, that is to say, the least reasonable the common support assumption is, the more relevant the CBIPTW become. This is because the CBIPTW automatically compensate for the instability in the probability model generated by the lack of common support, something King and Zeng [24] identified as the risk of extreme counterfactuals. In this sense the CBIPTW may also contribute to expand the validity of the results, allowing the analysis of larger data sets that otherwise would be unreasonable.

Figure 3 Average Treatment Effects of SP coverage on Sanitary Jurisdictions.

Figure 3

Average Treatment Effects of SP coverage on Sanitary Jurisdictions.

6 Conclusions

We have shown that the idea behind the Covariate Balancing Propensity Score [11] is flexible enough to cover also longitudinal settings with general treatment regimes. Here we provide an estimation method for Marginal Structural Models with continuous treatments whose main advantage is that it directly includes the correlation-breaking property of Robins’ [1] stabilized weights in the estimation procedure. This new tool avoids the manual process of checking the correlation amongst the current exposure to treatment and the covariate histories in the weighted data and then respecifying. This new extension to the CBPS may help clear the way for those interested in causal analysis in these complex settings.[14]

Since the balance conditions do not require making any modeling assumptions in themselves, but only that the weights balance the covariates – something we know to be true regardless of the true DGP –, it can be argued that minimizing the correlation between covariates can be done more efficiently then with already available methods, such as the non-parametric extension of the CBPS to general treatment regimes [12] or Genetic Matching [29]. This point grows in importance as treatment assignment models gain in complexity.[15] However, unlike these other methods, the methodology we propose here provides the researcher also with a probability model that outperforms the traditional GLM approach from which it is possible to start building bridges between MSMs and the decomposition methods literature, something that remains a pending task for future research.

Funding source: Consejo Nacional de Ciencia y Tecnología

Award Identifier / Grant number: CVU 173055

Funding statement: A preliminary version of the paper formed part of the first author’s PhD thesis, financially supported by a Consejo Nacional de Ciencia y Tecnología scholarship (CVU 173055), under the supervision of the second author. In the final phase the first author received support from the UNAM. Programa de Becas Posdoctorales en la UNAM, Becario del Programa Universitario de Estudios del Desarrollo.


Both authors are grateful for the comments received during various presentations of early versions, while the observations of two anonymous referees have importantly contributed to the final version. Special thanks are due to Dr Christian Fong for very helpful, clarifying comments on the CBPS Package.

Appendix I Tables

Table A.1

Covariate-Balance for Time-varying SP coverage on Sanitary Jurisdictions.

VariableSampleCoef.Bootstrap Std. Err.zp>zNormal-based [95 % Conf. Interval]
Doctor’s officesUW0.100.033.750.000.050.15
Staffed hospital bedsUW0.
Non-staffed hospital bedsUW0.240.073.610.000.110.37
Physicians with day-to-day contact with patientsUW0.210.063.540.000.090.33
Physicians without day-to-day contact with patientsUW-0.010.01-1.370.17-0.020.00
Nurses with day-to-day contact with patientsUW0.250.092.880.000.080.41
Nurses without day-to-day contact with patientsUW-0.020.01-1.930.05-0.040.00

  1. Source: Own elaboration based on data from the Sistema Nacional de Información en Salud (National Health Information System) and Seguro Popular administrative records

  2. Estimates based on 1000 replications. Bootstrapping takes into account the matching algorithm.

  3. a SP coverage refers to affiliates as proportion of the population without traditional (work-related) health insurance (IMSS, ISSSTE, Pemex, the Ministry of Defense or the Navy).

  4. b Per thousand population without traditional health insurance.

  5. c UW corresponds to the unweighted 194 sanitary jurisdictions with the smallest mean distance to the rest of the observations in the data; GLM to the sample weighted using OLS and an empty model as stabilizer; CBIPTW-OVER to the sample weighted using over-identified correlation-breaking inverse-probability weights, which gives equal importance to both correctly predict treatment-assignation and balancing covariates and CBPS-EXACT to the sample weighted using Just-identified CBIPTW, which privileges covariate balance over the probabilistic model.

  6. d Coefficients result from a pooled regressions (2328 obs.) of each covariate on the yearly increment of SP coverage.

Table A.2

Average Treatment Effect of SP coverage on Sanitary Jurisdictions.

Outcome VariableSampleCoef.Bootstrap Std. Err.zp>zNormal-based [95 % Conf. Interval]
Doctor’s officesUW0.200.0116.900.000.180.23
Physicians with day-to-day contact with patientsUW0.530.0222.420.000.490.58
Nurses with day-to-day contact with patientsUW0.730.0419.000.000.660.81

  1. Source: Own elaboration based on data from the Sistema Nacional de Información en Salud (National Health Information System) and Seguro Popular administrative records

  2. Estimates based on 1000 replications. Bootstrapping takes into account the matching algorithm.

  3. a SP coverage refers to affiliates as proportion of the population without traditional (work-related) health insurance (IMSS, ISSSTE, Pemex, the Ministry of Defense or the Navy).

  4. b Increment on the relative number (per thousand population without traditional health insurance) with respect to year prior the implementation of the program.

  5. c UW corresponds to the unweighted 194 sanitary jurisdictions with the smallest mean distance to the rest of the observations in the data; GLM to the sample weighted using OLS and an empty model as stabilizer; CBIPTW-OVER to the sample weighted using over-identified correlation-breaking inverse-probability weights, which gives equal importance to both correctly predict treatment-assignation and balancing covariates and CBIPTW-EXACT to the sample weighted using Just-identified CBIPTW, which privileges covariate balance over the probabilistic model.

  6. d Coefficients result from pooled regressions (2328 obs.) of each outcome variable on (cumulated) SP coverage.

Appendix II Derivation of Formulas

Equation 3 follows from plugging the Gaussian distribution,

Equation 4 follows from plugging the definition of Robins’ stabilized weights,


where the last equality follows from assuming that current variables can only be affected by the past and the definition of conditional distribution, i. e.,


The first two lines of the score vector gθ, equation 5, correspond to the first order conditions of the log-likelihood


In the third line we have conveniently rearranged the histories of exposure and covariates in the correlation-breaking condition for symmetry’s sake, the proof of which follows the same general outline of equation 3.

The entries of the 3╳3 covariance matrix θA_t,X_t are


The upper left 2×2 submatrix of θA_t,X_t is well-known since corresponds to the covariance of the score conditions of the normal distribution. The rest of the elements are obtained by working out the respective covariances involving the third term of the score vector gθ as follows.

where the last equality follows from assuming the marginal distribution of At to be Gaussian with mean zero, and thus EAit=0 and EAit2=σ2.

where the last two equalities follow from assuming the marginal distribution of At to be Gaussian with mean zero, making its third moment equal zero, EAit3, and its second moment equal its variance, EAit2=σ2.

where the last equality follows from recognizing the Gaussian distribution of Ait with mean A_it1α+X_itβ and variance σ2, whose second noncentral moment appears enclosed in the first set of brackets.


1. Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Halloran M, Berry D, editors. Statistical models in epidemiology, the environment, and clinical trials. The IMA Volumes in Mathematics and its Applications. vol. 116. Berlin: Springer; 2000. p. 95–133, chap 2.10.1007/978-1-4612-1284-3_2Search in Google Scholar

2. Robins JM, Hernn MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550–60.10.1097/00001648-200009000-00011Search in Google Scholar

3. Robins JM, Hernán MA. Estimation of the causal effects of time-varying exposures. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Longitudinal data analysis, Chapman & Hall/CRC Handbooks of Modern Statistical Methods, Boca Raton: CRC Press; 2008. p. 553–99, chap 23.10.1201/9781420011579.ch23Search in Google Scholar

4. Blackwell M. A framework for dynamic causal inference in political science. Am J Polit Sci. 2013;57(2):504–20.10.1111/j.1540-5907.2012.00626.xSearch in Google Scholar

5. Imbens GW. The role of the propensity score in estimating dose-response functions. Biometrika. 2000;87(3):706–10.10.3386/t0237Search in Google Scholar

6. Imai K, Van Dyk DA. Causal inference with general treatment regimes. J Am Stat Assoc. 2004;99(467):854–66.10.1198/016214504000001187Search in Google Scholar

7. Hirano K, Imbens GW. The propensity score with continuous treatments. In: Gelman A, Meng XL, editors. Applied Bayesian modeling and causal inference from incomplete-data perspectives: An essential journey with Donald Rubin’s statistical family. Wiley Series in Probability and Statistics. vol. 226164. New York: John Wiley & Sons, Ltd; 2004. p. 73–84, chap 7.10.1002/0470090456.ch7Search in Google Scholar

8. Flores CA, Flores-Lagunes A, Gonzalez A, Neumann TC. Estimating the effects of length of exposure to instruction in a training program: The case of Job Corps. Rev Econ Stat. 2012;94(1):153–71. in Google Scholar

9. Zhao S, van Dyk DA Imai K. Propensity-score based methods for causal inference in observational studies with fixed non-binary treatments. 2013., mimeo.Search in Google Scholar

10. Imai K, Ratkovic M. Robust estimation of inverse probability weights for marginal structural models. J Am Stat Assoc. 2015;110(511):1013–23. 10.1080/01621459.2014.956872.Search in Google Scholar

11. Imai K, Ratkovic M. Covariate balancing propensity score. J R Stat Soc, Ser B, Stat Methodol. 2014;76(1):243–63.10.1111/rssb.12027Search in Google Scholar

12. Fong C, Ratkovic M, Hazlett C, Imai K. Cbps: R package for covariate balancing propensity score. Comprehensive R Archive Network (CRAN). 2015. in Google Scholar

13. Fong C, Hazlett C, Imai K. Parametric and nonparametric covariate balancing propensity score for general treatment regimes. 2015., mimeo.Search in Google Scholar

14. Splawa-Neyman J, Dabrowska D, Speed T, et al.. On the application of probability theory to agricultural experiments. essay on principles. section 9. Stat Sci. 1990;5(4):465–72.10.1214/ss/1177012031Search in Google Scholar

15. Rubin DB. Matching to remove bias in observational studies. Biometrics. 1973;29(1):159–83.10.1017/CBO9780511810725.007Search in Google Scholar

16. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Math Model. 1986;7(9):1393–512.10.1016/0270-0255(86)90088-6Search in Google Scholar

17. Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008;168(6):656–64.10.1093/aje/kwn164Search in Google Scholar

18. Huffman C. Causal inference with time-varying continuous interventions: Evaluating the Mexican universal health insurance program Seguro Popular. PhD thesis. Centro de Estudios Econmicos, El Colegio de Mxico, A.C. 2016.Search in Google Scholar

19. Huffman C, van Gameren E. Time-varying continuous interventions: The roll-out of a universal health insurance program and its impact on the supply of health care services. 2017. mimeo.10.1186/s12939-018-0874-1Search in Google Scholar

20. Rosenbaum PR. Observational studies. In: Observational studies. Berlin: Springer; 2002. p. 1–17.10.1007/978-1-4757-3692-2Search in Google Scholar

21. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163(12):1149–56.10.1093/aje/kwj149Search in Google Scholar

22. Ding P, Miratrix LW. To adjust or not to adjust? sensitivity analysis of m-bias and butterfly-bias. J Causal Inference. 2015;3(1):41–57.10.1515/jci-2013-0021Search in Google Scholar

23. Ratkovic M, Tingley D. Sparse estimation and uncertainty with application to subgroup analysis. Polit Anal. 2017;25(1):1–40. 10.1017/pan.2016.14.Search in Google Scholar

24. King G, Zeng L. The dangers of extreme counterfactuals. Polit Anal. 2006;14(2):131–59.10.1093/pan/mpj004Search in Google Scholar

25. King G, Zeng L. When can history be our guide? The pitfalls of counterfactual inference. Int Stud Q. 2007;51(1):183–210.10.1111/j.1468-2478.2007.00445.xSearch in Google Scholar

26. Gower JC. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 1966;53(3–4):325–38.10.1093/biomet/53.3-4.325Search in Google Scholar

27. Gower J. A general coefficient of similarity and some of its properties. Biometrics. 1971;27(4):857–71.10.2307/2528823Search in Google Scholar

28. Bosch M, Campos-Vzquez RM. The trade-offs of social assistance programs in the labor market: The case of the “Seguro Popular” program in Mexico. Documento de Trabajo DT-2010-12. El Colegio de Mxico, A.C. 2010. in Google Scholar

29. Diamond A, Sekhon JS. Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Rev Econ Stat. 2013;95(3):932–45.10.1162/REST_a_00318Search in Google Scholar

Received: 2017-01-13
Revised: 2018-01-24
Accepted: 2018-03-19
Published Online: 2018-04-06
Published in Print: 2018-09-25

© 2018 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.