## Abstract

Due to the need to evaluate the effectiveness of community-based programs in practice, there is substantial interest in methods to estimate the causal effects of community-level treatments or exposures on individual level outcomes. The challenge one is confronted with is that different communities have different environmental factors affecting the individual outcomes, and all individuals in a community share the same environment and intervention. In practice, data are often available from only a small number of communities, making it difficult if not impossible to adjust for these environmental confounders. In this paper we consider an extreme version of this dilemma, in which two communities each receives a different level of the intervention, and covariates and outcomes are measured on a random sample of independent individuals from each of the two populations; the results presented can be straightforwardly generalized to settings in which more than two communities are sampled. We address the question of what conditions are needed to estimate the causal effect of the intervention, defined in terms of an ideal experiment in which the exposed level of the intervention is assigned to both communities and individual outcomes are measured in the combined population, and then the clock is turned back and a control level of the intervention is assigned to both communities and individual outcomes are measured in the combined population. We refer to the difference in the expectation of these outcomes as the marginal (overall) treatment effect. We also discuss conditions needed for estimation of the treatment effect on the treated community. We apply a nonparametric structural equation model to define these causal effects and to establish conditions under which they are identified. These identifiability conditions provide guidance for the design of studies to investigate community level causal effects and for assessing the validity of causal interpretations when data are only available from a few communities. When the identifiability conditions fail to hold, the proposed statistical parameters still provide nonparametric treatment effect measures (albeit non-causal) whose statistical interpretations do not depend on model specifications. In addition, we study the use of a matched cohort sampling design in which the units of different communities are matched on individual factors. Finally, we provide semiparametric efficient and doubly robust targeted MLE estimators of the community level causal effect based on i.i.d. sampling and matched cohort sampling.

## Comments (0)