This section introduces basic definitions to enable unambiguous talk about generalization. The section may appear somewhat dry but it is critical that these terms be understood. Ultimately there can be no scientific progress if we do not know what we are talking about when we are talking about generalization.^{1}

**Definition 1** (Graph) A graph is a collection 𝒢=〈**V**, **E**〉 of nodes **V**={*V*_{1}, …, *V*_{N}} and edges **E**={*E*_{1}, …, *E*_{M}} where the nodes correspond to variables and the edges denote the relation between pairs of variables.

**Definition 2** (Directed Acyclic Graph) A directed acyclic graph (DAG) is a graph that only admits: (i) *directed* edges with one arrowhead (e.g., →); (ii) *bi-directed* edges with two arrowheads (e.g., ); and (iii) no directed cycles (e.g., X→Y→X), thereby ruling out mutual or self causation.

A *path* in a DAG is any unbroken route traced along the edges of a graph – irrespective of how the arrows are pointing (e.g., *X**M*→*Y*). A *directed* path, however, is a path composed of directed edges where all edges point in the direction of the path (e.g., *X*→*M*→*Y* is a directed path between the ordered pair of variables (*X*, *Y*)). Any two nodes are *connected* if there exists a path between them, else they are *disconnected*.

**Definition 3** (Causal Structure, adapted from Pearl (2009: p. 44, 203)) A *causal structure* or diagram of a set of variables **W** is a DAG 𝒢=〈{**U**, **V**}**, E**〉 with the following properties:

Each node in {**U**, **V**} corresponds to one and only one distinct element in **W**, and vice versa;

Each edge *E*∈**E** represents a direct functional relationship among the corresponding pair of variables;

The set of nodes {**U**, **V**} is partitioned into two sets:

**U**={*U*_{1}, *U*_{2}, …, *U*_{N}} is a set of *background* variables determined only by factors outside the causal structure;

**V**={*V*_{1}, *V*_{2}, …, *V*_{N}} is a set of *endogenous* variables determined by variables in the causal structure – that is, variables **U**∪**V**; and

None of the variables in *U* have causes in common.

A causal structure or diagram provides a transparent graphical language for communicating our private knowledge about what variables we believe are relevant for a specific causal analysis, and how these variables stand in causal relation to one another. Figure 1, adapted from (Morgan and Winship 2012, fig. 6), is an example of a causal diagram. In this causal diagram variables *U*_{i} are the unobserved background variables, and all other variables are the endogenous variables in **V** (i.e., they all have at least one arrow pointing into them).^{2} By convention solid nodes represent known and measured variables, whereas empty nodes depict unmeasured ones.

Figure 1: Panel (a) depicts a non-parametric structural equation model explaining how test scores (*Y*) depend on student ability (*A*), feelings of self-worth (*S*), neighborhood (*N*), parental education (*P*), and unobserved background causes (*U*_{Y}). Exposure to charter schools (C) is caused by ability, parental education, and unobserved background causes (*U*_{C}); and it affects test scores via feelings of self-worth, a mediator. Panel (b) is the equivalent graphical representation of the non-parametric structural model in Panel (a). The causal diagram contains all the information needed for non-parametric causal identification.

Causal diagram 𝒢 represents a possible theory of causation, one where exposure to charter versus public school (*C*) affects test scores (*Y*) via feelings of self-worth (*S*).^{3} At the same time parental education (*P*) and student ability (*A*) (unobserved, notice hollow circle) are both common causes of exposure to charter schools, and of test scores. These two causes act as potential confounders. They both imply an association between charter schools and test scores even if charter schools are without effect (e.g., even if we delete the arrows in *C*→*S*, or *S*→*Y* from 𝒢). Parental education (*P*) affects tests scores directly, by helping with homework say, and indirectly, via the choice of residential neighborhood (*N*) and school type (*C*).

Causal diagrams invite the use of an intuitive terminology to refer to causal relations. In a causal diagram *C*→*S* reads “*C* causes *S.*” We also say that *C* is a *parent* of *S*, and *S* is a *child* of *C*, if *C* directly causes *S*, as in *C*→*S*. For example, the *parents* of *Y* are denoted PA(*Y*)={*P*, *N*, *S*, *A*}.^{4} Similarly, we say that *C* is an *ancestor* of *Y*, and *Y* a *descendant* of *C*, if *C* is a direct or indirect cause of *Y*. Thus, *P* is both a direct cause of *Y*, as in *C*→*Y*, and an indirect cause, as in *C*→*N*→*Y*. We refer to non-terminal nodes in directed paths as *mediators*. *S* is a mediator in the path *X*→*S*→*Y*.

In addition to laying out causal theories graphically, and with intuitive terminology, causal diagrams have two additional properties. First, by Definition 3 a DAG of a set of variables **W** only qualifies as a causal diagram if it includes all common causes of the variables in **W** (see point 4 in the definition).^{5} This ensures the diagram has some nice properties, including the ability to read conditional independencies directly.^{6} For example, the diagram tells us that under the null of no effect (e.g., deleting *C*→*S* in causal diagram 1), and conditional on *P* and *A*, charter schools and test scores are distributed independent of each other. If *C* and *Y* remain associated despite controlling for these variables, then we read that as evidence that they are causally related under the assumptions laid out in causal diagram 1.^{7}

Second, the definition of causal diagrams relies on directed edges (e.g., arrows) in place of explicit functional relations to depict causal relations between variables in the graph. This is a feature not a bug. Detailed knowledge about specific functional forms is often completely unnecessary for causal identification. To wit, this diagrammatic representation of functional relations is in accordance with how most people store their causal knowledge. For example, most of us know that smoking causes lung cancer but few, if any of us, know the precise functional relation linking them together.

Figure 1 also shows that every causal model has a corresponding causal diagram (Figure 1a and 1b respectively). A causal model is defined as follows:

**Definition 4** [Causal Model, adapted from Pearl (2009: p. 203)] A *causal model* **M** replaces the set of edges *E* in a causal structure *𝒢* by a set of functions **F**={*f*_{1}, *f*_{2}, …, *f*_{N}}, one for each element of **V**, such that **M**=〈**U**, **V**, **F**〉. In turn, each function *f*_{i} is a mapping from (the respective domains of) *U*_{i}∪PA_{i} to *V*_{i}, where *U*_{i}⊆**U** and PA_{i}⊆**V**/*V*_{i} and the entire set **F** forms a mapping from **U** to **V**. In other words, each *f*_{i} in

$${\upsilon}_{i}={f}_{i}\mathrm{(}p{a}_{i}\mathrm{,}\text{\hspace{0.17em}}{u}_{i}\mathrm{}\mathrm{)}\mathrm{,}\text{}i=\mathrm{1,}\text{\hspace{0.17em}}\mathrm{2,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}N\mathrm{,}$$

assigns a value to *V*_{i} that depends on (the values of) a select set of variables in *V*∪*U*, and the entire set **F** has a unique solution **V**(**u**).

Like the causal diagram, a causal model is completely non-parametric. For example, casual model 1a specifies that being exposed to a charter schools is a function *c*=*f*_{c}(*a*, *p*, *u*_{c}). This function is compatible with any well-defined mathematical expression in its arguments like *c*=*α*+*β*_{1}*a*+*β*_{2}*p*+*u*_{c}, or *c*=*α*+*β*_{1}*a*+*β*_{2}*p*+*β*_{3}*a*×*p*+*u*_{c}.

Causal models, like causal diagrams, are completely deterministic: Probability comes into the picture through our ignorance of background conditions **U**, which we summarize using a probability distribution P(**u**). In turn, P(**u**) induces a probability distributions P(**v**) over all endogenous variables in **V**.^{8}

**Definition 5** [Probabilistic Causal Model, Pearl (2009: p. 205)] A probabilistic causal model Γ is a pair 〈**M**, P(**u**)〉, where **M** is a causal model and P(**u**) is a probability function defined over the domain of **U**.

Finally, social scientists often talk about generalizability in terms of causal mechanisms. But what are causal mechanisms, and what is the difference between a model and a causal mechanism, if any? The present framework allows us to define such mechanisms precisely:

**Definition 6** (Causal mechanism) A *causal mechanism* is any **F**′⊆**F**, where **F** is the set of functions in causal model **M**=〈**U**, **V**, **F**〉.

For example, in Figure 1a function *f*_{y} is a causal mechanism generating *y*, and so too is the set **F** of all mechanisms in model **M**. The difference between these two mechanisms is that *f*_{y} takes some endogenous variables as inputs, whereas mechanism **F** takes only background variables as inputs. For instance, in causal model 1a we say that ability (*A*) causes test scores (*Y*) via mechanism **F**_{A,Y}={*f*_{c}, *f*_{s}, *f*_{y}}.

After this brief introduction to causal diagrams we turn to the formal definitions needed to understand interventions, heterogeneity, and generalization.

## Comments (0)