Early efforts to resolve Lord’s paradox were made by Bock [16], Judd and Kenny [17], Cox and McCullagh [18], and Holland and Rubin [19]. Since no data was given on the old-diet, authors had to assume a model of weight gain under old-diet conditions and concluded, almost uniformly, that both statisticians were in fact correct, depending on the model assumed and on the precise questions that the statisticians attempted to answer. Bock, for example, sees no contradiction between the two statisticians. The first statistician asks: “Is there a difference in the average gain in weight of the population?” and correctly answered: “No!” The second statistician asks: “Is a man expected to show a greater weight gain than a woman, given that they are initially of the same weight?” and answers it correctly: “Yes!” [[16], p. 491]. Bock does not explain why the two conclusions are noncontradictory given that the first question is merely a weighted average of the second.

Cox and McCullagh [18], computed the causal effect of the new diet by assuming that, under the old diet, the final weight of every individual will remain the same as the initial weight. Accordingly, they found that statistician-1 is correct, the average causal effect (ACE) of the new diet on weight gain is zero for both men and women. Based on the same model, they found that statistician-2 is also correct, though he simply asks a different question, concerning the behavior of individual units within each population. Here statistician-2 finds that individual units are affected differently; initially overweight individuals tend to lose weight, and initially underweight individuals tend to gain weight. Naturally, then, comparing boys and girls at the same initial weight would show boys losing more weight than girls. Again, what Cox and McCullagh left unanswered is why the two findings – differential gain on every stratum and equal gain on the average – should not contradict the “sure thing” principle.

Holland and Rubin [19] assumed several different models for the old-diet and showed that, in contrast to the Cox and McCullagh’s model, the gender specific causal effects of the diet may be non-zero for both men and women, and their difference can be either positive or negative depending on the parameters of the assumed model. Thus, conclude Holland and Rubin, neither statistician is correct or incorrect; it all depends on which model one assumes for the old diet weight gain. What Holland and Rubin did not explain is what in the new-diet data alone gave Lord’s the unmistaken impression that statisticians 1 and 2 reach conflicting conclusions, namely, why their findings should not be constrained by the Sure Thing Principle.

Another question left unanswered by early interpreters is Lord’s appeal for a general strategy of “allowing” for initial group differences. “The researcher wants to know how the groups would have compared if there had been no preexisting uncontrolled differences.” In other words, is there a general criterion for deciding whether controlling for pre-treatment differences is a valid thing to do, in case we wish to compare group behavior that is free from the influence of those differences.

Such a general criterion is provided by the graphical analysis presented in the previous section. The criterion coincides with the answer to the question of whether adjustment for covariates (in our case, ${W}_{I}$) is appropriate for estimating total and direct effects. It is based on the graph structure alone, free of parametric assumptions that renders the analysis of Holland and Rubin undecisive.

Holland and Rubin did not attempt to interpret the problem in terms of the effect of gender, as we did in the previous section, because gender, being unmanipulable, cannot have a causal effect according to Holland and Rubin’s doctrine of “no causation without manipulation” [22]. To demonstrate its generability, let us apply the graphical method to a model proposed by Wainer and Brown [23], where the target quantity is the effect of diet, not of gender. Wainer and Brown simplified Lord’s original problem and interpreted the two ellipses of Figure 1 to represent two different diets, or two dining halls, each serving a different diet. They further removed gender from consideration and obtained the two data sets seen in Figure 3 [their Figure 9]. Since the choice of dining tables is manipulable, causal effects are well defined, and they presented Lord’s dilemma as choosing between two methods of estimating the causal effect of dining room on weight gain. In their words:

Figure 3: A scatter plot of a simplified Lord’s Paradox showing the bivariate distribution of weights in two dining rooms at the beginning and end of each year [from [23]].

“The first statistician calculated the difference between each student’s weight in June and in September, and found that the average weight gain in each dining room was zero. This result is depicted graphically in Figure 3 [their Figure 9]. with the bivariate dispersion within each dining hall shown as an oval. Note how the distribution of differences is symmetric around the 45° line (the principal axis for both groups) that is shown graphically by the distribution curve reflecting the statistician’s findings of no differential effect of dining room.

The second statistician covaried out each student’s weight in September from his/her weight in June and discovered that the average weight gain was greater in Dining Room *B* than in Dining Room *A*. This result is depicted graphically in Figure 4 [their Figure 10]. In this figure the two drawn-in lines represent the regression lines associated with each dining hall. They are not the same as the principal axes because the relationship between September and June is not perfect. Note how the distribution of adjusted weights in June is symmetric around each of the two different regression lines.
^{4} From this result the second statistician concluded that there was a differential effect of dining room, and that the average size of the effect was the distance between the two regression lines.

Figure 4: A graphical depiction of Lord’s Paradox showing the bivariate distribution of weights in two dining rooms at the beginning and end of each year augmented by the regression lines for each group [from [23]].

So, the first statistician concluded that there was no effect of dining room on weight gain and the second concluded there was. Who was right? Should we use change scores or an analysis of covariance? To decide which of Lord’s two statistician’s had the correct answer requires that we make clear exactly what was the question being asked. The most plausible question is causal, ‘What was the causal effect of eating in Dining Room *B*?’”

[23] Wainer and Brown’s model is depicted in Figure 5. Here, the initial weight is no longer treatment dependent for it was measured prior to treatment. It is in fact a confounder since, as shown in the data of Figure 3 [their Figure 9], overweight students seem more inclined to choose Dining Room $B$, compared with underweight students. So, ${W}_{I}$ affects both diet *D* and final weight *W*.

Figure 5: Graphical representation of Wainer and Brown’s scenario in which the initial weight (${W}_{I}$) is a determiner of diet (*D*), and the effect of Diet on gain requires an adjustment for ${W}_{I}$.

It is clear from the graph of Figure 5 that, regardless of whether one aims at estimating the effect of diet on the final weight ${W}_{F}$ or on the weight gain ($Y$) adjustment for the initial weight ${W}_{I}$ is necessary. Thus, statistician-2, who adjusted for ${W}_{I}$ (ANCOVA) was correct, while statistician-1, who was charmed by the equality of average weight gain under the two diets was flatly wrong. This equality reflects no change in expected weight gain predicated upon *finding* a subject in Dining Room *A* as compared to $B$; it does not represent equality of gains *due* to a change from Dining Room *A* to dining room $B$. Confounders need to be “controlled for” when causal effects are estimated, and failure to do so leads to biased results. The right answer, therefore, lies with statistician-2, who concluded that diet *A* led to significantly more gain in weight than diet $B$ when proper allowance is made for differences in initial weight between the two groups. This also explains why the Sure Thing Principle need not constrain the predictions of the two statistician; the principle applies to causal effects, not to statistical predictions [8].

Interestingly, Wainer and Brown did not reach this conclusion. Instead, they concluded that the two statisticians were right, but made different assumptions. In their words:

“To draw his conclusion the first statistician makes the implicit assumption that a student’s control diet (whatever that might be) would have left the student with the same weight in June as he had in September. This is entirely untestable. The second statistician’s conclusions are dependent on an allied, but different, untestable assumption. This assumption is that the student’s weight in June, under the unadministered control condition, is a linear function of his weight in September. Further, that the same linear function must apply to all students in the same dining room.”

I differ from Wainer and Brown in this conclusion. There is no need for the assumption of linearity to justify the correctness of statistician-2’s insistence on using ANCOVA. Simultaneously, no assumption whatsoever would justify statistician-1 conclusion. Failure to control for confounding cannot be remedied by linearity, and proper control for confounder works both in linear and nonlinear models.

It is worth re-emphasizing at this point that our analysis relies, of course, upon the assumption of no unobserved confounders. When latent confounders are present, the machinery of *do*-calculus [24, 25] need be invoked to decide if the target effects are estimable or not. If not, then both statisticians are wrong, none of the two methods would result in unbiased estimate, and Lord’s despair is perhaps justified: “The usual research study of this type is attempting to answer a question that simply cannot be answered in any rigorous way on the bases of available data.”

However, the need to invoke causal assumptions, beyond the available data (e. g., no unmeasured confounding) applies to ALL tasks of causal inference (in observational studies), so there is nothing special to Lord’s paradox. The unique challenge that Lord’s paradox presented to the research community was to decide, from a rudimentary qualitative features of the model, whether allowance for preexisting differences should be made and, if so, how. We have seen that in the case of Lord’s original story (Figure 1) as well as in the dining rooms variant of the story (Figure 3) such determination could be made using plausible qualitative models, without making any assumptions about the functional form of the relationship between a treatment and its outcomes.
^{5}

In the first story, both statisticians were right, each aiming at a different effect. In the second story, one was right (ANCOVA) and one was wrong. But in no case did we face a predicament like the one that triggered Lord’s curiosity: two seemingly legitimate methods giving two different answers to the same research question. Lord gave in to the clash, and declared surrender. But he shouldn’t have; whether we can estimate a given effect or not (for a given scenario) is a mathematical question with a yes/no answer, and should not be shaken by a clash of intuitions.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.