A Proof of Theorem 1
For are defined by the sample estimating equations:
For . eq. (13) leads to the following influence function representation of :
It can be calculated that the second row of takes the forms of
Therefore for ,
Consequently, . Note that
It follows from that
By independence of so that
We have verified that , where
By the moment conditions ,
By independence between
Therefore since also , it follows that
So . Note that
There is no guarantee of either do not impose that
and the moment conditions for do not impose
B Proof of Corollary 1
Under the causal model, the parameter . Then
where by definition that
Next consider . It follows from the 4th moment condition
that . Therefore,
and . Therefore
Using , it can then be verified that
In the special case when , then
C Proof of Corollary 2
Replace show that
Then we can write, for ,
It follows from , and the instrumental variable moment equations that
By independence of Finally, we check that
and so that
We have verified that asymptotically. The same result can also be verified using the counter-factual model as in Corollary 1.
D Proof of Corollary 3
Note , where
Using the sparse structure of are then given by
For in Theorem 1. Theorem 1 also shows the asymptotic variance of as
By the moment condition is at least as large as
A similar calculation shows that . Of course one can also bootstrap.
E Proof of Proposition 1
Consider first the case of , where
such that for ,
An application of the partitioned matrix inversion formula shows that the solution to eq. (5) is given by, for ,
Substitute the first two equations into the third and simplify to
Since . And then
Up to among the control and treatment groups.
These calculations can be extended to the LATE GMM model in eq. (5), where we now define consistent estimates. Then eqs. (19) and (21) both continue to hold, leading to . The first two equations in eq. (20) now become
Note that given implied by the estimating eq. (13) for implied by eq. (13).
F Proof of Proposition 2
Let . Then the normal equations corresponding to eq. (9) are
Taking a linear combination using ,
We rearrange this into
By the definition in eq. (10),
The normal equations therefore take the form of
Next, consider the normal equations determining the interactive ,
This can be rewritten as
Then eq. (23) can be satisfied through eq. (22) by setting
G Proof of Proposition 3
When , Hahn (1998) shows that , where
We can then use in the proof of Corollary 1 to show that
More generally when , the LATE efficiency bound was calculated in Frolich (2006) and Hong and Nekipelov (2010) (Lemma 1 and Thm 4), with , and
where . We can rewrite this as
Again comparing this to in the proof of Corollary 1 shows that
The comparison between in eq. (17) can also be understood in the context of doubly robust estimators, which use influence functions of the form similar to ,
The estimators with influence function , the P-score model is obviously correctly specified. Therefore , doubly robust estimators use influence functions of the form
where or the set of
are correctly specified. Among different misspecified linear approximations to , the least square projection is more efficient.
Similar to eqs. (3) and (4), can be consistently estimated under suitable regularity conditions (such as those in Newey (1997)) by
and . If we write
then we expect that uniformly in ,
which coincides with the semiparametric asymptotic influence function, and includes the CI model as a special case when .
H Proof of Proposition 4
Recall that . It can be shown that
where the last line follows from Assumption 8.2. Furthermore,
Next we consider
Using Assumption 8, each term can be shown to be under assumptions 8.1 and 8.2.
Then write the first part of the influence function as
First note that the two sums are orthogonal:
We now use arguments similar to those in Lemma B.2 of BCS 2017a to derive the limiting distribution of eq. (25). The distribution of . Define
By construction, . Next define
Using properties of Brownian motion,
Since the two sums are independent, ,
Furthermore, since , by the continuous mapping theorem,
For the second term, it suffices to use Assumption 8.2 to show that
Lastly, note that .
As in Section 2, it is straightforward to show that the 2SLS robust variance is consistent for times
Independently for each . Using similar arguments as those in Lemma B.3 of BCS 2017a,
The key steps are to use the Almost Sure Representation theorem to construct ,
Also, note that by the weak law of large numbers, for any sequence ,
Therefore, the first and third terms coincide with . The second term converges to
This is larger than .
I Proof of Proposition 5
The sample normal equations for this regression are given by
We can write ,
Using Assumption 8.1 and 8.2 we can show that , where
In the following we will show that , which by non-singularity of . Then the second row of the relation
implies that, using the above ,
where we recall that . Using similar arguments to those in proposition 4,
Note that the first two sums in the influence function are orthogonal:
And the third sum is orthogonal to the first two sums by the same arguments in proposition 4. Therefore, . It is also easy to show using similar arguments to those in proposition 4 that the 2SLS nominal variance consistently estimates times
which is larger than .
J Proof of Proposition 6
We choose to work with the representation in eq. (11), using which we write
For the denominator, under Assumption 8, Lemma B.3 of BCS 2017a implies that
Together with ,
In the above
Using Assumption 8, Lemmas B.2 and B.3 of BCS 2017a, and arguments similar to those in propositions 4 and 5, we can show that eq. (27) can be written as
The first part of this influence function corresponds exactly to the first term in eq. (25). Therefore regardless of within the cluster.
In the special case of unconfoundedness, eq. (28) becomes
Using Assumption 8, where
for on the cluster dummies, and converges to
Therefore is in turn consistent for the variance of
which is asymptotically smaller than the variance of eq. (28) but larger than the variance of its first component. Next we will need to add a consistent estimate of
This is obtained by in Corollary 3 to obtain a consistent estimate of the variance of eq. (28).
We can also directly estimate the variance of by estimating the first representation of the influence function in eq. (27). Let ,
Lemma B.3 of BCS 2017a and the continuous mapping theorem imply that consistently estimates the variance of eq. (27).
K Proof of Proposition 7
This estimator can be implemented using OLS and 2SLS by fully interacting and the regression functions in eq. (12) as . Consider first the OLS case under Assumption 5.
where , for
In the normal equations . Note that
where we can write (1) as
we can write eq. (1) as
This obviously is more efficient than eq. (29) since , and results in a smaller variance.
Next we generalize the above to LATE. Consider
so that for ,
Since the denominator is , we focus on the numerator, and write
are defined by
and are analogously defined by
Define , where
Then we proceed similar as the ATE case to write the numerator as
where by noting that
we can write (1) as
Again this ought to be more efficient than eq. (27) since , so that the above equation becomes the efficient influence function in eq. (24) conditional on both the cluster indicators and the extra regressors.