A Proof of Theorem 1
For k=1,2,3,letWikdenotetheinstruments,letVikandUikdenotetheregressors,andletθkdenotetheparametersoftheinstrumentalvariablemomentconditionEWikYi−Vik′θ0k=0.Theestimatorsθˆk are defined by the sample estimating equations:
(13)1n∑i=1nWikYi−Vik′θˆk+1n∑i=1nWikUikϕˆ′Xˉ−μx=0.
For βˆ1,letUi1=0,pz=PZi=1,pd=PDi=1,Wi1=1 Zi−pz,Vi1=1 Di−pd,andθ1=α,β.Forβˆ2,letUi2=0,Wi2=1 Zi−pz Xi−μx Vi2=1 Di−pd Xi−μx,θ2=α,β,η.Forβˆ3,letUi3=Zi−pz,θ3=α,β,η,ϕ,Wi3=1 Zi−pz Xi−μx Zi−pzXi−μx,andVi3=1 Di−pd Xi−μx Zi−pzXi−μx. eq. (13) leads to the following influence function representation of :
nθˆk−θ0k=EWikVik−11n∑i=1n(WikYi−Vik′θ0+EWikUikϕ0Xi−μx)+oP(1).
It can be calculated that the second row of EWikVik−1,k=1,2,3 takes the forms of
0 CovD,Z−10 CovD,Z−1 00 CovD,Z−1 0 0.
Therefore for k=1,2,3,
nβˆk−β0=CovZ,D−11n∑i=1nψkYi,Zi,Xi,Wik,Vik,Uik+oP(1),
ψkYi,Zi,Xi,Wik,Uik,Vik=Zi−pzYi−Vik′θ0⏟ψik1+EZi−pzUikϕ0Xi−μx⏟ψik2≡ψik
Consequently, nβˆk−β0⟶dN0,Cov(D,Z)−2Varψik.Itremainstoshowthatforj=1,2,Varψi3≤Varψij.ThiscanbedonebyshowingthatCovψij−ψi3,ψi3=0.Forthispurpose,considerfirstj=1. Note that
ψi1−ψi3=Zi−pzη0′Xi−μx+ϕ0′Xi−μxZi−pz−pz−pz2Xi−μx′ϕ
=Zi−pzη0′Xi−μx⏟Δψi131+Zi−pz2ϕ0′Xi−μx⏟Δψi132−pz−pz2Xi−μx′ϕ⏟Δψi133.
It follows from Zi2=ZiandEWi3Yi−Vi3′θ0=0 that
Covψi31,Δψi13k=0,k=1,2,3
By independence of ZifromXi,Covψi32,Δψi131=0.Finally,wecheckthatCovψi32,Δψi132=pz−pz22ϕ0′VarXϕ0,andCovψi32,Δψi133=pz−pz22ϕ0′VarXϕ0, so that
Covψi32,Δψi132−Δψi133=0.
We have verified that CovΔψi13,ψi3=0,andβˆ3ismoreefficientthanβˆ1asymptotically.Nextturntoβˆ2andψi2=Zi−pzYi−α0−β0Di−pd−η0′Xi−μx.WewanttoshowVarψi3≤Varψi2byverifyingthatCovΔψi23,ψi3=0, where
Δψi23=Zi−pz2ϕ0′Xi−μx−pz−pz2ϕ0′Xi−μx
=1−2pzZi+pz2ϕ0′Xi−μx⏟Δψi231−pz−pz2ϕ0′Xi−μx⏟Δψi232.
By the moment conditions EWi3Yi−Vi3′θ0=0,
Covψi31,Δψi23k=0,,k=1,2.
By independence between ZandX
Covψi32,Δψi231=pz−pz22ϕ0′VarXϕ0.
Therefore since also Covψi32,Δψi232=pz−pz22ϕ0′VarXϕ0, it follows that
Covψi32,Δψi231−Δψi232=0.
So CovΔψi23,ψi3=0,andVarψi3≤Varψi1.βˆ3isalsomoreefficientthanβˆ2.However,thereisnoefficiencyrankingbetweenβˆ1andβˆ2. Note that
Δψi12≡ψi1−ψi2=Zi−pzη0′Xi−μx.
There is no guarantee of either CovΔψi12,ψi2=0orCovΔψi12W,ψi1W=0.Thisisbecausethemomentconditionsforβˆ2 do not impose that
EZXY−α0−β0D−η0′X=0,
and the moment conditions for βˆ1 do not impose
EZXY−α0−β0D=0orEXY−α0−β0D=0.
B Proof of Corollary 1
Under the causal model, the parameter β0andtheinfluencefunctionsforβˆcanbewrittenusingthecounterfactuals.Recallthatβ0=EY1−Y0|D1>D0=EY1∗−Y0∗/ED1−D0.Definet1=Y1∗−β0D1,t0=Y0∗−β0D0. Then
α0−β0pd=EY−β0ED=pzEt1+1−pzEt0ψ1=Z−pzY−α0−β0D−pd=Z−pz1−pzt1−Et1+pzt0−Et0+pz1−pzt1−t0,
where by definition Et1−Et0=0.Nextconsiderβˆ2.Itfollowsfromthe3rdmomentequationEX−μxY−α0−β0D−pd−η0X−μX=0 that
η0=VarX−1CovX,Y−β0D=VarX−1pzCovX,t1+1−pzCovX,t0,
and that
ψ2=Z−pzY−α0−β0D−pd−η0X−μx=Z−pz1−pzt1−Et1+pzt0−Et0−η0X−μx+pz1−pzt1−t0,
Next consider βˆ3. It follows from the 4th moment condition
EZ−pzX−μxY−α0−β0D−pd−η0X−μX−ϕ0Z−pzX−μx=0
that ϕ0=VarX−1CovX,t1−t0. Therefore,
ψ31=Z−pzY−α0−β0D−pd−η0′X−μx−ϕ0′Z−pzX−μx=Z−pz(1−pzt1−Et1−Covt1,XVarX−1X−μx+pzt0−Et0−Covt0,XVarX−1X−μx)+pz1−pzt1−t0−Covt1−t0,XVarX−1X−μx
and ψ32=pz1−pzCovt1−t0,XVarX−1X−μx. Therefore
ψ3=Z−pzY−α0−β0D−pd−η0′X−μx−ϕ0′Z−pzX−μx=Z−pz(1−pzt1−Et1−Covt1,XVarX−1X−μx+pzt0−Et0−Covt0,XVarX−1X−μx)+pz1−pzt1−t0
Using Z⊥t1,t0,X, it can then be verified that
Covψ1−ψ3,ψ3=0andCovψ2−ψ3,ψ3=0.
In the special case when D=Z,t1=Y1−β0,t0=Y0,β0=EY1−Y0, then
(17)ψ3=Z−pz(1−pzY1−EY1−CovY1,XVarX−1X−μx+pzY0−EY0−CovY0,XVarX−1X−μx)+pz1−pzY1−Y0−β0ψ2=Z−pz(1−pzY1−EY1+pzY0−EY0−η0X−μx)+pz1−pzY1−Y0−β0.
for η0=VarX−1pzCovX,Y1+1−pzCovX,Y0.
C Proof of Corollary 2
Replace μxbyμxs=EXs.Thenitcanbeshownthatβˆ3ismoreefficientthanβˆ4.Similarcalculationsasthoseforβˆ3 show that
nβˆ4−β0=CovD,Z−11n∑i=1nψi4+oP(1),whereψi4=Zi−pzYi−ρ0−β0Di−pd−η0s′Xsi−μxs−ϕ0s′Xsi−μxsZi−pz+pz−pz2ϕ0s′Xsi−μxs
Then we can write, for ηˉ0,ϕˉ0possiblydifferentfrombothη0,ϕ0andη0s,ϕ0s,
Δψi43=ψi4−ψi3=Zi−pzη0′Xi−μx−η0s′Xsi−μxs+ϕ0′Xi−μx−ϕ0s′Xsi−μxsZi−pz+pz−pz2Xsi−μxs′ϕ0s−pz−pz2Xi−μx′ϕ0=Zi−pzηˉ0′Xi−μx+ϕˉ0′Xi−μxZi−pz−pz−pz2Xi−μx′ϕˉ0=Zi−pzηˉ0′Xi−μx⏟Δψi431+Zi−pz2ϕˉ0′Xi−μx⏟Δψi432−pz−pz2Xi−μx′ϕˉ0⏟Δψi433.
It follows from Zi2=Zi, and the instrumental variable moment equations that
Covψi31,Δψi43k=0,,k=1,2,3
By independence of ZiandXi,Covψi32,Δψi431=0. Finally, we check that
Covψi32,Δψi432=pz−pz22ϕ0′VarXϕˉ0,
and Covψi32,Δψi433=pz−pz22ψ0′VarXψˉ0. so that
Covψi32,Δψi432−Δψi433=0.
We have verified that CovΔψi43,ψi3=0,andβˆ3ismoreefficientthanβˆ4 asymptotically. The same result can also be verified using the counter-factual model as in Corollary 1.
D Proof of Corollary 3
Note Aˆk−1BˆkAˆk−1→pVarΛkEWikVik′−1WikYi−Vik′θ0k, where
Λ1=1−pd01Λ2=1−pd−μx010001Λ3=1−pd−μxpzμx0100001−pz0001
Using the sparse structure of EWikVik,the(2,2)elementsofAk−1BkAk−1 are then given by
VarCovZi,Di−1Zi−pzYi−Vik′θ0k
For k=1,2,thiscoincideswiththeasymptoticvarianceσk2 in Theorem 1. Theorem 1 also shows the asymptotic variance of βˆ3 as
σ32=VarCovZi,Di−1Zi−pzYi−Vik′θ0k+pz1−pzϕ0Xi−μx
By the moment condition EZi−pzXi−μxYi−Vik′θ0k=0,σ32 is at least as large as
(18)plimσˆ32=VarCovZi,Di−1Zi−pzYi−Vik′θ0k
A similar calculation shows that σˉ32⟶pσ32. Of course one can also bootstrap.
E Proof of Proposition 1
Consider first the case of D=Z.Forα+β=EY|Z=1,α=EY|Z=0,andμx=EX,themomentconditionsareEϕiα0,β0,μ0x=0, where
ϕi(α,β,μx)′=(Zi(Yi−α−β),ℤi(Xi−μx),(1−Zi)(Yi−α), (1−Zi)(Xi−μx)),
such that for A11=VarYi|Zi=1,A12=CovYi,Xi|Zi=1=A21′,B11=Var(Yi|Zi=0),B12=CovYi,Xi|Zi=0=B21′,A22=B22=VarXi,
(19)Varϕi⋅=pzA001−pzBA=A11A12A21A22B=B11B12B21B22
Then Varˆϕi⋅issimilartoVarϕ⋅withpz,A,Breplacedbypˆz,Aˆ,Bˆ.
An application of the partitioned matrix inversion formula shows that the solution to eq. (5) is given by, for F2=A22−A21A11−1A12−1andG2=B22−B21B11−1B12−1,
(20)1n∑i=1nZiYi−α−β−Aˆ12Aˆ22−11n∑i=1nZiXi−μx=01n∑i=1n1−ZiYi−α−Bˆ12Bˆ22−11n∑i=1n1−ZiXi−μx=0−Fˆ2A21A11−11n∑i=1nZiYi−α−β+F21n∑i=1nZiXi−μx−Gˆ2Bˆ21Bˆ11−11n∑i=1n1−ZiYi−α+Gˆ21n∑i=1n1−ZiXi−μx=0.
Substitute the first two equations into the third and simplify to
(21)Aˆ22−11n∑i=1nZiXi−μx+Bˆ22−11n∑i=1n1−ZiXi−μx=0
Since Aˆ22=VarXi+OP1n=Bˆ22+OP1n,thiscanbeusedtoshowthatμˆx=Xˉ+oP1n. And then
αˆ+βˆ=∑i=1nZiYi−Aˆ12Aˆ22−1∑i=1nZiXi−μx/∑i=1nZi+oP1nαˆ=∑i=1n1−ZiYi−Bˆ12Bˆ22−1∑i=1n1−ZiXi−μx/∑i=1n1−Zi+oP1n
Up to oP1nterms,thesearetheintercepttermsinseparateregressionsofYionXi−Xˉ among the control and treatment groups.
These calculations can be extended to the LATE GMM model in eq. (5), where we now define A11=VarY−α0−β0D|Z=1,A12=CovY−α0−β0D,X|Z=1=A21′,B11=Var(Y−α0−β0D|Z=0),B12=CovY−α0−β0D,X|Z=0=B21′,A22=B22=VarX,andletAˆjk,Bˆjkdenotetheirn consistent estimates. Then eqs. (19) and (21) both continue to hold, leading to μˆx=Xˉ+oP1n. The first two equations in eq. (20) now become
1n∑i=1nZiYi−α−βDi−Aˆ12Aˆ22−11n∑i=1nZiXi−μx=01n∑i=1n1−ZiYi−α−βDi−Bˆ12Bˆ22−11n∑i=1n1−ZiXi−μx=0,
Note that given αandβ,Aˆ12Aˆ22−1andBˆ12Bˆ22−1arepreciselytheprofiledϕˆandηˆ implied by the estimating eq. (13) for β3.Inotherwords,theabovetwoequationsaretheconcentratedestimatingequationsforαandβ implied by eq. (13).
F Proof of Proposition 2
Let W1i=Zi, ZiViTandW0i=1−Zi, 1−ZiViT. Then the normal equations corresponding to eq. (9) are
1n∑i=1nW1iYi−γˆ1−ϑˆ1Vi=0,1n∑i=1nW0iYi−γˆ0−ϑˆ0Vi=0.1n∑i=1nW1iDi−τˆ1−ζˆ1Vi=0,1n∑i=1nW0iDi−τˆ0−ζˆ0Vi=0.
Taking a linear combination using βˆAL=AvgLATEˆ,
1n∑i=1nW1iYi−γˆ1−ϑˆ1Vi−βˆALDi−τˆ1−ζˆ1Vi=01n∑i=1nW0iYi−γˆ0−ϑˆ0Vi−βˆALDi−τˆ0−ζˆ0Vi=0
We rearrange this into
1n∑i=1nW1iYi−γˆ1+βˆALτˆ1+βˆALζˆ1−ϑˆ1Vˉ−βˆALDi−βˆALζˆ1−ϑˆ1Vi−Vˉ=01n∑i=1nW0iYi−γˆ0+βˆALτˆ0+βˆALζˆ0−ϑˆ0Vˉ−βˆALDi−βˆALζˆ0−ϑˆ0Vi−Vˉ=0
By the definition in eq. (10),
νˆ=γˆ1−βˆALτˆ1−βˆALζˆ1−ϑˆ1Vˉ=γˆ0−βˆALτˆ0−βˆALζˆ0−ϑˆ0Vˉ.
The normal equations therefore take the form of
(22)1n∑i=1nW1iYi−νˆ−βˆALDi−βˆALζˆ1−ϑˆ1Vi−Vˉ=01n∑i=1nW0iYi−νˆ−βˆALDi−βˆALζˆ0−ϑˆ0Vi−Vˉ=0
Next, consider the normal equations determining the interactive βˆ∞.ForWi=W1i,W0iT,
1n∑i=1nWiYi−αˆ−βˆ∞Di−ηˆVi−Vˉ−ϕˆZiVi−Vˉ=0.
This can be rewritten as
(23)1n∑i=1nW1iYi−αˆ−βˆ∞Di−ηˆ+ϕˆVi−Vˉ=01n∑i=1nW0iYi−αˆ−βˆ∞Di−ηˆVi−Vˉ=0
Then eq. (23) can be satisfied through eq. (22) by setting
αˆ=νˆ,βˆ∞=βˆAL,ηˆ=βˆALζˆ0−ϑˆ0,ϕˆ=βˆALζˆ1−ϑˆ1−ηˆ.
G Proof of Proposition 3
When D=Z, Hahn (1998) shows that σ∞2=Varψ∞, where
ψ∞=DpY1−EY1|X−1−D1−pY0−EY0|X+EY1−Y0|X−EY1−Y0=D−pY1−EY1|Xp+Y0−EY0|X1−p+Y1−Y0−EY1−Y0
We can then use ψ3 in the proof of Corollary 1 to show that
Covψ3/pz1−pz−ψ∞,ψ∞=0.
More generally when Z≠D, the LATE efficiency bound was calculated in Frolich (2006) and Hong and Nekipelov (2010) (Lemma 1 and Thm 4), with σ∞2=Varψ∞, and
ψ∞=1PD1>D0{ZpzY−EY|Z=1,X+EY|Z=1,X−1−Z1−pzY−EY|Z=0,X−EY|Z=0,X−(ZpzD−ED|Z=1,X+ED|Z=1,X−1−Z1−pzD−ED|Z=0,X−ED|Z=0,X)β},
where PD1>D0=PD=1|Z=1−PD=1|Z=0. We can rewrite this as
(24)PD1>D0ψ∞=Zpzt1−Et1|X−1−Z1−pzt0−Et0|X+Et1−t0|X=Z−pz{t1−Et1|Xpz+t0−Et0|X1−pz}+t1−t0.
Again comparing this to ψ3 in the proof of Corollary 1 shows that
Covψ3/PD1>D0pz1−pz−ψ∞,ψ∞=0.
The comparison between ψ2andψ3 in eq. (17) can also be understood in the context of doubly robust estimators, which use influence functions of the form similar to ψ∞butwithoutrequiringpztobeconstant.DefineQX≡PZ=1|X.InthecaseofD=Z,
ϕ∞=DQXY−EY1|X−1−D1−QXY−EY0|X+EΔY|X−EΔY=D−QXY1−EY1|XQX+Y0−EY0|X1−QX+Y1−Y0−β
The estimators with influence function ϕ∞areconsistentaslongaseitherQXorthepairofEY1|X,EY0|Xarecorrectlyspecified.UndercompleterandomizationandwithQXspecifiedaspz, the P-score model is obviously correctly specified. Therefore EY1|XandEY0|X,beinglinearprojectionson1VX,havenoeffectonconsistency.However,betweentwomisspecifiedconditionalmeanmodels,thefirstpairinψ3isamoreefficientprojectionthatinducesasmallervariancethanthelinearprojectioninψ2.Similarly,inthegeneralLATEcasewhenD≠Z, doubly robust estimators use influence functions of the form
ϕ∞=D−Q˜Xt1−Et1|XQX+t0−Et0|X1−QX+t1−t0−β.
where Et1|X=EY1∗|X−β0ED1|XandEt0|X=EY0∗|X−β0ED0|X.TheseestimatorsareconsistentaslongaseitherQX or the set of
EY1∗|X,EY0∗|X,ED1|X,ED0|X.
are correctly specified. Among different misspecified linear approximations to Et1|XandEt0|X, the least square projection is more efficient.
Similar to eqs. (3) and (4), σ∞2 can be consistently estimated under suitable regularity conditions (such as those in Newey (1997)) by
σˉ∞2=CovˆZ,D−21n∑i=1nϵˉi∞2 wherebarϵi∞=Zi−pˆzϵˆi∞+pˆz1−pˆzϕˆ∞Vi−Vˉ
and ϵˆi∞=Yi−αˆ−βˆ∞Di−ηˆVi−Vˉ−ϕˆZiVi−Vˉ. If we write
Yi−βˆ∞Di=1−Ziαˆ+ηˆVi−Vˉ+Ziαˆ+ηˆ+ϕˆVi−Vˉ+ϵˆi∞,
then we expect that uniformly in Xi,
αˆ+ηˆVi−Vˉ=EY−βD|Z=0,Xi+oP(1)=Et0i|Xi+oP(1)αˆ+ηˆ+ϕˆVi−Vˉ=EY−βD|Z=1,Xi+oP(1)=Et1i|Xi+oP(1).
Therefore ϕˆVi−Vˉ=Et1i−t0i|Xi+oP(1),ϵˆi∞=Zit1i−Et1i|Xi+1−Zit0i−Et0i|Xi,
ϵˉi∞=Zi−pz1−pzt1i−Et1i|Xi+pzt0i−Et0i|Xi+pz1−pzt1i−t0i+oP(1),
which coincides with the semiparametric asymptotic influence function, and includes the CI model as a special case when D=Z.
H Proof of Proposition 4
Recall that nβˆ1−β0=CovnZ,D−1nCovnZ,Y−Dβ0. It can be shown that
CovnZ,D=1n∑i=1nZiDi−1n∑i=1nZi1n∑i=1nDi=pˆz1−pˆz1n∑i=1nZiD1ipˆz−1n∑i=1n1−ZiD0i1−pˆz=pz1−pzPD1>D0+oP(1).
where the last line follows from Assumption 8.2. Furthermore,
CovnY,Z=1n∑i=1nZiYi−1n∑i=1nZi1n∑i=1nYi=pˆz1−pˆz1n∑i=1nZiY1i∗pˆz−1n∑i=1n1−ZiY0i∗1−pˆz
Next we consider
nCovnY,Z−CovnD,Zβ0=pˆz1−pˆz1n∑i=1nZit1ipˆz−1−Zit0i1−pˆz
=pˆz1−pˆz1n∑i=1nZit1ipˆz+Zit0i1−pˆz−t1i−t0i−t0i1−pˆz+pˆz1−pˆz1n∑i=1nt1i−t0i=pˆz1−pˆz1n∑i=1nZi−pˆzt1ipˆz+t0i1−pˆz+pˆz1−pˆz1n∑i=1nt1i−t0i=pz1−pz1n∑i=1nZi−pzt1i−Et1ipz+t0i−Et0i1−pz+pz1−pz1n∑i=1nt1i−t0i+Rn
where
Rn=1n∑i=1nZi−pˆzpz−pˆzt1i−t0i+1−pzEt1i+pzEt0i+1n∑i=1npz−pˆz1−pzt1i+pzt0i−1−pzEt1i+pzEt0i+pˆz1−pˆz−pz1−pz1n∑i=1nt1i−t0i=pz−pˆz1n∑i=1nZi−pzt1i−t0i+pz−pˆz21n∑i=1nt1i−t0i+pz−pˆz1n∑i=1n1−pzt1i−Et1i+pzt0i−Et0i+pˆz1−pˆz−pz1−pz1n∑i=1nt1i−t0i
Using Assumption 8, each term can be shown to be oP(1),sothatRn=oP(1).FromthispointonthevariancebecomesdifferentdependingonwhetherS=1orS>1.Recallthatωi=t1i−Et1ipz+t0i−Et0i1−pzandωs=Eωi|Xi∈s.Firstnotethat1n∑i=1nt1i−t0iisasymptoticallyorthogonalto1n∑i=1nZi−pzωi under assumptions 8.1 and 8.2.
Cov∑s∈S1n∑i=1n1Xi∈sZi−pzωi,∑s∈S1n∑i=1n1Xi∈st1i−t0i
=∑s∈S1n∑i=1nE1Xi∈sZi−pzωit1i−t0i=∑s∈S1n∑i=1nE1Xi∈sEZiXi∈s,Y1i,Y0i,D1i,D0i−pzωit1i−t0i=∑s∈S1n∑i=1nE1Xi∈sEZiXi∈s−pzωit1i−t0i=Oa.s.1n
Then write the first part of the influence function as
(25)1n∑i=1nZi−pzωi=1n∑i=1nZi−pzωi−ωs+1n∑i=1nZi−pzωs=∑s∈S1n∑i=1n1Xi∈sZi−pzωi−ωs+∑s∈S1n∑i=1n1Xi∈sZi−pzωs.
First note that the two sums are orthogonal:
Cov∑s∈S1n∑i=1n1Xi∈sZi−pzωi−ω(s),∑s∈S1n∑i=1n1Xi∈sZi−pzω(s)=∑s∈S1n∑i=1nCov1Xi∈sZi−pzωi−ω(s),1Xi∈sZi−pzω(s)=∑s∈S1n∑i=1nE1Xi∈sZi−pz2ωi−ω(s)ω(s)=∑s∈S1n∑i=1nE1Xi∈sZi−pz2ωi−Eωi|Xi∈s,ZiEωi|Xi∈s,Zi=∑s∈S1n∑i=1nE1Xi∈sZi−pz2ωi−Eωi|Xi∈sEωi|Xi∈s=0
We now use arguments similar to those in Lemma B.2 of BCS 2017a to derive the limiting distribution of eq. (25). The distribution of U=1n∑i=1nZi−pzωi−ωsisthesameasthedistributionofthesamequantitywheretheobservationsarefirstorderedbystrataandthenbyZi=1andZi=0withinstrata.Letnz(s)bethenumberofobservationsinstrataswhichhaveZi=z∈{0,1},andletps=PXi∈s,N(s)=∑i=1nISi<s,andF(s)=PSi<s.IndependentlyforeachsandindependentlyofZ(n),S(n),letωis:1≤i≤nbei.i.d.withmarginaldistributionequaltothedistributionofωi|Xi∈s. Define
U˜=1n∑s∈S∑i=nN(s)n+1nN(s)n+n1(s)nωis−ωs1−pz+∑i=nN(s)n+n1(s)n+1nN(s)n+n(s)nωis−ωs−pz
By construction, U|S(n),Z(n)=dU˜|S(n),Z(n)whichimpliesU=dU˜. Next define
U∗=1n∑s∈S∑i=nF(s)+1nF(s)+p(s)pzωis−ωs1−pz+∑i=nF(s)+p(s)pz+1nF(s)+p(s)ωis−ωs−pz
Using properties of Brownian motion,
1n∑i=nF(s)+1nF(s)+p(s)pzωis−ωs1−pz→dN0,p(s)pz1−pz2Eωis−ωs2
1n∑i=nF(s)+p(s)pz+1nF(s)+p(s)ωis−ωs−pz→dN0,p(s)1−pzpz2Eωis−ωs2
Since the two sums are independent, ωis−ωsareindependentacrossiands,andEωis−ωs2=Eωi−ωs2Xi∈s,
U∗→dN0,pz1−pz∑s∈Sp(s)Eωi−ωs2Xi∈s
Furthermore, since N(s)n,n1(s)n→pF(s),pzp(s), by the continuous mapping theorem,
U˜−U∗→p0
Therefore,
1n∑i=1nZi−pzωi−ωs→dN0,pz1−pz∑s∈Sp(s)Eωi−ωs2Xi∈s⏟Ω1
For the second term, it suffices to use Assumption 8.2 to show that
1n∑i=1nZi−pzωs=∑s∈S1n∑i=1n1Xi∈sZi−pzωs→dN0,∑s∈Sτ(s)p(s)ωs2⏟Ω2
Lastly, note that 1n∑i=1nt1i−t0i→dN0,Vart1i−t0i⏟Ω3.Thennβˆ1−β0⟶dN0,PD1>D0−2Ω1+Ω2+Ω3.
As in Section 2, it is straightforward to show that the 2SLS robust variance is consistent for PD1>D0−2 times
plim1n∑i=1nZi−pzωi+t1i−t0i2=plim1n∑i=1nZi−pzωi2+plim1n∑i=1nt1i−t0i2=plim1n∑i=1nZi−pz2ωi−ωs2+plim1n∑i=1nZi−pz2ω(s)2+plim1n∑i=1nt1i−t0i2
Independently for each sandindependentlyofZ(n),S(n),letωis:1≤i≤nbei.i.dwithmarginaldistributionequaltothedistributionofωi|Xi∈s. Using similar arguments as those in Lemma B.3 of BCS 2017a,
1n∑i=1nZi−pz2ωi−ωs2=∑s∈S1n∑i=1n1(s)1−pz2ωis−ωs2+1n∑i=1n0(s)−pz2ωis−ωs2
=∑s∈Sn1(s)n1n1(s)∑i=1n1(s)1−pz2ωis−ωs2+n0(s)n1n0(s)∑i=1n0(s)−pz2ωis−ωs2→p∑s∈Spzp(s)1−pz2Eωis−ωs2+1−pzp(s)−pz2Eωis−ωs2=pz1−pz∑s∈Sp(s)Eωi−ωs2Xi∈s
The key steps are to use the Almost Sure Representation theorem to construct n˜1(s)n=dn1(s)nsuchthatn˜1(s)n→a.s.pzp(s)andthentonotethatbyindependenceofZ(n),S(n)andωis:1≤i≤n,foranyϵ>0,
P1n1(s)∑i=1n1(s)ωis−ωs2−Eωis−ωs2>ϵ=EP1nn˜1(s)n∑i=1nn˜1(s)nωis−ωs2−Eωis−ωs2>ϵn˜1(s)n
Also, note that by the weak law of large numbers, for any sequence nk→∞ask→∞,
1nk∑i=1nkωis−ωs2→pEωis−ωs2
Since nn˜1(s)n→∞almostsurely,byindependenceofn˜1(s)nandωis:1≤i≤n,
P1nn˜1(s)n∑i=1nn˜1(s)nωis−ωs2−Eωis−ωs2>ϵn˜1(s)n→a.s.0
Therefore, the first and third terms coincide with Ω1andΩ3. The second term converges to
plim1n∑i=1nZi−pz2ω(s)2=∑s=1Sωs2pspz1−pz
This is larger than Ω2aslongasτs≤pz1−pzforalls∈S,andstrictlysoforsomes∈S.
I Proof of Proposition 5
The sample normal equations for this regression are given by
τnβˆ2,ηˆ=1n∑i=1n[1Xi∈ss∈SZi−pz]Yi−βˆ2Di−∑s=1Sηˆs1Xi∈s=0.
We can write βˆ2−β0,ηˆ−η0=Aˆ−1τnβ0,η0ifweletη0=η0s,s∈S,t1s=Et1i|Xi∈s,t0s=Et0i|Xi∈s,
η0s=EY|s−ED|sβ0=pzt1s+1−pzt0s=1−pzt1s+pzt0s−1−2pzt1s−t0s.
and
Aˆ=1n∑i=1n[1Xi∈ss∈SDidiag1Xi∈ss∈SZi−pzDiZi−pz1Xi∈ss∈S′.]
Using Assumption 8.1 and 8.2 we can show that Aˆ=A+oP(1), where
A=[psED|sdiagps,s∈Spz1−pzPD1>D00]
In the following we will show that τnβ0,η0=Op1n, which by non-singularity of Aimpliesthatβˆ2−β0,ηˆ−η0=OP1n. Then the second row of the relation
A+oP(1)βˆ2−β0,ηˆ−η0′=τnβ0,η0
implies that, using the above η0s,
PD1>D0nβˆ2−β0=1n∑i=1nZi−pzpz1−pzYi−β0Di−∑s=1Sη0s1Xi∈s+oP(1)
=1n∑i=1nZi−pzt1i−Et1ipz+t0i−Et0i1−pz−∑s∈SEt1i−Et1i|Xi∈spz+Et0i−Et0i|Xi∈s1−pz−1−2pzpz1−pzt1s−t0s1Xi∈s+t1i−t0i+oP(1)=∑s∈S1n∑i=1n1Xi∈sZi−pzωi−ωs+∑s∈S1n∑i=1n1Xi∈sZi−pz1−2pzpz1−pzt1s−t0s+1n∑i=1nt1i−t0i+oP(1).
where we recall that ωi=t1i−Et1ipz+t0i−Et0i1−pzandωs=Eωi|Xi∈s. Using similar arguments to those in proposition 4,
∑s∈S1n∑i=1n1Xi∈sZi−pzωi−ωs→dN0,pz1−pz∑s∈Sp(s)Eωi−ωs2Xi∈s⏟Ω1∑s∈S1n∑i=1n1Xi∈sZi−pz1−2pzpz1−pzt1s−t0s→dN0,∑s∈Sp(s)τ(s)1−2pzpz1−pzt1s−t0s2⏟Ωˉ21n∑i=1nt1i−t0i→dN0,Vart1i−t0i⏟Ω3
Note that the first two sums in the influence function are orthogonal:
Cov∑s∈S1n∑i=1n1Xi∈sZi−pzωi−ωs,∑s∈S1n∑i=1n1Xi∈s
Zi−pz1−2pzpz1−pzt1s−t0s=∑s∈S1n∑i=1nE1Xi∈sZi−pz2ωi−ωs1−2pzpz1−pzt1s−t0s=∑s∈S1n∑i=1nE1Xi∈sZi−pz21−2pzpz1−pzEωi−ωst1s−t0sXi∈s,Zi=∑s∈S1n∑i=1nE1Xi∈sZi−pz2ωi−EωiXi∈s1−2pzpz1−pzt1s−t0s=0
And the third sum is orthogonal to the first two sums by the same arguments in proposition 4. Therefore, PD1>D0nβˆ2−β0⟶dN0,Ω1+Ωˉ2+Ω3. It is also easy to show using similar arguments to those in proposition 4 that the 2SLS nominal variance consistently estimates PD1>D0−2 times
plim1n∑i=1nZi−pzpz1−pzYi−β0Di−∑s=1Sη0s1Xi∈s2=plim∑s∈S1n∑i=1n1Xi∈sZi−pzωi−ωs2+plim∑s∈S1n∑i=1n1Xi∈sZi−pz1−2pzpz1−pz2t1s−t0s2+plim1n∑i=1nt1i−t0i2=Ω1+Ω˜2+Ω3
where
Ω˜2=∑s∈Spspz1−pz1−2pzpz1−pzt1s−t0s2
which is larger than Ωˉ2ifpz1−pz>τsforsomes,unlessS=1orpz=12.
J Proof of Proposition 6
We choose to work with the representation in eq. (11), using which we write
(26)nβˆ3−β0=n∑s=1Sξˆ1s−ξˆ0s−β0ζˆ1s−ζˆ0s∑i=1n1Xi∈sn∑s=1Sζˆ1s−ζˆ0s∑i=1n1Xi∈sn
For the denominator, under Assumption 8, Lemma B.3 of BCS 2017a implies that
ζˆ1s=1n∑i=1n1Xi∈sZiDi1n∑i=1n1Xi∈sZi⟶pPD1=1|s,ζˆ0s=1n∑i=1n1Xi∈s1−ZiDi1n∑i=1n1Xi∈s1−Zi⟶pPD0=1|s.
Together with 1n∑i=1n1xi∈s⟶pps≡pxi∈s,
∑s=1Sζˆ1s−ζˆ0s∑i=1n1Xi∈sn⟶pPD1=1−PD0=1=PD1>D0.
Using pˆs=1n∑i=1n1Xi∈s,pˆspˆzs=1n∑=1n1Xi∈sZi,t1s=Et1i|Xi∈s,andt0s=Et0i|Xi∈s
∑s=1Sξˆ1s−ξˆ0s−β0ζˆ1s−ζˆ0s∑i=1n1Xi∈sn=∑s=1Spˆs1n∑i=1nt1i1Xi∈sZipˆspˆz−1n∑i=1nt0i1Xi∈s1−Zipˆs1−pˆz=∑s=1S1n∑i=1n1Xi∈st1i−t1sZipˆz−t0i−t0s1−Zi1−pˆz+∑s=1S1n∑i=1n1Xi∈st1s−t0s=∑s=1S1n∑i=1n1Xi∈st1i−t1sZipz−t0i−t0s1−Zi1−pz+∑s∈SR1ns+R2ns+∑s=1S1n∑i=1n1Xi∈st1s−t0s
(27)=∑s=1S1n∑i=1n1Xi∈sZi−pzt1i−t1spz+t0i−t0s1−pz+∑s∈SR1ns+R2ns+1n∑i=1nt1i−t0i
In the above
R1ns=pz−pˆzpˆz1−pˆz1n∑i=1n1Xi∈st1i−t1sZi+t0i−t0s1−ZiR2ns=1pˆz1−pˆz−1pz1−pz×1n∑i=1n1Xi∈s1−pzt1i−t1sZi−pzt0i−t0s1−Zi.
Rewriting,
R1ns=pz−pˆzpˆz1−pˆz1n∑i=1n1Xi∈sZi−pzt1i−t1s−t0i−t0s+pz1n∑i=1n1Xi∈st1i−t1s+1−pz1n∑i=1n1Xi∈st0i−t0s
R2ns=1pˆz1−pˆz−1pz1−pz1n∑i=1n1Xi∈sZi−pzt1i−t1s+t0i−t0s+pz1n∑i=1n1−Zi1Xi∈st1i−t1s−1−pz1n∑i=1nZi1Xi∈st0i−t0s=1pˆz1−pˆz−1pz1−pz1n∑i=1n1Xi∈sZi−pzt1i−t1s+t0i−t0s+pz1n∑i=1n1−Zi−1−pz1Xi∈st1i−t1s−1−pz1n∑i=1nZi−pz1Xi∈st0i−t0s+pz1−pz1n∑i=1n1Xi∈st1i−t1s−1−pzpz1n∑i=1n1Xi∈st0i−t0s
Using Assumption 8, Lemmas B.2 and B.3 of BCS 2017a, and arguments similar to those in propositions 4 and 5, we can show that ∑s∈SR1ns=oP(1)OP1n=oP1nand∑s∈SR2ns=oP(1)OP1n=oP1n.Sincet1i−t1spz+t0i−t0s1−pz=t1i−Et1ipz+t0i−Et0i1−pz−t1s−Et1ipz+t0s−Et0i1−pz=ωi−ωs eq. (27) can be written as
(28)∑s=1S1n∑i=1n1Xi∈sZi−pzωi−ωs+1n∑i=1nt1i−t0i+oP(1)
The first part of this influence function corresponds exactly to the first term in eq. (25). Therefore regardless of pzthereisnoneedtoworryaboutthevariationinducedbythesamplingschemeforZi within the cluster.
In the special case of unconfoundedness, eq. (28) becomes
(29)∑s=1S1n∑i=1n1Xi∈sZi−pzY1ipz+Y0i1−pz−EY1ipz+Y0i1−pzXi∈s+1n∑i=1nY1i−Y0i−μ1−μ0+oP(1),
Using Assumption 8, σˆ32isconsistentfortheplimofPD1>D0−2times1n∑i=1nψi2 where
ψi=Zi−pzt1i−tˉ1pz+t0i−tˉ01−pz−Σn,X,t1pz+t01−pzΣn,X−1Xi−Xˉ+pz1−pzt1i−t0i−tˉ1+tˉ0−Σn,X,t1−t0Σn,X−1Xi−Xˉ=Zi−pzωi−ωˉ−Σn,X,ωΣn,X−1Xi−Xˉ+pz1−pzt1i−t0i−tˉ1+tˉ0−Σn,X,t1−t0Σn,X−1Xi−Xˉ
for Σn,X=1n∑i=1nXi−XˉXi−Xˉ′andΣn,X,t=1n∑i=1nXi−Xˉti−tˉ.WithXibeingtheclusterdummies,ωi−ωˉ−Σn,X,ωΣn,X−1Xi−Xˉistheresidualfromasaturatedregressionofωiontheclusterdummies,andconvergesto∑s∈S1xi∈sωi−ωs.Forthesamereason,t1i−t0i−tˉ1+tˉ0−Σn,X,t1−t0Σn,X−1Xi−Xˉistheresidualfromasaturatedregressionoft1i−t0i on the cluster dummies, and converges to
∑s∈S1xi∈st1i−t0i−Et1i−t0i|s.
Therefore 1n∑i=1nψi2 is in turn consistent for the variance of
∑s=1S1n∑i=1n1Xi∈sZi−pzωi−ωs+1n∑i=1n∑s∈S1xi∈st1i−t0i−Et1i−t0i|s,
which is asymptotically smaller than the variance of eq. (28) but larger than the variance of its first component. Next we will need to add a consistent estimate of
1n∑i=1n∑s∈S1xi∈sEt1i−t0i|s2.
This is obtained by ϕˆ′1n∑i=1nVi−VˉVi−Vˉ′ϕˆ,whichisthevarianceofthefittedvalueofthesaturatedclusterdummyregression.Wecanthenuseσˉ32 in Corollary 3 to obtain a consistent estimate of the variance of eq. (28).
We can also directly estimate the variance of βˆ3 by estimating the first representation of the influence function in eq. (27). Let tˆ1iZi=Yi−Diβˆ3Zi,tˆ0i1−Zi=Yi−Diβˆ31−Zi,
tˆ1s=∑i=1ntˆ1iZi1Xi∈s/∑i=1nZi1Xi∈stˆ0s=∑i=1ntˆ0i1−Zi1Xi∈s/∑i=1n1−Zi1Xi∈s
and construct
Ωˆ=1n∑i=1n∑s∈S1Xi∈stˆ1iZi−tˆ1sZipˆz−tˆ0i1−Zi−tˆ0s1−Zi1−pˆz2+1n∑i=1n∑s=1S1Xi∈stˆ1s−tˆ0s2
Lemma B.3 of BCS 2017a and the continuous mapping theorem imply that tˆ1s⟶pt1sandtˆ0s⟶pt0s.Slutsky<sup>′</sup>stheoremthenimpliesthatΩˆ consistently estimates the variance of eq. (27).
K Proof of Proposition 7
This estimator can be implemented using OLS and 2SLS by fully interacting Zi,theclusterdummies,andtheadditionalregressorsXi.TosimplifynotationwedenoteWi=1 Xi and the regression functions in eq. (12) as γˆ0s′Wi,γˆ1s′Wi,τˆ0s′Wiandτˆ1s′Wi. Consider first the OLS case under Assumption 5.
βˆS=∑s∈SpˆsWˉsγˆ1s−γˆ0s
where Wˉs=1n∑i=1n1Xi∈sWi/pˆs,γˆ0s⟶pγ0s=EWW′|s−1EWY0|s,andγˆ1s⟶pγ1s=EWW′|s−1EWY1|s, for
(30)γˆ1s=H1n−11n∑i=1n1Xi∈sZiWiYiandH1n=1n∑i=1n1Xi∈sZiWiWi′.
(31)γˆ0s=H0n−11n∑i=1n1Xi∈s1−ZiWiYiandH0n=1n∑i=1n1Xi∈s1−ZiWiWi′.
In the normal equations EWYj−W′γjs|s=0forj=0,1,andWincludestheconstantterm.ThereforeEYj−W′γjs|s=0forj=0,1,sothatβˆS⟶pβ0=Δ=EY1−Y0.Inthefollowing,wewillnotrequirepzs≡pz. Note that
βˆS−β0=∑s∈SpˆsWˉsγˆ1s−γ1s−γˆ0s+γ0s⏟(1)+∑s∈SpˆsWˉsγ1s−γ0s−Δ⏟(2),
where we can write (1) as
∑s∈SpˆsWˉsH1n−11n∑i=1n1Xi∈sWiZiY1i−Wi′γ1s−H0n−11n∑i=1n1Xi∈sWi1−ZiY0i−Wi′γ0s.
Using pˆs→pps,Wˉs→pEW|s,EW′|sEWW′|s−1=1,0,...,
H1n→ppspzsEWW′|s,H0n→pps1−pzsEWW′|s,
1n∑i=1n1Xi∈sWiZiY1i−Wi′γ1s=OP1n,1n∑i=1n1Xi∈sWi1−ZiY0i−Wi′γ0s=OP1n,
we can write eq. (1) as
∑s∈SEW|sEWW′|s−11n∑i=1n1Xi∈sWiZiY1i−Wi′γ1spzs−1−ZiY0i−Wi′γ0s1−pzs+oP1n=∑s∈S1n∑i=1n1Xi∈s[Zi−pzsY1i−Wi′γ1spzs+Y0i−Wi′γ0s1−pzs+Y1i−Y0i−Wi′γ1s−γ0s]+oP1n
Therefore,
(1)+2=∑s∈S1n∑i=1n1Xi∈sZi−pzsY1i−Wi′γ1spzs+Y0i−Wi′γ0s1−pzs+1n∑i=1nY1i−Y0i−Δ+oP1n
This obviously is more efficient than eq. (29) since Wi′γjs,j=0,1isthelinearprojectionofYij−EYij|swithinclusters, and results in a smaller variance.
Next we generalize the above to LATE. Consider
βˆS=∑s∈SpˆsWˉsγˆ1s−γˆ0s∑s∈SpˆsWˉsτˆ1s−τˆ0s
so that for β0=EY1−Y0|D1>D0,
βˆS−β0=∑s∈SpˆsWˉsγˆ1s−γˆ0s−τˆ1s−τˆ0s′β0∑s∈SpˆsWˉsτˆ1s−τˆ0s
Since the denominator is ED1−D0+oP(1)=PD1>D0+oP(1), we focus on the numerator, and write
PD1>D0+oP(1)βˆS−β0=∑s∈SpˆsWˉsγˆ1s−γˆ0s−τˆ1s−τˆ0s′β0.
γ1sandγ0s are defined by
γˆ1s=H1n−11n∑i=1n1Xi∈sZiWiYi⟶pγ1s=EWW′|s−1EWY1∗|s
γˆ0s=H0n−11n∑i=1n1Xi∈s1−ZiWiYi⟶pγ0s=EWW′|s−1EWY0∗|s,
and τ1sandτ0s are analogously defined by
τˆ1s=H1n−11n∑i=1n1Xi∈sZiWiDi→pτ1s=EWW′|s−1EWD1|s
τˆ0s=H0n−11n∑i=1n1Xi∈s1−ZiWiDi→pτ0s=EWW′|s−1EWD0|s.
Define ηˆjs=γˆjs−τˆjs′β0forj=0,1,sothatηˆjs⟶pηjs=EWW′|s−1EWtj|s, where
ηˆ1s=1n∑i=1n1Xi∈sZiWiWi′−11n∑i=1n1Xi∈sZiWiY1i∗−D1i′β0⏟t1iηˆ0s=1n∑i=1n1Xi∈s1−ZiWiWi′−11n∑i=1n1Xi∈s1−ZiWiY0i∗−D0i′β0⏟t0i
Then we proceed similar as the ATE case to write the numerator as
∑s∈SpˆsWˉsηˆ1s−η1s−ηˆ0s+η0s⏟(1)+∑s∈SpˆsWˉsη1s−η0s⏟(2),
where by noting that
1n∑i=1n1Xi∈sWiZit1i−Wi′η1s=OP1n,1n∑i=1n1Xi∈sWi1−Zit0i−Wi′η0s=OP1n,
we can write (1) as
∑s∈SEW|sEWW′|s−11n∑i=1n1Xi∈sWiZit1i−Wi′η1spzs−1−Zit0i−Wi′η0s1−pzs+oP1n=∑s∈S1n∑i=1n1Xi∈s[Zi−pzst1i−Wi′η1spzs+t0i−Wi′η0s1−pzs+t1i−t0i−Wi′η1s−η0s]+oP1n
Therefore,
(32)(1)+2=∑s∈S1n∑i=1n1Xi∈sZi−pzst1i−Wi′η1spzs+t0i−Wi′η0s1−pzs+1n∑i=1nt1i−t0i+oP1n
Again this ought to be more efficient than eq. (27) since Wi′ηjsisthewithinclusterlinearprojectionoftji−tjs.Themorevariablestheprojectionison,thesmallerthevariance.AsdimW→∞atanappropriaterate,Wi′ηjs→Etji|Wiforj=0,1, so that the above equation becomes the efficient influence function in eq. (24) conditional on both the cluster indicators and the extra regressors.