Skip to content
Published by De Gruyter Oldenbourg July 3, 2018

# OLS and 2SLS in Randomized and Conditionally Randomized Experiments

Jason Ansel, Han Hong and and Jessie Li

# Abstract

We investigate estimation and inference of the (local) average treatment effect parameter when a binary instrumental variable is generated by a randomized or conditionally randomized experiment. Under i.i.d. sampling, we show that adding covariates and their interactions with the instrument will weakly improve estimation precision of the (local) average treatment effect, but the robust OLS (2SLS) standard errors will no longer be valid. We provide an analytic correction that is easy to implement and demonstrate through Monte Carlo simulations and an empirical application the interacted estimator’s efficiency gains over the unadjusted estimator and the uninteracted covariate adjusted estimator. We also generalize our results to covariate adaptive randomization where the treatment assignment is not i.i.d., thus extending the recent contributions of Bugni, F., I.A. Canay, A.M. Shaikh (2017a), Inference Under Covariate-Adaptive Randomization. Working Paper and Bugni, F., I.A. Canay, A.M. Shaikh (2017b), Inference Under Covariate-Adaptive Randomization with Multiple Treatments. Working Paper to allow for the case of non-compliance.

JEL Classification: C1; C8; C9

# Acknowledgements:

We thank Joe Romano for very helpful discussions, and particular the editors and referee for insightful comments and constructive suggestions. This research was supported by a Faculty Research Grant awarded by the Committee on Research from the University of California, Santa Cruz, the National Science Foundation (SES 1658950), and SIEPR. Correspondence can be sent to jeqli@ucsc.edu.

### References

Bugni, F., I.A. Canay, A.M. Shaikh (2017a), Inference Under Covariate-Adaptive Randomization. Tech. rep.10.1920/wp.cem.2016.2116Search in Google Scholar

Bugni, F., I.A. Canay, A.M. Shaikh (2017b), Inference Under Covariate-Adaptive Randomization with Multiple Treatments. Tech. rep.10.1920/wp.cem.2016.2116Search in Google Scholar

Chen, X., H. Hong, A. Tarozzi (2008), Semiparametric Efficiency in GMM Models with Auxiliary Data. Annals of Statistics 36: 808–843.10.1214/009053607000000947Search in Google Scholar

Freedman, D.A. (1981), Bootstrapping Regression Models. The Annals of Statistics 9: 1218–1228.10.1214/aos/1176345638Search in Google Scholar

Freedman, D.A. (2008), On Regression Adjustments to Experimental Data. Advances in Applied Mathematics 40: 180–193.10.1016/j.aam.2006.12.003Search in Google Scholar

Frolich, M. (2006), Nonparametric IV Estimation of Local Average Treatment Effects with Covariates. Journal of Econometrics 139: 35–75.10.1016/j.jeconom.2006.06.004Search in Google Scholar

Hahn, J. (1998), On the Role of Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica 66: 315–332.10.2307/2998560Search in Google Scholar

Hong, H., D. Nekipelov (2010), Semiparametric Efficiency in Nonlinear LATE Models. Quantitative Economics 1: 279–304.10.3982/QE43Search in Google Scholar

Imbens, G., J. Angrist (1994), Identification and Estimation of Local Average Treatment Effects. Econometrica 62: 467–475.10.2307/2951620Search in Google Scholar

Imbens, G.W., D.B. Rubin (2015), Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.10.1017/CBO9781139025751Search in Google Scholar

Imbens, G. W., J. M. Wooldridge (2009), Recent Developments in the Econometrics of Program Evaluation. Journal of Economic Literature 47: 5–86.10.1257/jel.47.1.5Search in Google Scholar

Lin, W. (2013), Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman Critique. The Annals of Applied Statistics 7: 295–318.10.1214/12-AOAS583Search in Google Scholar

Newey, W. (1994), The Asymptotic Variance of Semiparametric Estimators. Econometrica 62: 1349–82.10.2307/2951752Search in Google Scholar

Newey, W. (1997), Convergence Rates and Asymptotic Normality for Series Estimators. Journal of Econometrics 79: 147–168.10.1016/S0304-4076(97)00011-0Search in Google Scholar

Rosenbaum, P., D. Rubin (1983), The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70:41.10.1093/biomet/70.1.41Search in Google Scholar

Shao, J., X. Yu, B. Zhong (2010), A Theory for Testing Hypotheses Under Covariateadaptive Randomization. Biometrika 97: 347–360.10.1093/biomet/asq014Search in Google Scholar

## Article note:

This article is part of the special issue “Big Data” published in the Journal of Economics and Statistics. Access to further articles of this special issue can be obtained at www.degruyter.com/journals/jbnst.

This paper was inspired by a discussion with Patrick Kline, who also provided key references.

## Code and Datasets

http://dx.doi.org/10.15456/jbnst.2018193.020733

## A Proof of Theorem 1

For k=1,2,3,letWikdenotetheinstruments,letVikandUikdenotetheregressors,andletθkdenotetheparametersoftheinstrumentalvariablemomentconditionEWikYiVikθ0k=0.Theestimatorsθˆk are defined by the sample estimating equations:

(13)1ni=1nWikYiVikθˆk+1ni=1nWikUikϕˆXˉμx=0.

For βˆ1,letUi1=0,pz=PZi=1,pd=PDi=1,Wi1=1Zipz,Vi1=1Dipd,andθ1=α,β.Forβˆ2,letUi2=0,Wi2=1ZipzXiμxVi2=1DipdXiμx,θ2=α,β,η.Forβˆ3,letUi3=Zipz,θ3=α,β,η,ϕ,Wi3=1ZipzXiμxZipzXiμx,andVi3=1DipdXiμxZipzXiμx. eq. (13) leads to the following influence function representation of :

nθˆkθ0k=EWikVik11ni=1n(WikYiVikθ0+EWikUikϕ0Xiμx)+oP(1).

It can be calculated that the second row of EWikVik1,k=1,2,3 takes the forms of

0CovD,Z10CovD,Z100CovD,Z100.

Therefore for k=1,2,3,

nβˆkβ0=CovZ,D11ni=1nψkYi,Zi,Xi,Wik,Vik,Uik+oP(1),
ψkYi,Zi,Xi,Wik,Uik,Vik=ZipzYiVikθ0ψik1+EZipzUikϕ0Xiμxψik2ψik

Consequently, nβˆkβ0dN0,Cov(D,Z)2Varψik.Itremainstoshowthatforj=1,2,Varψi3Varψij.ThiscanbedonebyshowingthatCovψijψi3,ψi3=0.Forthispurpose,considerfirstj=1. Note that

ψi1ψi3=Zipzη0Xiμx+ϕ0XiμxZipzpzpz2Xiμxϕ
=Zipzη0XiμxΔψi131+Zipz2ϕ0XiμxΔψi132pzpz2XiμxϕΔψi133.

It follows from Zi2=ZiandEWi3YiVi3θ0=0 that

Covψi31,Δψi13k=0,k=1,2,3

By independence of ZifromXi,Covψi32,Δψi131=0.Finally,wecheckthatCovψi32,Δψi132=pzpz22ϕ0VarXϕ0,andCovψi32,Δψi133=pzpz22ϕ0VarXϕ0, so that

Covψi32,Δψi132Δψi133=0.

We have verified that CovΔψi13,ψi3=0,andβˆ3ismoreefficientthanβˆ1asymptotically.Nextturntoβˆ2andψi2=ZipzYiα0β0Dipdη0Xiμx.WewanttoshowVarψi3Varψi2byverifyingthatCovΔψi23,ψi3=0, where

Δψi23=Zipz2ϕ0Xiμxpzpz2ϕ0Xiμx
=12pzZi+pz2ϕ0XiμxΔψi231pzpz2ϕ0XiμxΔψi232.

By the moment conditions EWi3YiVi3θ0=0,

Covψi31,Δψi23k=0,,k=1,2.

By independence between ZandX

Covψi32,Δψi231=pzpz22ϕ0VarXϕ0.

Therefore since also Covψi32,Δψi232=pzpz22ϕ0VarXϕ0, it follows that

Covψi32,Δψi231Δψi232=0.

So CovΔψi23,ψi3=0,andVarψi3Varψi1.βˆ3isalsomoreefficientthanβˆ2.However,thereisnoefficiencyrankingbetweenβˆ1andβˆ2. Note that

Δψi12ψi1ψi2=Zipzη0Xiμx.

There is no guarantee of either CovΔψi12,ψi2=0orCovΔψi12W,ψi1W=0.Thisisbecausethemomentconditionsforβˆ2 do not impose that

EZXYα0β0Dη0X=0,

and the moment conditions for βˆ1 do not impose

EZXYα0β0D=0orEXYα0β0D=0.

## B Proof of Corollary 1

Under the causal model, the parameter β0andtheinfluencefunctionsforβˆcanbewrittenusingthecounterfactuals.Recallthatβ0=EY1Y0|D1>D0=EY1Y0/ED1D0.Definet1=Y1β0D1,t0=Y0β0D0. Then

α0β0pd=EYβ0ED=pzEt1+1pzEt0ψ1=ZpzYα0β0Dpd=Zpz1pzt1Et1+pzt0Et0+pz1pzt1t0,

where by definition Et1Et0=0.Nextconsiderβˆ2.Itfollowsfromthe3rdmomentequationEXμxYα0β0Dpdη0XμX=0 that

η0=VarX1CovX,Yβ0D=VarX1pzCovX,t1+1pzCovX,t0,

and that

ψ2=ZpzYα0β0Dpdη0Xμx=Zpz1pzt1Et1+pzt0Et0η0Xμx+pz1pzt1t0,

Next consider βˆ3. It follows from the 4th moment condition

EZpzXμxYα0β0Dpdη0XμXϕ0ZpzXμx=0

that ϕ0=VarX1CovX,t1t0. Therefore,

ψ31=ZpzYα0β0Dpdη0Xμxϕ0ZpzXμx=Zpz(1pzt1Et1Covt1,XVarX1Xμx+pzt0Et0Covt0,XVarX1Xμx)+pz1pzt1t0Covt1t0,XVarX1Xμx

and ψ32=pz1pzCovt1t0,XVarX1Xμx. Therefore

ψ3=ZpzYα0β0Dpdη0Xμxϕ0ZpzXμx=Zpz(1pzt1Et1Covt1,XVarX1Xμx+pzt0Et0Covt0,XVarX1Xμx)+pz1pzt1t0

Using Zt1,t0,X, it can then be verified that

Covψ1ψ3,ψ3=0andCovψ2ψ3,ψ3=0.

In the special case when D=Z,t1=Y1β0,t0=Y0,β0=EY1Y0, then

(17)ψ3=Zpz(1pzY1EY1CovY1,XVarX1Xμx+pzY0EY0CovY0,XVarX1Xμx)+pz1pzY1Y0β0ψ2=Zpz(1pzY1EY1+pzY0EY0η0Xμx)+pz1pzY1Y0β0.

for η0=VarX1pzCovX,Y1+1pzCovX,Y0.

## C Proof of Corollary 2

Replace μxbyμxs=EXs.Thenitcanbeshownthatβˆ3ismoreefficientthanβˆ4.Similarcalculationsasthoseforβˆ3 show that

nβˆ4β0=CovD,Z11ni=1nψi4+oP(1),whereψi4=ZipzYiρ0β0Dipdη0sXsiμxsϕ0sXsiμxsZipz+pzpz2ϕ0sXsiμxs

Then we can write, for ηˉ0,ϕˉ0possiblydifferentfrombothη0,ϕ0andη0s,ϕ0s,

Δψi43=ψi4ψi3=Zipzη0Xiμxη0sXsiμxs+ϕ0Xiμxϕ0sXsiμxsZipz+pzpz2Xsiμxsϕ0spzpz2Xiμxϕ0=Zipzηˉ0Xiμx+ϕˉ0XiμxZipzpzpz2Xiμxϕˉ0=Zipzηˉ0XiμxΔψi431+Zipz2ϕˉ0XiμxΔψi432pzpz2Xiμxϕˉ0Δψi433.

It follows from Zi2=Zi, and the instrumental variable moment equations that

Covψi31,Δψi43k=0,,k=1,2,3

By independence of ZiandXi,Covψi32,Δψi431=0. Finally, we check that

Covψi32,Δψi432=pzpz22ϕ0VarXϕˉ0,

and Covψi32,Δψi433=pzpz22ψ0VarXψˉ0. so that

Covψi32,Δψi432Δψi433=0.

We have verified that CovΔψi43,ψi3=0,andβˆ3ismoreefficientthanβˆ4 asymptotically. The same result can also be verified using the counter-factual model as in Corollary 1.

## D Proof of Corollary 3

Note Aˆk1BˆkAˆk1pVarΛkEWikVik1WikYiVikθ0k, where

Λ1=1pd01Λ2=1pdμx010001Λ3=1pdμxpzμx0100001pz0001

Using the sparse structure of EWikVik,the(2,2)elementsofAk1BkAk1 are then given by

VarCovZi,Di1ZipzYiVikθ0k

For k=1,2,thiscoincideswiththeasymptoticvarianceσk2 in Theorem 1. Theorem 1 also shows the asymptotic variance of βˆ3 as

σ32=VarCovZi,Di1ZipzYiVikθ0k+pz1pzϕ0Xiμx

By the moment condition EZipzXiμxYiVikθ0k=0,σ32 is at least as large as

(18)plimσˆ32=VarCovZi,Di1ZipzYiVikθ0k

A similar calculation shows that σˉ32pσ32. Of course one can also bootstrap.

## E Proof of Proposition 1

Consider first the case of D=Z.Forα+β=EY|Z=1,α=EY|Z=0,andμx=EX,themomentconditionsareEϕiα0,β0,μ0x=0, where

ϕi(α,β,μx)=(Zi(Yiαβ),i(Xiμx),(1Zi)(Yiα),(1Zi)(Xiμx)),

such that for A11=VarYi|Zi=1,A12=CovYi,Xi|Zi=1=A21,B11=Var(Yi|Zi=0),B12=CovYi,Xi|Zi=0=B21,A22=B22=VarXi,

(19)Varϕi=pzA001pzBA=A11A12A21A22B=B11B12B21B22

Then VarˆϕiissimilartoVarϕwithpz,A,Breplacedbypˆz,Aˆ,Bˆ.

An application of the partitioned matrix inversion formula shows that the solution to eq. (5) is given by, for F2=A22A21A111A121andG2=B22B21B111B121,

(20)1ni=1nZiYiαβAˆ12Aˆ2211ni=1nZiXiμx=01ni=1n1ZiYiαBˆ12Bˆ2211ni=1n1ZiXiμx=0Fˆ2A21A1111ni=1nZiYiαβ+F21ni=1nZiXiμxGˆ2Bˆ21Bˆ1111ni=1n1ZiYiα+Gˆ21ni=1n1ZiXiμx=0.

Substitute the first two equations into the third and simplify to

(21)Aˆ2211ni=1nZiXiμx+Bˆ2211ni=1n1ZiXiμx=0

Since Aˆ22=VarXi+OP1n=Bˆ22+OP1n,thiscanbeusedtoshowthatμˆx=Xˉ+oP1n. And then

αˆ+βˆ=i=1nZiYiAˆ12Aˆ221i=1nZiXiμx/i=1nZi+oP1nαˆ=i=1n1ZiYiBˆ12Bˆ221i=1n1ZiXiμx/i=1n1Zi+oP1n

Up to oP1nterms,thesearetheintercepttermsinseparateregressionsofYionXiXˉ among the control and treatment groups.

These calculations can be extended to the LATE GMM model in eq. (5), where we now define A11=VarYα0β0D|Z=1,A12=CovYα0β0D,X|Z=1=A21,B11=Var(Yα0β0D|Z=0),B12=CovYα0β0D,X|Z=0=B21,A22=B22=VarX,andletAˆjk,Bˆjkdenotetheirn consistent estimates. Then eqs. (19) and (21) both continue to hold, leading to μˆx=Xˉ+oP1n. The first two equations in eq. (20) now become

1ni=1nZiYiαβDiAˆ12Aˆ2211ni=1nZiXiμx=01ni=1n1ZiYiαβDiBˆ12Bˆ2211ni=1n1ZiXiμx=0,

Note that given αandβ,Aˆ12Aˆ221andBˆ12Bˆ221arepreciselytheprofiledϕˆandηˆ implied by the estimating eq. (13) for β3.Inotherwords,theabovetwoequationsaretheconcentratedestimatingequationsforαandβ implied by eq. (13).

## F Proof of Proposition 2

Let W1i=Zi,ZiViTandW0i=1Zi,1ZiViT. Then the normal equations corresponding to eq. (9) are

1ni=1nW1iYiγˆ1ϑˆ1Vi=0,1ni=1nW0iYiγˆ0ϑˆ0Vi=0.1ni=1nW1iDiτˆ1ζˆ1Vi=0,1ni=1nW0iDiτˆ0ζˆ0Vi=0.

Taking a linear combination using βˆAL=AvgLATEˆ,

1ni=1nW1iYiγˆ1ϑˆ1ViβˆALDiτˆ1ζˆ1Vi=01ni=1nW0iYiγˆ0ϑˆ0ViβˆALDiτˆ0ζˆ0Vi=0

We rearrange this into

1ni=1nW1iYiγˆ1+βˆALτˆ1+βˆALζˆ1ϑˆ1VˉβˆALDiβˆALζˆ1ϑˆ1ViVˉ=01ni=1nW0iYiγˆ0+βˆALτˆ0+βˆALζˆ0ϑˆ0VˉβˆALDiβˆALζˆ0ϑˆ0ViVˉ=0

By the definition in eq. (10),

νˆ=γˆ1βˆALτˆ1βˆALζˆ1ϑˆ1Vˉ=γˆ0βˆALτˆ0βˆALζˆ0ϑˆ0Vˉ.

The normal equations therefore take the form of

(22)1ni=1nW1iYiνˆβˆALDiβˆALζˆ1ϑˆ1ViVˉ=01ni=1nW0iYiνˆβˆALDiβˆALζˆ0ϑˆ0ViVˉ=0

Next, consider the normal equations determining the interactive βˆ.ForWi=W1i,W0iT,

1ni=1nWiYiαˆβˆDiηˆViVˉϕˆZiViVˉ=0.

This can be rewritten as

(23)1ni=1nW1iYiαˆβˆDiηˆ+ϕˆViVˉ=01ni=1nW0iYiαˆβˆDiηˆViVˉ=0

Then eq. (23) can be satisfied through eq. (22) by setting

αˆ=νˆ,βˆ=βˆAL,ηˆ=βˆALζˆ0ϑˆ0,ϕˆ=βˆALζˆ1ϑˆ1ηˆ.

## G Proof of Proposition 3

When D=Z, Hahn (1998) shows that σ2=Varψ, where

ψ=DpY1EY1|X1D1pY0EY0|X+EY1Y0|XEY1Y0=DpY1EY1|Xp+Y0EY0|X1p+Y1Y0EY1Y0

We can then use ψ3 in the proof of Corollary 1 to show that

Covψ3/pz1pzψ,ψ=0.

More generally when ZD, the LATE efficiency bound was calculated in Frolich (2006) and Hong and Nekipelov (2010) (Lemma 1 and Thm 4), with σ2=Varψ, and

ψ=1PD1>D0{ZpzYEY|Z=1,X+EY|Z=1,X1Z1pzYEY|Z=0,XEY|Z=0,X(ZpzDED|Z=1,X+ED|Z=1,X1Z1pzDED|Z=0,XED|Z=0,X)β},

where PD1>D0=PD=1|Z=1PD=1|Z=0. We can rewrite this as

(24)PD1>D0ψ=Zpzt1Et1|X1Z1pzt0Et0|X+Et1t0|X=Zpz{t1Et1|Xpz+t0Et0|X1pz}+t1t0.

Again comparing this to ψ3 in the proof of Corollary 1 shows that

Covψ3/PD1>D0pz1pzψ,ψ=0.

The comparison between ψ2andψ3 in eq. (17) can also be understood in the context of doubly robust estimators, which use influence functions of the form similar to ψbutwithoutrequiringpztobeconstant.DefineQXPZ=1|X.InthecaseofD=Z,

ϕ=DQXYEY1|X1D1QXYEY0|X+EΔY|XEΔY=DQXY1EY1|XQX+Y0EY0|X1QX+Y1Y0β

The estimators with influence function ϕareconsistentaslongaseitherQXorthepairofEY1|X,EY0|Xarecorrectlyspecified.UndercompleterandomizationandwithQXspecifiedaspz, the P-score model is obviously correctly specified. Therefore EY1|XandEY0|X,beinglinearprojectionson1VX,havenoeffectonconsistency.However,betweentwomisspecifiedconditionalmeanmodels,thefirstpairinψ3isamoreefficientprojectionthatinducesasmallervariancethanthelinearprojectioninψ2.Similarly,inthegeneralLATEcasewhenDZ, doubly robust estimators use influence functions of the form

ϕ=DQ˜Xt1Et1|XQX+t0Et0|X1QX+t1t0β.

where Et1|X=EY1|Xβ0ED1|XandEt0|X=EY0|Xβ0ED0|X.TheseestimatorsareconsistentaslongaseitherQX or the set of

EY1|X,EY0|X,ED1|X,ED0|X.

are correctly specified. Among different misspecified linear approximations to Et1|XandEt0|X, the least square projection is more efficient.

Similar to eqs. (3) and (4), σ2 can be consistently estimated under suitable regularity conditions (such as those in Newey (1997)) by

σˉ2=CovˆZ,D21ni=1nϵˉi2wherebarϵi=Zipˆzϵˆi+pˆz1pˆzϕˆViVˉ

and ϵˆi=YiαˆβˆDiηˆViVˉϕˆZiViVˉ. If we write

YiβˆDi=1Ziαˆ+ηˆViVˉ+Ziαˆ+ηˆ+ϕˆViVˉ+ϵˆi,

then we expect that uniformly in Xi,

αˆ+ηˆViVˉ=EYβD|Z=0,Xi+oP(1)=Et0i|Xi+oP(1)αˆ+ηˆ+ϕˆViVˉ=EYβD|Z=1,Xi+oP(1)=Et1i|Xi+oP(1).

Therefore ϕˆViVˉ=Et1it0i|Xi+oP(1),ϵˆi=Zit1iEt1i|Xi+1Zit0iEt0i|Xi,

ϵˉi=Zipz1pzt1iEt1i|Xi+pzt0iEt0i|Xi+pz1pzt1it0i+oP(1),

which coincides with the semiparametric asymptotic influence function, and includes the CI model as a special case when D=Z.

## H Proof of Proposition 4

Recall that nβˆ1β0=CovnZ,D1nCovnZ,YDβ0. It can be shown that

CovnZ,D=1ni=1nZiDi1ni=1nZi1ni=1nDi=pˆz1pˆz1ni=1nZiD1ipˆz1ni=1n1ZiD0i1pˆz=pz1pzPD1>D0+oP(1).

where the last line follows from Assumption 8.2. Furthermore,

CovnY,Z=1ni=1nZiYi1ni=1nZi1ni=1nYi=pˆz1pˆz1ni=1nZiY1ipˆz1ni=1n1ZiY0i1pˆz

Next we consider

nCovnY,ZCovnD,Zβ0=pˆz1pˆz1ni=1nZit1ipˆz1Zit0i1pˆz
=pˆz1pˆz1ni=1nZit1ipˆz+Zit0i1pˆzt1it0it0i1pˆz+pˆz1pˆz1ni=1nt1it0i=pˆz1pˆz1ni=1nZipˆzt1ipˆz+t0i1pˆz+pˆz1pˆz1ni=1nt1it0i=pz1pz1ni=1nZipzt1iEt1ipz+t0iEt0i1pz+pz1pz1ni=1nt1it0i+Rn

where

Rn=1ni=1nZipˆzpzpˆzt1it0i+1pzEt1i+pzEt0i+1ni=1npzpˆz1pzt1i+pzt0i1pzEt1i+pzEt0i+pˆz1pˆzpz1pz1ni=1nt1it0i=pzpˆz1ni=1nZipzt1it0i+pzpˆz21ni=1nt1it0i+pzpˆz1ni=1n1pzt1iEt1i+pzt0iEt0i+pˆz1pˆzpz1pz1ni=1nt1it0i

Using Assumption 8, each term can be shown to be oP(1),sothatRn=oP(1).FromthispointonthevariancebecomesdifferentdependingonwhetherS=1orS>1.Recallthatωi=t1iEt1ipz+t0iEt0i1pzandωs=Eωi|Xis.Firstnotethat1ni=1nt1it0iisasymptoticallyorthogonalto1ni=1nZipzωi under assumptions 8.1 and 8.2.

CovsS1ni=1n1XisZipzωi,sS1ni=1n1Xist1it0i
=sS1ni=1nE1XisZipzωit1it0i=sS1ni=1nE1XisEZiXis,Y1i,Y0i,D1i,D0ipzωit1it0i=sS1ni=1nE1XisEZiXispzωit1it0i=Oa.s.1n

Then write the first part of the influence function as

(25)1ni=1nZipzωi=1ni=1nZipzωiωs+1ni=1nZipzωs=sS1ni=1n1XisZipzωiωs+sS1ni=1n1XisZipzωs.

First note that the two sums are orthogonal:

CovsS1ni=1n1XisZipzωiω(s),sS1ni=1n1XisZipzω(s)=sS1ni=1nCov1XisZipzωiω(s),1XisZipzω(s)=sS1ni=1nE1XisZipz2ωiω(s)ω(s)=sS1ni=1nE1XisZipz2ωiEωi|Xis,ZiEωi|Xis,Zi=sS1ni=1nE1XisZipz2ωiEωi|XisEωi|Xis=0

We now use arguments similar to those in Lemma B.2 of BCS 2017a to derive the limiting distribution of eq. (25). The distribution of U=1ni=1nZipzωiωsisthesameasthedistributionofthesamequantitywheretheobservationsarefirstorderedbystrataandthenbyZi=1andZi=0withinstrata.Letnz(s)bethenumberofobservationsinstrataswhichhaveZi=z{0,1},andletps=PXis,N(s)=i=1nISi<s,andF(s)=PSi<s.IndependentlyforeachsandindependentlyofZ(n),S(n),letωis:1inbei.i.d.withmarginaldistributionequaltothedistributionofωi|Xis. Define

U˜=1nsSi=nN(s)n+1nN(s)n+n1(s)nωisωs1pz+i=nN(s)n+n1(s)n+1nN(s)n+n(s)nωisωspz

By construction, U|S(n),Z(n)=dU˜|S(n),Z(n)whichimpliesU=dU˜. Next define

U=1nsSi=nF(s)+1nF(s)+p(s)pzωisωs1pz+i=nF(s)+p(s)pz+1nF(s)+p(s)ωisωspz

Using properties of Brownian motion,

1ni=nF(s)+1nF(s)+p(s)pzωisωs1pzdN0,p(s)pz1pz2Eωisωs2
1ni=nF(s)+p(s)pz+1nF(s)+p(s)ωisωspzdN0,p(s)1pzpz2Eωisωs2

Since the two sums are independent, ωisωsareindependentacrossiands,andEωisωs2=Eωiωs2Xis,

UdN0,pz1pzsSp(s)Eωiωs2Xis

Furthermore, since N(s)n,n1(s)npF(s),pzp(s), by the continuous mapping theorem,

U˜Up0

Therefore,

1ni=1nZipzωiωsdN0,pz1pzsSp(s)Eωiωs2XisΩ1

For the second term, it suffices to use Assumption 8.2 to show that

1ni=1nZipzωs=sS1ni=1n1XisZipzωsdN0,sSτ(s)p(s)ωs2Ω2

Lastly, note that 1ni=1nt1it0idN0,Vart1it0iΩ3.Thennβˆ1β0dN0,PD1>D02Ω1+Ω2+Ω3.

As in Section 2, it is straightforward to show that the 2SLS robust variance is consistent for PD1>D02 times

plim1ni=1nZipzωi+t1it0i2=plim1ni=1nZipzωi2+plim1ni=1nt1it0i2=plim1ni=1nZipz2ωiωs2+plim1ni=1nZipz2ω(s)2+plim1ni=1nt1it0i2

Independently for each sandindependentlyofZ(n),S(n),letωis:1inbei.i.dwithmarginaldistributionequaltothedistributionofωi|Xis. Using similar arguments as those in Lemma B.3 of BCS 2017a,

1ni=1nZipz2ωiωs2=sS1ni=1n1(s)1pz2ωisωs2+1ni=1n0(s)pz2ωisωs2
=sSn1(s)n1n1(s)i=1n1(s)1pz2ωisωs2+n0(s)n1n0(s)i=1n0(s)pz2ωisωs2psSpzp(s)1pz2Eωisωs2+1pzp(s)pz2Eωisωs2=pz1pzsSp(s)Eωiωs2Xis

The key steps are to use the Almost Sure Representation theorem to construct n˜1(s)n=dn1(s)nsuchthatn˜1(s)na.s.pzp(s)andthentonotethatbyindependenceofZ(n),S(n)andωis:1in,foranyϵ>0,

P1n1(s)i=1n1(s)ωisωs2Eωisωs2>ϵ=EP1nn˜1(s)ni=1nn˜1(s)nωisωs2Eωisωs2>ϵn˜1(s)n

Also, note that by the weak law of large numbers, for any sequence nkask,

1nki=1nkωisωs2pEωisωs2

Since nn˜1(s)nalmostsurely,byindependenceofn˜1(s)nandωis:1in,

P1nn˜1(s)ni=1nn˜1(s)nωisωs2Eωisωs2>ϵn˜1(s)na.s.0

Therefore, the first and third terms coincide with Ω1andΩ3. The second term converges to

plim1ni=1nZipz2ω(s)2=s=1Sωs2pspz1pz

This is larger than Ω2aslongasτspz1pzforallsS,andstrictlysoforsomesS.

## I Proof of Proposition 5

The sample normal equations for this regression are given by

τnβˆ2,ηˆ=1ni=1n[1XissSZipz]Yiβˆ2Dis=1Sηˆs1Xis=0.

We can write βˆ2β0,ηˆη0=Aˆ1τnβ0,η0ifweletη0=η0s,sS,t1s=Et1i|Xis,t0s=Et0i|Xis,

η0s=EY|sED|sβ0=pzt1s+1pzt0s=1pzt1s+pzt0s12pzt1st0s.

and

Aˆ=1ni=1n[1XissSDidiag1XissSZipzDiZipz1XissS.]

Using Assumption 8.1 and 8.2 we can show that Aˆ=A+oP(1), where

A=[psED|sdiagps,sSpz1pzPD1>D00]

In the following we will show that τnβ0,η0=Op1n, which by non-singularity of Aimpliesthatβˆ2β0,ηˆη0=OP1n. Then the second row of the relation

A+oP(1)βˆ2β0,ηˆη0=τnβ0,η0

implies that, using the above η0s,

PD1>D0nβˆ2β0=1ni=1nZipzpz1pzYiβ0Dis=1Sη0s1Xis+oP(1)
=1ni=1nZipzt1iEt1ipz+t0iEt0i1pzsSEt1iEt1i|Xispz+Et0iEt0i|Xis1pz12pzpz1pzt1st0s1Xis+t1it0i+oP(1)=sS1ni=1n1XisZipzωiωs+sS1ni=1n1XisZipz12pzpz1pzt1st0s+1ni=1nt1it0i+oP(1).

where we recall that ωi=t1iEt1ipz+t0iEt0i1pzandωs=Eωi|Xis. Using similar arguments to those in proposition 4,

sS1ni=1n1XisZipzωiωsdN0,pz1pzsSp(s)Eωiωs2XisΩ1sS1ni=1n1XisZipz12pzpz1pzt1st0sdN0,sSp(s)τ(s)12pzpz1pzt1st0s2Ωˉ21ni=1nt1it0idN0,Vart1it0iΩ3

Note that the first two sums in the influence function are orthogonal:

CovsS1ni=1n1XisZipzωiωs,sS1ni=1n1Xis
Zipz12pzpz1pzt1st0s=sS1ni=1nE1XisZipz2ωiωs12pzpz1pzt1st0s=sS1ni=1nE1XisZipz212pzpz1pzEωiωst1st0sXis,Zi=sS1ni=1nE1XisZipz2ωiEωiXis12pzpz1pzt1st0s=0

And the third sum is orthogonal to the first two sums by the same arguments in proposition 4. Therefore, PD1>D0nβˆ2β0dN0,Ω1+Ωˉ2+Ω3. It is also easy to show using similar arguments to those in proposition 4 that the 2SLS nominal variance consistently estimates PD1>D02 times

plim1ni=1nZipzpz1pzYiβ0Dis=1Sη0s1Xis2=plimsS1ni=1n1XisZipzωiωs2+plimsS1ni=1n1XisZipz12pzpz1pz2t1st0s2+plim1ni=1nt1it0i2=Ω1+Ω˜2+Ω3

where

Ω˜2=sSpspz1pz12pzpz1pzt1st0s2

which is larger than Ωˉ2ifpz1pz>τsforsomes,unlessS=1orpz=12.

## J Proof of Proposition 6

We choose to work with the representation in eq. (11), using which we write

(26)nβˆ3β0=ns=1Sξˆ1sξˆ0sβ0ζˆ1sζˆ0si=1n1Xisns=1Sζˆ1sζˆ0si=1n1Xisn

For the denominator, under Assumption 8, Lemma B.3 of BCS 2017a implies that

ζˆ1s=1ni=1n1XisZiDi1ni=1n1XisZipPD1=1|s,ζˆ0s=1ni=1n1Xis1ZiDi1ni=1n1Xis1ZipPD0=1|s.

Together with 1ni=1n1xisppspxis,

s=1Sζˆ1sζˆ0si=1n1XisnpPD1=1PD0=1=PD1>D0.

Using pˆs=1ni=1n1Xis,pˆspˆzs=1n=1n1XisZi,t1s=Et1i|Xis,andt0s=Et0i|Xis

s=1Sξˆ1sξˆ0sβ0ζˆ1sζˆ0si=1n1Xisn=s=1Spˆs1ni=1nt1i1XisZipˆspˆz1ni=1nt0i1Xis1Zipˆs1pˆz=s=1S1ni=1n1Xist1it1sZipˆzt0it0s1Zi1pˆz+s=1S1ni=1n1Xist1st0s=s=1S1ni=1n1Xist1it1sZipzt0it0s1Zi1pz+sSR1ns+R2ns+s=1S1ni=1n1Xist1st0s
(27)=s=1S1ni=1n1XisZipzt1it1spz+t0it0s1pz+sSR1ns+R2ns+1ni=1nt1it0i

In the above

R1ns=pzpˆzpˆz1pˆz1ni=1n1Xist1it1sZi+t0it0s1ZiR2ns=1pˆz1pˆz1pz1pz×1ni=1n1Xis1pzt1it1sZipzt0it0s1Zi.

Rewriting,

R1ns=pzpˆzpˆz1pˆz1ni=1n1XisZipzt1it1st0it0s+pz1ni=1n1Xist1it1s+1pz1ni=1n1Xist0it0s
R2ns=1pˆz1pˆz1pz1pz1ni=1n1XisZipzt1it1s+t0it0s+pz1ni=1n1Zi1Xist1it1s1pz1ni=1nZi1Xist0it0s=1pˆz1pˆz1pz1pz1ni=1n1XisZipzt1it1s+t0it0s+pz1ni=1n1Zi1pz1Xist1it1s1pz1ni=1nZipz1Xist0it0s+pz1pz1ni=1n1Xist1it1s1pzpz1ni=1n1Xist0it0s

Using Assumption 8, Lemmas B.2 and B.3 of BCS 2017a, and arguments similar to those in propositions 4 and 5, we can show that sSR1ns=oP(1)OP1n=oP1nandsSR2ns=oP(1)OP1n=oP1n.Sincet1it1spz+t0it0s1pz=t1iEt1ipz+t0iEt0i1pzt1sEt1ipz+t0sEt0i1pz=ωiωs eq. (27) can be written as

(28)s=1S1ni=1n1XisZipzωiωs+1ni=1nt1it0i+oP(1)

The first part of this influence function corresponds exactly to the first term in eq. (25). Therefore regardless of pzthereisnoneedtoworryaboutthevariationinducedbythesamplingschemeforZi within the cluster.

In the special case of unconfoundedness, eq. (28) becomes

(29)s=1S1ni=1n1XisZipzY1ipz+Y0i1pzEY1ipz+Y0i1pzXis+1ni=1nY1iY0iμ1μ0+oP(1),

Using Assumption 8, σˆ32isconsistentfortheplimofPD1>D02times1ni=1nψi2 where

ψi=Zipzt1itˉ1pz+t0itˉ01pzΣn,X,t1pz+t01pzΣn,X1XiXˉ+pz1pzt1it0itˉ1+tˉ0Σn,X,t1t0Σn,X1XiXˉ=ZipzωiωˉΣn,X,ωΣn,X1XiXˉ+pz1pzt1it0itˉ1+tˉ0Σn,X,t1t0Σn,X1XiXˉ

for Σn,X=1ni=1nXiXˉXiXˉandΣn,X,t=1ni=1nXiXˉtitˉ.WithXibeingtheclusterdummies,ωiωˉΣn,X,ωΣn,X1XiXˉistheresidualfromasaturatedregressionofωiontheclusterdummies,andconvergestosS1xisωiωs.Forthesamereason,t1it0itˉ1+tˉ0Σn,X,t1t0Σn,X1XiXˉistheresidualfromasaturatedregressionoft1it0i on the cluster dummies, and converges to

sS1xist1it0iEt1it0i|s.

Therefore 1ni=1nψi2 is in turn consistent for the variance of

s=1S1ni=1n1XisZipzωiωs+1ni=1nsS1xist1it0iEt1it0i|s,

which is asymptotically smaller than the variance of eq. (28) but larger than the variance of its first component. Next we will need to add a consistent estimate of

1ni=1nsS1xisEt1it0i|s2.

This is obtained by ϕˆ1ni=1nViVˉViVˉϕˆ,whichisthevarianceofthefittedvalueofthesaturatedclusterdummyregression.Wecanthenuseσˉ32 in Corollary 3 to obtain a consistent estimate of the variance of eq. (28).

We can also directly estimate the variance of βˆ3 by estimating the first representation of the influence function in eq. (27). Let tˆ1iZi=YiDiβˆ3Zi,tˆ0i1Zi=YiDiβˆ31Zi,

tˆ1s=i=1ntˆ1iZi1Xis/i=1nZi1Xistˆ0s=i=1ntˆ0i1Zi1Xis/i=1n1Zi1Xis

and construct

Ωˆ=1ni=1nsS1Xistˆ1iZitˆ1sZipˆztˆ0i1Zitˆ0s1Zi1pˆz2+1ni=1ns=1S1Xistˆ1stˆ0s2

Lemma B.3 of BCS 2017a and the continuous mapping theorem imply that tˆ1spt1sandtˆ0spt0s.Slutsky<sup></sup>stheoremthenimpliesthatΩˆ consistently estimates the variance of eq. (27).

## K Proof of Proposition 7

This estimator can be implemented using OLS and 2SLS by fully interacting Zi,theclusterdummies,andtheadditionalregressorsXi.TosimplifynotationwedenoteWi=1Xi and the regression functions in eq. (12) as γˆ0sWi,γˆ1sWi,τˆ0sWiandτˆ1sWi. Consider first the OLS case under Assumption 5.

βˆS=sSpˆsWˉsγˆ1sγˆ0s

where Wˉs=1ni=1n1XisWi/pˆs,γˆ0spγ0s=EWW|s1EWY0|s,andγˆ1spγ1s=EWW|s1EWY1|s, for

(30)γˆ1s=H1n11ni=1n1XisZiWiYiandH1n=1ni=1n1XisZiWiWi.
(31)γˆ0s=H0n11ni=1n1Xis1ZiWiYiandH0n=1ni=1n1Xis1ZiWiWi.

In the normal equations EWYjWγjs|s=0forj=0,1,andWincludestheconstantterm.ThereforeEYjWγjs|s=0forj=0,1,sothatβˆSpβ0=Δ=EY1Y0.Inthefollowing,wewillnotrequirepzspz. Note that

βˆSβ0=sSpˆsWˉsγˆ1sγ1sγˆ0s+γ0s(1)+sSpˆsWˉsγ1sγ0sΔ(2),

where we can write (1) as

sSpˆsWˉsH1n11ni=1n1XisWiZiY1iWiγ1sH0n11ni=1n1XisWi1ZiY0iWiγ0s.

Using pˆspps,WˉspEW|s,EW|sEWW|s1=1,0,...,

H1nppspzsEWW|s,H0npps1pzsEWW|s,
1ni=1n1XisWiZiY1iWiγ1s=OP1n,1ni=1n1XisWi1ZiY0iWiγ0s=OP1n,

we can write eq. (1) as

sSEW|sEWW|s11ni=1n1XisWiZiY1iWiγ1spzs1ZiY0iWiγ0s1pzs+oP1n=sS1ni=1n1Xis[ZipzsY1iWiγ1spzs+Y0iWiγ0s1pzs+Y1iY0iWiγ1sγ0s]+oP1n

Therefore,

(1)+2=sS1ni=1n1XisZipzsY1iWiγ1spzs+Y0iWiγ0s1pzs+1ni=1nY1iY0iΔ+oP1n

This obviously is more efficient than eq. (29) since Wiγjs,j=0,1isthelinearprojectionofYijEYij|swithinclusters, and results in a smaller variance.

Next we generalize the above to LATE. Consider

βˆS=sSpˆsWˉsγˆ1sγˆ0ssSpˆsWˉsτˆ1sτˆ0s

so that for β0=EY1Y0|D1>D0,

βˆSβ0=sSpˆsWˉsγˆ1sγˆ0sτˆ1sτˆ0sβ0sSpˆsWˉsτˆ1sτˆ0s

Since the denominator is ED1D0+oP(1)=PD1>D0+oP(1), we focus on the numerator, and write

PD1>D0+oP(1)βˆSβ0=sSpˆsWˉsγˆ1sγˆ0sτˆ1sτˆ0sβ0.

γ1sandγ0s are defined by

γˆ1s=H1n11ni=1n1XisZiWiYipγ1s=EWW|s1EWY1|s
γˆ0s=H0n11ni=1n1Xis1ZiWiYipγ0s=EWW|s1EWY0|s,

and τ1sandτ0s are analogously defined by

τˆ1s=H1n11ni=1n1XisZiWiDipτ1s=EWW|s1EWD1|s
τˆ0s=H0n11ni=1n1Xis1ZiWiDipτ0s=EWW|s1EWD0|s.

Define ηˆjs=γˆjsτˆjsβ0forj=0,1,sothatηˆjspηjs=EWW|s1EWtj|s, where

ηˆ1s=1ni=1n1XisZiWiWi11ni=1n1XisZiWiY1iD1iβ0t1iηˆ0s=1ni=1n1Xis1ZiWiWi11ni=1n1Xis1ZiWiY0iD0iβ0t0i

Then we proceed similar as the ATE case to write the numerator as

sSpˆsWˉsηˆ1sη1sηˆ0s+η0s(1)+sSpˆsWˉsη1sη0s(2),

where by noting that

1ni=1n1XisWiZit1iWiη1s=OP1n,1ni=1n1XisWi1Zit0iWiη0s=OP1n,

we can write (1) as

sSEW|sEWW|s11ni=1n1XisWiZit1iWiη1spzs1Zit0iWiη0s1pzs+oP1n=sS1ni=1n1Xis[Zipzst1iWiη1spzs+t0iWiη0s1pzs+t1it0iWiη1sη0s]+oP1n

Therefore,

(32)(1)+2=sS1ni=1n1XisZipzst1iWiη1spzs+t0iWiη0s1pzs+1ni=1nt1it0i+oP1n

Again this ought to be more efficient than eq. (27) since Wiηjsisthewithinclusterlinearprojectionoftjitjs.Themorevariablestheprojectionison,thesmallerthevariance.AsdimWatanappropriaterate,WiηjsEtji|Wiforj=0,1, so that the above equation becomes the efficient influence function in eq. (24) conditional on both the cluster indicators and the extra regressors.

Received: 2017-02-22
Revised: 2018-01-08
Accepted: 2018-02-01
Published Online: 2018-07-03
Published in Print: 2018-07-26

© 2018 Oldenbourg Wissenschaftsverlag GmbH, Published by De Gruyter Oldenbourg, Berlin/Boston