Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter July 18, 2016

Local Semi-Parametric Efficiency of the Poisson Fixed Effects Estimator

Valentin Verdier


Hausman, Hall, and Griliches [Hausman, J., H. B. Hall, and Z. Griliches. 1984. “Econometric Models for Count Data with an Application to the Patents-R & D Relationship.” Econometrica 52 (4): 909–938.] have defined the Poisson fixed effects (PFE) estimator to estimate models of panel data with count dependent variables under distributional assumptions conditional on covariates and unobserved heterogeneity, but without any restriction on the distribution of unobserved heterogeneity conditional on covariates. Wooldridge [Wooldridge, J. M. 1999. “Distribution-Free Estimation of some Nonlinear Panel Data Models.” Journal of Econometrics 90 (1): 77–97.] showed that the PFE estimator is actually consistent even if the distributional assumptions of the PFE model are violated, as long as the restrictions imposed on the conditional mean of the dependent variable are satisfied. In this note I study the efficiency of the PFE estimator in the absence of distributional assumptions. I show that the PFE estimator corresponds to the optimal estimator for random coefficients models of Chamberlain [Chamberlain, G. 1992. “Efficiency Bounds for Semiparametric Regression.” Econometrica 60 (3): 567–596.] in the particular case where the assumptions of equal conditional mean and variance and zero conditional serial correlation are satisfied, regardless of whether the distributional assumptions of the PFE model hold. For instance the dependent variable does not need to be a count variable. This local efficiency result, combined with the simplicity and robustness of the PFE estimator, should provide a useful additional justification for its use to estimate conditional mean models of panel data.

JEL Classification: C01; C13; C23; C25


A Efficient Estimation under Conditional Mean Restrictions

Recall the notation from the body of the text so that we have ρit(β)=yitμit(β)s=1Tyiss=1Tμis(β),ρi(β)=[ρi1(β), …, ρiT(β)]′, Di=E(ρiβ(β0)|xi), Σi=Var(ρi(β0)|xi), pit(β)=μit(β)s=1Tμis(β),pi(β)′=[pi1(β), …, piT(β)], Wi(β)=diag(pi(β)), μit=μit(β0).

Introducing some new notation, let μi=[μi1,,μiT],yi=[yi1,,yiT], Σy,i=Var(yi|xi), hi=E(ci|xi), μit=μitβ(β0),μi=[(μi1),,(μiT)].

We will also use M[k] to denote the kth column vector (element) of the matrix (row vector) M.

Also let:

(A.1)Pi=1t=1Tμit[μi1μi1μiTμiT] (A.1)

so that ρi=ρi(β0) can be rewritten as:

(A.2)ρi=(IPi)yi (A.2)

Lemma 1: Σi=Var(ρi|xi) is singular.

Proof. Since ρi=(IPi)yi and Pi is a function of xi, we simply have to show that (IPi) is singular.

First note that Pi is idempotent, so that IPi is idempotent as well.



since trace(Pi)=t=1Tμitt=1Tμit=1.

Hence IPi is singular.□

Lemma 2:Σi=(Σy,i1Σy,i1μi(μiΣy,i1μi)1μiΣy,i1)is a symmetric generalized inverse of Σi.

Proof. We can rewrite Σi as:

(A.3)Σi=(IPi)Σy,i(IPi) (A.3)
(A.4)=Σy,iPiΣy,iΣy,iPi+PiΣy,iPi (A.4)

Note that:

(A.5)Piμi=μi (A.5)




(A.6)ΣiΣi=(Σy,i1Σy,i1μi(μiΣy,i1μi)1μiΣy,i1)Σyi0 (A.6)
(A.7)(Σy,i1Σy,i1μi(μiΣy,i1μi)1μiΣy,i1)Σy,iPi+0 (A.7)
(A.8)=IPi (A.8)


(A.9)ΣiΣiΣi=ΣiPiΣi (A.9)
(A.10)=Σi (A.10)

Note that:

(A.11)PiPi=Pi (A.11)


(A.12)ΣiΣiΣi=ΣiΣiPi (A.12)
(A.13)=ΣiΣy,iPi+PiΣy,iPi+Σy,iPiPiΣy,iPi (A.13)
(A.14)=Σi (A.14)

So Σi=(Σy,i1Σy,i1μi(μiΣy,i1μi)1μiΣy,i1) is indeed a generalized inverse of Σi. It is clearly symmetric as well.□

Lemma 3:The asymptotic variance ofβ^optis the same independently of which symmetric generalized inverse of Σi,Σi+,is used.

Proof. Let Σi± and Σi+¯ be two symmetric generalized inverses of Σi.

Since ρi=(IPi)yi, we have, for any k=1, …, dim(β0):

(A.15)Di[k]=E((1t=1Tμit[μi1[k]μi1[k]μiT[k]μiT[k]]t=1Tμit[k](t=1Tμit)2[μi1μi1μiTμiT])yi|xi) (A.15)
(A.16)=hi(1t=1Tμit[μi1[k]μi1[k]μiT[k]μiT[k]]t=1Tμit[k](t=1Tμit)2[μi1μi1μiTμiT])μi (A.16)
(A.17)=hi(μi[k]μit=1Tμit[k]t=1Tμit) (A.17)

so that:

(A.18)Di=hi(μiμit=1Tμitt=1Tμit) (A.18)

From this expression for Di we can show that ΣiΣi±Di=ΣiΣi+¯Di=Di by showing that, for any particular choice of i, the linear system of equations in w:

(A.19)Σiw=Di (A.19)

is consistent.

Consistency of (A.19) follows from:


and recalling ΣiΣi=IPi from the previous lemma, so that:


Therefore E(DiΣi±Di)=E(DiΣi±ΣiΣi+¯Di)=E(DiΣi+¯Di). Hence the result of this lemma is proved.□

Proof of Proposition 1.

Proof.Chamberlain (1992: p. 581), showed that the asymptotic information bound for estimating β0 from (1) is:


Note that:



(A.20)DiΣiDi=hiμiΣiμihi (A.20)

Therefore we have:

(A.21)V0,β1=E(DiΣiDi) (A.21)

Because Σi is a symmetric generalized inverse of Σi as shown in Lemma 2, the estimator defined by (4) with Σi+=Σi also has inverse asymptotic variance:

(A.22)Vopt1=E(DiΣiDi) (A.22)

Hence from Lemma 3, for any choice of a symmetric generalized inverse of Σi, Σi+, we have the result:

(A.23)V0,β1=Vopt1 (A.23)

Therefore, for any choice of Σi+,β^opt is asymptotically efficient for estimating β0 consistently from (1). Since (1) implies (3), β^opt is also asymptotically efficient for estimating β0 from (3).□

B Efficient Estimation under the Poisson Fixed Effects Assumptions

Lemma 4 provides a useful alternative characterization of Σi.

Lemma 4:Σiis the unique matrix S that satisfies:

(B.1)SΣiS=S (B.1)
(B.2)ΣiSΣi=Σi (B.2)
(B.3)SΣi=IPi (B.3)
(B.4)ΣiS=IPi (B.4)

Proof. In the previous section we have shown that Σi satisfies (B.1)–(B.3). Since Σi and Σi are symmetric:

(B.5)ΣiΣi=IPi (B.5)

This solution is unique since for any S, S˜ satisfying these requirements [2]:

(B.6)S=SΣiS=S(IPi)=SΣiS˜=(IPi)S˜=S˜ΣiS˜=S˜ (B.6)

Proof of Proposition 2

Proof. Under (8) and (9) we have Σy,i=hidiag(μi)+viμiμi where vi=Var(ci|xi). Therefore:

(B.7)Σi=Var(ρi|xi) (B.7)
(B.8)=(IPi)(hidiag(μi)+viμiμi)(IPi) (B.8)
(B.9)=(IPi)hidiag(μi)(IPi) (B.9)

where the last equality follows from μitμisμispitr=1Tμir=0.


(B.10)Xi=hi1(diag(1μi)1t=1TμitJ) (B.10)

where J=[1111] and, by an abuse of notation, (1μi)=[1μi1,,1μiT]. We will show that Σi=Xi.

Note that:

(B.11)JPi=t=1Tμitt=1TμitJ (B.11)
(B.12)=J (B.12)


(B.13)diag(1μi)Pi=1t=1TμitJ (B.13)


(B.14)ΣiXi=(IPi)diag(μi)(IPi)(diag(1μi)1t=1TμitJ) (B.14)
(B.15)=(IPi)diag(μi)(diag(1μi)1t=1TμitJ1t=1TμitJ+1t=1TμitJ) (B.15)
(B.16)=(IPi)(IPi) (B.16)
(B.17)=IPi (B.17)

Since both Σi and Xi are symmetric:

(B.18)XiΣi=IPi (B.18)

So in order to show that Σi=Xi, there only remains to show that Xi satisfies (B.1) and (B.2).

For (B.1):

(B.19)XiΣiXi=Xi(IPi) (B.19)
(B.20)=XiXiPi (B.20)
(B.21)=Xihi1(1t=1TμitJ1t=1TμitJ) (B.21)
(B.22)=Xi (B.22)

For (B.2):

(B.23)ΣiXiΣi=(IPi)Σi (B.23)
(B.24)=Σi (B.24)

where the second equality follows from the previous section of the appendix.

Therefore we have shown that in this case:

(B.25)Σi=hi1(diag(1μi)1t=1TμitJ) (B.25)


(B.26)DiΣi=hihiμi(diag(1μi)1t=1TμitJ) (B.26)
(B.27)=(μiμi)t=1Tμitt=1Tμitj (B.27)

where j′=[1, …, 1] and, by an abuse of notation, (μiμi)=[μi1μi1,,μiTμiT].

Note that:

(B.28)pitβ=μitt=1Tμits=1Tμis(s=1Tμis)2μit (B.28)

so that:

(B.29)pitβ1pit=μitμits=1Tμiss=1Tμis (B.29)


(B.30)(pi(β0)β)Wi(β0)1=(μiμi)t=1Tμitt=1Tμitj (B.30)

Hence we have shown that under (1), (8) and (9):


Hence the asymptotic variance of n(β^PFEβ0) is equal to Vopt and from Proposition 1 this is equal to V0,β0, the efficiency bound for estimating β0 consistently from (1).□


Acemoglu, D., and J. Linn. 2004. “Market Size in Innovation: Theory and Evidence from the Pharmaceutical Industry.” The Quarterly Journal of Economics 119 (3): 1049–1090.10.1162/0033553041502144Search in Google Scholar

Azoulay, P., G. S. J. Zivin, and J. Wang. 2010. “Superstar Extinction.” Quarterly Journal of Economics 125 (2): 549–589.10.1162/qjec.2010.125.2.549Search in Google Scholar

Burgess, R., M. Hansen, A. B. Olken, P. Potapov, and S. Sieber. 2012. “The Political Economy of Deforestation in the Tropics.” The Quarterly Journal of Economics 127 (4): 1707–1754.10.1093/qje/qjs034Search in Google Scholar

Chamberlain, G. 1987. “Asymptotic Efficiency in Estimation with Conditional Moment Restrictions.” Journal of Econometrics 34 (3): 305–334.10.1016/0304-4076(87)90015-7Search in Google Scholar

Chamberlain, G. 1992. “Efficiency Bounds for Semiparametric Regression.” Econometrica 60 (3): 567–596.10.2307/2951584Search in Google Scholar

Hahn, J. 1997. “A Note on the Efficient Semiparametric Estimation of Some Exponential Panel Models.” Econometric Theory 13 (04): 583–588.10.1017/S0266466600006010Search in Google Scholar

Hausman, J., H. B. Hall, and Z. Griliches. 1984. “Econometric Models for Count Data with an Application to the Patents-R & D Relationship.” Econometrica 52 (4): 909–938.10.2307/1911191Search in Google Scholar

Newey, W. K. 1993. “16 Efficient Estimation of Models with Conditional Moment Restrictions.” In Handbook of Statistics, Volume 11, pp. 419–454. Amsterdam: Elsevier.10.1016/S0169-7161(05)80051-3Search in Google Scholar

Newey, W. K. 2001. “Conditional Moment Restrictions in Censored and Truncated Regression Models.” Econometric Theory 17 (5): 863–888.10.1017/S0266466601175018Search in Google Scholar

Newey, W. K., and D. McFadden. 1994. “Chapter 36 Large Sample Estimation and Hypothesis Testing. In Handbook of Econometrics, edited by R. F. Engle and D. L. McFadden, Volume 4, pp. 2111–2245. Amsterdam: Elsevier.10.1016/S1573-4412(05)80005-4Search in Google Scholar

Penrose, R. 1955. “A Generalized Inverse for Matrices.” Mathematical Proceedings of the Cambridge Philosophical Society 51 (03): 406–413.10.1017/S0305004100030401Search in Google Scholar

Rose, N. L. 1990. “Profitability and Product Quality: Economic Determinants of Airline Safety Performance.” Journal of Political Economy 98 (5): 944–964.10.1086/261714Search in Google Scholar

Wooldridge, J. M. 1999. “Distribution-Free Estimation of some Nonlinear Panel Data Models.” Journal of Econometrics 90 (1): 77–97.10.1016/S0304-4076(98)00033-5Search in Google Scholar

Published Online: 2016-7-18

©2018 Walter de Gruyter GmbH, Berlin/Boston

Scroll Up Arrow