On the Use of the Helmert Transformation, and its Applications in Panel Data Econometrics

: We revisit the Helmert transformation, and provide a useful and simple derivation of the joint distribution of the sample mean and the sample variance in samples from independently and identically distributednormalrandomvariables.Ourderivationisdistinguishedbyconcreteness,verylittleabstractness, and should be appealing to beginning students of statistics, and to both beginning and advanced students of econometrics. We also highlight one fruitful application of the Helmert transformation in panel data econometrics. The Helmert transformation can be used to eliminate the fixed effects in the estimation of fixed effects models, and we briefly review this application of the transformation in the panel data context.


Introduction
The Helmert transformation is named after the German geodesist Friedrich Robert Helmert (1876) and has a long history of use in statistics (Sawkins 1940;Cramér 1946, p. 116;Kruskal 1946;Weatherburn 1961, p. 164;Brownlee 1965,p .2 7 1 ;Rao 1973, p. 182 among others).One application (the application from now on for brevity) of the Helmert transformation in statistics is to find the joint distribution of the sample mean, and the sample variance, calculated from a sample from a normal population.Although the cited references are authoritative, the presentations there of the application are in our view often rather complicated, which inhibits readers' understating of the topic.We present our version of how the Helmert transformation works in the application, and our version should be particularly suitable and accessible to either beginning undergraduate university students in statistics, or students at both undergraduate and postgraduate level in econometrics, or quantitative social and political sciences which use econometrics.
For example, Cramér (1946, p. 116) and Weatherburn (1961, p. 164) both follow Sawkins (1940),andintroduce in abstract terms an orthogonal linear transformation having certain properties.Instead, we think that explicitly stating the transformation, and then directly establishing its key properties would have pedagogical advantages.Rao (1973, p. 182) similarly uses an orthogonal matrix, i.e. a Helmert matrix, which he defines in terms of certain abstract properties.We think there would be pedagogical advantage to instead explicitly display one such Helmert matrix, so that the reader can visualize what is going on, and then proceed to establishing its properties and what it exactly does in the context of the application.
Upon presenting our treatment of the application, which of course uses the vary same ideas of the Helmert transformation and the associated Helmert matrix as the sources mentioned above, we also briefly discuss the use of the Helmert transformation in panel data econometrics.In particular, the Helmert transformation can be employed to eliminate fixed effects in the estimation of fixed effects models.Therefore, our treatment has the advantage that it is very concrete and explicit compared to previous literature, and advocates for more extensive use of the Helmert transformation.Unlike previous authors, we firstly explicitly state the Helmert transformation and display the Helmert matrix, and then proceed to prove the implications of applying the Helmert transformation in the application.
Our proof is closest to Brownlee (1965,p.271).Brownlee (1965, p. 271) starts with a formula for the sample variance and using it, derives new variables that possess certain desired properties.This derivation of the new variables is fairly involved.We instead start by defining the new variables and then using a simple induction argument, prove that these variables have the same variance as the original variables.
We use mathematical induction in our proof, and Stigler (1984) also uses mathematical induction in his proof, but the argument there is different.He first establishes the joint distribution of the sample mean and variance for a sample of just two observations and then via the induction argument, proves that this joint distribution extends to samples of larger size.Zehna (1991) asserts that Stigler's proof is not completely rigorous, and relies three times throughout the proof on a faulty argument.
This article proceeds as follows.In Section 2 we introduce the Helmert transformation and derive the joint distribution of the sample mean and the sample variance.In Section 3 we introduce the Helmert matrix associated with the transformation, and provide some remarks on its properties.In Section 4 we suggest that the use of the Helmert transformation has been somewhat overlooked in within estimation of one way error component fixed effects models in panel data econometrics, and we briefly survey the extant applications of the Helmert transformation in panel data context.In the last section we conclude.

The Helmert Transformation and the Joint Distribution of the Sample Mean and the Sample Variance in Samples from a Normal Population
We have a set of T random variables x t , t = 1, 2, … , T independently and identically distributed (i.i.d.from now on) as Normal(,  2 ), and we want to find the joint distribution of the sample mean xT = ∑ T t=1 x t ∕T and the sample variance s 2 = ∑ T t=1 (x t − xT ) 2 ∕(T − 1).Consider the Helmert transformation, which takes the set x t , t = 1, 2, … , T and produces a new set of variables z t , t = 1, 2, … , T as follows: The new set of variables z t have convenient properties.(In what follows, we liberally use the properties of the expectation E(⋅), variance Var(⋅), and covariance Cov(⋅, ⋅) operators, and we assume that the reader is familiar with these properties.) Properties of the Helmert transformation: ∕t, which we prove in the theorem below.
An Auxiliary Theorem.Let x t be a scalar quantity observed over t = 1, 2, … , T periods.Then, where xT = Proof of the Auxiliary Theorem by induction.
so for T = 2 the relationship holds indeed.
Now we assume that the relationship holds for T, and demonstrate that if it holds for T, then it holds for By the induction hypothesis Therefore, what remains to be shown is that Hence, which is indeed equal to (x T+1 − xT ) 2 T∕(T + 1).

□
We will now use the properties of the Helmert transformation to deduce the joint distribution of the sample mean and the sample variance in samples from a normal population.We use the following three facts.First, linear combinations of jointly normal random variables are themselves jointly normally distributed (Cramér 1946, p. 213;Weatherburn 1961, p. 57).Second, for a set of jointly normal random variables, if they are uncorrelated, they are independent as well (e.g.Lancaster 1959;David 2009).There is a subtle point here, the set of variables need to be jointly normal, as one can construct counter examples where the marginal distributions are normal but the joint distribution is not normal, variables are uncorrelated, and yet they are not independent.Lancaster (1959) presents such a counter example, and precisely states the conditions in his Theorem 1. Third, the sum of k independent, squared, standard normal variables is distributed as  2 (k), which is a distribution discovered and described by Helmert (1876). (Helmert (1876) discovered the  2 distribution, however he did not observe that the sample mean and the sample variance are independent, see David (2009).) Main Theorem.The joint distribution of the sample mean and the sample variance in a sample of T i.i.d.random variables x t , t = 1, 2, … , T,eachx t distributed as Normal(,  2 ), has the following properties: -The sample mean xT and the sample variance s 2 are independently distributed.
-The sample mean xT is distributed as Normal(,  2 ∕T).
Proof of the Main Theorem.In the proof we will refer to the listed Properties of the Helmert transformation as Property plus the number under which the property was listed.
-The sample mean xT is a function of z 1 only, and by Property 5, the sample variance s 2 is a function of z t , t = 2, … , T only.ByProperty1,z t , t = 1, 2, … , T are uncorrelated, and by Property 2, z t , t = 1, 2, … , T are jointly normally distributed.Because z t , t = 1, 2, … , T are a set of uncorrelated and jointly normal random variables, they are a set of independent random variables as well (Lancaster 1959, Theorem 1).
Therefore the independence of xT (a function of z 1 only) and s 2 (a function of z 2 , z 3 , … , z T only) follows because z 1 is independent of z 2 , z 3 , … , z T .-The sample mean xT is normally distributed by Property 2. Its mean is given in Property 3, and its variance isgiveninProperty4.Overallx T is distributed as Normal(,  2 ∕T).
where the last equality follows by dividing both sides of Property 5 by  2 .However, t ∕ 2 is the sum of T − 1 squared standard normal variables, and hence is distributed as  2 (T − 1).
The Main Theorem was first proved by Fisher (1915Fisher ( , 1925)), but Fisher's "geometric arguments" are di cult to follow.What we have presented, albeit somewhat lengthy, is an elementary and concrete proof which requires very little abstract thinking.Our proof should be accessible to students with any moderately quantitative background, and should be much preferable to students who are not used to elaborate abstract mathematical thinking, such as beginning undergraduates in statistics, or both undergraduate and graduate students in econometrics.Overall, from the transformed set of variables z t it is very easy to deduce the joint distribution of the sample mean and the sample variance.
We finish this section by stating another property of the Helmert transformation, which we will briefly use in Section 4. Suppose there is another set of variables y t , t = 1, … , T. Then, The proof of this property follows the same steps as the proof of the Auxiliary Theorem and, therefore, is omitted.Note that if y t = x t for all t, this property reduces to Property 5.

The Helmert Matrix
The previous section is self contained, and using the Helmert transformation to deduce the joint distribution of the sample mean and the sample variance does not require any matrix algebra.However if there is need, or desire to do so, one can also relate the Helmert transformation from the previous section to what Lancaster (1965) calls a Helmert matrix in the strict sense.Consider the following matrix We can verify by direct multiplication that the rows of this matrix are orthogonal, that is, HoHo ′ results in a diagonal matrix.We can also consider the rescaled version of Ho where diag(v) is an operator that transforms a vector v into a diagonal matrix, and overall the operation diag(v) ⋅ Ho results in multiplying the ith row of the matrix Ho by the ith element of the vector v.With the choice of the first element in v as 1∕T, we can see by direct multiplication that Hm ′ Hm is symmetric with one element repeated on the main diagonal and another element repeated everywhere off the main diagonal.HmHm ′ is diagonal and is almost the identity matrix, only the upper left element is different from 1, the rest of HmHm ′ coincides with the identity matrix.
If we instead choose the first element in v to be 1∕ √ T, direct multiplication shows that Hn ′ Hn and HnHn ′ are both the identity matrix, and we would call such a matrix Hn an orthonormal matrix.
W es e et h a ti fw ea r r a n g et h es e tx t , t = 1, 2, … , T into a column vector x ≡ ( choose the first element of v to be 1∕T, we will obtain the Helmert transformation from the previous section, z = Hm ⋅ x, where z ≡ . Because of the appealing aesthetics of Hn with the first element in v chosen to be 1∕ √ T, i.e. a choice resulting in an orthonormal matrix Hn ′ Hn = HnHn ′ = Identity, all authors we are aware of use this version of the Helmert matrix.However for our application all we need is that HmHm ′ be a diagonal matrix with all the elements on the main diagonal below the first being equal to 1.In this situation, the elements of the vector z = Hm ⋅ x will be mutually uncorrelated and each element z t , t = 2, … , T will be homoskedastic.Therefore, for our application choosing the first element in the vector v as 1∕T serves perfectly fine.

The Helmert Transformation in Panel Data Models
We have derived the joint distribution of the sample mean and the sample variance with particular focus on the simplicity and concreteness of the derivation, and particular focus on the use of the Helmert transformation.
The Helmert transformation has found its application in the "fixed effects" panel data model too.Consider the standard one way "fixed effects" panel data model (e.g.Wooldridge 2010, Ch.10;Hsiao 2014,Ch.3) where the regressand y it is a scalar, the regressor vector x it is K × 1, y it , x ′ it are i.i.d. for i = 1, 2, … , I,i.e.the variables constitute an i.i.d.random sample in the cross section.The individual "fixed effects" (so called by convention) are time constant, random and potentially correlated with the regressor vector.The idiosyncratic error term  it is uncorrelated with the regressor x j for all t,,i, j, i.e. the regressor x it is strictly exogenous with respect to  it conditional on the fixed effects, and  it is i.i.d.both in the cross section and in the time series dimensions.
Consistent estimation of the parameter vector  in Eq. ( 3) under the assumptions that the fixed effects can be arbitrarily correlated with the regressors, and the regressors are strictly exogenous with respect to  it , conditional on the fixed effects, proceeds by eliminating the fixed effects.The conventional way of eliminating the fixed effects is by the within transformation, firstly averaging Eq. ( 3) across time, to Then we subtract this averaged equation from Eq. ( 3) to eliminate the fixed effects and to obtain the estimating equation (4) Finally, we estimate Eq. ( 4) by ordinary least squares (OLS) over the I ⋅ T pooled observations to obtain the within estimator ) . (5) The within estimator is well studied, well understood and a basic building block in the panel data econometrics literature.However the within transformation introduces strong correlation between the transformed errors in the estimating Eq. ( 4), because all the transformed errors ( it − εiT )foracrosssectionaluniti share the same εiT .This makes residual diagnostic checks and residual analysis awkward and di cult, for example, if we wanted to test that the errors  it are indeed i.i.d.
On the other hand if we apply the Helmert transformation in Eq. ( 1) on each variable in the fixed effects model Eq. ( 3) and for each cross sectional unit i separately, to construct transformed variables corresponding to z 2 , z 3 , … , z T which have mean 0, then we again eliminate the fixed effects  i : We can proceed with the OLS estimation of the Helmert-transformed estimating Eq. ( 6), which is the best linear unbiased estimator, because the errors in the Helmert-transformed estimating equation ( it − εit−1 ) √ (t − 1)∕t are uncorrelated and homoskedastic, (and normal if the original  it were normal to start with).By invoking Property 6 of the Helmert transformation, one can verify that the estimator in Eq. ( 7) coincides with the within estimator in Eq. (4).However, as an added benefit, we can apply any residual diagnostics and checks that we might have in mind, because the Helmert-transformed errors in the estimating equation have the same stochastic properties as the original errors in the structural model.To the best of our knowledge, the existing literature on panel data models has not exploited this simplification in the analysis afforded by the Helmert transformation.
The use of the Helmert transformation in the context of panel models and, in particular, dynamic panel models has been popularized by Arellano and Bover (1995) and Arellano (2003).(See also Alvarez and Arellano 2003.)It is also described in Hansen (2022, Section 17.43).A dynamic panel model is like the one given in Eq. ( 3), but it additionally contains the lagged values of the dependent variable as regressors.For example, if y it (directly) depends only on its own value in the previous period, then the model is (8) Arellano and Bover (1995) suggest to transform the variables in the following way: Observe that this is the same Helmert transformation in Eq. ( 1) but with the variables ordered in the reverse order according to thetimeindex.Arellano and Bover (1995) refer to this transformation as "the forward orthogonal deviation" as opposed to "the backward orthogonal deviation" which is the one displayed in Eq. ( 6).Both forward and backward orthogonal deviations produce uncorrelated and homoskedastic errors in the transformed model, provided that the original errors  it are i.i.d.However, the forward orthogonal deviation has the following advantage in the dynamic panel model.When transforming the variables to remove the fixed effects, it introduces correlation between the transformed lagged dependent variable and the transformed error term.Therefore, one needs to use an instrumental variables estimator to obtain consistent estimates of  and .With the forward orthogonal deviation, the past values of the dependent variable y i1 , … , y it−1 are valid instruments for (y it−1 − ȳit ) √ (T − t − 1)∕(T − t), which is the new lagged dependent variable after the variable transformation.Furthermore, Hayakawa (2009aHayakawa ( , 2009b) ) shows that the instrumental variables estimator is more e cient if the instruments themselves are constructed using backward orthogonal deviations.

Conclusion
We revisit the Helmert transformation, and we provide a simple and useful induction-based derivation of the joint distribution of the sample mean and the sample variance in samples from independently and identically distributed normal random variables.Our derivation is concrete and should be appealing to students of statistics and econometrics.We also suggest one fruitful application of the Helmert transformation in panel data econometrics -residual based tests in fixed effects models.We briefly review the applications of the Helmert transformation in panel data context, where the transformation is more commonly known as "the forward/backward orthogonal deviations operator".
where without loss of generality we take >t > 1. (Property 1 does not require normality; an i.i.d.distribution of the variables would be su cient.) 2. z t , t = 1, 2, … , T are linear combinations of x t , t = 1, 2, … , T (which we have assumed, are jointly normally distributed) and hence z t , t = 1, 2, … , T are jointly normally distributed as well.3. Ez 1