Bootstrap methods for inference in the Parks model

This paper shows how to bootstrap hypothesis tests in the context of the Parks’s (1967) Feasible Generalized Least Squares estimator. It then demonstrates that the bootstrap outperforms FGLS(Parks)’s top competitor. The FGLS(Parks) estimator has been a workhorse for the analysis of panel data and seemingly unrelated regression equation systems because it allows the incorporation of cross-sectional correlation together with heteroskedasticity and serial correlation. Unfortunately, the associated, asymptotic standard error estimates are biased downward, often severely. To address this problem, Beck and Katz (1995) developed an approach that uses the Prais-Winsten estimator together with “panel corrected standard errors” (PCSE). While PCSE produces standard error estimates that are less biased than FGLS(Parks), it forces the user to sacrifice efficiency for accuracy in hypothesis testing. The PCSE approach has been, and continues to be, widely used. This paper develops an alternative: a nonparametric bootstrapping procedure to be used in conjunction with the FGLS(Parks) estimator. We demonstrate its effectiveness using an experimental approach that creates artificial panel datasets modelled after actual panel datasets. Our approach provides a superior alternative to existing estimation options by allowing researchers to retain the efficiency of the FGLS(Parks) estimator while producing more accurate hypothesis test results than the PCSE. (Published in Special Issue Recent developments in international economics) JEL C13 C15 C23 C33


Introduction
Feasible Generalized Least Squares (FGLS) estimator was designed as an efficient estimator for systems of equations with both serially and contemporaneously correlated disturbances. 1 Such models include the SUR model and associated restricted forms, such as time-series, cross-section/panel data models. Its superior efficiency is well established (Kmenta and Gilbert, 1970;Guilkey and Schmidt, 1973;Messemer, 2003;Chen et al., 2010;Moundigbaye et al., 2018). It has been widely used and is available in many econometric software packages including RATS, SHAZAM, SAS, and Stata. 2 Kmenta and Gilbert (1970) were the first to note that the estimated standard errors from FGLS (Parks), while consistent, can be substantially biased in finite samples. More recently, Beck and Katz (1995) documented that the estimated standard errors for the FGLS (Parks) estimator have severe downward bias when the time dimension is small relative to the number of cross-sections. 3 To address this deficiency, they recommend using a Prais-Winsten estimator with corresponding "panel-corrected standard errors" (PCSE).
This approach has been widely adopted, as evidenced by more than 6,600 Google Scholar citations. It remains a popular estimation choice. While the PCSE approach generally involves less size distortion than FGLS (Parks) with asymptotic standard errors, it does not eliminate it (cf. Reed and Webb, 2010). This paper demonstrates that the combined use of the FGLS (Parks) estimator with bootstrapping constitutes an approach that is superior to the PCSE approach in both estimator efficiency and inference accuracy. 4 The remainder of the paper is organized as follows. Section 2 briefly introduces the SUR model with autoregressive errors and associated FGLS(Parks) estimator. Section 3 presents a nonparametric bootstrap procedure for the FGLS (Parks) estimator. Section 4 demonstrates the superior performance of the bootstrap procedure. It employs an innovative experimental approach where testing is performed on synthetic datasets designed to resemble actual datasets. Section 5 concludes.

The SUR model and the FGLS(Parks) estimator
The FGLS (Parks) estimator was constructed as an efficient estimator for the SUR model with autocorrelated disturbances, = + , = 1,2, … , , _________________________ 1 For a sampling of the extent of heteroskedasticity, cross-sectional correlation, and serial correlation in some widely used panel datasets, see Reed and Ye (2011). 2 The FGLS (Parks) estimator provides the framework for Stata's xtgls procedure.
3 To see how the downward bias in FGLS (Parks) standard errors translates into coverage rate performance, refer to Moundigbaye et al. (2018). 4 For a comparison of efficiency of the FGLS (Parks), PCSE and the OLS estimators, see Moundigbaye et al. (2018).
Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4) www.economics-ejournal.org 2 where and are × 1 vectors, is × , and is × 1. The N equations can be stacked and represented in compact form as, It is assumed that the disturbance vector, ( ) = ( 1 , 2 , , … , , )′, is generated by a stationary, first-order autoregressive process, where is an × , diagonal matrix consisting of scalars having absolute value less than 1. Consistent with stationarity, the disturbances for the first observation, (1) , are assumed to be generated by (1) = −1 (1) . 6 The innovations, ( ) , = 1,2, … , are independent and identically distributed random variables with � ( ) � = 0 and covariance matrix In summary, the covariance model above assumes a diagonal matrix, with N parameters specifying equation-specific, first-order serial correlation; together with a symmetric matrix with ( +1) 2 parameters, specifying the contemporaneous covariances.
The FGLS(Parks) estimator is represented by and � �̂( where � is the Prais-Winsten transformation matrix and � is the estimate of in Equation (4). Estimation of individual parameters is described in Judge et al. (1985, pages 485-490). Further details are given in the Appendix. Note that the GLS estimator for the Parks model, ̂( ) , is identical to Equation (5) except that � and � are replaced with their population analogs. _________________________ 5 To clarify notation, notice that the vector contains the T disturbances for the ith equation whereas the vector ( ) contains the N disturbances for the different equations or cross-sectional elements at time t. 6 Appendix 1 of Guilkey and Schmidt (1973) and Judge et al. (1985, p. 485-487) show how to obtain the elements of the matrix A. If we let 0 = � ( ) ( ) ′� for = 1,2, … , then from (3) and stationarity, 0 = 0 ′ + . Guilkey and Schmidt show that this equation can be solved for the elements of 0 in terms of the elements of and . From (1) = −1 (1) we have 0 = −1 ( −1 )′ or = 0 ′. If H and B are the Cholesky factors of and 0 , respectively, then ′ = ′ ′ and = −1 .

Hypothesis testing: the bootstrap versus asymptotic-based tests
Asymptotic-based tests. A common approach for testing linear hypotheses of the form 0 : = involves the Wald statistic, where the restriction matrix R has q rows (the number of restrictions) and K columns (where = ∑ =1 ). 8 The test statistic, , is asymptotically distributed as 2 . As noted above, there is ample evidence that asymptotic-based tests for the FGLS(Parks) model do not provide accurate inference. Rejection probabilities tend to be substantially in excess of their nominal levels and confidence intervals are too small.
Bootstrap methods can improve upon asymptotic-based tests. Horowitz (1997) and others have provided extensive surveys of the bootstrap literature. Horowitz (1997, p. 201) gives a succinct statement of the key bootstrap results: "The bootstrap provides a higher-order asymptotic approximation to critical values for tests based on "smooth" asymptotically pivotal statistics. When a bootstrap-based critical value is used for such a test, the difference between the test's true and nominal levels decreases more rapidly with increasing sample size than it does when the critical value is obtained from firstorder asymptotic theory. Given a sufficiently large sample, the nominal level of the test will be closer to the true level when a bootstrap critical value is used than when a critical value based on first-order asymptotic theory is used." _________________________ 7 While FGLS (Parks) and PCSE are widely used, there are alternatives available. For a comparative evaluation of the performance of the FGLS (Parks) and PCSE estimators with some alternatives, see Moundigbaye et al. (2018). 8 The bootstrapping approach that we propose is not limited to linear hypotheses. Suppose that we want to test the set of non-linear hypotheses ( ) = 0 where the non-linear function has dimension q. The �̂�. This Wald statistic has the same asymptotic distribution as its linear analog, (9), and bootstrap testing would follow the same steps as those outlined below.
Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4) www.economics-ejournal.org 4 Since the Wald statistic, , is asymptotically pivotal, 9 bootstrap approaches will provide improved accuracy compared with tests that rely on the asymptotic distribution.
Bootstrap Procedure. Our approach builds on Rilstone and Veall (1996). They demonstrated improved performance for confidence intervals based on a parametric bootstrap in the context of a simple SUR model without serial correlation. Their paper helped to shift the focus of bootstrap work toward test statistics and away from standard errors, based on then-recent theoretical work on the bootstrap. An important contribution of this paper is the development of a non-parametric bootstrap for the more complicated case of a SUR model with serially correlated errors.
Below we give the steps for implementing a bootstrap test of the null hypothesis, 0 , in the context of a SUR model with AR(1) disturbances (i.e., the Parks model). Although the results that we show are based on a nonparametric bootstrap, it is useful for explanatory purposes to show how the nonparametric method differs from the simpler parametric method. 10 STEP 1: Estimate from the unrestricted model and compute the test statistic, , from (9). Call this test statistic �.
STEP 2: Re-estimate the model under the restrictions imposed by the null hypothesis, = , to obtain , � � , , � and ̃. For the nonparametric bootstrap, we also need � , the × matrix of residuals based on the constrained estimates.
STEP 3: Use the restricted estimates , � � , and � as the parameters of the data generating process described by Equations (1) through (4) above, together with the restriction, = , to generate a bootstrap sample satisfying the hypothesis to be tested.
For a parametric bootstrap, one might start with the following: But for the nonparametric bootstrap we rely on information contained in the collection of constrained residuals, � . Using the restricted parameter estimates from STEP 2, make the following set of transformations to obtain a set of residuals, � � ( ) �, that can be treated as the starting point of the data generating process (DGP). Let � ( ) =̃( ) − �̃( −1) for = 2,3, … , and � (1) =̃̃( 1) . Then let � ( ) = � −1 � ( ) , where � is the Cholesky factor of � .
For the bootstrap samples to satisfy the null hypothesis, the � � ( ) � should have row means of zero, and they should be uncorrelated. If those conditions are not met, one can whiten the residuals by first subtracting row averages from each of the corresponding elements of � � ( ) � to get the centered c. Correlation among the c rows can be eliminated by premultiplying c by the transposed inverse of the Cholesky factor of ′ . Hereafter, references to the set of residuals, � =� � ( ) � assumes that they have been whitened.

_________________________
9 With regard to the Wald statistic being asymptotically pivotal, Horowitz (1997) writes, "The arguments in Section 2a show that the bootstrap provides higher-order asymptotic approximations to the distributions and critical values of 'smooth' asymptotically pivotal statistics. These include test statistics whose asymptotic distributions are standard normal or chi-square." Note that Wald statistics converge in distribution to chi-square, whose only parameter is the degrees of freedom. Hence the Wald statistic is asymptotically pivotal because it does not depend on parameters of the model's data generating process. 10 While not reported here, we also evaluated inferential performance using the parametric bootstrap. The parametric bootstrap performed marginally better than the non-parametric bootstrap. Results are available from the authors.

Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4)
www.economics-ejournal.org 5 STEP 3a*: For a nonparametric bootstrap, we draw a sample of T � � ( ) � vectors with replacement from this empirical distribution to form the columns of U. It is the nonparametric counterpart to U in Step 3a above, where the sampling was from a known distribution.
From this point onward, the remaining steps for both parametric and nonparametric bootstraps are the same.
STEP 4: Estimate parameters for the unconstrained model from the first bootstrap sample (b=1), compute the test statistic, g 1 , and store it.
STEP 5: Repeat the process of generating a bootstrap sample, estimating the model, and computing the test statistic until one has B bootstrap samples and test statistics 1 , 2 , … , . Davidson and MacKinnon (2004) recommend choosing B such that when is the level of significance of the test, the product ( + 1) is an integer, e.g. = 999. Estimate the -level critical value for the test, , as the (1 − )th quantile from the empirical distribution of the 's. STEP 6: Reject 0 at nominal level if the test statistic computed from the original sample, � satisfies � > . Alternatively compute a p-value from STEP 5 as the fraction of the bootstrap samples with > �. The above procedure can be suitably modified at STEPS 1 and 3 to deal with test statistics that depend on estimates of the restricted model (Lagrange multiplier tests) or on estimates of both restricted and unrestricted models (Likelihood ratio tests).

Results from Monte Carlo experiments
Description of experiments. In this section we perform a series of Monte Carlo experiments to assess the performance of the bootstrap procedure described above. To do that, we construct synthetic panel datasets that are made to "look like" real datasets. The reason we do that is because the Parks model has � 2 +3 2 � unique parameters in the error variance-covariance matrix. There is little guidance in how best to assign values to these parameters for the purpose of designing meaningful experiments. Beck and Katz's simulation experiments are based on a substantial simplification of the error variance-covariance matrix. For example, when N=15, they reduce the number of unique parameters from 135 to 3 by assigning half of the panel units one variance value, the other half another, and setting all cross-sectional correlations the same. Further, their experiments do not Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4) www.economics-ejournal.org 6 allow for the interaction between cross-sectional and serial correlation. 11 The problem with this approach is that it greatly reduces the realism of the error variance-covariance matrix. This raises concerns about the external validity of the associated simulation results. In contrast, we adopt an innovative approach that assigns a unique value to every element in the error variancecovariance matrix, along with the values of the independent variable(s) in the experimental DGPs. We derive these values from actual panel datasets. The details of how we do this are given below. The first dataset we work with is Grunfeld's (1958) investment data, one of the most widely used panel datasets in applied econometrics (Kleiber and Zeileis, 2010) 12 . The dataset consists of annual observations of three variables for 10 U.S. firms over a 20-year period (1935)(1936)(1937)(1938)(1939)(1940)(1941)(1942)(1943)(1944)(1945)(1946)(1947)(1948)(1949)(1950)(1951)(1952)(1953)(1954). 13 The dependent variable is firm gross investment in plant and equipment (I). The two explanatory variables are the market value of the firm at the end of the previous year (F) and a capital stock measure (C).
Our procedure for creating the synthetic datasets is best illustrated by example. Suppose we want to generate a synthetic dataset that "looks like" the Grunfeld data, except that it has dimensions N=2 and T=20. We begin by extracting the data for the first two firms in the Grunfeld dataset. In this case, the data matrix consists of a constant term and the variables F and C. The twenty, time series values of the independent variables (F and C) are set equal to their actual values in the Grunfeld data. To generate artificial values for the dependent variable I, we multiply by a coefficient vector whose elements are obtained by regressing the I on X and then conforming it to the type of restriction imposed.We then add simulated error terms.
The DGP for the simulated error terms is constructed so that the errors have the same nonspherical properties as residuals from a regression of the Grunfeld data. Specifically, we estimate Equation (2) using SUR and collect the residuals. These residuals are used to estimate the elements of and . The estimated values are then set as the population values for the error variance-covariance matrix in the DGP that produces the simulated error terms. The simulated error terms are added to to produce simulated values of the dependent variable I. By generating a new set of error terms, multiple synthetic datasets having dimensions N=2 and T=20 can be produced, each of which is constructed to have characteristics similar to the Grunfeld data. We then use these synthetic panel datasets to run experiments testing three linear restrictions, each having the form = 0: (10.a) 1 = [0 1 0 0 0 0 ], _________________________ 11 Beck and Katz (1985, pages 640f.): "Varying degrees of heteroscedasticity were simulated by setting the variance of the first half of the units to 1 while the variance of the second half of the units was experimentally manipulated. The covariance matrix of this multivariate distribution was constructed so that all pairs of units were equally correlated, with the degree of correlation also experimentally manipulated. Errors were then generated so that the variances and covariances of the errors were proportional to the corresponding variances and covariances of the independent variable. The errors could therefore show panel heteroscedasticity and contemporaneous correlation, either alone or in combination." 12 Greene (2012, page 342) writes, "The Grunfeld investment data…are a classic data set that have been used for decades to develop and demonstrate estimators for seemingly unrelated regressions." 13 We use the version of the dataset that appears in Hill, Griffiths, and Lin (2008). Our experiments are designed so that the respective null hypotheses are always true. We chose these three restrictions because they each represent a common type of hypothesis test found in empirical research.
(10.a)-(10.c) are easily modified to allow for different numbers of firms, N, in the synthetic, Grunfeld-type panel datasets. For these datasets, restriction matrices will have 3N columns. The analogs to (10.a)-(10.c) for an alternative N value are identical in the first six columns, with zeros in the remaining 3 − 6 columns.
Results from synthetic panel datasets modelled after the Grunfeld data. The first three rows of the top panel of Table 1 (T=20) report results of Monte Carlo experiments based on the Grunfeld data with N=5, testing each of the restrictions in (10.a) -(10.c). Each experiment consists of 500 replications. The first column reports 5% critical values for the 2 distribution with 1 degree of freedom (Restrictions 1 and 2) and 2 degrees of freedom (Restriction 3). The next column reports the average critical values determined by the nonparametric bootstrap procedure described in Section III above. Note that these are average critical values because a critical value is produced for each replication, and the table summarizes the results from 1000 replications.
For example, in testing the significance of the coefficient for F in the equation for the first firm (Restriction 1), the 2 critical value with one degree of freedom is 3.841. This compares to an average critical value of 8.619 for the bootstrap procedure. While this is only one experiment, these results are qualitatively what we would expect: estimated standard errors from the FGLS(Parks) estimator are well-known for being biased downwards in finite samples, implying that that the 2 critical values will be too small. A similar pattern emerges when testing Restrictions 2 and 3.
The next set of three rows repeats the experiments, except now the full set of 10 firms is used in creating the synthetic Grunfeld panel datasets. Note that the addition of data for the extra firms does more than just increase the sample size. It introduces a new set of variances and covariances, increasing the number of unique elements in the error variance-covariance matrix from 20 to 65. This exacerbates the bias in the FGLS(Parks) standard errors. While the 2 critical values are unchanged, the bootstrapped critical values increase to reflect the greater imprecision caused by having to estimate additional parameters.
The lower panel of Table 1 (T=11) repeats the previous experiments, but this time only uses the first 11 years of the Grunfeld data. The reason for doing this is that the finite sample bias in the FGLS(Parks)'s standard errors is known to increase as a function of ⁄ (Moundigbaye et al., 2018). We investigate this by decreasing T to where it is just larger than the number of firms Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4) www.economics-ejournal.org 8 (T=11, N=10), noting that the FGLS(Parks) estimator cannot be estimated when < . 14 The six rows of the lower panel report the results for N=5 and N=10. The results again correspond to expectations. Compared to their values in the top panel, the smaller ⁄ values are associated with larger bootstrapped critical values. Note that the 2 critical values are unchanged. Table 2 calculates the Type I error rates associated with the critical values in Table 1. These should ideally equal 0.05, though some deviation is expected due to sampling error. Column (1) reports Type I error rates associated with using the GLS estimator and critical values from the 2 distribution. This provides a benchmark for the subsequent estimators. Column (2) reports error rates when the FGLS(Parks) estimator is used; i.e., when the population parameters of the error variance-covariance matrix are replaced with their estimates and hypothesis testing relies on critical values from the 2 distribution. Column (3) continues to use the FGLS (Parks) estimator, but applies critical values from the bootstrapping procedure described in STEP 6 above. The last column reports rejection rates associated with the PCSE estimator. In all cases, hypotheses are rejected whenever the sample statistic is greater than the critical value for a given replication. The values in the table report rejection rates for the 500 replications for each experiment.

_________________________
14 In principle, the FGLS(Parks) estimator can be calculated when T=N. However, we found that we sometimes encountered problems in our simulations in this case, so we set the lower bound of T=N+1.
Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4) www.economics-ejournal.org 9 There are a number of noteworthy results here. First, rejection rates for the FGLS(Parks) estimator using 2 critical values range from 0.166 to 0.624. In other words, if we were to use a 5% significance level, we would reject the true null hypothesis anywhere from 17% to 62% of the time. Further, as foreshadowed above, performance deteriorates markedly as N increases holding T constant (within each panel in the table), and as T decreases holding N constant (from top panel to bottom panel). In contrast, the bootstrap procedure does much better. Rejection rates for the bootstrap range from 0.018 to 0.060, close to the 0.05 benchmark.
The last column provides a comparison with Beck and Katz's PCSE estimator. As noted in the introduction, the PCSE procedure has been promoted as producing standard errors, and associated test results, that are superior to the FGLS(Parks) estimator --though at some cost in efficiency. Indeed, the improved performance of the PCSE over the FGLS(Parks) estimator with asymptotic standard errors is evident by comparing rejection rates in Column (4) and Column (2), respectively. It is also evident, however, that it performs substantially worse than the bootstrap procedure. A comparison of Column (4) with Column (3) shows that rejection rates for the PCSE are further from the 0.05 benchmark than those for the bootstrap in all 12 experiments.
The last row in each panel reports mean and standard deviation column values across the different experiments for T=20 and T=11, respectively. It provides a crude measure of overall performance, with values closer to 0.05 indicating better overall performance. The bootstrap Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4) www.economics-ejournal.org 10 procedure demonstrates superior inferential performance over both the PCSE estimator and the FGLS(Parks) estimators. In the T=20 experiments, the mean rejection rates are 0.044 versus 0.154 and 0.276, respectively. In the T=11 experiments, they are 0.039 versus 0.199 and 0.415. Results from synthetic panel datasets modelled after additional datasets. In this next section, we perform further performance tests. The goal is to investigate whether our bootstrap procedure continues to perform well when tested on synthetic datasets very different from those based on the Grunfeld data. Whereas the Grunfeld data related a firm's investment to its market value and capital stock, the next four datasets we work with relate (i) foreign aid and real per capita GDP growth for a set of least developed countries (LDCs) from 1960-2000; (ii) tourism and crime in Italian provinces from 1985-2003; (iii) taxes and growth in annual real per capital Gross Domestic Product (GDP) for a large cross-section of countries from 1961-2000; and (iv) taxes and growth in annual real per capita Personal Income data (PCPI) from US states from 1960-1999. We chose these studies because, in addition to offering a sharp contrast to Grunfeld, the associated data are strongly balanced, allow a relatively large number of N and T combinations, and were readily available.
The first of these studies was published by Bruckner in 2013 in the Journal of Applied Econometrics. It estimates the effect of real per capita GDP growth on the growth in development aid for 44 countries over 25 years. Accordingly, our simulated datasets are comprised of these two variables. 15 , 16 We use the maximum number of time periods (T=25) while allowing N to take on the values 5, 10, 15, 20, and 24 across the different experiments.
The second study was published by Biagi, Brandano, and Detotto in Economics E-Journal in 2012. It studies the effect of tourism on crime in 95 Italian provinces over a period of 18 years. In addition to the dependent variable measuring crime and the key explanatory variable measuring number of tourists, it includes control variables for economic growth, the level of income, the unemployment rate, population density, a measure of educational attainment, and a measure of criminal "deterrence." Accordingly, our corresponding, simulated datasets consist of eight variables. 17 , 18 We again used the maximum number of time periods (T=18) while allowing N to take values equal to 5, 10, 15, and 17 across the different experiments.

_________________________
15 The regressions underlying our hypothesis tests are modelled after the regression reported in Table I For larger N, the R i are identical in the first four columns, with zeros in the remaining 2N-4 columns. 17 The regressions underlying our hypothesis tests are modelled after the regression reported in Table I, Column 1 on page 13 of Biagi et al. (2012).
18 The corresponding restriction matrices for N=2 are given by: For larger N, the R i are identical in the first sixteen columns, with zeros in the remaining 8N-16 columns.
Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-4) www.economics-ejournal.org 11 The final two studies are datasets that were used in the Monte Carlo simulations of Reed and Ye, published in Applied Economics in 2011. The datasets consist of only two variables, a tax variable and an economic growth variable. 19 In both cases, we use the maximum number of time periods (T=40) while allowing N to take values equal to 5, 10, 15, 20, and 25. In constructing synthetic panel datasets to resemble these additional datasets, we followed the same procedure that we described for the Grunfeld data. Table 3 repeats the analysis of Table 2, focusing on the Type I error rates associated with testing restrictions R 1 , R 2 , and R 3 . Panel A reports on the experiments using the synthetic datasets derived from the Bruckner data. As was the case with the Grunfeld datasets, hypothesis tests using the FGLS(Parks) estimator and 2 critical values (Column 2) generally perform poorly, with rejection rates ranging from 0.114 to 0.764. The latter value is not exceptional for the FGLS(Parks) estimator (see, for example, Table 2 in Beck and Katz, 1995;and Figures 5 and 6 in Moundigbaye et al., 2018).
The bootstrap (Column 3) again does much better, producing type I error rates that have a mean rate of 0.030 with a standard deviation of 0.010. The PCSE approach (Column 4) does slightly better than the bootstrap with this data set. Rejection rates have a mean value across all experiments of 0.044 with a standard deviation of 0.011. The fact that the PCSE can, in some circumstances, do very well, is not surprising (see, for example, Table 5, Columns 3 and 4 in Moundigbaye et al., 2018).
Panel B of Table 3 repeats the comparison, this time using synthetic panel datasets derived from the Biagi et al. (2012) data on Italian crime rates. The results for the FGLS(Parks) estimator with 2 critical values are similar to previous results, with a mean Type I error rate of 0.516. The corresponding Type I error rates for the PCSE approach are unacceptably large, as they were in Table 2, with a mean rejection rate of 0.234. The bootstrap procedure performs substantially better, with a mean rejection rate of 0.039.
The last two panels report results for the experiments using synthetic datasets on taxes and economic growth modelled on cross-country and US state data, respectively. Once again, the bootstrapped FGLS estimates dominate. In Panel C, the mean rejection rate for the bootstrapped critical values is 0.039 with a standard deviation of 0.012. This compares to a mean of 0.065 and a standard deviation of 0.027 for the PCSE estimator. In Panel D, the difference is more pronounced. The mean rejection rates for FGLS(Bootstrapped) and PCSE are 0.050 and 0.139, with the bootstrapped rejection rates being more tightly clustered around its mean value. Note: N and T correspond to the number of cross-sectional units and time periods, respectively. Restrictions 1 through 3 are described in the text related to discussion of Equations (10.a) through (10.c). Type I error rates report the percent of 500 Monte Carlo experiments where the null hypothesis was rejected.

Conclusion
Although the FGLS(Parks) estimator has desirable efficiency properties both asymptotically and in finite samples; its poorly estimated standard errors limit its usefulness for inference. As a result, Beck and Katz's (1995) PCSE approach has been widely adopted as an alternative. Unfortunately, the PCSE approach, while reducing size distortion, does not entirely eliminate it; and it is based on a Prais-Winsten estimator that is less efficient than the FGLS(Parks) estimator. This paper develops a non-parametric bootstrap procedure for testing hypotheses using the more efficient FGLS(Parks) estimator. We illustrate the bootstrap's use in a number of experiments where we simulate panel datasets to "look like" real datasets. We show that the bootstrap procedure performs well with these data. While the PCSE approach sometimes also performs well, the bootstrap usually performs better, often substantially better.
Up to this point, researchers working with panel datasets where the number of time periods is larger than the number of cross-sections have had to give up the efficiency of the FGLS (Parks) estimator to obtain greater accuracy in hypothesis testing using PCSE. The bootstrapping procedure presented here allows researchers to retain the efficient FGLS (Parks) estimator and to have test results that are generally more accurate than those offered by the PCSE approach.