A replication of “A quasi-maximum likelihood approach for large, approximate dynamic factor models” (Review of Economics and Statistics, 2012)

Abstract The authors replicate and extend the Monte Carlo experiment presented in Doz, Giannone and Reichlin (A Quasi-Maximum Likelihood Approach For Large, Approximate Dynamic Factor Models, Review of Economics and Statistics, 2012) on alternative (time-domain based) methods for extracting dynamic factors from large datasets; they employ open source software and consider a larger number of replications and a wider set of scenarios. Their narrow sense replication exercise fully confirms the results in the original article. As for their extended replication experiment, the authors examine the relative performance of competing estimators under a wider array of cases, including richer dynamics, and find that maximum likelihood (ML) is often the dominant method; moreover, the persistence characteristics of the observable series play a crucial role and correct specification of the underlying dynamics is of paramount importance.


Introduction
A central empirical finding from the use of dynamic factor models (DFMs) in applied macroeconomics is that a few factors can explain a large fraction of the total variance of many macroeconomic series. The factor structure is typically used for signal extraction and forecasting, although in some cases maximum likelihood-based methods can be used to test hypotheses of economic interest. Extensive surveys of the DFM literature can be found in Bai and Ng (2008), Stock and Watson (2011) and Stock and Watson (2016).
Maximum likelihood estimation for DFMs with a high-dimensional panel of time series becomes rapidly infeasible given the large number of parameters. However, in Doz et al. (2011) a solution was put forward to overcome this problem via a two-step hybrid approach that links the simplicity and speed of principal components to the efficiency of the Kalman smoother; later, Doz et al. (2012) refined this technique by considering quasi-maximum likelihood estimation where factor estimates are computed iteratively using the Kalman smoother on the state-space representation via the EM algorithm.
Both approaches are robust to cross-sectional misspecification, time-series correlation of the idiosyncratic components, and non-Gaussianity. Since then, state-space based maximum likelihood methods have been increasingly adopted in the applied literature, e.g. Luciani (2015) and Scotti (2016), among others. The asymptotic properties of EM estimation of these models were recently investigated in Luciani (2019a, 2019b) who also provide simulation-based evidence.
In this paper, we replicate and extend the Monte Carlo experiment presented in Doz et al. (2012) on the suitability of ML for extracting dynamic factors from large datasets, by using different software, a larger number of replications and a wider set of scenarios. We find that most of the statements in the original article are supported, but the relative quality of full ML estimation, compared to the alternative procedures proposed in the same paper, could be substantial in some cases, whereas in other settings it may not be so decisive to outweigh the extra computational cost.
For our replication exercise, we used the DFM package for gretl (see Lucchetti and Venetis (2019)), which implements three estimators for DFMs, namely the benchmark principal components estimator (PC ), the two-step Kalman smoother based estimator (TS ) put forward by Doz et al. (2011) and the quasi-maximum likelihood EM based estimator (ML ) by Doz et al. (2012).
The paper is organised as follows: section 2 describes the data generating process, section 3 offers a narrow sense replication of the Monte Carlo study in Doz et al. (2012) and section 4 extends (wide sense replication) the experiment, in a direction which should address common concerns of practitioners. Section 5 briefly concludes the paper by summarising our findings.

Setup of the experiment
The models that we consider can be written in state-space representation as (1) where x t is a vector of N standardised observable variables 1 and f t is the q-element vector of (unobserved) common dynamic factor; the term χ t is the so-called "common component" and the shocks to the observation Equation (1), e t , are known as the idiosyncratic component, and are assumed to be uncorrelated with f t at all leads and lags. It is assumed that the elements of e t are weakly correlated both cross-sectionally and serially, so that the factors f t summarise the most relevant cross-covariance properties of the variables. Both processes f t and e t are assumed to be second-order stationary. As for s, the lag order in the observation Equation (1), the original study we are replicating only considers a static version of the observation Equation (1) (that is, s = 0). Part of the added value of the present study is to remove this limitation in favour of the fully generalised version with s ≥ 0.
The lag order s in Equation (1) is assumed to be finite here. A sizeable body of literature (mainly by Forni, Hallin, Lippi and several co-authors) has analysed the so-called generalised DFM, in which this assumption is relaxed, and estimation is typically carried out via spectral methods. See Forni et al. (2000Forni et al. ( , 2015Forni et al. ( , 2017. Although most of the literature on DFMs (including the article we are replicating) pays little attention to the issue of identification of the model above, it should be mentioned that the reduced form of Equations (1)-(2) is a VARMA process, for which notoriously many observationallyequivalent representations exist. A comprehensive analysis of the identification problem in the context of DFMs, with different approaches to its solution, is given in Bai and Wang (2015). However, while the existence of several observationally-equivalent parametrisations for the same DGP are a serious problem for parameter estimation, forecasting is not affected by the existence of several observationally equivalent parametrisations (see for example Lütkepohl (2006), page 290). Since our focus is on the recovery of the space spanned by the dynamic factors f t , the under-identification problem is not an issue.
In accordance with Doz et al. (2012), we consider the following data generating process in our Monte Carlo study: the idiosyncratic shocks were generated as where D is a diagonal matrix. 2 We call d i its i-th diagonal entry. As for the matrix T , where χ i,t = λ 0,i f t + ... + λ s,i f t−s denotes the i th common component. The Toeplitz matrix T is the covariance matrix of the vector v t , so the parameter τ controls for the amount of crosscorrelation in the idiosyncratic errors, Corr (e i,t , e j,t ) = τ |i− j| . The parameter β i = Var(e i,t ) Var(x i,t ) denotes noise variance proportion for the i−th variable and is drawn from the uniform distribution . When u = 0.5, "observables" x i,t have cross-sectionally homoskedastic idiosyncratic components. The exact factor model corresponds to the case τ = 0 and D = 0. The elements of the loadings matrices Λ i were drawn from a standard Normal distribution. 3 The above data generating process is considerably broader than the one adopted by Doz et al. (2012), which can be derived as a special case of Equations (1)-(5). Similar experiments have also been performed by Breitung and Tenhofen (2011) and Bai and Li (2016). 4 The aim of the experiment is to carry out a comparison between estimators on their ability to reconstruct the latent factors. As is well known, estimated factors (and factor loadings) are rotations of the underlying latent factors (and underlying latent loadings respectively), so the comparison will be based on a statistic that measures the closeness of the space spanned by the "true" simulated factors f t and their estimatorsf t . The three estimators considered are principal components (PC), the two-step estimator (TS) and the maximum likelihood estimator (ML). In the latter two cases, by the term "estimator", we refer to the smoothed estimates of the state vector f t in (2)).
If we denote by F andF the matrices holding f t andf t , respectively, the statistic we use is the trace R 2 , that can be written as, 5 The notation F andF is not to be mistaken for the static factors matrix. For s = 0, matrix F holds simulated f t whileF holds either the r = q static factorsf t,PC estimated by PC or the q dynamic factorsf t,T S orf t,ML depending on the estimation method. For s > 0 matrix F holds f t , f t−1 , ..., f t−s whileF holds the r = q(s + 1) static factors estimated by PC orf t ,f t−1 , ...,f t−s when TS or ML estimation has been employed. The TR statistic tends to 1 as the canonical correlation between estimated and true factors increases. On the other end, values of T R close to 0 indicate high discrepancies between the space spanned by the actual and the estimated factors. Thus, it is an appropriate quantitative measure since common factors are identified only up to a rotation. 6 In addition, as an anonymous reviewer pointed out, a measure which is equally important to report is the MSE between the estimated common component and the simulated one since it is not affected by identification issues. At each replication b = 1, ..., 5000, we compute the following 3 Again, this choice was made for conformity to the original article. As noted by Poncela and Ruiz (2020), drawing from other distributions such as the Uniform distribution, so as to ensure that the factors are truly pervasive, could perhaps be more appropriate. An online appendix offers a table with more elaborate factor loadings for the case s > 0 but we find no significant differences from the normal distribution case. The same is true for results pertaining to the dynamically simpler case of s = 0. 4 In Forni et al. (2018) a comparison between alternative modelling techniques is also performed, but with a focus on forecasting. 5 It satisfies 0 ≤ T R ≤ 1. 6 We are using this measure of the ability to reconstruct the factor space in accordance with the original article, but we would like to stress that this is a purely descriptive statistic and we are not in the position to judge if differences between methods are statistically significant. standardized mean squared errors (MSEs), and then we calculate the average relative MSE of ML over either method=PC or TS or Methods 4 and 5 as reported in the extended replication section

Replication of the original results
The first exercise we perform is a narrow sense replication exercise, in which the relative performance of three different methods for factor extraction is investigated. The original experiment assumed s = 0 and p = 1 as well as A 1 = α · I q , D = d · I N with α, d being scalar-valued. The parameter u is set to 0.1 in all experiments, so that the idiosyncratic components e t can have substantially different variances and the observable series x t exhibit different signal-to-noise ratios.
The three Tables 2, 4, 6 present a reproduction of all Monte Carlo results from the original study of Doz et al. (2012). The original study's results are shown in Tables 1, 3, 5 and are placed side-by-side in all tables with our replicating study's corresponding estimates. The data generation process (DGP) is exactly the same as that of the original paper, the only difference is that we ran the Monte Carlo experiment with 5000 replications, instead of 500.
The tables contain several subtables, each of which has different values of T (the sample size) for each row and of N (the number of observable variables) for each column. The first subtable contains the average of the T R statistic for the ML estimator and the second one the average number of EM iterations needed to achieve convergence, as a rough proxy for computational speed. The remaining three subtables display the relative performance of the PC and TS estimators compared to the ML estimator, measured by the relative trace ratios T R ML /T R PC , T R ML /T R T S . 7 As can be seen, results are qualitatively identical and quantitatively very close to the original article: the maximum likelihood estimator appears to have a slight edge on the other two methods, especially so when the idiosyncratic disturbances e t are closer to the "ideal" factor models (that is, both their cross-sectional correlation and autocorrelation are 0). It is noteworthy that the relative advantage of ML tends to vanish as either the time dimension T or the cross-sectional dimension N increase. Moreover, the extra computational cost for the extraction of factors via ML may be substantial (especially for small samples). 7 We dropped results with (all three methods') individual T R < 0.05 to avoid spuriously large trace ratios. Notice that the "failure percentages" were very few compared to the 5000 simulations in each case considered (see Table B1 of the online readme file). Further, we ensure that initial conditions satisfy stationarity restrictions.   Table 1).  Table 1  www.economics-ejournal.org Table 3:  Table 2).  Table 2  www.economics-ejournal.org   Table   3).  Table 3 in Doz et al. (2012) N =10 N Notes: Data generation process input set at: s www.economics-ejournal.org

Extended replication
In this section, we extend the original Monte Carlo experiment so as to shed more light on the relative properties of the three factor extraction methods considered in the previous section. The additional set of simulations we run can be motivated as follows: in most cases, the interest of the practitioner is in using the estimated factors as a by-product for a model of interest, that can be used for forecasting/nowcasting purposes, structural analysis or more.
In a time-series context (especially at higher frequencies), it is very likely that the factors may affect the observables in a lag-distributed pattern, so the proper value of s in Equation (1) is larger than 0. As is well known (for example, see Bai and Ng (2007)), Equation (1) can be re-written in a "static" form where the vector F t contains f t and its lags, suitably stacked, and Λ Λ Λ is the horizontal concatenation of the Λ i matrices. The dimension of the F t vector is q · (s + 1). The interest of the practitioner is arguably to use a method that reconstructs as closely as possible the space spanned by F t . The possibilities we investigate are 1. compute the q-vectorf t via ML and then stack its lags to formF t ; 2. compute the q-vectorf t via TS and then stack its lags to formF t ; 3. computeF t by extracting the first q · (s + 1) principal components; 4. approximateF t via the first principal component; 5. approximateF t via the first principal component and its lags.
In principle, Methods 4 and 5 may seem to be irrelevant, as in the present study the number of static factors is assumed to be known, so evaluating methods that are known a priori to be misspecified may seem unfair. However, we are interested in evaluating how serious this shortcoming is, in the light of the fact that in many cases it is customary for practitioners to use the first principal component as the "dominant" factor, regardless of any statistically oriented criterion, (see for example Bai and Ng (2007)).
In a way, the inclusion of Methods 4 and 5 gives an idea of the performance of different methods for correctly specified models, relative to the problems stemming from incorrect specification. Having said this, a proper analysis of the effects of misspecification of the lag orders would deserve a far more comprehensive treatment, that we leave for further research.
In order to initialise the algorithm, we use the method outlined in Lütkepohl (2014) to retrieve the dynamic factor space from principal components, that entails running a VAR(1) on estimated principal components and then performing PC extraction on the residuals. In practice: 1. given the PC estimates of the static factorsF t , we estimate a VAR(1) via OLS: Notes: Comparing alternative dynamic factor extraction methods when s = 1. Data generation process input set at: s = 1, p = 1, α = 0.9, d = 0.5, τ = 0.5, u = 0.1, q = 2. Notes: Comparing alternative dynamic factor extraction methods using relative MSE calculations when s = 1. Data generation process input set at: s = 1, p = 1, α = 0.9, d = 0.5, τ = 0.5, u = 0.1, q = 2.
As shown in Table 7, ML clearly dominates TS and all the methods based on a limited set of principal components. The only way to achieve a qualitatively similar approximation to the space spanned by the true factors is by using the full set of principal components, compared to which the advantage of using ML becomes smaller, especially for larger values of T and N.
The EM-based ML method exhibits a noticeably larger overhead in terms of computational complexity than in the original Monte Carlo experiment by Doz et al. (2012). This was to be expected, since the structure of the model we are considering is significantly more complex in terms of lag specification (s = 1 in Equation (1)).
Perhaps the relative superiority of PC to TS may be considered surprising at first sight. However, the two methods use a different number of factors, since PC uses q · (s + 1) factors and estimates consistently the factor space, whereas TS only uses q factors (plus their lags). Further, the first step of the Doz et al. (2011) TS method consists of the factors estimated by PC, but their model is a static one (no lag in the measurement equation), which is not the case here. So this result should be taken as an indication that TS may perform rather poorly in finite samples when s > 0 and suitable initialization conditions should be devised to alleviate factor extraction issues for small cross-sections, at least when N ≤ 25 and T ≤ 100.
It is interesting to compare the three PC-based models under the viewpoint of misspecification: clearly, Method 3 (PC) is a correct way to estimate the full set of static factors, whereas Methods 4 and 5 (PC,1 and PC,1,lags) are not, in that they are based on a specification of Equation (1) where s or q are too small. It is evident -last two sub-panels of Table 7 -that the effects of misspecification are quite dramatic and larger values of T and N do not mitigate the shortcomings. Table 8 reports the relative MSE (Rel-MSE) of the ML common component estimator over the TS, PC, PC,1 and PC,1,lags estimators. The ML estimator behaves very similarly to the PC estimator however there is a clear improvement in common components estimation in MSE terms when it comes to the remaining methods of common components estimation.
It would seem that, as long as your objective is the estimation of the common component χ t , estimation of static factors via PC is just as good as ML (which, however, dominates TS). However, if you're interested in attaching some kind of meaning to the estimated factors f t via some identification restrictions, then ML appears to be the method of choice. Our set of extra results can be summarised as follows: the most important element to consider is persistence, either as a characteristic of latent factors or as the lag length in the observation Equation (1). 8 When persistence is high, factors are estimated more accurately with maximum likelihood than any of the other methods we considered; the only exception is principal components for the full set of static factors, which on the other hand has the disadvantage of requiring a set of q · (s + 1) series instead of just q. Anyhow, the near-equivalence of ML and PC requires T and/or N to be rather large; in mid-sized samples ML retains a noticeable edge.

Conclusions and possible extensions
We use an independent software implementation to replicate the simulation results of Doz et al. (2012) regarding factor estimation in dynamic settings using the benchmark principal component method, a two step Kalman smoother based method and EM-based maximum likelihood estimation.
Our narrow sense replication exercise fully confirms the results in the original article. As for our extended replication experiment, we find that ML is the dominant method in a wide array of situations, notably when persistence is substantial. Those results go one step further than the closing remark by Doz et al. (2012) "...Efficiency improvements are relevant when the factor extraction is difficult, that is, when there are more common factors to estimate... ", by including a richer set of dynamic specifications and the possibility of misspecification.
The present work could be extended in several directions: for instance, the lag orders s and p and the number of dynamic factors q are assumed to be known and fixed ex ante; clearly, this 8 See also Table A7 in the online appendix where factors are constructed as "more pervasive" along with a distributed lag effect on observables, more elaborate persistence profile for the factors and different persistent characteristics on the idiosyncratic components. choice was made for conformity to the experiment we are replicating and we acknowledge it is unrealistic and of limited usefulness for practitioners. It may be conjectured that different methods could display different performance if the lag orders have to be selected or if q has to be estimated through criteria such as the one put forward in Hallin and Liška (2007).
The urgency of such an extension is forcefully argued in Poncela and Ruiz (2020), in the light of another important theoretical question: simultaneous identification of p (the VAR order in the state transition Equation (2)) and s (the lag order of the observation Equation (1)) is an open issue that deserves further research. The purpose of our replication exercise is to extend the original results in Doz et al. (2012) by considering richer dynamics under the assumption of correct specification, but we acknowledge that near-underidentification could occur in finite samples for various values of p and s. How problematic this would be for the main purpose tackled here (the estimation of the space spanned by the dynamic factors) remains to be seen and will be the object of further research. However, it seems safe to conjecture that, as long as the object of interest is the space spanned by the dynamic factors f t (as opposed to parameter point estimation), overestimating p and/or s should not lead to serious problems.
Another possible extension -for comparison purposes -could be using a DGP taken from some empirical exercise. These extensions, however, would broaden the present article's scope considerably and are left for future work.