A Flexible Mixed-Frequency Vector Autoregression with a Steady-State Prior

We propose a Bayesian vector autoregressive (VAR) model for mixed-frequency data. Our model is based on the mean-adjusted parametrization of the VAR and allows for an explicit prior on the 'steady states' (unconditional means) of the included variables. Based on recent developments in the literature, we discuss extensions of the model that improve the flexibility of the modeling approach. These extensions include a hierarchical shrinkage prior for the steady-state parameters, and the use of stochastic volatility to model heteroskedasticity. We put the proposed model to use in a forecast evaluation using US data consisting of 10 monthly and 3 quarterly variables. The results show that the predictive ability typically benefits from using mixed-frequency data, and that improvements can be obtained for both monthly and quarterly variables. We also find that the steady-state prior generally enhances the accuracy of the forecasts, and that accounting for heteroskedasticity by means of stochastic volatility usually provides additional improvements, although not for all variables.


Introduction
The vector autoregressive model (VAR) is a commonly used tool in applied macroeconometrics, in part because of its simplicity. Over the years, VAR models have developed in many different directions under both frequentist and Bayesian paradigms. The Bayesian approach offers the attractive ability to easily incorporate soft restrictions and shrinkage, which ameliorate the issue of overparametrization. Within the Bayesian framework itself, a large number of papers have developed prior distributions for the parameters in VAR models. Many of these are, in one way or another, variations of the Minnesota prior proposed by Litterman (1986) (see for example the book chapters Del Negro and Schorfheide, 2011, Karlsson, 2013. Gains in computational power have led to further alternatives in the choice of prior distribution as intractable posteriors can efficiently be sampled using Markov Chain Monte Carlo (MCMC) methods such as the Gibbs sampler (Gelfand andSmith, 1990, Kadiyala andKarlsson, 1997).
A particular development in the Bayesian VAR literature is the steady-state prior proposed by Villani (2009). The prior is based on a mean-adjusted form of the VAR where the unconditional mean is explicitly parameterized. This seemingly innocuous reparametrization is justified by the fact that practitioners and analysts often have prior information regarding the steady-state (or unconditional mean) readily available. In the standard parametrization, a prior on the unconditional mean is only implicit as a function of the other parameters' priors. Because the forecast in a stationary VAR converges to the unconditional mean as the horizon increases, a prior for the steady-state parameters can help retain the long-run forecast in the direction implied by theory, even if the model is estimated during a period of divergence.
Another modeling feature that modern VARs often include is stochastic volatility. In many macroeconomic applications, a typical characteristic of the data is that the volatility has varied over time. By fitting VARs with constant volatility, the estimated error covariance matrix attempts to balance periods of low and high volatility and find a compromise. Consequently, the predictive distribution does not account for the current level of volatility. Seminal contributions with respect to stochastic volatility were first made by Primiceri (2005), Cogley and Sargent (2005) and numerous follow-up studies have since documented the usefulness of stochastic volatility for forecasting, see e.g. work by Clark (2011), D'Agostino, Gambetti, andGiannone (2013), Clark and Ravazzolo (2015), Carriero, Clark, and Marcellino (2016). Because of the established utility thereof, we also allow for more flexibility in our model by modeling time variation in the error covariance matrix.
VARs are often estimated on a quarterly basis, see e.g. Stock and Watson (2001), Adolfson, Lindé, and Villani (2007). The reason is simply that many variables of interest are unavailable at higher frequencies, although the majority is often sampled monthly if not even more frequently. When the data are available at different frequencies, common practice is to aggregate high-frequency variables to the lowest frequency present. Such an aggregation incurs a loss of information for variables measured throughout the quarter: the aggregated quarterly values are typically weighted sums of the constituent months and so any information carried by a within-quarter trend or pattern will be disregarded by the aggregation. From a forecasting perspective an analyst will be unconsciously forced to disregard part of the information set when constructing a forecast from within a quarter as the most recent realizations are only available for the high-frequency variables. Another reason for utilizing higher frequencies of the data is that the number of observations is increased. A VAR estimated on data collected over, say, ten years makes use of 120 observations of the monthly variables instead of being limited to the 40 aggregated quarterly observations.
Multiple approaches to dealing with the problem of mixed frequencies are available in the literature. Mixed data sampling (MIDAS) regressions and the MIDAS VAR proposed by Ghysels, Sinko, and Valkanov (2007) and Ghysels (2016), respectively, use fractional lag polynomials to regress a low-frequency variable on lags of itself as well as high-frequency lags of other variables. This approach is predominantly frequentist, although Bayesian versions are available (Rodriguez andPuggioni, 2010, Ghysels, 2016). A second approach, which is the focus of this work, is to exploit the general ability of state-space modeling to handle missing observations (Harvey and Pierse, 1984). Eraker, Chiu, Foerster, Kim, and Seoane (2015), concerned with Bayesian estimation, used this idea to treat intra-quarterly values of quarterly variables as missing data and proposed measurement and state-transition equations for the monthly VAR. Schorfheide and Song (2015) considered forecasting using a construction along the lines of Carter and Kohn (1994) and provided empirical evidence that the mixed-frequency VAR improved forecasts of eleven US macroeconomic variables as compared to a quarterly VAR. In terms of flexible time-varying models with mixed-frequency data, Cimadomo and D'Agostino (2016) employed the mixed-frequency VAR together with time-varying parameters and stochastic volatility to cope with a change in frequency of the data. Following up on the work by Schorfheide and Song (2015), Götz and Hauzenberger (2018) recently showed that more flexible models that include stochastic volatility tend to improve forecasts also within this framework.
The main contribution of this paper is that we extend the mixed-frequency toolbox by incorporating prior information on the steady states, and by adding stochastic volatility to the model. Thus, we effectively combine the steady-state parametrization of Villani (2009) with the state-space modeling approach for mixed-frequency data of Schorfheide and Song (2015) and the common stochastic volatility model proposed by Carriero et al. (2016). The proposed model accommodates explicit modeling of the unconditional mean with data measured at different frequencies. In order to employ the model in a realistic forecasting situation, we use a real-time dataset consisting of 13 macroeconomic variables for the US, where ten of the variables are sampled monthly, and the remaining three are available quarterly. We implement the steady-state prior using the standard Villani (2009) approach, and using the hierarchical structure presented by Louzis (2019). In our empirical application, we find that, for most variables, mixed-frequency data, stochastic volatility, and steady-state information improve forecasting accuracy as compared to models without any of the aforementioned features.
The structure of the paper is as follows. Section 2 describes the main methodology, Section 3 provides information about the data and details about the implementation, and Section 4 evaluates the forecasting performance. Section 5 concludes.

Combining Mixed Frequencies with Steady-State Beliefs
The mixed-frequency method adopted in this work is a state space-based model which follows the work by Mariano and Murasawa (2010), Schorfheide and Song (2015), Eraker et al. (2015). There are several modeling approaches available for handling mixed-frequency data, including MIDAS (Ghysels et al., 2007), bridge equations (Baffigi, Golinelli, and Parigi, 2004) and factor models (Mariano andMurasawa, 2003, Giannone, Reichlin, andSmall, 2008). We do not review these further here, but instead refer the reader to the survey by Foroni and Marcellino (2013) and an early comparison conducted by Kuzin, Marcellino, and Schumacher (2011).

State-Space Representation of the Mixed-Frequency Model
To cope with mixed observed frequencies of the data, we assume the system to be evolving at the highest available frequency. This assumption frames the problem of frequency mismatch as a missing data problem. By doing so, the approach naturally lends itself to a statespace representation of the system in which the underlying monthly series of the quarterly variables become the latent states of the system. Because we have a mix of monthly and quarterly frequencies in our empirical application, we will in the following proceed with the presentation of the model from this perspective. It should, however, be noted that other compositions of frequencies are viable within the same framework.
The VAR model at the core of the analysis is specified for the high-frequency and partially missing variables. More specifcally, a VAR(p) for the n × 1 vector z t is employed such that where Π(L) = (I n − Π 1 L − Π 2 L 2 − · · · − Π p L p ) is a p-th order invertible lag polynomial, d t is an m × 1 vector of deterministic components and Φ is an n × m matrix of parameters. The time index t is here monthly. We let the error term u t be heteroskedastic and return to the specifics thereof in Section 2.2. The model in (1) is a conventional VAR specification, but, in the spirit of Villani (2009), we instead employ the mean-adjusted form as where Ψ = [Π(L)] −1 Φ. It can be readily confirmed that E(z t |Π, Ψ, Σ) = Ψd t := µ t , and thus µ t is the unconditional mean-steady state-of the process. The steady-state representation (2) requires an explicit prior on the steady state parameters. However, common practice is to use (1) with a loose prior on Φ, which implicitly defines an intricate (but loose) prior on Ψ and, subsequently, µ t . We argue that in many applications, the parametrization in (2) is more convenient as it allows for a more natural elicitation of prior beliefs. In what follows, we will extend the work of Villani (2009) such that (2) may still constitute a viable option in the presence of mixed frequencies.
Next, we partition the high-frequency underlying process z t as z t = (z m,t , z q,t ) , where z m,t represents the n m monthly and z q,t the n q quarterly variables. Recall that the time t here takes the highest frequency, i.e. monthly. The empirical problem that is ubiquitous in macroeconomic data is that what is observed varies between months such that z t is not always fully observed.
To distinguish between the underlying process and actual observations, we denote the latter by y t . A consequence of all variables not being observed at every time point t is that the dimension n t of y t is not always equal to n = n m + n q . The observed data in y t are generally supposed to be some linear aggregate of Z t = (z t , . . . , z t−p+1 ) such that where M q,t and Λ q are deterministic selection and aggregation matrices, respectively. We let M q,t be the n q identity matrix I nq if all quarterly variables are observed at time t so that y q,t = 0 Λ q Z t . In the remaining periods, M q,t is an empty matrix such that y t = y m,t . More complicated observational structures can easily be accomodated using the very same approach; instead of being empty or a full I n matrix, M t can have rows that correspond to unobserved variables omitted. This idea allows for the approach to seamlessly handle missing data for a subset of the monthly variables at the end of the sample.
The aggregation matrix Λ q represents the assumed aggregation scheme of unobserved high-frequency latent observations z q,t into occasionally observed low-frequency observations y q,t . To make the presentation simpler, we can write the bottom block of ΛZ t as where Λ qq collects the columns of Λ q that correspond to quarterly variables in Z t . Schorfheide and Song (2015), working with log-levels of the data, used the intra-quarterly average y * q,t = 1 3 (z * q,t + z * q,t−1 + z * q,t−2 ), where y * q,t denotes the observed quarterly log-levels and z * q,t the latent monthly log-levels. Because we use log-differenced data, we instead follow Mariano andMurasawa (2003, 2010). By taking the quarterly difference of y * q,t to construct our observed growth rates, we obtain Finally, the expression can be written as Because the set of weights in (4) sum to three, we define our latent variable of interest to be z q,t = 3∆z * q,t , i.e. the latent month-on-month growth rate scaled to be commensurate in scale with the quarterly level.
Equations (2) and (3) form a state-space model that can be used for estimation of the model. Schorfheide and Song (2015) suggested an efficient compact formulation of the employed state-space model that is statistically equivalent but computationally more convenient. The compact treatment is based on the observation that the set of monthly variables included in the model are observed for all time points except for a handful at the end of the sample, known as a ragged edge (Bańbura, Giannone, and Reichlin, 2011). The treatment proposed by Schorfheide and Song (2015) is to let the monthly variables enter the model as exogenous for t = 1, . . . , T b , where T b denotes the final time period where the monthly variables are all observed. By this approach, the monthly variables are excluded from the state equation. The state dimension is thereby reduced from np to n q (p + 1), which improves the computational efficiency substantially.
In order to more formally introduce this formulation of the model, we first let denote the mean-adjusted data. The state-space model is thereafter formulated in terms of y t andz t = z t − Ψd t , leading to the model where Π i,j , i, j ∈ {m, q} refer to the submatrices of regression parameters relating the j frequency variables to the conditional mean of the i frequency variables. The errors are the corresponding partitions of u t = (u m,t , u q,t ) and are consequently correlated. Finally,Ỹ m,t−1 stacks the mean-adjusted monthly variables asỸ m,t−1 = (ỹ m,t−1 , . . . ,ỹ m,t−p ) andZ q,t−1 = (z q,t−1 , . . . ,z q,t−p ) .
The above state-space model remains valid as long as t ≤ T b , implying that all of the monthly series are observed. To deal with ragged edges and unbalanced monthly data for t = T b + 1, we follow Ankargren and Jonéus (2019) and adaptively add the monthly series with missing data as appropriate. Contrary to Schorfheide and Song (2015), we thereby avoid use of the full companion form altogether.

Extending the Basic Steady-State Model
The standard BVAR with the steady-state prior typically produces good forecasts and is for this reason used by e.g. Sveriges Riksbank as one of its main forecasting models (see Iversen, Laséen, Lundvall, and Söderström, 2016). However, recent work in the VAR literature demonstrates that allowing for more flexibility may be beneficial. Particularly, letting the error covariance matrix in the model vary over time by incorporating stochastic volatility often improves the predictive ability as demonstrated by e.g. Clark (2011), Clark and Ravazzolo (2015), Carriero et al. (2016). Moreover, studies such as Bańbura, Giannone, and Reichlin (2010), Giannone, Lenza, and Primiceri (2015), Koop (2013) have shown that medium-sized models including 10-20 variables often outperform smaller models when forecasting. The caveat, however, when extending the size of the model under the use of the steady-state prior is that the researcher must set a prior mean and variance for the unconditional mean for each variable in the model. For key variables such as inflation, GDP growth and unemployment this task is relatively effortless, but it can be more challenging when the previous literature does not offer any guidance on reasonable prior specifications. To simplify the process of specifying the steady-state prior, Louzis (2019) developed a hierarchical prior for the steady-state prior that effectively relieves the researcher from eliciting the prior variances of the steady-state parameters. Instead, only prior means are required. Providing a sensible prior for the unconditional mean is generally much simpler than quantifying the uncertainty of one's specification. We next briefly describe the stochastic volatility and hierarchical steady-state prior specifications that we extend our basic model with.

Stochastic volatility
The stochastic volatility model we employ is the common stochastic volatility (CSV) model of Carriero et al. (2016), which is a parsimonious and simple approach for letting the error covariance matrix in the model vary over time. The state equation describing the highfrequency VAR is under the CSV variance specification given by where A −1 is a lower triangular matrix and f t is the latent univariate volatility series evolving according to The log-volatility log(f t ) thus evolves as an AR(1) process without intercept with parameters (φ, σ 2 ). The time-varying error covariance matrix implied by the preceding model is Σ t = f t Σ, where Σ = A −1 (A −1 ) . Consequently, the CSV prior assumes a fixed covariance structure where the volatility factor provides a time-varying scaling of the constant error covariance Σ.

Hierarchical steady-state priors
The appealing feature of the steady-state prior is that it allows the researcher to use readily available information about long-run steady-state levels of the included variables. For the reasons discussed earlier, Louzis (2019) proposed a hierarchical steady-state prior using the normal-gamma construction used by e.g. Griffin and Brown (2010), Huber and Feldkircher (2019). The reason for such an approach is that the benefits of the steady-state prior are larger when we have accurate and relatively informative priors for the steady states. The normal-gamma prior employs a hierarchical specification that provides sufficiently heavy tails to allow for a large degree of shrinkage to the prior mean when appropriate, and more flexibility otherwise. In effect, the researcher only has to provide a prior mean for each steadystate parameter as the associated variances are instead obtained from the hyperparameters higher up in the hierarchy.
To be more precise, the hierarchical steady-state prior is based on the normal-gamma prior proposed by Griffin and Brown (2010) that employs a hierarchical specification given by where φ ψ and λ ψ are additional fixed hyperparameters and G(a, b) denotes the gamma distribution with shape a and rate b. The prior is therefore constructed using idiosyncratic, or local, hyperparameters ω ψ,j , which in turn depend on the two auxiliary hyperparameters φ ψ and λ ψ . Griffin and Brown (2010) showed that the variance of the unconditional prior for ψ j is negatively associated with λ ψ , meaning that higher values of λ ψ induce a larger degree of shrinkage towards the prior mean. The hyperparameter λ ψ can therefore be interpreted as a global shrinkage parameter. At the same time, the excess kurtosis of the unconditional prior is negatively related to φ ψ . Taken together, the implication is that if a tight prior (i.e. λ ψ is high) is employed, the local shrinkage given by ω ψ,j can still deviate notably from zero if φ ψ is small due to the heavy tails of the unconditional prior distribution. This feature allows for a shrinkage profile that is in general tight, but loose when necessary.

Prior Distributions
We use a standard normal inverse Wishart prior for the VAR coefficients and error covariance (Π, Σ). Thus, we have a priori Σ ∼ IW(S, ν), vec(Π )|Σ ∼ N(vec(Π ), Σ ⊗ Ω Π ), where Π = (Π 1 , . . . , Π p ). The main diagonal of the prior covariance matrix for the regression parameters, Ω Π , is set in the Minnesota-style fashion where λ 1 is the overall tightness and λ 2 determines the lag decay rate; the inclusion of s r adjusts for differences in measurement scale of the variables. For a more thorough exposition of the normal inverse Wishart prior, the reader is referred to Karlsson (2013). While Σ describes the fixed covariance structure, the time-varying volatility in the model is governed by the latent volatility f t . For the two parameters associated with its evolution, (φ, σ 2 ), we use a normal distribution truncated to the stationary region for φ, and an inverse gamma prior for σ 2 : As discussed in Section 2.2, the priors for the steady-state parameters are normal conditional on the local shrinkage parameters. Instead of fixing the top-level hyperparameters φ ψ and λ ψ , Huber and Feldkircher (2019) proceeded with an additional hierarchy by specifying priors for φ ψ and λ ψ . We follow their suggestion and obtain the following hierarchical prior specification for the steady-state parameters:

Posterior Sampling
To estimate the model and produce forecasts, we employ Markov Chain Monte Carlo (MCMC). The MCMC algorithm consists of multiple Gibbs sampling steps, which we describe next. We relegate some of the details to Appendix A.
Sampling the latent monthly variables To sample from the posterior distribution of the latent monthly variables, p(Z|Π, Σ, ψ, f, Y, d), we use a simulation smoother along the lines of Durbin and Koopman (2012). To increase the computational efficiency, we implement it using the compact formulation for the balanced part of the sample as suggested by Schorfheide and Song (2015). For the unbalanced ragged edge, we instead leverage the adaptive procedure developed by Ankargren and Jonéus (2019). The simulation smoothing step is conducted based on the mean-adjusted dataỹ t to produce a draw ofz t . We thereafter construct the unadjusted high-frequency series by adding the deterministic component z t = z t + Ψd t .
Sampling the regression and covariance parameters Given Z, ψ and f , the VAR can be transformed into a homoskedastic VAR without intercept based onz By standard results (Kadiyala andKarlsson, 1993, 1997), the conditional posterior distribution is also normal inverse Wishart. It is thereby possible to sample from the marginal posterior of Σ followed by the full conditional posterior of Π: The posterior moments are standard given the transformation of the model and presented in Appendix A. A draw can efficiently be made from the posterior of Π by reverting to its matrix-normal form: where Ξ is an n × np matrix of numbers independently drawn from the standard normal distribution, chol is the lower triangular Cholesky decomposition and the operation A \ B means to solve the linear system AX = B for X. Because the Cholesky factor is triangular, the linear systems can be solved more efficiently using forward and back substitution.
Sampling the steady-state parameters Prior to sampling the steady-state parameters, the associated hyperparameters are drawn from their respective conditional posterior distributions. The conditional posterior of the global shrinkage parameter λ ψ is gamma distributed and given by The conditional posterior of φ ψ is proportional to and permits no representation in terms of a standard distribution. As suggested by Huber and Feldkircher (2019), Louzis (2019) we employ a random walk Metropolis-Hastings step in order to sample from the posterior distribution. The random walk operates on the log-scale and the proposal is given by where s is a scaling factor. The proposed value φ * ψ is accepted with probability where the second ratio accounts for the asymmetric proposal distribution.
Given the hyperparameters, the local shrinkage parameters ω ψ,j can be sampled. The conditional posterior distribution is the generalized inverse Gaussian distribution where if y ∼ GIG(a, b, c) then p(y; a, b, c) ∝ y a−1 exp {0.5(by + c/y)}. The prior covariance matrix for ψ, i.e. Ω ψ , can thereafter be constructed as the diagonal matrix with main diagonal given by (ω ψ,1 , . . . , ω ψ,nm ). Next, by dividing both sides of the model (5) by √ f t we obtain a homoskedastic model given by The posterior moments provided by Villani (2009) therefore apply directly for the preceding transformation of the model. Leť The posterior distribution of ψ is with posterior moments Sampling the latent volatility Conditional on the other parameters in the model, we can obtainz Squaring and taking the logarithm of the elements ofz t yields log(z 2 i,t ) = log(f t ) + log(e 2 i,t ), i = 1, . . . , n, wherez i,t is the ith element ofz t with a similar logic for e i,t . Coupling the preceding equation with the transition equation (6) defines a linear but non-normal state-space model. Kim, Shephard, and Chib (1998) proposed a sampling strategy that introduces auxiliary mixture indicators r t,i so that the model conditional on these indicators is normal. We use the refined ten-state mixture by Omori, Chib, Shephard, and Nakajima (2007) together with the algorithm discussed by McCausland, Miller, and Pelletier (2011) and as implemented by Kastner and Frühwirth-Schnatter (2014) to sample from the posterior distribution of the latent volatility series.
The posteriors of the parameters of the volatility process are standard given f . The posterior distribution of φ is a truncated normal distribution whereas the posterior distribution of σ 2 is inverse gamma. We proceed by sampling (φ, σ 2 ) first, the mixture indicators r t,i next and, finally, the latent volatility series in order to target the correct posterior distribution as discussed by Del Negro and Primiceri (2015).

Data and Implementation Details
In this section, we provide information about the data used and some details regarding the implementations. 1. Real-time data for Hours is available in ALFRED from 2011 and onwards; data from FRED is used prior to 2011. Hours is seasonally adjusted using X-13ARIMA-SEATS using the seasonal package in R (Sax and Eddelbuettel, 2018). 2. A list of the IDs of the variables is available in Appendix B.

Data
Our dataset consists of 13 key macroeconomic variables for the United States. The dataset we use largely parallels that of Carriero et al. (2016), Louzis (2019) with the exception that we use CPI inflation as the sole measure of inflation. The data consist of ten monthly and three quarterly variables and ranges over the period 1980M01-2018M12. Most of the included variables are available with real-time vintages in the ALFRED database. For variables not available in ALFRED, we turn to FRED and FRED-MD (McCracken and Ng, 2016). A summary of the data is provided in Table 1.
We follow Louzis (2019) and transform the raw series to growth levels. For our monthly variables, we use month-on-month growth rates, whereas the three quarterly variables are computed as quarter-on-quarter rates. All growth rates are annualized. The final two columns of Table 1, µ ψ,j and √ ω ψ,j , display the prior means and prior standard deviations of the unconditional means of the variables. The values are drawn from Louzis (2019), but are also in line with e.g. Clark (2011),Österholm (2012). We use real-time data where available throughout the forecasting exercise. To obtain a realistic pattern of available observations, we first consider the information set available on the tenth day of every month. Figure 1 displays the publication pattern during 2005-2018 and shows the number of months that has passed since the last available publication. Figure 1 shows a pattern that is characteristic of real-time forecasting of macroeconomic GDP Resid. inv.

Inflation
Capacity util.
Indust. prod. data. Data for financial and select real and nominal variables are already available for the previous month, whereas the previous month's outcomes for some of the monthly variables are unknown. The pattern of availability displayed shows that consumption and inflation are available with a one-month delay at every month except for a handful of occasions. Similarly, non-farm employment, hours, unemployment and the federal funds rate are typically available with a zero-month delay with the exception of a few months. In the final dataset that we use in our forecasting exercise, we make adjustments to the publication delays in order to obtain a more uniform dataset. The adjustments change the publication structure in the vintages so that the aforementioned variables have the same delay in all vintages, i.e. consumption and inflation are always observed wtih a delay of one month, whereas non-farm employment, hours, unemployment and the federal funds rate are always observed without any delay. Consequently, at every month that we make our forecasts observations are available for the preceding month for six of the monthly variables, whereas four still lack data.

Implementation Details
The mixed-frequency models that we estimate use p = 12 lags following e.g. Bańbura et al. (2010). The overall tightness in the prior distribution for the regression parameters is set to λ 1 = 0.2 and the lag decay used is λ 2 = 1. We use 15,000 draws in the MCMC procedure and discard the first 5,000. For the hierarchical steady-state prior, we let c 0 = c 1 = 0.01 in line with Huber and Feldkircher (2019), Louzis (2019). To set the scale of the proposal distribution for φ ψ , we employ the adaptive scaling procedure discussed by Roberts and Rosenthal (2009). We use a batch size of 100 and check every 100 iterations if the fraction of acceptances within the most recent batch exceeds 0.44. If it does, we increase s by δ(k) = min(0.01, k −1/2 ), where k denotes the batch number. If the fraction of acceptances was less than 0.44, s is instead decreased by δ(k).
For the parameters of the log-volatility process, we let the prior mean and standard deviation for φ be µ φ = 0.9 and Ω φ = 0.1, respectively. The prior mean and degrees of freedom of σ 2 are σ 2 = 0.01 and d = 4.

Empirical Application: Real-Time Forecasting of Key US Variables
In this section, we assess the forecasting ability of the model that we propose. The assessment is carried out by studying the out-of-sample predictive accuracy of the model based on the real-time dataset for the US that was discussed in Section 3.

Forecasting Setup
The quarterly steady-state Bayesian VAR model has been used in several previous studies, see for example Adolfson et al. (2007),Österholm (2008), Villani (2009), Clark (2011), Ankargren, Bjellerup, and Shahnazarian (2017). The model is employed both for policy purposes and for forecasting and is implemented in the Matlab toolbox BEAR developed at the European Central Bank (Dieppe, Legrand, and van Roye, 2016). Our empirical application targets this audience, and our main interest lies in seeing whether the components we add to the model-mixed frequencies, stochastic volatility and hierarchical steady states-improve upon the benchmark model of Villani (2009) estimated on single-frequency data. The forecasting results are also compared to models using Minnesota-style normal inverse Wishart priors, i.e. without use of the steady-state component. A summary of the models that we include in the forecast evaluation is presented in Table 2. The benchmark model is the steady-state model estimated on single-frequency data. Depending on whether it serves as benchmark for quarterly or monthly variables, we include either the full set of variables (aggregated to the quarterly frequency) or the ten monthly variables. The quarterly VAR uses p = 4, whereas for the monthly VAR p = 12.
We use a recursive forecasting scheme to evaluate the forecasting performance of the considered models. Beginning in January 2005, we estimate the models and make forecasts and then recursively add months to the set of data used for estimation. The benchmark models use the balanced data, whereas the mixed-frequency models automatically handle the ragged edges.
The forecasting ability of the models is evaluated with respect to both point and density forecasts. For point forecasts, we consider the root mean squared errors. For density forecasts, we compute univariate and multivariate log predictive density scores. We do so by fitting a normal density to the draws from the predictive distribution following e.g. Adolfson where m denotes the model, s denotes the set of variables the LPDS is computed for, n s is the dimension of s, h is the forecast horizon, andȳ and V are the mean and covariance of the draws from the relevant predictive distribution. For fixed (m, s, h), we compute the summary LPDS by averaging over the evaluation period. We calculate the LPDS jointly but separately for the monthly and quarterly variables, and univariately for all variables. An important question when using real-time data is with respect to what vintage the forecasts should be evaluated. There is no consensus, but two alternatives are more common in the literature. The first, as used by e.g. Romer and Romer (2000) and Clark (2011), is to use the second available vintage. This choice can be justified by acknowledging that revisions that occur after longer periods of time may be unforeseeable and more structural in nature by relating to e.g. definitions, methods of measurement, etc. The second available estimate therefore provides a less noisy estimate than the initial available value, yet is produced in the same environment as the forecaster is active. The second common approach for evaluation, as followed by e.g. Schorfheide and Song (2015), is to use the most recent vintage. For whatever reason revisions may have taken place, the currently available data provide the best estimates of e.g. inflation and output in previous years. We follow the latter approach and use the most recent vintage for evaluating the forecasts, but for transparency provide the main results of the evaluation using the second available vintage in Appendix D.

In-Sample Estimation
As a preliminary analysis, we begin by estimating the mixed-frequency VAR model using the SS and SSNG priors to see whether the obtained steady-state posteriors differ. Because the long-term forecasts are largely determined by the steady-state posterior, seeing whether differences are present is of direct importance for forecasts beyond the immediate short term. Figure 2 displays kernel density estimates of the posterior distributions from the mixed-frequency model with common stochastic volatility. As a point of reference, the figure includes the prior distribution detailed in Table 1.
As expected, the posteriors in Figure 2 are for the most part similar. The modes of the posteriors are close to perfectly aligned for variables such as bond spread, inflation, residential investment and GDP. For others-e.g., hours, the federal funds rate and industrial production-the SSNG posteriors deviate more from both the priors and the SS posteriors.
While the steady states are of central importance for the levels of the forecasts, the precision thereof is highly influenced by the common stochastic volatility factor. Figure 3 displays the mean of √ f t together with 90 % bands for the SS-CSV and SSNG-CSV models. Figure 3 shows that there is little difference between the estimated volatility factors in the two steady-state models. Peaks of volatility are aligned and reach the same levels, while the level of the factor in the SSNG model is slightly higher in normal times. Both display the entrance into the Great Moderation in the beginning of the 1980s with heightened volatility again around the recent financial crisis. The interpretation of the level of the factor is that the time-invariant elements in the error covariance matrix Σ have been scaled by f t , which roughly amounts to an amplification by a factor of 4-6 during the recent financial crisis and a compression of around 0.5-0.75 in recent years. This feature has a direct effect on the width of the predictive distribution.

Forecast Evaluation
In this section, we present the main results of the forecast evaluation. For space considerations, the presentation includes results from the joint evaluations as well as univariate results for the three quarterly variables and the three monthly variables that are typically of primary interest: the inflation, federal funds and unemployment rates. For completeness, univariate evaluation results for the remaining variables can be found in Appendix E. Table 4 presents the results from the LPDS computed jointly. We compute the LPDS separately for the set of quarterly and monthly variables, respectively. The forecast horizons h in the table correspond to the frequency of the respective set of variables.

Joint forecasting results
Across all horizons and sets of variables, SS-CSV and SSNG-CSV dominate with only one exception in which Minn-CSV does slightly better than SS-CSV. For the quarterly sets of variables, SS-CSV outperforms the other models for h > 0 with the SSNG-CSV model ranking first for the nowcast. Minn-CSV ranks higher than the constant volatility models for the initial horizons, but for the long-term forecasts the added value of the steady-state prior outweighs the improvements obtained from stochastic volatilities. However, given a model, stochastic volatility appears to be useful as it improves the joint forecasting performance of  quarterly variables across the board when comparing the constant volatility models to their heteroskedastic counterparts. Within the two groups of models with constant and stochastic volatility, we see that the steady-state models forecast better than Minn-IW and Minn-CSV, respectively, throughout all horizons. Therefore, the table shows that steady-state information and flexible modeling of the volatility structure help to improve the quarterly forecasts.
For the performance of the monthly forecasts, the picture is largely the same. The three models with stochastic volatility outperform the constant models for all horizons and SSNG-CSV produces the most accurate density forecasts for h = 2, 3, 4. For the remaining horizons, SS-CSV picks up the lead. Among the constant volatility models, the ranking is no longer uniform across horizons.
With respect to the joint log predictive scores, we can therefore conclude the following. First, there are gains in utilizing prior information on the steady states. Second, further improvements can be obtained by allowing for stochastic volatility. Third, with a handful of exceptions for the quarterly forecasts made by Minn-IW and SS-IW, the relative LPDS is negative throughout, indicating that the mixed-frequency models produce better density forecasts than the single-frequency benchmarks. The three points are in line with the previous literature and can be seen as a synthesis of the conclusions made by Villani (2009) Quarterly univariate forecasting results Tables 5-7 present the univariate LPDS and RMSE for the three quarterly variables GDP, Residential investment and Non-residential investment.
Starting with GDP, a somewhat different pattern than what was seen for the joint LPDS emerges. For both evaluation metrics, SS-IW is generally the better forecaster beyond the short term and is only outperformed by CSV models at the first three horizons and in terms of density forecasts. Table 5 shows that the mixed-frequency models do better than the quarterly benchmark for the immediate short term when either nowcasting the current quarter or forecasting the next quarter. Beyond the first quarter forecast, the quarterly model generally produces more accurate forecasts. A similar result is found by Schorfheide and Song (2015). Use of the steady-state prior results in more accurate forecasts at every horizon, but whether or not a hierarchical prior formulation and stochastic volatility provide improvements varies. The homoskedastic steady-state models outperform the Minn-IW model at all horizons, and the stochastic volatility steady-state models consistently forecast GDP growth more accurately than Minn-CSV. For residential investment, Table 6 presents forecasting results that more closely resemble the joint results. SS-CSV and SSNG-CSV dominate for all horizons, although the difference with respect to Minn-CSV is occasionally small, particularly for the point forecasts. Nevertheless, both steady-state models with stochastic volatility perform well with better scores than all other models for every horizon and with respect to both point and density forecasts.
Finally, Table 7 shows the forecast evaluation for Non-residential investment. The pattern displayed in Table 7 is a mix of the patterns in Tables 5-6. For the nowcast, Minn-CSV provides better forecasts than the others, whereas SS-CSV generally does well and ranks first for horizons 1-5 with respect to the density forecasts. The utility of the steady-state prior is clear from Table 7: while Minn-CSV and Minn-IW start out well, the performance deteriorates more rapidly with h than what is manifested by the other models employing information about the steady states. We can again see that both SS-CSV and SSNG-CSV dominate Minn-CSV for all h > 0.
Monthly univariate forecasting results Moving to the monthly variables, Table 8 presents the forecast evaluation for inflation. The results indicate that there is little to gain from using the mixed-frequency VAR for forecasting monthly inflation as compared to a monthly VAR. The relative RMSE is close to unity and few of the Diebold-Mariano tests of equal predictive ability indicate any difference between the benchmark and the mixedfrequency models.
Next, the evaluation of the forecasts of the federal funds rate is displayed in Table 9. In contrast to the results for inflation, we here find large benefits from using the mixedfrequency models for forecasting the monthly federal funds rate. All three models with stochastic volatility do well with respect to both density and point forecasts, but the steadystate models have a small edge across most horizons.
The final series we evaluate univariate forecasts for is the unemployment rate. The results are presented in Table 10. The table reveals that mixed-frequency models are useful also for forecasting unemployment. SS-IW appears to be the better forecaster in terms of point forecasts, whereas SS-CSV provides more accurate density forecasts for all horizons. Thus, adding stochastic volatility does not improve point forecasts of the unemployment rate, but the density forecasts exhibit a substantial enhancement.
Forecasts evaluated against the second vintage To ensure that our results are not primarily driven by our choice of data to evaluate the forecasts against, Appendix D presents the same tables as shown in the main text but with the evaluations carried out with the second available vintages. Qualitatively, the results remain. For the forecasts of GDP, the gains obtained by using mixed-frequency data are larger when the forecasts are evaluated against the second vintage. Occasional changes in rankings among the models occur across variables, but for the most part the rankings remain unaltered and the conclusions made so far are intact irrespective of the choice of evaluation vintage.

Conclusion
We present a vector autoregressive model that is a synthesis of recent important contributions. Our model incorporates three main features. First, the model allows for mixedfrequency data by use of a state-space formulation. We deal with the particular mixedfrequency case involving monthly and quarterly data and solve the frequency mismatch problem by postulating a monthly VAR with missing values similar to the work by Schorfheide and Song (2015). Second, we include prior beliefs about the steady states, or unconditional means, of the variables in the model by means of the steady-state prior developed by Villani (2009). We also employ the hierarchical formulation of the prior proposed by Louzis (2019), whose advantage is that it is only necessary to specify prior means of the steady-state parameters while the prior variances are, in turn, equipped with hyperpriors. Third, to allow for an error covariance matrix that varies over time we include as the final component the common stochastic volatility model presented by Carriero et al. (2016). We estimate our model and competing alternatives using US data including ten monthly and three quarterly variables. The results show that the forecasts are generally improved by adding the three components to the benchmark VAR model. Using mixed instead of single frequencies of the data generally does not produce worse forecasts, and instead usually performs better. Including prior information about the steady states typically outperforms the corresponding alternatives that lack this information. The hierarchical steady-state prior is appealing as it allows for shrinkage to the prior means of the steady states, and is generally on par with or better than the standard steady-state prior. Finally, we find that common stochastic volatility mostly improves the accuracy of the forecasts as the models including heteroskedasticity generally outperform the models with constant volatility.

A Posterior Moments
Regression and Covariance Parameters The moments of the posterior distributions for the regression and covariance parameters are: The conditional posterior distribution of σ 2 is

B Data Sources
The IDs of the series used and their sources are shown in Table 3.

Note:
The forecast horizons h refer to quarters and months, respectively, for the two sets of variables. The scores in the table display the score of the model in the first column minus the score of the benchmark model, whereby negative entries indicate that the mixed-frequency model is superior. Bold entires show the minimum in each column. The benchmark model for the quarterly set of variables is a VAR(4) including all 13 variables aggregated to the quarterly frequency. For the monthly LPDS, the benchmark model is a VAR(12) including the the ten monthly variables. For both cases, the steady-state prior with a constant error covariance matrix is used. Two stars ( * * ) indicate that the Diebold-Mariano test of equal predictive ability is significant at the 1 percent level, whereas a single star indicates significance at the 10 percent level. The test employs the modifications proposed by Harvey, Leybourne, and Newbold (1997).        (12) including the ten monthly variables using the steady-state prior with a constant error covariance matrix. Two stars ( * * ) indicate that the Diebold-Mariano test of equal predictive ability is significant at the 1 percent level, whereas a single star indicates significance at the 10 percent level. The test employs the modifications proposed by Harvey et al. (1997).

D Forecast Evaluation Tables (Second Vintage)
The tables in the main text present the results of the forecast evaluation when evaluated with respect to the most recent vintage. The tables in this Appendix (Table 11-15) present the same evaluations but conducted with respect to the second available vintage. Because the federal funds rate is not revised, the second vintage is the same as the most recent vintage. Therefore, the results when evaluating the forecasts against the second vintage are identical to Table 9 and therefore not reproduced again here.