Prediction Regions for Poisson and Over-Dispersed Poisson Regression Models with Applications to Forecasting Number of Deaths during the COVID-19 Pandemic

Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.


Introduction
The current Coronavirus Disease (COVID-19) pandemic [12], caused by the SARS-CoV-2 virus, is providing statisticians, data scientists, machine learners, and other modelers a real-time laboratory to test and demonstrate their forecasting skills and abilities, with the quality of their forecasts assessable in a matter of days, weeks, or months. See, for instance, https://covid19-projections.com from the Masachussetts Institute of Technology (MIT) and the Institute of Health Metrics (IHME)'s https://covid19.healthdata.org/united-states-of-america based at the University of Washington in Seattle, as well as [15] discussing the complexities of modeling pandemics. Of particular interests are the forecasting of the numbers of daily cases 1 , deaths, and hospitalizations, or the cumulative cases, deaths, and hospitalizations attributable to COVID-19 at a future date in a specified country or a locality (e.g., a county, state, or province) on the basis of currently observed cases, deaths, and hospitalizations data. Such forecasts are of critical importance since they are major components in the decision-making process by government officials, business leaders, and educational and university administrators regarding the termination of lockdowns, lessening of social distancing and other mitigation regulations, opening of businesses, or continuing with online class formats in K-12 schools, colleges, and universities.
The left panel of Figure 1 provides the daily number of reported deaths due to COVID-19 for the United States (US) with respect to the number of days since December 31, 2019 until May 15, 2020, which is Day 137 in the figures, as reported by the European Center for Disease Control (ECDC) [11] [see Section A.1]. For a given date/day, including weekends, in the data set, the numbers reported are from the preceding day, which is due to a processing lag in reporting. The right panel of Figure 1 is a scatterplot of the cumulative number of deaths in the US due to COVID-19. Given these daily and cumulative deaths data set, it is of interest to forecast the number of cumulative deaths in the US by, say, May 25, 2020 (corresponding to Day 147), which is Memorial Day, and to ask whether by that day the cumulative number of deaths in the US due to COVID-19 will have surpassed the ominously depressing and grim milestone of 100,000 cumulative deaths. Later, for our illustration, we will consider the problem of forecasting the cumulative number of deaths in the US due to COVID-19 at the end of May 2020, and compare our forecast with what eventually occurred. Finally, we attempt to forecast the cumulative number of deaths by July 16th based on the data on July 2nd.
Such forecasting problems are clearly non-trivial since there is the distinct possibility that whatever model we had fitted in the observed time-frame may not apply to the time period under forecast, the ever-present danger and risk of extrapolation. Aside from the fitted model most likely not being the true data generating model -recalling the aphorism attributed to George E. P. Box [7] that all models are wrong, but some are useful -there are other factors, some beyond our control, that could impact the number of reported deaths at a future time, such as premature easing of social distancing and re-opening of business establishments, virus mutations, better diagnostic tools, changing hotspots, overburdened health care facilities, introduction of effective treatments, beneficial or detrimental actions by local, state, and/or federal entities, changing definition deaths Cumulative Deaths due to COVID-19, under-or over-reporting of deaths, timely development of a vaccine, protests and riots arising from social unrests, and others. But high-level decision-makers such as government officials, business leaders, educational administrators, and society itself, demand some beacon, however dim such beacons may be, to guide them in their decision-making. Statisticians, data scientists, machine learners, and other modelers are always ready and willing to provide such beacons. This paper is in this spirit. We will examine existing methods and develop new methods for constructing prediction regions for random variables that pertain to the number of occurrences of an event of interest. A prediction region contains more information compared to just a point prediction since it provides information about the uncertainty inherent in the prediction. Note that with a prediction region we are interested in the would-be realized value of a random variable, not the value of a parameter, hence instead of referring to it as a confidence region, it is instead called a prediction region. The events of particular interest are those that are 'rare' in the sense that, informally, the probability of an event occurring in an infinitesimal interval is also infinitesimal. Consequently, our starting point will be the Poisson distribution which is a model for the number of occurrences of a rare event, and transition to the more general Poisson regression model, and eventually to an over-dispersed Poisson regression model which turns out to be a better model in the COVID-19 application. The real-life and practical application for which our methods will be applied is the construction of prediction regions for the daily and cumulative number of deaths due to COVID-19 in the US for a future date given only the daily deaths data until a current date. Note that such predictions or forecasts could probably be improved by utilizing other information (such as the capacities of health care facilities; movements of people in a region; information about sensitivity and specificity of diagnostic tests; transmission rates (R 0 ) of the virus; and others), or via stratification by states, cities, or counties and then combining the results from these strata to obtain a point prediction and a prediction region for the whole US. However, we approach the construction of the prediction regions for the daily and cumulative deaths at a future date by just utilizing the observed reported daily deaths data for the whole US, which in a sense is the most reliable available data regarding this COVID-19 pandemic. It might be possible to utilize information about the number of cases or infected individuals, which is also reported daily, but we feel that this is not a reliable information since it is highly dependent on the number of tests that are performed and on the sensitivity and specificity of the diagnostic tests used. In addition, if such information is to be used in the prediction model, then we may not have their realized values at the future date on which the prediction region is desired. We point out that even though we are employing probabilistic models in the form of the Poisson or an over-dispersed Poisson model, which are derivable from intuitive conditions when dealing with rare events (cf., [23,16]), our prediction method is still purely data-driven being only reliant on the observed data.

Poisson Model
The occurrence of a death due to COVID-19 could still be considered as a rare event when viewed in the context of the whole population, though even if it is rare, deaths are still significant and dire events. This is because to die of COVID-19, generally one first needs to get infected, which at this point is still a rare event, and then having been infected, to die from it. The rate of dying when infected with COVID-19, if not age-adjusted, is still rather low, less than 2% (see, for instance, Coronavirus (COVID-19) Mortality Rate). Because of its rarity, a plausible probability model for the number of deaths due to COVID-19 is therefore the Poisson model whose probability mass function (PMF) is given by with Z 0,+ = {0, 1, 2, . . .}, and λ > 0 is the rate parameter, which is also the mean and variance of the distribution, and I{·} is the indicator function. For a variable Y with this Poisson distribution, we write Y ∼ P OI(λ). The cumulative distribution function of a P OI(λ) is We start our investigations with this no-covariate Poisson model, equivalently, a model with intercept only, since results for the Poisson regression model build on this no-covariate model.

Prediction Regions when Rate λ is Known
Suppose now that Y 0 ∼ P OI(λ), where for the moment we assume that we know the rate parameter Note that this region will not be an interval being a subset of Z 0,+ , though if this region is formed as the intersection between Z 0,+ and an interval in , then we may call it imprecisely as an interval. Subject to this condition, a desirable property of such a region is that its cardinality is as small as possible. If we allow for Γ(λ, α) to depend on a randomizer U , a standard uniform random variable independent of Y 0 , the smallest cardinality 100(1 − α)% prediction region is, using a Neyman-Pearson Lemma type argument, given by where, for d ∈ , we define the subsets of Z 0,+ given by and c(α) and γ(α) determined via with 0/0 = 0. Observe that by allowing randomized prediction regions, we have If we do not admit randomized prediction regions, which is achieved by always taking U = 0 in Γ 0 (U ; λ, α), then unless 1 − α is a 'natural' prediction coefficient, we will not achieve equality in the preceding probability statement. The use of the adjective 'natural' is analogous to its use in constructing nonparametric confidence intervals, cf., [21]. See [19] on the application of Neyman-Pearson-type arguments to construct optimal confidence regions, which could be adapted to the construction of prediction regions. There are two other ways of constructing prediction intervals for Y 0 when λ is large using normal approximations. To obtain the prediction regions, these intervals are then intersected with Z 0,+ . Letting N (µ, σ 2 ) denote a normal distribution with mean µ and variance σ 2 , we recall that when λ is large owing to the Central Limit Theorem and the Delta Method (cf., [8]), then we have the normal approximations Let φ(·) and Φ(·) be the probability density and cumulative distribution functions of a standard normal random variable so that Let z α = Φ −1 (1 − α) be its (1 − α)th quantile. Two approximate prediction regions for Y 0 when λ is large, which are based on the above normal approximations, are given by When λ is large, as noted in the construction of Γ 1 , we may approximate the Poisson probabilities by normal probabilities, via As such, we obtain the approximation Consequently, when λ is large, the regions Γ 0 (U ; λ, α) and Γ 1 (λ, α) should be close to each other. For these prediction regions Γ 0R (randomized), Γ 0N (nonrandomized), Γ 1 , and Γ 2 , the exact coverage probabilities and their exact lengths (mean length for Γ 0R ) could be computed under the P OI(λ) distribution, since λ is known. Note that the lengths, which are the differences between the upper and lower integer limits of the prediction regions, are equivalent surrogates of the cardinalities of the regions. Figure 2 depicts the exact coverage probabilities (CP), expressed in percentages, and their lengths (expected length for Γ 0R ), for different values of λ. Except when λ takes small values  where the coverage probabilities of Γ 1 and Γ 2 are degraded, especially for the latter, the performance of these prediction regions are quite similar. The coverage probability of Γ 0R is exactly equal to 1 − α, whereas that for Γ 0N is always at least equal to 1 − α. Both Γ 1 and Γ 2 could have coverage probabilities that could be below the nominal coverage level, though as λ increases, these differences become negligible. By construction, Γ 0R has a shorter interval than Γ 0N ; for some values of λ, the length of Γ 0R exceeds that of Γ 1 and Γ 2 , but this is because the coverage probabilities of Γ 1 and Γ 2 are lower than the nominal coverage level.

Prediction Regions when Rate λ is Unknown
But, in the preceding developments, we have assumed that the rate parameter λ is known, an unrealistic assumption. How do we deal with the situation when λ is unknown? Suppose that we had observed a realization y = (y 1 , y 2 , . . . , y n ) of a random sample Y = (Y 1 , Y 2 , . . . , Y n ) from P OI(λ), so the components of Y are independent and identically distributed (IID) from P OI(λ).
Our goal is to utilize y to construct a 100(1 − α)% prediction region for an unobserved Y 0 , which is independent of Y and whose distribution is also P OI(λ). How will we achieve our goal? Note that through the Sufficiency Principle, we may reduce the problem by simply assuming that we had observed t = n i=1 y i , the realization of the sufficient statistic for λ given by T = n i=1 Y i , which has a P OI(nλ) distribution. The reduced problem therefore is that we have (T, Y 0 ) which are independent random variables with T ∼ P OI(nλ) and Y 0 ∼ P OI(λ) and our goal is to construct a 100(1 − α)% prediction regionΓ(T, U ; α) for Y 0 , which utilizes T , and possibly a randomizer U which is independent of (T, Y 0 ).
Given T = t, the maximum likelihood estimate (MLE) of λ isλ(t) = t/n. By virtue of the consistency ofλ(T ) for λ as n → ∞, a seemingly straight-forward approach to constructing a prediction region for Y 0 is to replace λ in Γ 0 (U ; λ, α), Γ 1 (λ, α), and Γ 2 (λ, α) in (4), (5), and (6), respectively, byλ(t) to obtainΓ 0 (T, U ; α) = Γ 0 (U ;λ(T ), α); How do these prediction regions compare with each other in terms of performance, both in the context of their coverage probabilities and also their cardinalities, whose surrogate are lengths? In particular, by substitutingλ(T ) = T /n for λ, how does this impact the coverage probabilities of these prediction regions and are they still valid, even in an asymptotic sense? It is not clear how the substitution of λ byλ(T ) = T /n will impact the exact performance of the first prediction regionΓ 0 . However, for the second and third prediction regionsΓ 1 andΓ 2 , we could alter them to take into account the substitutions, provided that λ is large. As noted earlier, when λ is large,Γ 0 ≈Γ 1 , so the alteration ofΓ 1 should also apply, approximately, toΓ 0 . The change in distributions of the pivotal quantities arising from these substitutions are reflected below, a consequence of the Delta-Method.
From these normal approximations, we could improve the prediction intervalsΓ 1 andΓ 2 into the following prediction intervals, which take into account the impact of these substitutions where, for notational economy, we writeλ forλ(T ) and '∨' for max; for the ceiling function; and for the floor function: Note that by intersecting the intervals with Z 0,+ , the floor and ceiling functions are actually not needed, but we retain them in the formula since when we consider the 'length', this pertains to the length of the interval. Observe that if the lower limits of these intervals are not zeros, which will usually be the case for large λ, then it is a simple exercise to show that these two prediction intervals have the same lengths, but they are not identical regions. In trying to adapt the prediction region Γ 0 (U ; λ, α) in (4) to the situation where λ is unknown, the main idea is to replace λ by an estimate obtained from the observed data. Doing so leads to estimates of p(k|λ), k ∈ Z 0,+ which are then used in determining c(α) and γ(α) in (4). Thus,Γ 0 in (7) is obtained by using the ML estimates of {p(k|λ), k = 0, 1, . . .} given by {p(k|λ(n, t)), k = 0, 1, . . .} withλ(n, t) = t/n. This begs the question on whether other possible estimates of {p(k|λ), k = 0, 1, . . .} could be utilized which may have better performances than the use of the ML estimates. An approach based on a second-order Taylor expansion adjusts p(k|λ(n, T )) and leads to the approximation p(k|λ) ≈p 3 (k; (n, t)) ≡ p(k|λ(n, t)) By usingp 3 (k; (n, t)) in place of p(k|λ) in (4) results in the prediction region denoted byΓ 3 (n, t; α). Another intriguing possibility is to utilize the uniformly minimum variance unbiased estimator (UMVUE) (see [8]) of p(k|λ), given the data (n, T ) with T ∼ P OI(nλ), or equivalently, Y 1 , Y 2 , . . . , Y n which are IID P OI(λ). The UMVUE of p(k|λ), usually obtained through the Rao-Blackwell Theorem and Lehmann-Scheffe Theorem [8], is given bŷ the binomial probability at k with parameters (t, 1/n). Observe, however, that this approximation will lead to zero probabilities for k outside of the set {0, 1, . . . , t}. Usingp 4 (k; (n, t)) in lieu of p(k|λ) in (4) leads to the prediction region denoted byΓ 4 (n, t; α).
As yet another idea is to develop a procedure by borrowing from the Bayesian playbook [8]. We suppose that our prior knowledge of the value of the Poisson rate λ is represented by a distribution function G. Having observed Y = y = (y 1 , y 2 , . . . , y n ), the posterior distribution of λ is given by with t = t(y) = n i=1 y i . The conditional probability mass function of Y 0 , given Y = y, also called the posterior predictive PMF, is If we are pure Bayesians, then we will completely know, or trust, our G, so we could use the predictive PMF p(·|y; G) in lieu of the Poisson PMF in (4) to form a Bayesian prediction region for Y 0 . Usually, however, we may try to estimate G by aĜ(·; y) based on y. This brings us to the realm of the Empirical Bayes (EB) approach, pioneered by Herbert Robbins; see [24,25,26]. An extreme case is to 'estimate' G by a degenerate distribution at the ML estimateλ = t/n, which leads to just substitutingλ in the Poisson PMF, hence results in the prediction regionΓ 0 in (7). Another possibility is to try to estimate G non-parametrically. However, here we implement this Bayesian and EB approaches using a family of conjugate priors, so we assume G is a gamma distribution with mean κ/β and variance κ/β 2 , denoted by G κ,β , whose density function is where κ > 0 and β > 0. Note that κ is the shape parameter and β is the scale parameter. Under G = G κ,β , simplifying (15) we obtain, for y 0 ∈ Z 0,+ , When κ is a positive integer, the PMF in (16) corresponds to a negative binomial distribution with parameters κ + t and (β + n)/(β + n + 1). The PMF in (16) could be used in place of the Poisson PMF in (4) to form a Bayesian prediction region for Y 0 , given (κ, β), denoted byΓ 5 (U, Y; (κ, β)). An approach to specifying (κ, β) is to specify a prior mean and prior standard deviation for λ, say M and S, respectively, which yield κ = M 2 /S 2 and β = M/S 2 . The EB approach estimates κ and β from the data y = (y 1 , y 2 , . . . , y n ). Such estimation could be done via maximum likelihood using the likelihood function obtained from the joint marginal distribution of (Y 1 , Y 2 , . . . , Y n ) based on the model Y i |λ i ∼ P OI(λ i ) and λ i ∼ G κ,β . This likelihood function is given by A method-of-moments approach to estimating (κ, β) based on y fails, however, because negative estimates of κ and β are obtained when the sample variance of y is smaller than its sample mean. At this point we mention previous works dealing with prediction intervals under the Poisson model. Prediction interval methods for the Poisson model have been incorporated in the R package envStats [17]. An object function in this package is predIntPois dealing with construction of prediction intervals under the Poisson model. It provides four options for the type of prediction interval to construct. The methods are based on procedures presented in [9,13,18]. The option normal.approx in predIntPois coincides with the prediction regionΓ 1 based on the normal approximation. In these earlier procedures, randomization was not utilized, hence generally conservative prediction intervals are obtained. Other approaches for prediction interval construction under the Poisson model, including Poisson regression models, are based on bootstrapping and simulation techniques, hence are computationally-intensive [3].
To compare performance of the prediction regionsΓ 0 (randomized version),Γ 1 ,Γ 2 ,Γ 3 ,Γ 4 , anď Γ 5 with M = 50, S = 100, and for n ∈ {5, 10, 15, 20, 30, 50, 70, 100} and λ ∈ {1, 5, 15, 30, 50, 100, 200}, we performed simulation studies, with program codes in the R [20] environment, to determine the coverage probabilities and the lengths of the regions (recall that length is an equivalent surrogate for the cardinality of the regions since we took the ceiling and the floor of the lower and upper limits, respectively, for the intervals that leads toΓ 1 andΓ 2 ). For each combination of n and λ, 10000 simulation replications of the basic simulation experiment were performed. Table 4 in the Appendix presents the results on the coverage percentages, mean lengths of the prediction intervals, and standard deviations of the lengths for different values of λ. The basic simulation experiment is, for a fixed n and λ, to generate T ∼ P OI(nλ) and Y 0 ∼ P OI(λ). The T variable could be viewed as . . , Y n are IID from a P OI(λ), The prediction regions are then constructed based on the observed (n, T ), with prediction coefficient of 95%. Note that sincě Γ 5 is the Bayes prediction region instead of the EB, we only needed the value of T = n i=1 Y i , but if we also used the EB approach, then we would have needed the values of (y 1 , y 2 , . . . , y n ) to estimate (κ, β). After constructing the prediction regions, it is then determined if Y 0 is contained in these regions. Coverage percentage is the percentage out of the 10000 prediction regions that contain the Y 0 ; mean (standard deviation) length is the average (standard deviation) of the lengths of the 10000 prediction intervals. Figure 3 presents plots with respect to n of the coverage probabilities (CP) and mean lengths (ML) for λ ∈ {1, 5, 30, 100}.
Examining Table 4 and Figure 3 we observe that when λ = 1, the CP ofΓ 2 is very poor and even deteriorates as n increases. The reason for this is that the realized Y 0 tends to equal 0, but the square-root transformation has a tendency to shift to the right the prediction interval, hence the interval tends to miss Y 0 . This result forΓ 2 is consistent with the result when the rate λ is known. When n = 5,Γ 3 andΓ 4 have unacceptably lower CPs compared to the nominal level, whileΓ 0 also has CPs which are below the nominal level, as well asΓ 1 andΓ 5 , though the last two regions have CPs closer to the desired level. The length ofΓ 0 tends to be shorter thanΓ 1 andΓ 5 . As n increases, the CPs ofΓ 3 andΓ 4 get closer to the desired level, and their lengths tend to be a bit shorter thanΓ 0 andΓ 1 . When λ = 5, the CPs ofΓ 0 ,Γ 2 ,Γ 3 , andΓ 4 are all below the nominal level, whereas forΓ 1 andΓ 5 , their CPs exceed or are quite close to the nominal level, except when n = 5. As a consequence, they ended up having longer mean lengths. These behaviors continue to hold as λ was increased, but with the CPs getting closer to the nominal level, especially as n increases. When n is small, the CPs ofΓ 3 andΓ 4 are still appreciably lower than the nominal level. When λ is large,Γ 1 ,Γ 2 , andΓ 5 almost have the same performance. Summing up our observations from these simulation studies for this no-covariate or intercept only Poisson model, in terms of adapting to the estimation of the unknown rate λ,Γ 1 andΓ 5 possess the best performance among these six prediction regions in terms of achieving the nominal level, but they also tend to be longer than the others.
negative continuously-differentiable function ρ(·), called the inverse link function, such that This is the so-called Poisson regression model and belongs to the class of generalized linear models or the class of non-linear models [22,8]. If Y 0 , given x = x 0 , has a Poisson distribution with rate λ(x 0 ; θ), and θ is known, then we could construct prediction regions for Y 0 according to the methods described in the first part of Section 2 when λ was assumed known.

With Unknown Parameter Vector
When θ is not known, then there is a need to estimate it. Let us therefore assume that we are able to observe the sample ) and with the Y i s independent and the x i s fixed. We seek to construct a prediction region for Y 0 associated with the covariate vector x 0 . First, we introduce the following functions: The log-likelihood function for θ, given {(y i , x i ), i = 1, 2, . . . , n}, is given by The associated score vector function is whereas, the observed Fisher information matrix is, with x ⊗2 = x T x, Thus, the expected Fisher information matrix is The MLE of θ based on {(Y i , x i ), i = 1, 2, . . . , n}, denoted byθ, solves the equation This will usually be obtained through iterative procedures, such as the iterative Newton-Raphson method, with the iteration given by By the large-sample theory of ML estimation (cf., [8,29]), as n → ∞ and under regularity conditions on the sequence of covariate vectors x i , i = 1, 2, . . . , n, we have that A consistent estimator of I(θ) is I(θ). By the Delta-Method, it then follows that the ML estimator of λ(x 0 ; θ) satisfies, as n → ∞, where 'tr' means trace of a matrix. Using this result and when λ 0 = ρ(x 0 θ) is large, we obtain the approximate distributions of relevant pivotal quantities for constructing prediction regions. We writeλ 0 for λ(x 0 ;θ) andψ 0 for ψ(x 0θ ). These pivotal quantities are: . . , n}, from these pivotal quantities, we are then able to obtain approximate prediction regions for Y 0 given by: We could also have the prediction region based on Γ 0 from Section 2 given bỹ where we note that the dependence on x 0 and (Y, X) is throughλ 0 . Note, however, that we are simply plugging in the estimate of λ(x 0 ; θ), but without taking into consideration the variability inherent in the estimator λ(x 0 ;θ). A specific inverse link function ρ(·), which we will consider in the application to forecasting deaths in the US due to COVID-19, is the exponential function ρ(w) = exp(w), so that For this special inverse link function, we obtain the simplifications for the score vector and information matrices functions given by We also mention the extension of the Bayesian/EB approaches to constructing prediction regions in the regression setting. We suppose that the parameter θ in λ(x; θ) = ρ(xθ) takes values in a parameter space Θ. The approach then proceeds by starting with a prior distribution Π(·) on Θ which quantifies our prior knowledge about θ. The posterior predictive distribution of Y 0 , the response at x 0 , given the data (Y, X) = {(y i , x i ), i = 1, 2, . . . , n}, is given by where where the product and the sum are taken over the index set associated with (Y, X), so will be over {1, 2, . . . , n} for (Y, X), and {0, 1, 2, . . . , n} for (Y 0 , X 0 ). Generally, there will be no family of conjugate prior distributions on Θ with respect to the Poisson regression model, so the function H will not be in a closed analytical form, so that it has to be computed numerically, for instance, using Markov Chain Monte Carlo (MCMC) algorithms. Nevertheless, upon obtaining the posterior predictive distribution of Y 0 given in (27), a prediction region is then obtained by using this PMF p(y 0 |x 0 , (Y, X)) in lieu of the Poisson PMF in (4), analogously to the development of the prediction regionΓ 5 in the intercept only model. The prior distribution Π will involve hyper-parameters, for example, if Θ = p+1 , Π could be specified to be a multivariate normal distribution with mean vector µ and covariance matrix Σ, so (µ, Σ) will be the hyper-parameters. For the Bayesian, these hyper-parameters will be assigned values, unless an improper prior distribution (e.g., Lebesgue measure), which need not involve unknown hyper-parameters, is adopted; whereas, for the empirical Bayesian, these hyper-parameters will be estimated using the data (Y, X). Because of the need to approximate the posterior predictive PMF through numerical methods, these Bayesian and EB approaches to constructing a prediction region for Y 0 are clearly computationally-intensive, especially if used in a simulation study to investigate their properties, such as their coverage probabilities and their lengths. Because of the need to specify a non-conjugate prior and its hyper-parameters and the need for intensive computations, these Bayesian and EB procedures are not included in the illustrations, simulations, and applications. It is clear, though, that they are highly viable alternative procedures and should be further explored.

Illustrations of Prediction Regions
We demonstrate these prediction regions, depicted as intervals in the plots, via the following experiment. We specify a sample size n and an order p. We then generate IID realizations w i , i = 1, 2, . . . , n, n + 1, from either a N (µ, σ 2 ) distribution or a standard uniform distribution, and form the covariate vectors x i = (1, w i , w 2 i , . . . , w p i ), i = 1, 2, . . . , n, n + 1. For a specified θ = (θ 0 , θ 1 , . . . , θ p ) T , the Poisson rates are computed. The ith response y i is a realization of a random draw from a P OI(λ i ). The response vector is y = (y 1 , y 2 , . . . , y n , y n+1 ) T . The goal is to construct a prediction region for (25), (23), and (24), respectively. The procedures were coded into R functions and these will be made available publicly in due time. We present the results pictorially via a scatterplot of {(w i , y i ), i = 1, 2, . . . , n, n + 1}. The realized value y 0 ≡ y n+1 of Y 0 is highlighted and the three prediction intervals for Y 0 are also plotted. Included in the plot is the theoretical curve for the λ(x; θ) as a function of w and we also super-impose the fitted curve. Prediction regions for Y i at w i , i = 1, 2, . . . , n, are also depicted in the plot. , which all contained y 101 . Two things to observe from this plot are (1) the prediction regionΓ 2 was shifted to the right relative toΓ 1 , and (2) the prediction regionsΓ 0 with respect to the w-values are scissorlike or jagged. The latter is a consequence of the randomization approach in the construction of the prediction regions and this non-smooth behavior becomes more apparent since the realized values of the Y i s are small. The third realization, depicted in Figure 6, is from a model with n = 200, p = 3, with W i ∼ N (µ = 1, σ = 2), and with θ = (3, .2, −.1, −.05). The rate curve λ(w) as a function of w goes to zero as w increases, but goes to ∞ as w decreases, with a local minimum and maximum close to w = −2 and w = 1, respectively. The target of the prediction regions was

Simulation Studies
In each of these illustrative realizations, the three prediction regions did not vary much from each other in terms of their sizes, except in the first case whereΓ 0 was shorter but barely covered

Prediction Regions in Poisson Regression
Gamma2 TARGET the value being predicted. A question that now arises is how their coverage probabilities and their mean lengths compare with each other? To gain some insights into these comparisons, we performed simulation studies under the four different models described above, with each simulation run having 10000 replications. The sample sizes considered were n ∈ {30, 50, 100, 200}. In the Appendix, Tables 11, 12, 13 and 14 summarize the results of these simulations where we report the coverage probabilities (CP), mean lengths (ML), and standard deviation of lengths (SL). Examining these tables, it appears thatΓ 0 has coverage probabilities that are below the nominal level (between 3% and 4% below in Table 14 when n = 30), with the discrepancy being more pronounced when the sample size is small. As the sample size is increased, these observed coverage probabilities get closer to the nominal level. This deficiency is due to the estimation of the θ parameter and, as previously noted, theΓ 0 does not take into consideration the variability in the resulting estimator of λ(x 0 ; θ). On the other hand,Γ 1 andΓ 2 both achieve coverage probabilities that are quite close to the nominal level, especiallyΓ 1 , when n is large.Γ 0 , on the other hand, tends to have a lower mean length compared to the mean lengths ofΓ 1 andΓ 2 , with the differences in mean lengths becoming alarmingly large for the model in Table 13. Recall that for this model, the rate curve increases to ∞ as w decreases to −∞, and since the w i 's are generated from a normal distribution, on some occasions, w n+1 falls outside the range of {w i , i = 1, 2, . . . , n}. Depending on how different w n+1 is from the mean of w 1 , w 2 , . . . , w n , this could lead to a large estimate of the standard error ofλ n+1 ,

Prediction Regions in Poisson Regression
Gamma2 TARGET thus leading to very wide prediction regions forΓ 1 andΓ 2 . SinceΓ 0 simply utilized the estimate ofλ n+1 , but was totally oblivious to its variability, it was not much affected in such a situation. However, because of its rigidness with respect to this added variability, it could dramatically suffer. We demonstrate this situation by plotting an extreme realization in Figure  This particular demonstration warns us of the danger and pitfalls of making a prediction for a response variable that is associated with a covariate vector outside the convex hull of the covariate vectors used in the construction of the prediction regions and when the Poisson rate hyper-surface generated by the map x → ρ(xθ) is complex. As a word of caution, when performing extrapolation to do predictions, be forewarned of sinkholes littering the forecasting road -and, if it could be avoided, make no forecasts on long, especially very long, horizons. But, alas, this is the type of forecasting problem that is actually realistic and of most interest, such as that of predicting the number of cases or deaths due to COVID-19 in a future date, given the observed data up to a

Prediction Regions in Poisson Regression
Gamma2 TARGET certain date. Based on the results of these simulation studies, the prediction regionΓ 1 appears to be the most preferable among the three prediction regions. In our illustration using the COVID-19 data set in Section 4, we will therefore just present the prediction region provided byΓ 1 .

Forecasting Application to COVID-19 Pandemic
We now present in this section an illustration of the potential application of the procedures discussed in the preceding sections. One of the interesting questions during this COVID-19 pandemic is the forecasting of the number of cumulative deaths in the US at a given date, for example, at the end of May 31, 2020, given information up to a certain date, say May 15, 2020. Such forecasts are of critical importance since they could partly be the basis of highly consequential and possibly controversial decisions by federal, state, and local governments officials, school administrators, executives of big corporations and small businesses, religious leaders, and many others. Such decisions could pertain to when to institute stay-in-place directives, when to issue social distancing or social easing guidelines, when to open business establishments, when to open public places such as shopping malls and ocean beaches, when to allow religious gatherings, etc. Data for daily deaths and cumulative   [11]. Clearly, the sequence of cumulative deaths does not satisfy the independence assumption, so a non-homogeneous Poisson process model [23] is not an appropriate model for cumulative deaths when viewed as a continuous-time stochastic process. However, a non-homogeneous Poisson process could plausibly model the occurrences of deaths in continuous-time, from which it follows that the sequence of daily deaths will be independently Poisson distributed with possibly different rates depending on the number of days from the time origin and the specific day of the week, as well as other features such as, for example, the quality of the health care facilities, which is hard to quantify and not available in the European CDC data set. Our novel idea therefore is to utilize Poisson regression to predict the number of daily deaths according to the methods developed earlier, and then to aggregate these daily forecasts to obtain forecasts of the cumulative deaths.
We will use the data set for the US provided by the European CDC ( [11]) plotted in Figure  1 which are the observed numbers of daily deaths attributed to COVID-19 starting on March 1, 2020, the day after the first reported death due to COVID-19, until May 15, 2020. Note that, technically, this will be the deaths data at the end of May 14, 2020. Using this data set

Prediction Regions in Poisson Regression
Gamma2 TARGET on May 15th, and given the cumulative number of deaths until then, the goal is to forecast the cumulative number of deaths in the US due to COVID-19 by the end of May 31, 2020, that is, June 1, 2020. We limit our illustration to simply utilizing the variable DayNum, which is the number of days starting from December 31, 2019, Day which the day of the week, and Deaths, the variable representing the daily number of deaths. We surmise that the deaths data set is the most reliable among the data sets that were compiled, compared, for instance, to the data set pertaining to the number of cases or infected people. However, the deaths data set need not also be totally reliable and could be subject to misclassification error and competing causes of deaths., For example, a patient who contracted COVID-19 who dies primarily because of pneumonia may be classified as having died of COVID-19, but could also be classified as having died, not of COVID-19, but of pneumonia. See also, for instance, the WSJ article [14], [2], and the BBC news article https://www.bbc.com/news/world-53073046 [10], the last two discussing the notion of "excess deaths," which are deaths that may have been due to the pandemic, but which are not included in the reported COVID-19 deaths data set. Certainly, we could have used other information such as the number of reported cases; by performing separate forecasts in each of the 50 states and the District of Columbia, then aggregating; or even by utilizing counties or metropolitan cities as strata, and then combining forecasts from these strata to obtain an overall forecast for the whole US. However, for illustrative purposes, we decided to keep things simple.  = 137), consisting of 76 days, we have the daily number of deaths, hence also the cumulative number of deaths. The other variable used in the modeling is Day (e.g., Sunday, Monday, etc.) associated with each value of DayNum, which is a categorical or factor variable. We fitted a Poisson regression model using the glm function in R with a log-link, with response being Y = Deaths and covariate vector X = (1, W, W 2 , W 3 , W 4 , W 5 , Day), where W = DayNum, and Day is considered as a factor variable hence is converted into six, instead of seven since we already have an intercept term, dummy variables in the design matrix. We chose this 5th-order model with respect to DayNum since the Akaike Information Criterion (AIC) values, computed under the Poisson regression model, appear to stabilize starting at this model and adhering to the Law of Parsimony (Occam's Razor). The AIC values associated with the 8thand 9th-order models were actually smaller than for the 5th-order model; however, these models possess highly unstable predicted values. Table 1 summarizes the AIC values for the different models, both without and with Day in the model. It also contains the estimates of ξ, the overdispersion parameter in a model that will be introduced shortly, and as we will then see, larger values of ξ are indicative of the Poisson regression model becoming a more adequate model. The histogram and time plot of the residuals with respect to DayNum are provided in Figure 9. Recall that the ith residual in Poisson regression is defined as whereλ i is the fitted value associated with x i . As such, if the Poisson regression model is adequate, one should see a histogram similar to that associated with a centered (at zero) Poisson distribution with unit rate, but this will just be an approximation since the rates are estimated, hence that affects the distribution of the residual. Similarly, the time plot of the residuals should be randomly distributed on the zero horizontal line. The histogram and time plot in Figure 9, with the time Thus, the reported number on April 16th of 4928, the highest number of daily deaths reported, is an outlier explained by the adjustment made. However, we still included these perceived outliers in the fitting of the fifth-order, with respect to DayNum, Poisson regression model. Later, when we consider forecasting for July 15th and August 1st, and since another significant adjustment was made on June 26th, we will re-allocate each of the adjustments proportionately to the observed deaths on the days on or before the adjustment day.

Residuals
Based on the fitted model's residuals, we further assessed the independence assumption of the daily deaths. We do this by creating a contingency table for DayNum with six intervals and with Residuals being either negative or positive and then performing a test for independence. The observed contingency table is presented in Table 2. A chi-square test for independence based on this table yielded χ 2 c = 4.4685 on 5 degrees-of-freedom, with associated p-value of 0.4841, hence the null hypothesis of independence cannot be rejected. Observe, however, that the fit of the model in the early days is not satisfactory, and between DayNum 99 to 112, there was a preponderance of negative residuals, possibly owing to the influence of the adjusted reported daily deaths on DayNum 108.
It may appear surprising and counter intuitive to include a Day effect in the model, since one Recall that our main objective is to obtain a prediction region for the cumulative number of deaths at a specified date, in our case June 1, 2020 (DayNum = 154). This means we were predicting the cumulative deaths at the end of May 31, 2020. From DayNum = 138 to DayNum = 154, there were a total of 17 days. If we denote by Y j , j = 62, 63, . . . , 154, the random variable denoting the daily number of deaths for DayNum = j, the random variable denoting the cumulative number of deaths until DayNum = k is S k = k j=62 Y k . Thus, we are seeking a prediction region for S 154 , given that S 137 = 85906. Under the fitted Poisson regression model, Y j , given with w j = DayNum j and d kj , k = 1, 2, . . . , 6, the dummy variables representing whether Day is a Tuesday, a Wednesday, a Thursday, a Friday, a Saturday, or a Sunday, respectively, has Poisson distribution with rate λ(x j ; θ) = exp {x j θ} . We mention that in our R code for fitting this model, for computational stability, we first centered and standardized the non-constant columns of the x j 's. Let α * ∈ (0, 1), and the Y j s being independent. From the preceding section we know how to construct a 100(1 − α * )% prediction interval [a j , b j ] for Y j , where, as mentioned earlier, we will simply utilize the prediction regionΓ 1 . By the independence, we will have Thus, if we wanted S 137 +[a • , b • ] to be a 100(1−α)% prediction interval for S 154 , given S 137 , we could choose α * = 1 − (1 − α) 1/17 . This procedure will guarantee a conservative 100(1 − α)% prediction interval for S 154 , given S 137 . This is the approach we followed in constructing a (conservative) 95% prediction interval for S 154 , the cumulative number of deaths due to COVID-19 in the US by the end of May 31, 2020.

Over-Dispersed Poisson Regression Model
We implemented the above procedure and also constructed the prediction intervals at each of the observed DayNum which are depicted in Figure 10. Examining this figure note that there are more  observed Deaths outside the prediction curves than what is expected nominally. This indicates that either there is more variability inherent in the stochastic mechanism generating the observed number of daily deaths relative to a purely Poisson regression model, or the fifth-order Poisson rate model is still inadequate, or both. We propose an approach that introduces over-dispersion with the Poisson regression model serving as a hidden model. We mention that our Occam's Razor-type solution is motivated by frailty modeling in Survival Analysis (see, for instance, [5]). Our model assumes the existence of an unobserved positive latent variable Z j of mean 1 at w j = DayNum j , and the reported number of deaths Y j is the integer part of Z j Y * j , with {Y * j } arising from a Poisson regression model and with Z j and Y * j independent. Recall that frailty models in Survival Analysis are used specifically to model correlations among observations; whereas, in our model it serves as an unobserved random contamination component in the observed number of daily deaths. In our implementation, we shall take Z j to have a gamma distribution with mean one and variance 1/ξ (see [8]). This ξ is then an additional parameter in the regression model aside from the parameter vector θ. Such a model leads to an over-dispersed Poisson regression model, with the purely Poisson regression model embedded in this model and obtainable as a limiting case when ξ → ∞. Inference for such a model requires further study, with possible use of an EM-type algorithm, though this could be difficult to implement since the distributions of the Y i s are not in closed forms. However, we may implement a Z-estimation approach (see, for instance, [29]). We first note that The approximate higher-order moments of the Y i s are also obtainable. Based on these moments, we could form the set of estimating equations, where we recall that ρ(w) = exp(w): By Z-Estimation Theory ( [29]) and under regularity conditions, it will follow that, for some (p + 1) × (p + 1) matrix In fact, let us introduce the (p + 1) × 1 vector functions: Denote by H the (p + 1) × (p + 1) matrix function consisting of the derivatives of U with respect to (θ, ξ). The components of this matrix function are: x ⊗2 i ρ(x i θ) and H 12 ((y, x); θ, ξ) = 0; We could then obtain the estimates via the Newton-Raphson (NR) method with iteration step It turns out that a simpler way to obtain the estimates of θ and ξ is to first obtain the estimatẽ θ of θ from the first estimating equation. This could be done by using the glm object function in R [20] with the Poisson family and logarithm link. The estimateξ of ξ is then obtained from the second estimating equation using a one-variable NR iteration with θ replaced byθ. Define the (p + 1) × (p + 1) matrices Σ and Ω according to with plim denoting "in-probability limit" as n → ∞. Then, the asymptotic covariance matrix of (θ T ,ξ) T is We note that another way of obtaining estimates of Σ and Ω is to obtain their theoretical expressions using higher-order moments of the Y j s. These expressions depend on θ and ξ, so estimates could then be obtained by replacing θ and ξ byθ andξ. We also point out that Ξ 11 is generally not equal to Σ −1 11 when ξ is finite. In fact, it is imperative that Ξ 11 should be used instead of Σ −1 11 since it takes into account the impact of the estimation of ξ.
Applying the Delta-Method [29], we then have the pivotal quantity result, as n → ∞, given by where quantities with 'ˆ' are estimates obtained by plugging inθ andξ for θ and ξ in their respective expressions. From this pivotal quantity, it follows that an approximate 100(1 − α)% prediction interval for Y 0 is given by Because of the term 1/ξ, when ξ is small, this prediction interval will be wider than the prediction intervals under the purely Poisson regression model. As ξ → ∞, which makes the model approach the Poisson regression model, then this prediction interval will approachΓ 1 .
Implementing this procedure based on this over-dispersed Poisson regression model, we first examine the prediction intervals at each of the observed DayNum values, which are shown in Figure  11. We now see that these approximate 95% prediction intervals cover most of the observed daily deaths. This indicates that the over-dispersed Poisson regression model provides a better fit to the observed daily deaths data than the purely Poisson regression model whose approximate 95% prediction intervals are shown in Figure 10. The estimate of ξ turned out to beξ = 16.89016. The Daily Deaths left-panel of Figure 12 is the scatterplot of daily deaths, but now including the actual observed values after DayNum 137 and until 154 (these are the red dots) and the prediction intervals for the daily deaths past 137. Observe that the prediction intervals for DayNum between 62 and 137 are wider than those in Figure 11 and the reason for this is the prediction coefficient used is now adjusted for the goal of constructing a prediction interval for S 154 . In this case, α * = 0.003012. The right-panel of Figure 12 displays the prediction interval for S 154 , given S 137 ; in fact, this also displays the prediction intervals for S k , k = 138, 139, . . . , 153, given S 137 . The red dots are the actual observed values past DayNum equal to 137. The predicted cumulative deaths on DayNum = 154 wasŜ 154 = 96876 and the conservative prediction interval for S 154 was [86157,118323]. In both of these plots, notice that the 5th-order prediction model did not perform well past DayNum = 150, though the prediction interval for S 154 did cover what was actually observed, which was 104383. From Figure 12, an elevated number of daily deaths occurred starting at DayNum = 150 (May 28). The grim milestone of 100000 cumulative deaths due to COVID-19 in the US was also surpassed on this day.

Sensitivity Analysis
To partly assess the sensitivity of the procedure, if we had used the data until DayNum = 145 (May 23rd), the predicted value for S 154 isŜ 154 = 101010 and the conservative 95% prediction interval, given S 145 , is [96567, 106057]. The associated plots in this case are provided in Figure  13. On the other hand, if we had used the data until June 1, 2020 (DayNum = 154), the fitted value is 104383, which coincided with the observed cumulative number of deaths. That the fitted cumulative number of deaths and the observed cumulative number of deaths on the last day were equal is actually a consequence of the estimating equation, hence in hindsight is not a surprising result. The associated plots in this case are provided in Figure 14. Observe in the right-panel of Figure 14 that the model for the cumulative deaths based on this 5th-order model is quite excellent, lending support to our novel approach of modeling the daily deaths data, instead of the cumulative deaths data, for the purpose of making predictions for the cumulative deaths. We summarize the results of this sensitivity analysis in Table 3 and Figure 15, where we report the predicted values and the prediction intervals for S 154 under scenarios where data used in the model fitting is up to the different days from DayNum equal to 137 up to 153. Based on this analysis, the fifth-order model appears to possess stability since the predictions and the prediction intervals for S 154 remain somewhat consistent as the amount of data being used in the model fitting varies. As to be expected, note that as the forecasting horizon shortens, then the prediction interval also narrows. This is the case, even though we still considered the outlier on April 16th, which was the adjusted Deaths data point, as a legitimate observation. However, any model, especially higher-order models, will have a breakdown point in the sense of yielding seemingly unreasonable predictions, perhaps due to a long forecasting horizon, insufficient amount of data, or wildly changing data points drastically altering estimates which highly impact forecasts. For this 5th-order model, it appears to break down when the data used is on or before DayNum 132. One possible cause appears to be sharp increases and decreases in the observed daily deaths. For instance, on DayNum = 120 the reported daily deaths was 1369, but on the next two days they were 2110 and 2611, and these led to a huge jump in the predicted value for S 154 . Also, from DayNum ranging from 125 to 133, the reported daily deaths were 1317, 1297, 1252, 2144, 2353, 2239, 1510, 1624, and 734, and the predictions were highly unstable and only started to stabilize after DayNum = 132. This is a clear warning on the danger of fitting higher-order models or extrapolating with a long forecasting horizon. As the adage goes, attributed to Niels Bohr [28], with similar versions attributed to Mark Twain, Yogi Berra, and others: It is difficult to make predictions, especially about the future.

Forecasting Fifteen Days Forward
With dire thoughts of the inherent dangers of forecasting over a long horizon, and fully cognizant of the many eventualities (e.g., re-opening of economy; nationwide protests and riots due to police brutality; changing hotspots; adjustments on counts; etc.), which we are not taking into account in order to be purely data-driven, but which could drastically alter trajectories of daily and cumulative deaths, on the basis of the observed data up to July 2, 2020, in which the cumulative deaths 128062, we seek to forecast the cumulative deaths fifteen days forward, which will be July 16, 2020. We should mention that on June 26th, there was an adjustment of 1854 which occurred "following a state review of death certificates and prior outbreaks" in the State of New Jersey [6]. Together with the 3778 adjustment made by the State of New York on April 16th [1], this is the second documented non-trivial adjustment made on the daily deaths counts. We provide two point predictions and prediction regions for the target date: -the first one based on considering the observed Deaths values on April 16th and June 26th as legitimate values, and the second one based on re-allocating the adjustment values on those days proportional to the observed Deaths on the days on or before the day of adjustment. Without additional information, such a proportional re-allocation of the adjustment values appears to be most sensible, though this approach is not immune to criticism. Figure 16 presents the point prediction and the prediction interval for July 16, 2020 based on these two analyses on with a 5th-order model. When the adjustment values are not re-allocated, the predictions and prediction intervals are depicted in the two top plots, whereas when they are re-allocated, they are in the two bottom plots. With no re-allocation, the point prediction is 143272 together with an associated prediction interval of [128062,176957] for the cumulative deaths. With re-allocation, the point prediction is 146055 with an associated prediction interval of [128121,185369]. Observe that without re-allocation, the observed Deaths of 2437 on June 26th fell outside of the prediction interval on that date, while the observed Deaths of 4928 on April 16th barely fell inside the prediction interval on that date. Notice the wider prediction intervals for Deaths when no re-allocations were performed compared to those with re-allocations for the observed DayNum values. Observe also the widening prediction intervals as we go farther away from July 2nd, indicating high uncertainty on what may happen moving forward. Interestingly, the prediction interval for the cumulative deaths on July16th when re-allocations were performed is wider than that without re-allocations, and the point prediction is also tad higher. Ominously, notice that the prediction curve for Deaths appears to be acquiring an increasing trend past July 2nd, in contrast to the decreasing trend from DayNum 120 (April 28th) to 183 (June 30th). It remains to be seen if this is the effect of the lessening of social distancing guidelines, re-opening of business establishments and beaches, people gathering because of the current social unrest, or changing hotspots in the country. Of course, in forecasting settings with new data points accruing frequently -daily in this COVID-19 pandemic -forecasts should be updated as each new data point accrues. We intend to provide a publicly-accessible software applet to enable interested users to update forecasts with the latest updated data.

Concluding Remarks
Motivated by the COVID-19 pandemic, we examined the problem of constructing prediction regions for a Poisson distributed random variable, both under the no-covariate (that is, intercept only) Figure 16: Based on the available data until July 2, 2020, point predictions and prediction intervals of the deaths and cumulative deaths by July 16, 2020 (DayNum = 199) based on an analyses where adjustment values were not re-allocated (top plots) and re-allocated (bottom plots). and with-covariate settings. We compared the performances of the different prediction regions through simulation studies. In the regression setting, we also introduced an over-dispersed Poisson regression model upon observing over-dispersion in the COVID-19 reported Deaths data relative to a purely Poisson regression model. With the ultimate goal of predicting cumulative deaths due to COVID-19 at a future date, we first studied how to construct prediction intervals for the daily deaths data, and then utilized these prediction intervals to construct the prediction interval for the cumulative deaths. The final fitted models involved a 5th-order model in the variable DayNum, and also included the factor variable Day. Based on data until July 2, 2020, prediction and prediction interval for the July 16, 2020 cumulative deaths were obtained. The methodologies developed have the potential to be used in the monitoring of daily and cumulative deaths during epidemics or pandemics through the construction of prediction regions, which could then be used by decision-makers regarding implementation of social distancing/easing guidelines and deciding on the closure/opening of business, educational, government, and other establishments. However, further studies are needed to compare our methodologies to other methods that have been proposed during this pandemic.
The prediction and prediction region procedures we developed also possess limitations. First, in contrast to the Susceptible-Exposed-Infected-Recovered (SEIR) compartment model cf., [4], based on a continuous-time Markov Chain, our model does not posit an upper bound to the number of people that could die, which clearly is not the case. Second, especially since it involves higherorder terms in DayNum, they are highly sensitive to outliers, such as when huge adjustments are made as in the cases for the States of New York and New Jersey [14,6]. It would be desirable to develop procedures that are robust to such non-trivial adjustments, or to procedures that impose constraints on the rate of increase or decrease of the prediction curve via regularization to curb the influential impact of such adjustments, though this will entail developing new theory for the construction of prediction regions. Due to these first two limitations, the proposed methods are not suitable for use in long-horizon forecasting, hence our decision to simply forecast 15 days forward. Third, the procedures are not adaptive in its choice of the prediction model. Possible improvements may occur by choosing the prediction model in a data-dependent manner, but then model choice uncertainty needs to be accounted for in constructing prediction regions. Fourth, there could be an advantage in utilizing other bases functions to transform the variable DayNum, such as by using Laguerre polynomials, Legendre polynomials, trigonometric functions, or even splines or wavelets. These limitations of the proposed methods generate several potential research avenues for further studies. Table 4: Simulated coverage probabilities, mean of the lengths, and standard deviation of the lengths of the prediction intervalsΓ 0 andΓ j , j = 1, 2, 3, 4, 5, for different λ's and n's. For each combination of (n, λ), 10000 replications were performed. For λ = 1.  Table 5: Table 4 continued. For λ = 5.