A statistical study of COVID - 19 pandemic in Egypt

: The spread of the COVID - 19 started in Wuhan on December 31, 2019, and a powerful outbreak of the disease occurred there. According to the latest data, more than 165 million cases of COVID - 19 infection have been detected in the world ( last update May 19, 2021 ) . In this paper, we propose a statistical study of COVID - 19 pandemic in Egypt. This study will help us to understand and study the evolution of this pan -demic. Moreover, documenting of accurate data and taken policies in Egypt can help other countries to deal with this epidemic, and it will also be useful in the event that other similar viruses emerge in the future. We will apply a widely used model in order to predict the number of COVID - 19 cases in the coming period, which is the autoregressive integrated moving average ( ARIMA ) model. This model depicts the present behaviour of variables through linear relationship with their past values. The expected results will enable us to provide appropriate advice to decision - makers in Egypt on how to deal with this epidemic.


Introduction
The spread of the COVID-19 started in Wuhan on December 31, 2019, and a powerful outbreak of the disease occurred there. According to the latest data, more than 165 million cases of COVID-19 infection have been detected in the world (last update May 19, 2021). As the epidemic grows, patients with positive diagnosis are only considered. COVID-19 pandemic is new for all people. This pandemic has caused huge losses in the economy and human lives, as well as the occurrence of many social and political changes around the world.
It is known that viruses do not replicate outside the living organism. However, various environmental changes may play a role in its spread or decline, and this role is still mysterious. The main route of transmission of COVID-19 is person-to-person through direct contact with patient or through droplet spread from patient by sneezing or coughing, in addition to touching contaminated surfaces [1][2][3].
To date, little is known about roles of viruses in the environment, although it is believed that it could influence global bio-geo-chemical cycles and drive microbial evolution by natural selection by exchange of the genetic information between hosts [4]. On the other hand, the role of environmental conditions in the outbreaks of viral diseases is not well-recognized yet. However, there is a belief that some environmental parameters such as sunlight, temperature, humidity, and pollution could have direct or indirect effect on the outbreak of some indoor or airborne viruses [5].
The measures taken by the world are unprecedented for more than a century. This is because the emerging virus is very infectious and dangerous, especially for the elderly, and it is not secret that the world's decision-makers are not young in general and have not been able to feel safe with this virus.
Many scientists in various disciplines have gone to their best efforts to limit the spread of this pandemic. In fact, we need medical, pharmacy, and biological studies in the first place; statistics in order to give us some statistical models to be able to predict the infected cases in the coming period; and also researchers to give us information about epidemics throughout history and how our ancestors dealt with it. In addition to sociologists and psychologists, it is also important that geologists and meteorologists participate.
In order to find anti-epidemic drugs for COVID-19, we need to carry out many in-depth studies on immunological modulators and supportive therapies that are a promising path toward developing new strategies to first mitigate the spread of the virus and put an end to the devastating consequences for health throughout the world. There are many clinical trials to produce results guaranteed by many scientific authorities. There are many mathematical and statistical research in this field that provide a clear approach to extracting text and a screen to summarize accurate information in order to assist current clinical trials [6][7][8][9][10].
In this paper, we provide a statistical study of COVID-19 pandemic in Egypt. We try to understand and study the evolution of the virus by performing this study. The main result of this paper is proposing a statistical model using autoregressive integrated moving average (ARIMA) model that enables us to forecast the infected cases in Egypt in the coming period. The expected results evaluated using the proposed model will enable us to provide appropriate advice to decision-makers in Egypt on how to deal with this epidemic in the coming period. In addition, documenting of accurate data can help other countries to deal with the epidemic, and it will also be useful in the event that other similar viruses emerge in the future.

Model description
We will apply a widely used model in order to predict the number of COVID-19 cases in the coming period, which is the ARIMA model [11][12][13][14]. It is also called the Box-Jenkins model. This model depicts the present behaviour of variables through linear relationship with their past values. The ARIMA model consists of two parts [15]. The first part is an integrated component (I) that represents the order of differencing (d) that must be made to the series to achieve stability. The second part of ARIMA model is an ARMA model. The ARMA model also consists of two parts, which are auto regressive (AR) model and moving average (MA) model.
The AR model depicts the correlation within the time series between the current values and some of their past values. The output of AR, with time τ and order parameter p, can be given by formulae (1) and (2).
where z az az 1 .
The MA model represents the duration of the influence of random shocks. The output z τ ( ) of MA, with time τ and weights B q , can be given by formulae (3) and (4).
Combining AR and MA models gives us the output of ARMA model by formula (5).
The most general model is the ARIMA model, in which the differences are made at least once. ARIMA model has many successful examples of prediction for many different fields [16]. The output of ARIMA model can be given by formula (6).
ARIMA model is based on three components p, d, and q for AR, the order of differencing, and MA, respectively.

Model performance measures
Some of the common precision functions for measuring the performance of ARIMA model are given below.

Bayesian information criterion (BIC)
The Schwarz criterion or BIC is a criterion for selecting a model from a limited set of models [17]. This criterion depends on the likelihood function. BIC is closely related to the Akaike information criterion (AIC) [18].
BIC will be used to choose the best model for estimation; the best model has a lowest value of BIC. In this paper, the parameters of ARIMA model were chosen based on the value of BIC, which is given by (7) [19].
where is the observed data, is the number of observations, is the number of free parameters required to estimate, Pr { | } is the likelihood of the parameters, and is the maximized value of the likelihood function.

Autocorrelation function (ACF)
The ACF describes the correlation between z t and z τ − . This function is defined as follows:

Partial autocorrelation function (PACF)
The PACF describes the simple correlation between z t and z τ − minus the part explained by the intervening lags. This function is defined as follows:

Root mean square error (RMSE)
y y RMSE , where y l is the actual value and y l ͠ is the predicted value.

Mean absolute error (MAE)
y y MAE .  Figure 1, and it shows that the series' behaviour is unstable, which indicates its growth. And so the ACF of original series in Figure 2 shows that the series behaviour is unstable, as the ACF for this series do not fade away quickly at high lags. Furthermore, we performed the ADF test in order to test the stationary of this series. The ADF test assumes that the null hypothesis is non-stationary of the time series and that the alternative hypothesis is the stationary of the time series.

Coefficient of determination
The result of ADF test was non-stationary of the original series, in which the p-value was more than 5% (p 0.8143 = ). Therefore, the series needs to be transformed to achieve stationary. After making the first difference for the original series, the graph of the differenced series in Figure 3 shows that the series' behaviour is stationary. And so ACF and PACF of the differenced series in Figures 4 and 5 are characterized by alternate signs of the correlations and fade with increasing the lags. This indicates that the differenced series is stationary and there is no sign of seasonality. Furthermore, the result of ADF test shows that the differenced series are stationary, in which the p-value was less than 5% (p 0.0000 = ).
Now we can choose the appropriate ARIMA model using the BIC criterion, especially after fulfilling the stationarity condition of the series, which achieved at the 1st difference.
By analysing the data using the SPSS 26, we found that the ARIMA (1,1,1) model is the best suitable model, as it has the lowest value of BIC. Therefore, we propose the ARIMA (1,1,1) model for forecasting the COVID-19 cases in the coming period.   Figure 6 displays the residuals of ACF and PACF of the proposed model, and from this figure we notice that the coefficients of ACF and PACF fall within the confidence intervals, which means that the residuals are purely random changes, and hence the proposed model is fit for forecasting the COVID-19 cases in the coming period. The model fit statistics of ARIMA (1,1,1) are given in Table 1.  The model parameters of ARIMA (1,1,1) are given in Table 2. The coefficients of AR, MA, and constant models are 0.887, 0.806, and 5.514, respectively. Therefore, the formula of the proposed model can be given as follows.

Results and discussion
On January 26, 2020, Egypt banned all flights between Egypt and China. The first infection case of COVID-19 in Egypt was recorded in a Chinese person on February 14, 2020, and he was sent to quarantine at the health centre. On March 7, 2020, the Egyptian government imposed various forms of closures for schools, universi-    Figure 1. All data were obtained from the Ministry of Health and Population Egypt [20].
Because of the lack of awareness of the citizens, the number of infections increased until it reached its highest value in June 2020 (1,774 cases on June 19, 2020). Therefore, the Egyptian government took more measures, such as paying a fine, in case of non-compliance with the precautionary measures. In addition to the high temperature that reduces the spread of the virus in the months of July, August, and September 2020, and as a result, the number of infections decreased in this period.
In October 2020, the Egyptian government allowed the students to attend schools and universities in conjunction with low temperatures that increase the spread of the virus, which led to an increase in the number of infections again until it reached its peak on January 1, 2021 (1,418 cases).
Therefore, the Egyptian government canceled the attendance of students in schools and universities and replaced it with distance education. As a result, the number of infections decreased to 509 cases on February 6, 2021.
In this paper, we try to provide assistance to decision-makers in Egypt by proposing the ARIMA (1,1,1) model to predict the number of COVID-19 cases in the coming period. The comparison between the actual and estimated values (using the proposed model) in Table 3 confirms the good compatibility of this model and its ability to predict cases of COVID-19 in the coming period. Table 4 and Figure 7 show the expected confirmed cases using the proposed model from May 20, 2021, to June 31, 2021. As shown in Table 4, the expected results using the proposed model show that the trend of COVID-19 cases will increase during the coming period, as the number of daily cases in Egypt is expected to increase from 1,160 cases on May 19, 2021

Conclusion
In this paper, we presented a statistical study of COVID-19 pandemic in Egypt. An ARIMA (1,1,1) model has been proposed for forecasting the COVID-19 cases in the coming period. The expected results using the proposed model show that the trend of COVID-19 cases is increasing during the coming period, as the number of daily cases in Egypt is expected to increase from 1,160 cases on May 19, 2021, to 1,368 cases on June 31, 2021, with average 1,230 cases. Also, the total number of COVID-19 cases is expected to increase from 249,238 cases on May 19, 2021, to 302,137 cases on June 31, 2021, with average 275,570 cases. Therefore, we strongly advice decision-makers in Egypt to take the necessary measures towards reducing the spread of COVID-19 pandemic, such as obligating citizens to take precautionary measures, imposing more restrictions on places of high-density gatherings, isolating infected cases, and other measures recommended by the World Health Organization.
The future work of this research is a statistical study of COVID-19 pandemic in some Middle Eastern countries and compare the results between them to find out the most influencing factors on the spread of this epidemic. In addition, other models and techniques will be used to obtain accurate data to predict the infected cases of COVID-19.