Analysis of infectious disease transmission and prediction through SEIQR epidemic model

: In literature, various mathematical models have been developed to have a better insight into the transmission dynamics and control the spread of infectious diseases. Aiming to explore more about various aspects of infectious diseases, in this work, we propose conceptual mathematical model through a SEIQR (Susceptible-Exposed-Infected-Quarantined-Recovered) mathematical model and its control measurement. We establish the positivity and boundedness of the solutions. We also compute the basic reproduction number and investigate the stability of equilibria for its epidemiological relevance. To validate the model and estimate the parameters to predict the disease spread, we consider the special case for COVID-19 to study the real cases of infected cases from [2] for Russia and India. For better insight, in addition to mathematical model, a history based LSTM model is trained to learn temporal patterns in COVID-19 time series and predict future trends. In the end, the future predictions from mathematical model and the LSTM based model are compared to generate reliable results.


Introduction
In history, various pandemics and epidemics have emerged out several times, which have demolished humanity, sometimes resulting into end of civilizations and tremendous change in the course of history. Mathematical modelling plays an important role in understanding the complexities of such infectious diseases and their control. The initial models of infectious diseases primarily focused on the theoretical disease research. More recently, there have been e orts to broaden the eld still further by incorporating many aspects of complexity of natural systems. Provided this rich history in development of fundamental theory, the applied aspect of the questions being asked, and the extensive data collected on many disease systems, the eld of infectious disease modelling represents one of the richest areas of research at the interface of pure theory and data. Mathematical modelling can be used to study the mechanisms underlying observed epidemiological patterns, assessing the e ectiveness of control strategies, and predicting epidemiological trends. The investigation of dynamics of infectious disease transmission can be reached from various frameworks, such as relatively simple curve tting techniques to standard SIR, SEIR etc. compartmental models to complex stochastic models using simulations. To model the outbreaks of infectious diseases, there are two major components, namely; (i) the basic reproduction number (R , the average number of secondary infections produced by primary infection while coming in contact with the entire susceptible population) and (ii) the generation time (average time from the onset of symptom in primary infectious individual to symptom onset in secondary case). These two components jointly determine the likelihood and speed of the outbreaks of epidemic. Furthermore, all the modelling techniques largely depend on the availability of data for the corresponding infectious disease under investigation, in order to provide the robust estimate the parameters and their distribution. In case of emergence of a new infectious disease, rapid growth of literature on mathematical modelling also plays as a key factor. Mathematical modelling also helps in assessment of potential long-term impacts of such diseases and provides means to evaluate and predict the e ects of possible interventions, even with the availability of limited data. Some typical examples are profound early models for HIV in 1980s, pandemic in uenza, Creutzfeldt-Jakob Disease, outbreak of severe acute respiratory syndrome (SARS) in 2003, Nipah virus etc.
Very recently, the novel coronavirus, also known as COVID-19, 2019-nCOV or SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) has provided the world with an unparallel challenge and is crucially threatening the mankind [1, 2]. The rst case of COVID-19 was identi ed in the Wuhan city, Hubei Province of China on December 31, 2019, and since then at this stage, it has spread widely in 215 countries and territories around the whole world [1, 2] with the infections increasing exponentially. The coronavirus is a zoonotic disease, where primary host was animals, from there it transmitted to humans [3]. The infected people face u-like symptoms, which leads to critical illness, respiratory problems, primarily to those which have other diseases like diabetes, asthma, blood pressure, heart disease etc. [2].
On 30 January 2020, the epidemic of coronavirus disease 2019 (COVID-19) was declared as a Public Health Emergency of International Concern, the highest level in the World Health Organization's (WHO) emergency response for infectious diseases. The number of cases was accelerating in China and subsequently all over the world; however, the publications of the COVID-19's potential transmission and e ectiveness of government interventions were limited. It is urgent to provide more scienti c information for a better understanding of the novel coronavirus and further containment of the outbreak. Till date, more than 9.9 million people are infected with COVID-19 according to WHO and 0.49 million people have died due to the infection. As a preventive measure, various countries have implemented a strict lock-down and their administrations are encouraging and advising people to stay at home and maintain social distancing and proper hygiene. This pandemic has impacted immensely in a negative manner on world politics, education, economy, sociality, and various other global aspects [4,5].
Knowledge of early spread dynamics of infection and guring out the possible control measures is critical to assess the extensive outspread. Besides medical and biological research, it becomes an emergency to formulate a mathematical model which e ectively describes the development and transmission of disease. This will help to make signi cant decisions and preventive measures based on the e ective assumptions provided by the model. This paper is organized as follows. In Section 2, we propose a SEIQR mathematical model to predict the dynamics of infectious disease. In Section 3, the positivity and boundedness of the solutions are described. We also compute the basic reproduction number and investigate the stability of the equilibria. In Section 4, we perform the data analysis as a special case for COVID-19 in comparison with the model solution and also predict the disease dynamics. We train the proposed mathematical model using the data available on [2] to estimate the parameters, as a case for Russia and India. In Section 5, we present the deep learning model to be used for learning past trends and to predict future trends. The experimental results of this model are presented in Section 6. Section 7 presents a comparative evaluation of results of mathematical and temporal analysis based machine learning model. At the end we conclude our paper in Section 8.

Model formulation
In this work we are investigating SEIQR model based on the fact that infected/symptomatic individuals are put into quarantine for treatment (hospital or home remedies), from which they move into recovered class. The SEIQR, an extensively used epidemic model, can re ect the ows of people between ve states: susceptible (S), exposed (E), infectious (I), quarantine (Q) and recovered (R). The total host population is partitioned into susceptibles, exposed (in the latent period), infectious, quarantined and recovered, with the densities, respectively, denoted by S(t), E(t),I(t), Q(t) and R(t). The total population size at time t is denoted by N(t), with N = S + E + I + Q + R.
The model is described by the following system of di erential equations: with non-negative initial conditions; the positive parameters µ is the rate of natural death; α is nonnegative constant and denote the rate of disease caused death. The parameter γ denote the transfer rates between the exposed and the infectious. The constant k is the rate at which the infectious individuals are quarantined; the constant k is the rate at which the quarantine individuals recover; β , β are the rate of the e cient contact in the latent and infected; A are constant recruitment of susceptibles respectively. The initial conditions are S( ) > ; E( ) ≥ ; I( ) ≥ ; Q( ) ≥ and R( ) ≥ . Proof. From (1), we have dS dt ≥ −µS. This implies

Analysis of the system . Non-negativity and boundedness of solution
Similarly we have Integrating both sides of the above inequality and using theory of di erential inequalities, we obtain and letting t → ∞, we have < N ≤ A µ . Thus all solutions of (1) that initiate in {R + \ } are con ned in the region Ω = (S, E, Q, I, R) ∈ R + : N = A µ + ϵ for any ϵ > and for t → ∞. This proves the theorem.

. Computation of Reproduction number
Following the similar approaches as in [6], it can be veri ed that the system (1) always has the disease free equilibrium (DFE) P = (S , , , , ) = ( A µ , , , , ) in the absence of infection. Furthermore, let R = ρ(FV − ) represents the spectral radius of the matrix FV − , then R is given by . (2) The parameter R is referred to as the Basic reproduction number. The number R indicates the transmission of infectious disease, for R > , the outbreak is expected to continue and expected to end for R < . It can be observed from (2) that R signi cantly depends upon the average rate of contact between susceptible and infected, and transfer rate between susceptible and latent individuals. Thus, we can say lockdown/quarantining individuals plays a signi cant role in controlling the spread of infectious disease.

. Stability analysis
Stability analysis of DFE P : Theorem 3.2. If β S < γ + µ, then the disease-free equilibrium point is locally asymptotically stable.
Proof. In order to check the stability of the equilibrium points, we calculate the eigenvalues using Jacobian matrix. The Jacobian at DFE is given by The characteristic roots of the Jacobian matrix J DFE are −µ, −µ, −(k + α + µ) and β S −γ −(k +α + µ) ± √ (k +α +µ+β S −γ −µ) + γ β S . Clearly all ve roots are negative .Thus the disease-free equilibrium point (P ) is locally asymptotically stable. Further to prove the global asymptotic stability of DFE, we use theorem by Castilo-Chavez et.al. [7].

the classes of uninfected and infected individuals respectively.
The DFE is represented by P = (X , ) = ( A µ , ). For global asymptotic stability of P , condition (H1) and (H2) must be satis ed.
If given system (1) satis es (3), then P = (X , ) is a globally asymptotically stable equilibrium of given mathematical model. Proof: Using Theorem 3.3 to our model system (1), we obtain We can observe thatĜ(X, Y) ≥ ∀ (X, Y). Furthermore, the matrix A is M-matrix. Thus the DFE P is globally asymptotically stable.

Existence of Endemic equilibria and its stability:
In addition to DFE P , the model (1) admits a unique endemic equilibrium point P * = (S * , E * , Q * , I * , R * ), where From above expressions, we can observe that P * is positive only when R > . Thus the endemic equilibrium point P * exists whenever R > . Proof. The Jacobian matrix at EE is given by Solving the above for det(J EE − λI) = , we obtain the following characteristic equation where Since roots of the characteristic equation (4) corresponding to Jacobian matrix evaluated at P * are −µ < , −(k + α + µ) < , and remaining roots satisfy the characteristic equation λ + Aλ + Bλ + C = . Applying Routh-Hurwitz criterion, the endemic equilibrium P * is locally asymptotically stable, provided A > , C > and AB − C > . This completes the proof.

Parameter estimation, model validation and prediction
In this section, we perform numerical simulations to support our analytical results to observe the dynamics of the proposed nonlinear system. In order to do so, we rst estimate the parameters by analyzing the present Here, 'fun' is the objective function. In this study, the objective function is N = S + E + I + Q + R; b refers to the initial state of parameters. The initial state of parameters are chosen randomly. The terms lb and ub represent the lower bound and upper bounds of the parameters that are chosen randomly. In the options set, the maximum number of iterations required for attaining global optimum are speci ed.

Case study: Russia
We rst check for the validity of model and predictions for COVID-19 in Russia. The model parameters have been estimated considering the total infectious cases from May 1 to September 30 as provided by [2] using minimization techniques. We have considered the initial population size as on 1 May as S(0) = 140000000; E(0) = 140000; I(0) = 106498; Q(0)=100000; and R(0) = 13220. The parameter values are given in the following Table: Parameter In Figure 1, we provide the time-series analysis of total cases (infected, quarantined and recovered). It is clear from the gure that disease will possibly grow above 1300000 by end of November 2020 in Russia, if all the conditions remain same with a controllable number of infections.

Case Study: India
We further perform rigorous numerical simulation to analyze the scenario of COVID-19 epidemic in India. We rst estimate the model parameters using the total infectious cases from June 27 to September 30 as provided by [2] using minimization techniques. We have considered the initial population size as S(0) = 130000000; E(0) = 100000; I(0) = 508953; Q(0)=100000; and R(0) = 295881. The estimated parameter values are given in the following Table: Parameter In Figure 2, we have presented the best tted curve for the total number of COVID-19 and the same of real cases [2] of the best tted curve and the model predictions. It can be observed from the model predictions that the disease may continue upto even after end of November, if all the conditions remain same and the total infections may reach 11000000.

Predictions using Long short term memory (LSTM) model
Estimation of future COVID-19 cases can be viewed as a time series analysis problem where past trends in the increase of COVID-19 cases are used to predict future trend. The time series to be used for prediction of COVID-19 in future can be formulated as given in Eq. 5. In Eq. 5, letx t+ denote the predicted number of COVID-19 cases at time instant t + . The number of cases are predicted as a function f (.) of the cases at previous time steps. In this Eq. p represents the past time lags under consideration for the study and the terms x t represent actual cases at any time instant t.
Multi-step ahead prediction includes predicting two or more values in future, In this study, they are generated using an iterative method [9] where the predicted value is repeatedly passed back to the model to generate the next step forecast. For examplex t+ can be predicted as: Using the iterative process given in Eq. 6, the multi-step ahead predictionx t+M can be obtained. An M step ahead forecast requires M one-step ahead forecasts. The term M is called prediction horizon. . The nature of the function f(.) in Eq. 5 can be linear or non-linear. There are di erent models in literature for learning this linear or non-linear function for learning patterns in data. Autoregressive integrated moving average (ARIMA) method uses a linear function for prediction [10]. However it is seen in di erent predictive analytic applications such as future resource workload prediction [11] that non-linear models perform better in comparison to linear models. This is due to the fact that most of the real life patterns are non-linear in nature. Neural networks overcome the limitation of ARIMA models and can learn non-linear relationships in past time steps and predicted values [12]. A neural network uses non-linear activation function in its layers which enables to learn non-linear relations. Di erent works in literature have also used hybrid of ARIMA and neural networks for better modelling of patterns in a time series [13,14]. Recently, deep neural networks are well favoured over conventional neural networks as they have shown promising results. Among deep neural networks, recurrent neural network and its variation long short term memory model (LSTM) is very popular.
LSTM has shown its e cacy in dealing with short-term and long-term dependency problems. It has been successfully applied in di erent domains for modelling sequential and temporal dependencies. For example, LSTM is used for estimating future resource usage prediction in cloud data centres in [15,16], resource contention failure identi cation [17], tra c ow prediction [18] and many other use cases. These applications from literature indicate that LSTM is playing signi cant role for modelling temporal and sequential dependencies to predict COVID-19 patterns. The architecture of LSTM model is explained in next subsection.

. Architecture of LSTM model
In this section, we provide basic introduction to use LSTM for time series prediction. LSTM is a variant of recurrent neural network (RNN). RNNs are used for modelling sequential dependencies in data. A basic RNN consists of neurons which are arranged in the form of an array in di erent layers of the RNN. Each neuron in a given layer is connected to every neuron in the next layer. In these networks, the output from the previous time step is passed as input to the next time step which allows the network to retain memory and hence enables the model to learn sequential and temporal dependencies in data [17]. The prediction is computed as:x t+ = W hy h t + by The terms W hy and by represent the weights and bias respectively between output layer and last RNN layer whose output is h t . The network is trained using a mean squared error loss function. LSTM model is better than basic RNN as unlike RNN it does not have vanishing gradient problem. The time series prediction architecture of LSTM is similar to RNN except that a neuron of basic RNN model is replaced by a cell in LSTM. Figure 3 shows the architecture of basic LSTM cell. Each cell has three gates namely, input gate, output gate and the forget gate. The forget gate identi es the information that is to be forgotten by the cell memory. The termc t denotes the vector of new candidate values. The role of the input gate is to decide the values that are to be updated and the output gate controls the ow of information out of the cell. Together these three gates control the ow of information in and out of cell memory enabling it learn long and short term dependencies [17]. The rectangular boxes in Figure 3 represent the layer of neurons with a desired activation function.
Estimated COVID-19 cases can be predicted by analysing its temporal dependencies with other measurable metrics as well. We explain this prediction model in the next subsection.

. Univariate and multivariate modelling of COVID-19 cases
Univariate prediction models analyse temporal patterns in the past history of the number of COVID-19 cases itself to predict future trends. However it is seen in real life scenarios that each decision involves analysing multiple variables together. This is due to the fact that studying multiple related variables together can aid in the understanding of variations in the trend of desired metric to be predicted [19]. Multivariate analysis involves analysing multiple variables together to analyse the relationships between them and their combined e ect on the prediction of desired metric. It is shown in the context of prediction of resource usage and performance in cloud data centres [15,20] and prediction of mortality rates [21] that better prediction capabilities in forecasting new values are achieved by analysing metrics which are related with the desired metric. This indicates that multivariate analysis provides better statistical power to the model and aids in achieving more realistic predictions for the desired metric. Therefore, we analyse both univariate as well as multivariate prediction models. Here the desired metric is COVID-19 cases. To build multivariate prediction model, temporal dependencies in deaths as well as recovered cases are also analysed in addition to the number of COVID-19 cases.
In the next section, we analyse the results of LSTM based model for estimating COVID-19 cases for India and Russia.

Experimental analysis using LSTM model
In this section, we present the results of the experimental studies of COVID-19 cases. Here, we rst analyse the nature of COVID-19 cases in the next subsection which is followed by its modelling and prediction by LSTM models in subsequent sections. The data of COVID-19 cases, deaths and recovered cases in India and Russia is obtained from [2].

. Pre-processing
We analysed the time varying trends of COVID-19 cases in India and Russia. It is seen that time series of COVID-19 cases is non-stationary in nature. This is due to the reason that the series is a monotonically increasing series and hence the statistical properties of the series are not constant. We also experimentally veri ed this fact by using augmented Dickey-Fuller test [22]. Since the series is non-stationary, we use Box-Cox transformation to pre-process the series. Many machine learning and deep learning models are based on the assumption of normality in a time series. Therefore, here Box-Cox transformation is used so that the transformed series is closer to the normal distribution where the mean, mode and median lie at the centre. This pre-processing helps LSTM models to learn the trends in the non-stationary time series better.

. Model speci cations and results
To train our model we have used univariate as well as multivariate data for model training and prediction. To train the LSTM based neural network, we have used tanh(·) activation function. It is seen empirically that for India, a two layered LSTM with 4 and 8 cells in individual layers performs the best. For Russia, a two layered LSTM network with 32 cells in each layer performs the best. The model is trained for the data from May 1 to September 30 for Russia and from June 27 to September 30 for India and then predictions are made using the test results. We have predicted the total COVID-19 cases 60 days ahead for both India and Russia. The obtained prediction results for India are given in Fig. 4(a)-4(b) and for Russia are given in Fig. 5(a)-5(b).

Comparative discussion of results
From the predictions and obtained results in Section 4 and 6, we can observe that the virus infection will increase rapidly with time resulting into a vast infected population and the situation may worsen, if no necessary steps are taken or vaccine is not invented. Thus it is high time to take necessary precautions and preventive measures. People should themselves need to be aware and follow strict precautionary measures. There should be minimum public mobility along with strict hygienic and social distancing norms. Implementing Government restrictions mathematically, we can say that the total cases may peak upto 1300000 for Russia and 11000000 for India by end of November. We can also observe that LSTM is giving more precise predictions. Since, Government is changing its restriction measures and policies from time to time, so the system parameters are also changing after that period of time. Therefore, in the current position we are not interested about infected population for long time of period, but through these estimates, we can say that we need to follow strict policies to reduce the infection.

Conclusion
In this work, we have formulated and proposed a SEIQR epidemic model to study the dynamics of spread of infectious diseases. The proposed model is an attempt to provide a deep insight to analyze the disease spread and design possible control strategies. We have derived various analytical results, such as positivity and boundendess of the solutions, local stability of equilibria, basic reproduction number. There exist two equilibria, namely the disease-free equilibrium (DFE) and the endemic equilibrium (EE). We have estimated the parameters using the real world data as a special case for COVID-19. Our main aim is to predict mathematically the number of infected individual due COVID-19 virus in Russia and India in this study. To ful l our aim, we have performed a detailed numerical simulation of our proposed model through minimization techniques and long short term memory networks. We have tried to establish an acceptable agreement between the data analysis and numerical solutions. Analysing the results of our proposed model, we can say that India may be in a big trouble due to COVID-19 virus in near future. Indian Government should take stricter measures and policies other than lock down, quarantine etc. Since the government policies are changing continuously to protect India, thus parameters are also changing with time. Thus, we can assume that if the Indian Government and all state government may take proper steps time to time and people also follow strict social distancing measures along with proper hygiene, yoga, and etc., then the infected number of population may di er from our predictions with time and India will recover in recent future from this pandemic. At the end, we can summarize and conclude that mathematical modelling is an e cient method to estimate the situation and dynamics of this global pandemic COVID-19, if the parameters are estimated properly. Due to limited data and short time span, we have made certain assumptions while framing model. But with the release of more epidemic data, the key parameters may undergo signi cant changes on account of health, education, religion, per capita income, in uencing the pandemic spread among the masses.