Intelligent temporal analysis of coronavirus statistical data

: The coronavirus COVID - 19 is a ﬀ ecting around the world with strong di ﬀ erences between countries and regions. Extensive datasets are available for visual inspec - tion and downloading. The material has limitations for phenomenological modeling but data - based methodolo - gies can be used. This research focuses on the intelligent temporal analysis of datasets in developing compact solu - tions for early detection of levels, trends, episodes, and severity of situations. The methodology has been tested in the analysis of daily new con ﬁ rmed COVID - 19 cases and deaths in six countries. The datasets are studied per million people to get comparable indicators. Nonlinear scaling brings the data of di ﬀ erent countries to the same scale, and the temporal analysis is based on the scaled values. The same approach can be used for any country or a group of people, e.g., hospital patients, patients in inten - sive care, or people in di ﬀ erent age categories. During the pandemic, the scaling functions expanded for the con ﬁ rmed cases but remained practically unchanged for the con ﬁ rmed deaths, which is consistent with increasing testing.


Introduction
The coronavirus COVID-19 is affecting around the world. There are strong differences between countries and regions. People of all ages can be infected, but older people and people with pre-existing medical conditions are more vulnerable to becoming severely ill. The risk is presented with three parameters: An online interactive dashboard is hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University for visualizing and tracking reported cases of coronavirus disease 2019 (COVID-19) in real time [1,2]. Transmission dynamics is difficult to explain since the characteristics of a novel disease include many uncertainties. The open evidence review [3] makes information about active research on modes of transmission available.
The effective reproduction number (R) of an infectious disease is used for modeling. The tracking of the parameter is done by assuming a model structure. An example of this approach is presented in ref. [4], where the Kalman filter and a SIR model have been used for tracking R for COVID-19.
Distributions of the variables provide useful information about fluctuations, trends, and models. This has been used in temporal analysis for all types of measurements, features, and indices. Recursive updates of the parameters are needed in prognostics [5].
Generalized norms are used in data analysis to extract features from waveform signals collected from the statistical databases [6]. The computation of the norms can be divided into the computation of equal sized sub-blocks, i.e., the norm for several samples can be obtained as the norm for the norms of individual samples. This means that norms can be recursively updated [7]. The same methodologies can be used for analyzing the data distributions in less frequent data, e.g., daily COVID-19 data.
The temporal analysis focused on important variables provides useful information, including trends, fluctuations, and anomalies. The fundamental elements are presented geometrically as triangles to describe local temporal patterns originating from qualitative reasoning and simulation [8][9][10]. Replacing the reasoning with calculations based on the scaled values was the main contribution in ref. [11,12].
This research aims to develop unified intelligent temporal analysis methodologies for detecting the fluctuations, trends, and severity of the corona situations. Parametric systems are used to adapt the solution for varying operating conditions caused by local areas and groups of people. Recursive updates are used in the parametric models.

COVID-19 data
This research uses the complete COVID-19 dataset maintained by Our World in Data [13]. The collection of the COVID-19 data is updated daily and includes data on confirmed cases, deaths, hospitalizations, and testing. Raw data on confirmed cases and deaths for all countries are sourced from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. Data visualizations rely on work from many different people and organizations [14].
The Our World in Data has created a new description of all our data sources available at the GitHub repository where all of the data can be downloaded. These datasets were used as a data source in this research. The collection of data is presented as tabular data where every column of a table represents a particular variable, and each row corresponds to a given record of the data set for a specific country on a certain day. Each record consists of one or more fields, separated by commas. The data can be visualized in the COVID-19 DataExplorer for individual countries. Several countries can be compared by selecting them for the view. The maps available in DataExplorer help in focusing on the analysis.
The analysis uses confirmed COVID-19 cases whose number is lower than the number of actual cases. The main reason for this is limited testing, which also varies between countries and time. Therefore, the analysis is done country-wise. The pandemic introduces an increasing number of new COVID-19 cases, but countries also make progress in reducing the speed toward zero new cases ( Figure 1). However, the increase can start again as can be seen in the data of different countries. The pandemic can restart if it is active somewhere. The difficult periods vary between countries.
A part of the pandemic cases leads to hospitalizations and deaths. Both increases and reductions can be seen in the daily new confirmed COVID-19 deaths ( Figure 2). During the outbreak of the pandemic, the calculated case fatality rate (CFR) was a poor measure of the mortality risk since it depends on the number of tests, and at that time, there were few tests. The true number of cases was much higher. Later the number of tests has increased strongly, but not in all countries. An increasing number of variants and mutations has effects on the number of cases. Vaccinations were just started during the studied period. All these have a strong effect on the dynamics of the pandemic. The problems become more case specific but can, in the same time, activate in many locations.
The research focused on the temporal analysis is aimed at finding situations for more detailed modeling and action planning.

Methodologies
The temporal analysis needs to be adapted in the appropriate situations. The unified temporal analysis requires that all the features are on the same scale. In this research, this is done by combining the nonlinear scaling and the intelligent temporal analysis. This methodology allows recursive updates of the scaling functions.

Nonlinear scaling
The nonlinear scaling brings various measurements and features to the same scale by using monotonously increasing scaling functions f [15]. The parameters of the functions are extracted from measurements by using generalized norms and moments. The support area is defined by the minimum and maximum values of the variable, i.e., a specific area for each variable = … j j m , 1, , . The central tendency value, c j , divides the support area into two parts, and the core area is defined by the central tendency values of the lower and the upper part, ( ) c l j and ( ) c h j , correspondingly. This means that the core area of the variable j defined by h j is within the support area. The corner points are defined by iterating the orders, p, of the corresponding generalized norms: where ≠ p 0 is calculated from N values of a sample and τ is the sample time. This provides possibilities to recursively update the scaling functions since the generalized norms can be recursively updated. The iteration is based on the generalized skewness [16]. The scaled values should preserve the directions of the temporal changes with time. To achieve this, the scaling functions should be monotonously increasing. This is achieved by limiting the ratios, in the range ⎡ ⎣ ⎤ ⎦ , 3 1 3 . The corner points are adjusted if these limitations are not filled. There are several alternatives to select the points to tune [17].
The second-order polynomials, are monotonously increasing if the coefficients are defined as follows:

Temporal analysis
Trend analysis produces useful indirect measurements for the early detection of changes. For any variable j, a trend index ( ) I k j T is calculated from the scaled values X j with a linguistic equation: which is based on the means obtained for a short and a long time period, defined by delays n S and n L , respectively. The index value is in the linguistic range [ ] −2, 2 , representing the strength of both decrease and increase of the variable x j [11,12].
An increase is detected if the trend index exceeds a threshold ( ) > + I k ε In the analysis of the COVID data, the high number of cases is harmful. Area D close to [ ] 2, 2 is a dangerous situation, which introduces warnings and alarms. The increase becomes slower in Area A, the unfavorable trend is stopping and turns to decrease in Area B close to [ ] − − 2, 2 . The decrease gets slower in Area C and gradually stops.
The episodes are not sufficient for analyzing the severity of the situation. The level ( ) X k j , which is naturally highly important, is included in a deviation index, In this case, this index has its highest absolute values, when the level ( ) X k j is very high and getting still higher with a fast increasing speed [11]. This can be understood as a third dimension in Figure 3.
The trend analysis is tuned to applications by selecting the time periods n L and n S . The fluctuation indicators calculate the difference of the high and the low values of the measurement as a difference of two moving generalized norms: where the orders ∈ R p h and ∈ R p l are large positive and negative, respectively. The moments are calculated

Data analysis
The analysis was done for daily new confirmed COVID-19 cases and deaths in six countries: Finland, India, Italy. Sweden, the United Kingdom, and the United States. The full dataset with all the countries was downloaded for a selected time period as a csv file by using the DataExplorer. The country-specific lines were then extracted from this file to the arrays. The rolling 7-day average was used for the feasibility study since it operated smoothly for the confirmed deaths as well.
The cases were analyzed per million people to improve the sensitivity of the analysis for small countries. The situations vary strongly between countries and periods of time. The normalization keeps the directions of the effects but would leave the analysis of nonlinear effects to the modeling. The nonlinear scaling approach aims to simplify the modeling work.
Within each country, the risk levels are represented by using nonlinear scaling. The scaling functions are defined by five corner points by generalized norms whose orders are obtained from the data. The real values are scaled to the same range [−2, 2] for all variables in each country.
Trend indices are calculated from the scaled values by using informative short and long time periods. The trend index and its derivative visualize trend episodes. The severity of the situation is evaluated by a deviation index, which combines the trend index, the derivative of it, and the level.
The calculations are done with numerical values, and the results are represented in natural language. for the skewness. In the same time, these are the minimum points of the kurtosis γ 4 (Figure 4). All these orders are much higher than the arithmetic mean, actually fairly close to the standard deviation. The corresponding real values are shown in Figure 5. The support area is [ The resulting scaling function is nonlinear and consists of two second-order polynomials with linear derivatives. The upper part is slightly steeper than the lower part ( Figure 6). A continuous derivative in the center was not required in the calculations. The scaling functions were analyzed separately for the first seven months of the epidemic since the testing was on a much lower level. The feasible area expands considerably after the first periods. Obviously, the true number of cases was not detected earlier.

Temporal analysis
The scaled values X j for different countries are calculated by using appropriate scaling functions (3). The index     For the real-time operation, the analysis should adapt to the changing situations: the scaling functions should be gradually expanded in the beginning periods to react to increased testing and vaccinations and new variants. The orders p should be re-evaluated at least two times during the Autumn. The need for recursive updates may provide indications of the activation of new variants.

Scaling functions
Daily new confirmed COVID-19 deaths are varying strongly ( Figure 2). The analysis was performed for this data in the same way as for the confirmed cases. The corner points are shown in Figure 10. The differences between the beginning part and the whole data are very small. As an example, the analysis and the functions are shown for Finland in Figures 11 and 12. Nonlinear effects are even steeper than for the confirmed cases. For the upper part, the order is much higher: = p 3.7188.

Temporal analysis
The temporal analysis for the confirmed deaths was done in the same way as for the confirmed cases    , which in the range [−2, 2] as well. Everything is done with calculations. The earlier used reasoning is not needed. The case-specific adaptation is done by choosing time windows, thresholds, and some weight factors. These settings are the same for all cases in this research.
The parameters of the scaling functions are based on the time periods, which are in the analysis. Updates are needed if the changes go very small although there should be considerable effects, e.g., the beginning period in Figure 7. Using the parameters obtained for the early days after day 240 would result in upper limit values for the indicators. The need for updating is clearly seen. Also, recursive updates are used in the parametric models. In everyday use, the recursive tuning is needed, and this approach allows it: the norms with the specific orders can   be recursively updated at any time. The norm orders related to the scaling functions are updated less frequently since it requires more calculations. Fluctuations are detected from the difference between two norms related to very high and low order, correspondingly. The time periods, thresholds, and weight factors are selected in the tuning.
In the case studies, the intelligent temporal analysis operates well for the coronavirus statistical data. Example periods of six countries are presented for explaining the operation. The temporal analysis can be done with this approach for any variables that have time series data in the overall dataset.
The confirmed cases and deaths were analyzed per million people to facilitate comparisons between countries. The relative values are still much higher in the United Kingdom, the United States and Italy than in Finland. The US numbers are high already in the beginning period. Sweden has a very high number of cases when compared with the population. India has very low numbers in this respect, but the number of cases started to increase only at the end of the studied period.
The analysis can be done similarly for different subsets. Specific scaling functions can be used in local analysis and for people groups to increase the sensitivity of the temporal analysis. The data material already includes hospital patients and patients in intensive care. The progress in people vaccinations provides material for comparing results. Excess mortality and variations in local areas and groups of people, e.g., defined by age, have effects. In these, more aggregated material is used for analyzing countries and continents. Future development focuses on comparing different subsets and integrating the calculation levels.

Conclusions and future development
Unified intelligent temporal analysis methodologies were detecting the fluctuations, trends, and severity of the corona situations from time series. The analysis was adapted to the problem, and the calculations were done in the same way in all case studies. The methodology operates well for the coronavirus statistical data. The need for the updates is clearly seen, and the solution allows even recursive updates. The temporal analysis detects changes. Modeling is challenging since the driving forces depend on many even contradictory things. This part is here left to future research in more specific cases.

Conflict of interest:
Author states no conflict of interest.