Predictive analysis of the impact of the time of day on road accidents in Poland

The steady increase in the number of road users and their growing mobility mean that the issue of road safety is still a topical one. Analyses of factors influencing the number of road traffic accidents contribute to the improvement of road safety. Because changes in traffic volume follow a daily rhythm, hour of the day is an important factor affecting the number of crashes. The present article identifies selected mathematical models which can be used to describe the number of road traffic accidents as a function of the time of their occurrence during the day. The study of the seasonality of the number of accidents in particular hours was assessed. The distributions of the number of accidents in each hour were compared using the Kruskal-Wallis and Kolmogorov-Smirnov tests. Multidimensional scaling was used to present the found similarities and differences. Similar hours were grouped into clusters, which were used in further analysis to construct theARMAXmodel and theHolt-Wintersmodel. Finally, the predictive capabilities of each model were assessed.


Introduction
According to the World Health Organization (WHO), around 1.35 million people die in road traffic accidents each year, costing most countries about 3% of their gross domestic product. Road injuries are the leading cause of death in children and young adults aged 5-29 years, with more than half of all fatalities being among pedestrians, cyclists, and motorcyclists [1].
The high mortality rate, and the high cost of road crashes, make road traffic safety an important problem, the various aspects of which are widely discussed in the literature. Researchers try to identify factors that affect the level of road safety, which vary depending on the area studied, a country's road traffic history [2], and road infrastructure and superstructure, etc. Because risk factors differ from one region to the next, analyses should be conducted at a local level, as a basis for more general joint discussions.
The main factors that contribute to the large number of road accidents include traffic volume [3] and inappropriate driver behaviours such as overspeeding, rash driving, non-compliance with traffic rules, carelessness while crossing roads, playing on road, alcohol intake, fatigue, and sleepiness [4,5]. The authors of [6] analysed the problem of crashes from the perspective of the driver's age, noting that risk factors such as lack of experience and skills, and risky driving behaviours were associated with young drivers, while problems such as visual, cognitive and mobility impairments were mainly found in older drivers. In turn, the authors of [7] believe that the main causes of road traffic accidents are associated with the lack of control and inadequate enforcement of road traffic laws (especially speeding, drink-driving, failure to respect the rights of other road users -mainly pedestrians and cyclists -and unsafe road infrastructure. [8] accents the role of tourist attractiveness of a region as a factor that significantly increases the number of road accidents, while [9] mentions key risks such as vehicle overloading, speeding, and drinkdriving. In [10], the following factors affecting accidents were taken into account: the cause of the accident, the genders of the victims, the number and type of vehicles involved in the accident, the time of the accident, the severity of the accident, the type of accident and the age group of the driver(s). In [11], Khan et al. enumerate the following as the main causes of accidents: distractions, different weather conditions, sleep deprivation, unsafe lane changes, nighttime driving. A different approach was proposed in [12], where the severity of accidents and not their numbers was adopted as the dependent variable. The results showed that it was affected by the season, age of the driver, time of day, as well as road type and quality.
In some publications, road accidents are analysed taking into account the clock time of the occurrence of the crash. For example, the authors of [13,14] investigated the impact of the time of the day during which the accident occurred on the driver's drowsiness and the likelihood of the driver falling asleep at the wheel. In [15], the authors calculated the absolute risk and the relative risk of dying in a road accident at specific times of the day. The daily pattern of road accidents, along with other risk factors leading to crashes, was also investigated in [16][17][18]. In [18] the authors performed a spatial-temporal analysis, assessing not only the time of the event, but also the day of the week and the season of the year in which the crashes took place.
Clock time analyses allow to determine the time of the day at which the risk of road traffic accidents is the greatest, making it possible to implement preventive programs focused on this critical time. In view of this, the goal of this present study was to assess the clock-time related risk of road traffic accidents in Poland.
The research hypothesis assumed that the clock hour had a significant impact on the number of accidents. The hypothesis was thus formulated on account of the great interest of various organizations, including nongovernmental and social organizations, dealing not only with road safety, but primarily the impact of the time of day on the accident rate on Polish roads. This interest results from the traditional lifestyle of the inhabitants of Poland. Therefore, the research focused mainly on the analysis of the impact of the time of day on road safety in Poland. Therefore, the objective of the article was to analyse this one variable that has an impact on accidents in road traffic, even though the number of accidents is also affected by other factors. Another argument in favour of this approach was the fact that the time of day is a constant factor for the entire country (it does not change depending on the area, unlike road traffic). It reflects well the way society functions and its lifestyle. Moreover, knowledge about the time is generally available and thus allows easy interpretation of the obtained results as well as inference. The proposed method can easily be applied to other countries or smaller administrative areas (e.g. in provinces or cities).
Multivariate analyses yield good predictive results, however, they require complicated and complex models [4,6,12], which may hinder their correct interpretation and/or require specialized computational programs. Information on some variables influencing the level of road safety is not widely disseminated, and measurement results are difficult to obtain or parameters are not monitored. Therefore, the advantage of the study presented in the article is also the ease and possibility of adapting the presented method in other, similar analyses.
It should also be emphasized that most of the considerations presented in the literature concern the statistical analysis of the variables affecting road accidents. The statistical significance and strength of individual factors are examined, however, on the basis of the obtained results, no mathematical models enabling prediction are constructed, which has been done in the present article.
In the first stage of the study, the seasonality of the number of accidents at particular hours was assessed. The Kruskal-Wallis and Kolmogorov-Smirnov statistical tests were used to compare the distribution of the number of accidents in each hour. The similarities and differences found were presented using multidimensional scaling. On this basis, similar hours were grouped in clusters, which were used in further analysis to construct the ARMAX model and the Holt-Winters model. Finally, each of the proposed models was evaluated, their predictive abilities were compared, and final conclusions were formulated.
The presented study follows the literature trend that postulates the need to constantly review and update the results regarding the impact of individual factors on road hazard [6] and the WHO's 2030 Agenda for Sustainable Development, which aims to halve the global number of road deaths and injuries by 2030 [1].

The problem of road traflc accidents in Poland
Located on the East-West transport route, Poland is a country with a lot of transit transport, which strongly affects the intensity of its road traffic. According to the data of the Polish Border Guard Headquarters, in 2018 (the year of the present study), 12 435 345 vehicles entered Poland through the EU's external borders [19]. This undoubtedly affected the number of road traffic accidents, which was 31 674 in Poland in 2018. As a result of those crashes, 2 862 people died and 37 359 were injured (including 10 941 seriously injured). Compared to the previous year 2017, the number of road traffic accidents was smaller by 1 086 (−3.3%), which is a positive result that follows the downward trend observed in the recent years ( Figure 1). Unfortunately, the reduction in the number of road traffic accidents did not lead to a reduction in the number of fatalities, which has been increasing over the last few years ( Figure 2) The statistics cited above show that the problem of road safety in Poland is topical and requires continuous

Seasonality study
Let {x t } 1≤t≤8760 be a numerical sequence representing the number of accidents in the following hours of each day of 2018. We divide this sequence into 24 subsequences c k = {︀ x 24(j−1)+k+1 }︀ 1≤j≤365 , each of which corresponds to the number of accidents for a specific hour of the day k (let  us assume that k = 0, 1, . . . , 23) [20] e.g. the subsequence c 0 contains data on the number of accidents that took place at zero hour (i.e. between 00:00 and 00:59) in the successive days of 2018. Figure 3 shows box-and-whisker plots for each group corresponding to a clock hour, while Figure 4 shows a chart of the seasonality of accidents as a function of time.
The above figures clearly show that the sequence exhibits seasonal changes, which are analysed statistically below.

Kruskal-Wallis test
The Kruskal-Wallis test was performed to test seasonality (the test was used to compare between groups) [21]. Let {c k } 0≤k≤23 be a set of sequences corresponding to the number of accidents for a given hour of the day k. To examine seasonality at the level of significance α ∈ (0, 1), we formulated the following working hypothesis: H 0 : the distributions of the number of accidents are the same for each hour of the day (i.e. the time of day does not affect the number of accidents), and an alternative hypothesis: H 1 : there are hours during the day for which the distributions of the number of accidents differ significantly. Let n = n 0 + n 1 + . . . + n 23 denote sample size (i.e. the number of elements in set {c k } 0≤k≤23 ), which is divided into 24 disjoint groups of size n 0 = n 1 = · · · = n 23 (group sizes correspond to the respective hours of the day). Each group is randomly selected from a different population. The entire sample (all groups together) is ranked. Let R ij denote the rank in a sample of the j-th element from the i-th group.
The test statistic is given by the formula: where: Test statistic T (1) is a measure of deviation of rank mean of samples (groups) from the mean value of all ranks is equal (n +1)/2. Statistic T has distribution χ 2 with 23 degrees of freedom.
The value of the test statistic is estimated at 4630.58, and p-value is < 2.2 · 10 −16 . This means that the distributions of the number of accidents differ significantly for different hours of the day.

Kolmogorov-Smirnov test
The Kolmogorov-Smirnov test was also used to compare the distributions of the number of accidents for each hour of the day [21,22]. At significance level α ∈ (0, 1), the following working hypothesis was formulated for hours i and j (0 ≤ i, j ≤ 23, i ≠ j): H 0 : the cumulative distribution functions for the number of accidents for hours i and j are identical, The alternative hypothesis was: H 1 : the cumulative distribution functions for the number of accidents for hours i and j are significantly different.
The test statistic is given by the formula: where F i (t) and F j (t) denote cumulative distribution functions for hours i and j, respectively. Statistic D ij has a Kolmogorov distribution. The critical value Kα is determined from the Kolmogorov distribution tables for the significance level α. If √︁ n i +n j n i n j D ij > Kα, where n i and n j denote sample sizes for groups representing hours i and j, the null hypothesis H 0 is rejected in favour of the alternative hypothesis H 1 .

Figure 5: Values of the Kolmogorov-Smirnov test statistic for pairs of hours
The values of the Kolmogorov-Smirnov test statistic on their own do not allow to group similar groups into clusters. To determine clusters, we used the Multidimensional Scaling Method followed by K-means Cluster Analysis.

Multidimensional Scaling
To present the similarities and differences of accident distributions between hours (objects) we employ the multidimensional scaling (MDS) [23,24,26]. It is a statistical technique, which allows us to visualize the similarities of individual groups. All differences between analysed groups are contained in a distance matrix. The Kolmogorov-Smirnov statistic (2) was used as the distance between distributions of accident. The multidimensional scaling tends to locate the objects as points in space, where the similar elements are located close together [23,24].
The multidimensional scaling seeks the points z i ∈ R 2 , 1 ≤ i ≤ n that correspond to objects. By solution of the task: minz 1 ,...,zn S (z1, z 2 , . . . , zn) , we estimate the points corresponding to groups. Objective function: is called a stress function, ‖‖ is an Euclidean norm. Figure 6 presents the location of the groups estimated by applying multidimensional scaling method due to the Kolmogorov-Smirnov distance.
The location of the points (corresponding to hours) with clusters is presented in Figure 6. Each point is marked in XY coordinate systems, where X and Y denote the latent variables, which can be correlated with physical, weatherrelated and other factors.

ARMAX model
The number of road traffic accidents depends on the hour of the day, and hours are directly related to classes (clusters). To determine the dependence between number of accidents, hour of the day and other external factors which cannot be taken into consideration due to the lack of historical data, we consider the ARMAX (p, q) (AutoRegressive and Moving Average with external regressors) model. As a factor which directly influences on the number of incidents we take into account predictors describing membership in specific classes. The predictors are defined as binary variables that describe membership in classes 2 and 3, i.e.
1, t belongs to class 2, 0, t does not belong to class 2 and 1, t belongs to class 3, 0, t does not belong to class 3 We analyse a model given by: where y t = log (xt + 1), 1 ≤ t ≤ 8760, B− backshift operator (for any k ∈ N, B k y t = y t−k ) and {ϵ t } 1≤t≤8760 is a sequence of independent random variables with a normal distribution N(0, σ 2 ). We select from among the set of models (7), the one with the lowest AIC (Akaike Information Criterion) index. The model with lowest AIC is ARMAX(2.2). Table 1 presents the values of the estimators, standard deviations of the estimators, T statistics and probability for the working hypothesis that the value of estimator is equal to zero. Table 1 shows that at significance level 0.01 the working hypothesis should be rejected in favour of the alternative hypothesis. Thus, all estimator values are significantly different from zero. The value of the AIC index is 1325.406 and the value of the estimator σ 2 is 0.266. Figure 7 shows empirical values of the transformation of the number of   Additionally, the Kolmogorov-Smirnov residual normality test was performed. Value of the test statistics D is 0.035 and p-value is 7.46 · 10 −10 . This means that, at the significance level of 0.01, the working hypothesis should be rejected in favour of the alternative hypothesis. We also performed the Wald-Wolfowitz runs test [21]. Because the value of the Z statistic was −2.527, and p-value was 0.012, the working hypothesis regarding the randomness of the residuals should also be rejected at significance level 0.01.
The analysis of the distribution of residuals shows that it is not consistent with the normal distribution. Inclusion of additional predictors in the ARMAX model could improve the goodness of fit.

Holt-Winters method
The Holt-Winters method is a generalization of Brown's exponential smoothing method [25,[27][28][29][30]. It consists in estimating the trend and seasonality in a time series {x t } 0≤t≤N . This method was proposed by Holt (estimation of trend [29]) and Winters (who extended it to the seasonality component [30]).
Consider time series {y t } 1≤t≤8760 with additive seasonality of period p ∈ N given by where ︁ denotes the rest of the division of t by p, whereas y t = log(x t + 1) for 1 ≤ t ≤ 8760. The values of estimators for level, slope and seasonality are determined from the formulas: where 0 ≤ α, β, ≤ 1 stand for smoothing parameters. The forecast (expected value) for 0 < k ≤ p moments ahead was determined based on the observation at mo-   Figure 11 shows goodness of fit of the distribution of residuals ϵ t . Additionally, the Kolmogorov-Smirnov residual normality test was performed. The value of the D statistics was 0.0213 and p-value was 7.034 · 10 −4 . This means that, at the significance level of 0.01, the working hypothesis should be rejected in favour of the alternative hypothesis. We also performed the Wald-Wolfowitz runs test. The value of the Z statistic was −8.385 and the p-value was < 2.2 · 10 −16 and so the working hypothesis regarding the randomness of the residuals should also be rejected at significance level 0.01. The Holt-Winters model which takes into account only the hour of the event also shows that there are other factors besides seasonality that directly affect the number of road traffic accidents, the inclusion of which could improve the quality of the model.
The constructed models are of similar quality (Table 1). The H-W model includes 24 phases (shown in Figure 10), whereas the ARMAX model was created with the use of 3 clusters, which were determined using MDS method. Additionally, the ARMAX model takes into account external factors not related to the time of the event, which may affect the number of accidents (the moving average part in the model). In the H-W model, these factors are included as a residual component.

Conclusion
The number of road traffic accidents is influenced by numerous factors, but it is impossible to understand, monitor and analyse all of them. Often, crashes are caused by many overlapping factors. Nonetheless, the problem of road traffic accidents and their dire consequences is critical and, as such, it is of interest to both scientists and practitioners, who share the same goal of reducing road hazards.
This trend is also reflected in the present article, which analyses the relationship between the number of road traffic crashes and the time of the day. The problem was investigated using the ARMAX and the Holt-Winters models. These models confirmed that the time of the day had a significant impact on the occurrence of road traffic accidents, but the study also showed that there exist other important risk factors that were not included in the models and that the non-inclusion of those factors had a negative effect on the quality of those models. Despite this, the study showed a strong seasonality of road hazards, which is an important finding, especially from the point of view of accident prevention.
The Holt-Winters model showed the occurrence of the hourly seasonality of the number of accidents. The use of the MDS method made it possible to divide the whole day into 3 similar groups. On this basis, the ARMAX model was constructed taking into account the impact of each class (respective times of the day) on the number of accidents.
In accordance with the Polish procedures in force, a medical rescue team, fire brigade as well as the police are dispatched to each accident (road incident involving injured persons). The results of the presented analyses can be applied in planning the readiness of emergency medical teams. For example, for groups of hours with an increased number of accidents, a greater number of ambulances would be on duty, while in the remaining ones they would perform other transport tasks that do not require joining the traffic as an emergency vehicle, such as transporting patients for examination or transporting them from hospital. In addition, the results can be used in scheduling the duty of rescue and fire fighting crews as well as technical rescue vehicle crews (e.g. units in the National Fire and Rescue System) and also police car crews (allowing preventive dispatches to areas with increased accident rates at specific hours). The present study may be viewed as a contribution to developing a comprehensive approach to improving road safety. It is only consistent systemic solutions that can lead to permanent changes and reduction in the number of road accidents. Such initiatives must, however, be preceded by numerous, detailed studies, allowing for the accurate identification and assessment of the impact of all risk factors. One such study is reported in this paper. The method used in this study will be developed in further research by taking into account additional predictors and specifying the areas analysed. This will allow to create databases with information on the causes of road traffic accidents as a component in the development of a comprehensive road safety policy.