Intellectualization of the urban and rural bus: The arrival time prediction method

To improve the intelligence of urban and rural buses, it is necessary to realize the accurate prediction of bus arrival time. This paper first introduced urban and rural buses. Then, the arrival time prediction was divided into two parts: road travel time and stop time, and they were predicted by the support vector regression method and k-nearest neighbor (KNN) method. A section of a bus route in Pingdingshan city of Henan province was taken as an example for analysis. The results showed that the method designed in this study had better accuracy, and the result was closer to the actual value, with a maximum error of 84 s, a minimum error of 10 s, an average error of 42.5 s, and an average relative error of 5.74%, which could meet the needs of passengers. The results verify the reliability of the designed method in predicting the arrival time of urban and rural buses, which can be popularized and applied in practice.


Introduction
With the development of economy, China's urbanization process has accelerated [1], the links between cities and towns, towns, and townships have become more and more close, and the number of floating people and the floating scope are also increasing, which also stimulates the increase in transportation demand. Although the current urban and rural bus can meet the unified needs of people to some extent, there is still a gap with the urban bus, such as the imperfect infrastructure, the lack of unified planning, the difficulty in passenger transfer, and the long waiting time. Optimizing urban and rural buses is not only a means to narrow the gap between urban and rural areas but also a very important means to accelerate urbanization and promote rural development. The integration of urban and rural buses is to realize the extension of urban buses to rural areas and the integration of urban buses with rural areas to meet people's travel needs. The urban and rural bus is an important link to connect urban and rural economy, which can solve the chaos of the urban and rural passenger transport market. Moreover, it also plays a good role in solving the problem of urban traffic congestion. As a means of transportation in and out of the city, it has advantages. Urban and rural buses have longer routes than urban buses and have an urgent demand for vehicle information feedback. However, at present, the investment of intelligent facilities in urban and rural buses is significantly insufficient compared with urban buses; therefore, passengers cannot get timely feedback of vehicle arrival time at the bus station, leading to low convenience degree and low passenger satisfaction. Therefore, intelligent means are urgently needed to realize the automatization and informatization of buses [2] to further improve the convenience of buses [3] and passenger satisfaction. How to realize intelligent buses has been extensively discussed by researchers. The prediction of arrival time is an important part of intelligent buses. Currently, studies on the prediction of bus arrival time mainly concentrate on urban buses, and the methods used can be divided into three types. The first type of method is based on the spatiotemporal variation rules, i.e., predicting the future driving time according to historical data based on the spatiotemporal variation rules of bus driving time. Yang et al. [4] proposed a residual neural network-based method, which used two features about bus travel time and headway target section in both forward and reverse directions, and established the temporal-spatial information to predict the bus arrival time. They verified the effectiveness of the method through an experiment on Shenzhen No. 10 route. The second type of method is based on a variety of factors, such as weather, traffic flow, and road length. Based on the weather, working days/holidays, bus number, driver number, and other factors, Zhang [5] established a bus arrival time prediction model, took No. 1 bus in a city as the research subject to carry out the experimental verification, and obtained good prediction results. Sun et al. [6] designed a new system based on the real-time position data of the positioning system, average speeds of different line segments, the historical speed, and the temporal and spatial changes in traffic conditions. The case study on the actual bus found that the system had satisfactory accuracy. By combining the time vector-based method with the space vector-based method, Liu et al. [7] designed a comprehensive model based on long short-term memory and artificial neural network and verified the accuracy of the method through the entity dataset. The third type of method is a data fusion prediction method, which combines a variety of prediction methods to get better results. Currently, studies about the prediction of bus arrival time mainly took the urban bus as the subject and pay little attention to the urban and rural bus, and the prediction accuracy was also low. Based on the characteristics of the urban and rural public bus, this study predicted its arrival time. Aiming at the low prediction accuracy of the arrival time, this study divided the arrival time into two time periods, predicted the arrival time with two models, and carried out experiments on the actual bus operation data to verify the effectiveness of the method. This study makes some contributions to the better service of urban and rural buses. Accurate prediction of arrival time can reduce the waiting time of passengers and facilitate passengers to ride or transfer. It has a positive impact on improving passenger satisfaction and attracting passengers to travel by public transport. It is also the basis of scientific and reasonable dispatching of buses by bus enterprises. Generally, the forecast value of arrival time is displayed by the electronic station board in the station, but the electronic station board has not been widely distributed currently, and the accuracy of the forecast is not high. For urban and rural buses, the factors that affect the arrival time include (1) traffic congestion caused by emergencies; (2) speed slowdown caused by bad weather (fog, snow, etc.); (3) traffic congestion caused by heavy traffic during holidays; and (4) speed slowdown caused by rush hour.
At present, in the arrival time prediction, the most widely used method is the historical data-based prediction. It is assumed that the traffic mode is cyclic, and the parameters required for the prediction are presented in Table 1. Then, the prediction formulas are written as: The historical data-based prediction method is to average all the data of a period of historical operation to predict the arrival time of a bus. This method is simple in principle and easy to operate, but it needs a lot of historical data to support, showing poor real-time performance. The route of urban and rural buses is long, including not only rural roads with little congestion but also congested urban roads, which is very complex; therefore, this method is not suitable for the prediction of urban and rural buses.
Specifically speaking, the arrival time of a bus can be divided into two parts, one is the travel time of the bus on the road, and the other is the stop time of the bus at the station. However, most of the current research does not clearly distinguish them, leading to low accuracy of prediction. This study divided the arrival time prediction of a bus into two parts and calculated them respectively.
The travel time of a bus on the road refers to the time when the bus leaves station i and arrives station j. The time needed by a bus driving from station i to station i + n is denoted as T. The stop time at station i + n − 1 is denoted as t 1+n−1 . The travel time on the road is denoted as T − t i+1 − t i+2 − t i+n−1 . The travel time on the road is predicted by the support vector regression (SVR) method, and the bus stop time is predicted by the k-nearest neighbor (KNN) algorithm.

Travel time prediction model
Support vector machine (SVM) is a method of data classification [8]. SVM can mine rules from limited data and predict the unknown phenomena. Its learning ability and generalization ability are superior to neural networks. SVM can be used for predicting the driving time of road segments. In SVM, the classification hyperplane can be expressed as: where x refers to the sample vector. The class interval can be expressed as: The process of solving the optimal hyperplane is equal to the process of solving Lagrange multiplier a is introduced to establish the Lagrange function. Then, the problem is transformed to: s t ay 0. The time needed by bus k driving from the first station to station j A Stk The time needed by bus k driving from the first station to station j S The number of the current station N The number of the terminal station As SVM is mainly used for data classification, to predict the travel time, it is necessary to use the SVR algorithm [9]. Based on SVM, it is assumed that the input data is x i and the output data is y i . In the M-dimensional space, the classification hyperplane is written as: After the ϕ transformation of the input data, the insensitive loss function is set as: The SVR algorithm needs to find proper f to make E(ε) minimum: where w stands for the linear weight vector and C stands for the penalty factor. The above equation is solved by two relaxation vectors, δ i and δ i ′, then, y a a a a a a a K x y max where a i and a i ′ are the corresponding Lagrange multipliers of δ i and δ i ′ and K x y , i i ( ) is the kernel function. Finally, the output of the SVR algorithm can be written as: i.e., the prediction result of the travel time of the bus on the road.

Stop time prediction model
The bus stop time at the station is affected by the number of people getting on and off the bus and the number of people on the bus when it is fully loaded. If a model is constructed based on the above factors, the support of a large amount of data is needed, and the generalization of the model is not strong. To predict the stop time, several data most similar to the current situation are found from the historical data, and the weighted average value is taken as the prediction result. The KNN algorithm [10] can realize it. The steps of predicting the stop time with the KNN algorithm are as follows.
(1) In the training data, the nearest neighbor data are collected. k nearest-neighbor data were found out. The calculation formula of the distance is: where i refers to the bus number, m refers to the bus number to be predicted, j refers to the station number, and T j i refers to the stop time of bus i at station j.
(2) The weighted average value of k data is calculated:

Case analysis
Xincheng District, Pingdingshan City, Henan, is located in the west of the area and is connected to the old city by urban roads and light rail. The transportation in the area is convenient, and it is the leading area of urban-rural integration. Therefore, the construction of intelligent urban and rural buses plays a positive role in developing the region. In this study, bus No. 16 was taken as an example. Bus No. 16 started from Xiangshan temple and ended at the Zunhuadian government, with 48 stations. The studied road segment was from Xiangshan temple to municipal building, with ten stations in total, as presented in Table 2. Global position system (GPS) operation data of bus No. 16 were obtained from Pingdingshan bus company, and the data from July 1 to July 31, 2020, were collected. A model was established based on the data of the first 30 days, and the data of the last day was used for verification. GPS data included bus line name, direction, date, coordinate, speed, etc. The missing data were supplemented by the adjacent data. The duplicate data were deleted. Then, the data were normalized. Moreover, the bus was surveyed by follow-up, and the data, such as the bus opening time, closing time, the number of passengers getting on and off, arrival time, and departure time, were recorded ( Figure 1). Henan university of urban construction 6 Municipal building (Xiangyun Park) 7 Xiangshun road crossing 8 Municipal approval service center 9 Municipal building (municipal intermediate court) 10 Municipal transportation bureau (China Guangfa Bank) Intellectualization of the urban and rural bus  693 Taking July 1 as an example, the operation data of the bus are presented in Table 3.
The prediction performance of the model was evaluated by the relative error, y x x x = | − ′| , where x is the actual value and x′ is the predicted value. The time of the bus arriving at the next station was predicted using the SVR method and KNN methods designed in this study. Moreover, the same data were also predicted using the historical data-based prediction method. The results are shown in Figure 2. It was seen from Figure 2 that there was a big gap between the results obtained by the historical databased prediction method and the actual situation, and the results of the SVR + KNN method were closer to the actual values. The prediction error was calculated, and the results are presented in Table 4.    1  263  70  2  117  75  3  197  68  4  188  84  5  119  33  6  160  22  7  211  19  8  171  11  9  236  10  10 149 33 After calculation, the maximum error of the results obtained by the historical data-based prediction method was 263 s, the minimum error was 117 s, and the average error was 181.1 s, whereas the maximum error of the results obtained by the SVR + KNN method was 84 s, the minimum error was 10 s, and the average error was 42.5 s, which were significantly smaller than the former. The relative errors of the two methods were calculated, and the results are shown in Figure 3.
It was seen from Figure 3 that the relative error of the SVR + KNN method was significantly smaller. The maximum relative error of the historical data-based prediction method was 48.53%, the minimum relative error was 11.85%, and the average error was 25.97%. However, the maximum relative error of the SVR + KNN method was 11.8%, the minimum error was 1.62%, and the average error was 5.74%. Based on the above results, it was found that the SVR + KNN method could predict the arrival time of urban and rural buses, showing a favorable accuracy.
To further verify the performance of the method, the arrival time of five successive buses was predicted. The road segment was the same as above. The average values of three errors of the prediction based on the historical data and the SVR + KNN method are presented in Table 5.
It was found from Table 5 that the error of the prediction of the arrival time based on the historical data had some fluctuations, the maximum error was 224 s, and the average value was 198 s; the error of the prediction of the arrival time based on the SVR + KNN method kept stable, about 50 s, and the average value was 48 s, which was 150 s less than the former. The results demonstrated that the SVR + KNN method had a good performance in predicting the arrival time of buses.

Discussion
The current research methods about intelligent buses can be roughly divided into three directions: first is for bus enterprises, mainly studying the scheduling, operation, and assessment of buses [12]; second is for government departments, mainly studying the planning of bus routes and the analysis of passenger flow distribution [13]; and the third is for passengers, such as arrival query [14]. The technologies used in arrival query include geographic information system [15], data mining [16], intelligent terminal [17], machine learning [18], etc. Based on the GPS data of buses, Wei and Sun [19] designed a bus scheduling planning model. The lower model coordinated the travel time by transfers between buses and between buses and subways, and the upper model allocated the bus to reduce the operation cost. Zhang et al. [20] studied the prediction of bus passenger flow, adopted the GM (1, 1) model, and verified the correctness of the method through experiments. To help passengers get information about the shortest route, Brata et al. [21] designed an efficient Android application that could guide users to the nearest bus station through the camera based on augmented reality. Suman and Bolia [22] proposed that passengers could be motivated to travel by bus through direct bus services and developed two mathematical formulas to optimize the bus network. Through the experiment on MATLAB, the method was applied to Delhi, India, and they found that the network replanned by the algorithm saved 8% of the travel time. This study mainly analyzed the prediction of the arrival time of urban and rural buses. To further improve the accuracy of prediction, the bus arrival time was divided. In the current research, there are many different methods to divide the bus travel time, for example, divide it into the stop time and travel time, or divide it into deceleration arrival, vehicle parking, and acceleration departure. This study selected the former method and chose different prediction methods based on the characteristics of two periods. The experimental results demonstrated that the SVR + KNN method had a better performance in predicting the arrival time of bus on the same road segment compared to the historical data-based prediction method. The prediction error of the SVR + KNN method was significantly smaller than that of the historical data-based prediction method, and the average errors of the two methods were 181.1 and 42.5 s, respectively, showing a significant difference; the comparison of the relative error in Figure 3 also showed the same result. Finally, from the point of view of the stability of the algorithms, the prediction error of the SVR + KNN method was always controlled at about 50 s, which was superior to that of the historical data-based prediction method.

Conclusion
According to the characteristics of long routes and great prediction difficulty, the arrival time of the urban and rural bus was divided into two sections and predicted by the SVR + KNN method. The experiment was carried out on a section of the route of bus No. 16. The results showed that the prediction errors of the SVR + KNN method were smaller, within 100 s, compared to the historical data-based prediction method; the maximum error, average error, and average relative error of the SVR + KNN method were 84 s, 42.5 s, and 5.74%, respectively. Therefore, the SVR + KNN method can satisfy the demands of passengers and can be promoted and applied in practice.