Developing a model to determine the number of vehicles lane changing on freeways by Brownian motion method


 Lane change maneuvers are essential on car trips. Drivers change lanes to follow the desired route to reach their destination or improve their driving condition or level of service. To change lanes, a driver must consider several factors that affect safety. Due to the lack of appropriate data and consequently the lack of appropriate models to determine the number of lane-changes on the road (as an influential factor in accidents), this study attempts to collect proper data in a new way. Thus, the Qazvin-Karaj freeway was selected as the case study. After installing the imaging cameras and performing the image processing, SPSS and Expert Design statistical software were used to model development. The Brownian motion model was also used to construct the driver change lane model. The results showed that logarithmic model number 2 reported a better coefficient of determination than other models with a value of 0.472. Then models 3 and 9 were ranked with R
 2 of 0.451 and 0.442, respectively. Also, the Expert Design model with R
 2 (0.786) could have a better fit. The value of the response variable (Nch
 + 0.52)0.74 was obtained three-dimensionally against the changes of distance from the front vehicle (Df
 ) and distance from the rear vehicle (Db
 ). Variable values of distance from the front car and distance from the rear vehicle have more effective values on the number of lane changes than left and right distance values. The observed and Brownian data had a slight mean difference (0.018), and also, the standard deviation was so small. Also, the correlation in this data pair is 0.912, which is a suitable value and indicates a slight difference between the outputs of the Brownian model and the observations.


Introduction
Pursuing a car and changing lanes are two unavoidable actions while driving. Many car-following models have been used. Classical lane-changing models mainly focus on driver behavior and gap acceptance of lane selection [1,2]. It can be said that lane changing and vehicle tracking models are the basis of traffic flow theory. With the advent of automatic and semi-automatic vehicles, understanding the accuracy of lane-turning and car-tracking behavior models is critical to the driver's decisions to ensure the safe operation of these vehicles and the surrounding traffic [3,4]. Researchers have studied vehicle tracking for over fifty years, while fewer experiments have been performed on lane-changing behavior. This issue may be because of: 1) lane changing involves moving in two dimensions, and 2) relatively more vehicles (about five vehicles) are involved in a lane-changing event. In contrast, a car following usually involves two cars following one another in the same lane [5]. Oliver and Lam (1965) introduced the first nonlinear model to determine the number of lane changes [6]. This model assumed that the number of shift maneuvers from lane i to lane i + 1 is proportional to the second power of density in lane i multiplied by the difference between density and critical density in lane i + 1. Then there is the Gipps model, which was designed to describe the behavior of cars and trucks entering, passing, and leaving a section of road. To be used in conjunction with a car tracking model [7], which imposes restrictions on a driver's braking, to calculate the safe speed of a rear-wheel-drive vehicle.
To examine drivers' intentions to change lanes, many researchers use various methods such as machine learning classification, such as the Hidden Markov Model (HMM) [8][9][10], Support Vector Machine (SVM) [11,12], Bayesian network [13][14][15], artificial neural network [16,17] and deep neural network [18,19]. Zheng and Hansen developed an HMM-based lane-detection model using vehicle dynamic signals. They reported that the model classification could be covered by 80. 36 for left-hand lane changing and 83. 22 for right-lane changing in a real dataset [10]. Kim and his co-workers have proposed an ANN + SVM model to predict drivers' lane-changing decisions [20]. Vehicle status and road surface status information are enhanced using ANN models, and added information is transmitted to the SVM to detect drivers' decisions accurately.
Hu and his co-workers used the Bayesian and decision tree methods to model lane changes. This model predicts driver decisions about lane integration or non-lane connection. The best results were obtained when both the Bayesian tree and decision tree classifiers were combined in a single effect [21].
Hidas introduced a lane-changing model in the Intelligent Transport System Simulator (SITRAS). This model was developed under heavy traffic conditions to change the imposed and participatory lane. Each component of this model is a complex process [22]. Time gaps are a better indicator of driver behavior compared to spatial gaps. Time gaps are also a function of the spatial gaps and speed of the rear vehicle. In this regard, Bham, in 2009, developed a model of forced lane changing with the model of time gap acceptance. Accepted or rejected gaps between the target vehicle and the hypothetical front vehicle and the target vehicle and the hypothetical rear vehicle in the target lane were analyzed when the front-rear vehicle pairs followed each other [23]. In 2016, Balal and his co-workers developed a fuzzy inference system (ANFIS) that models a driver's binary decision as to whether or not to change lanes on the freeway. This model can be used in the lane-changing recommendation system in smart vehicles [5].
Hetrick also used observers to collect data on vehicles. The lane changing time period was between 3.4 to 13.6 seconds. Younger drivers had shorter time, while older drivers had longer time. The average lane-changing time length was 6 seconds [24]. Lee and co-workers pointed out that the presence of an observer in this research can affect driver behavior and lead to the absence of normal driving behavior [4]. A different approach used by Salvucci and Liu in 2002 uses a driving simulator to evaluate lane-changing behavior. Eleven participants in the experiment were asked to drive on a multi-lane highway equipped with a simulator. Individuals are then asked to report their willingness to lane changing and completing the change. Based on these observations, the average lane-changing period was estimated to be 5.14 with a standard deviation of 0.86 seconds [25].
Brownian motion is the random motion of particles in a liquid due to their collision with atoms or other molecules.
A macroscopic (visible) particle can be considered to map Brownian Motion, which is affected by many microscopic accidents. The Brownian motion is named after Robert Brown, a Scottish botanist who observed pollen that moved randomly in water. Brown described the move in 1827 but could not explain it. This phenomenon of particle transport remained unexplained until 1905. That is until Albert Einstein published an article explaining that water molecules move pollens in a liquid. The mathematical description of Brownian motion is a relatively simple calculation that is important not only in physics and chemistry but also in other statistical phenomena [26,27]. The first person to propose a mathematical model for Brownian motion was Thorvale N. Thiele, who published an article in 1880 that the current model is the Wiener process. Today, mathematical models that describe Brownian motion are used in mathematics, economics, engineering, physics, biology, chemistry, and many other disciplines [28,29].
It isn't easy to distinguish between motion due to Brownian motion and motion due to other effects. In biology, for example, observations must tell whether a species of living thing is moving because it can move (it can move on its own) or because it is moving Brownie [30]. Typically, one can distinguish between processes because Brownian motion appears to be irregular and random. In contrast, real motion is often in one direction or in the form of rotating in a particular direction [31].
Another example of the Brownian movement is illustrated: Imagine a person standing in a straight line and wanting to start walking. To select the path, he uses the method of tossing a coin and determining the next step based on it, then stops to take the next step, repeats the same thing, drops the coin again, and moves wherever the coin says. In many natural phenomena, we encounter such a model, and it is essentially this random behavior that determines the direction and dynamics of the system [32].
Lewis Bachlier (1900) first showed that financial markets follow random step processes. Therefore, standard probability accounts can be used to model financial markets. Stochastic step processes are essentially a Brownian motion in which past changes are independent of variable value changes in the future and past [33]. Brownian Motion has well-behaved mathematical properties, in which a pattern can be estimated with high accuracy and probabilities [34]. Thus, analysts often resort to independent trends, such as Brownian Motion, when analyzing a multidimensional process of unknown origin (such as the stock market). Brownian motion theory and random step patterns have been widely used in financial market modeling.
Given that speculation is modeled, one can use Bachler's extended probabilities, which have continued to be used to this day [35]. Studies by Osborne have shown that the natural logarithm of stock price and monetary value can be influenced by a set of decisions in statistical equilibrium. And this group of prices logarithms (created over time) is very similar to the Motion of a large number of molecules of a substance. The probability distribution function can be calculated using the probability distribution function and the randomly selected stock price at a random time in a steady-state, which is precisely the probability distribution for a particle (molecule) in Brownian Motion [36].
Although the methods used in previous research are different, all of these methods state that lane changes are not instantaneous events. Developing a lane-change model for a multi-lane traffic simulator is a challenging task. There is no clear and appropriate rule to be used by the majority of drivers in the decision to change lanes (there is a need for a random model). Driver behavior in rear and front vehicles usually affects the lane change process. For example, fast drivers change lanes more than slow drivers. As stated, the basis for the formation and simulation of stock price fluctuations with Brownian Motion was experimental. In this way, the rise or fall of stock prices is considered a random move (despite economic stimuli and deterrents). In the present study, due to the random nature of changing lanes [37] of drivers (and, of course, the existence of obstacles and other vehicles), we have tried to simulate changing lanes of vehicles with brown movement. Past studies have generally examined the effect of traffic flow parameters on lane change, such as: [38][39][40][41]. But the lack of previous studies is that perceptible parameters to drivers can be used in models. Therefore, in this study, in addition to using the new method "Brownian Motion" and adapting it to lanechanging, an attempt has been made to develop the model with the simplest parameters (i.e., the distance of each vehicle from the other vehicles and the surrounding obstacles) to fill this gap.

Methodology
According to Figure 1, after reviewing literature studies and realizing the existing shortcomings, statistical models and the Brownian motion model were selected to determine the number of lane changes. By selecting the study area, data collection was done through imaging and processing, and finally, the models were implemented, and the outputs were analyzed and compared.

Statistical analysis and regression model
The model is a symbol of reality, and in situations where due to economic, technical, and other constraints, it is impossible to experience the issues in practice. It is possible to understand how the system behaves. After reviewing previous research on models and variables affecting accidents in this study, the study area was determined, and the required information was collected. Different models were obtained using regression technique and fitting different linear and nonlinear functions on the collected data and their calibration. Finally, by controlling the accuracy and validity of the fitted models through statistical tests, the appropriate model is introduced.
Many attempts were made to find the best transfer function for the dependent variable and combine the independent variables to make the most significant possible connection using SPSS and Expert Design software. In this regard, linear and nonlinear functions were investigated. Finally, after fitting several models to the data and conducting preliminary studies regarding statistical tests, the most appropriate models were selected from dozens of different models, and the results were discussed. In the literature review section, the factors affecting the lane-changing were examined, and in this research, it is decided to examine the spatial and distance parameters as follows: In this section, the proposed models are controlled using the F statistic and the statistical significance. Also, R 2 , adjusted R 2 , and the significant test of regression coefficients and coefficient interpretation values were examined, and finally, the appropriate model was selected. The leastsquares method has been used to estimate the model parameters due to its statistically valuable properties.
(SSE; the sum of squares error, SSY; the sum of squares total and MSE mean squares error)

Brownian motion model
The first goal in this section is to simulate a onedimensional Brownian motion (X t ) t≥0 with a definite initial value of W 0 = ω 0 ϵR. Assuming t k > . . . > t 1 > t 0 = 0, a random step is made starting from ω 0 and a random step from ω 0 to ω 1 represents the value of W t1 and also reaches the value of W t k until ω k . While simulation is a complex process, accurate simulation using interval features is easily achieved using Brownian motion. In the following, how to simulate is discussed. Consider the sequence (z 1 , z 2 , . . . ) of independent, uniformly distributed values of the random variable of distribution N (0,1). Then we define the values (ω 0 , . . . , ω k ) from Eq. (2) recursively. In this case, the value of ω 0 is a definite initial value.
A structure similar to Eq. (3) for a Brownian motion (X t ) t≥0 with deviation α and variance σ 2 is written as Eq. (4). Therefore, the reciprocal relation of Brownian motion is written as follows.
And for the time-dependent deviation α and the timedependent σ 2 are written as Eq. (4).
Given the dependence of α and σ on time, the integral of Eq. (4) may not be efficiently computable. They can be replaced by squaring formulas in the simplest case ∝ , and σ as α(t i−1 ) and σ(t i−1 ) on the interval [t i−1 , t i ] is approximated. In this case, we replace Eq. (4) as follows.

Multi-dimensional Brownian motion simulation
If the d-dimensional Brownian motion has a covariance matrix ∑︀ , then decomposition of ∑︀ = AA t is necessary. In the simplest case, α = 0 (no deviation) and ∑︀ = Id, the random step structure of Eqs (2) and (3) is converted directly to the d-dimensional state. With z i and w i , which are the R d -dimensional (Each z i is constructed with random digits d representing the random variables of the iid N (0, 1) distribution). This is equivalent to the independence of simulating each component of a d-dimensional Brownian motion using a one-dimensional Eq. (3). The generalization of Eq. (3) for the d-dimensional model with the initial vector x 0 follows.
Given z i , x i ∈ R d , calculating the coefficient matrix at each stage is computationally complex. If α or especially ∑︀ are dependent on t (time), the computational costs will increase. Here Eq. (4) is generalized. Given the assumed initial vector x 0 , we have Eq. (8).
In this equation: In other words, the matrix of coefficients (A) in Eq. (8) needs to be calculated as a factorization matrix based on Eq. (9) at each stage. For a standard d-dimensional Brownian motion (X t ) t≥0 assuming no deviation and ∑︀ = Id the Brownie bridge structure can easily be applied to each component independently (same Face as described for the random step structure above). Also, for the d-dimensional Brownian motion (X t ) t≥0 with deviation α and the covariance matrix ∑︀ concerning the Brownian bridge structure still apply the structure (X t ) t≥0 and X t is obtained by X t = αt + AW t assuming ∑︀ = AA t .

Data collection
Four cameras are placed in different directions of the vehicle to collect data. The cameras are connected to the processing center through programming, and by image processing, vehicles and guard-rail around the freeway are recognized (Figure 2). At each moment of detection, the distance from the obstacle (other vehicles and guard-rails) is recorded. In this way, the distances of the desired vehicle are recorded at every moment of movement from left, right, front and rear. This was done on 30 cars, each lasting 40 minutes. Thus, a total of 72,000 rows of raw data were collected.

Case study
The study area is a freeway with a length of 103 km with three lanes in each direction, between Alborz and Qazvin provinces of Iran, and is part of Freeway 2 (Tehran-Tabriz). This freeway is the most accident-prone in the country and is one of the important transportation routes in the country.

Regression models (by SPSS)
Various models were implemented in SPSS software, and 11 of the models whose significance was confirmed are reported in Table 1. The resulting models failed to provide the proper R-square. However, model number 2, which is logarithmic, reported a better coefficient of determination than other models with a value of 0.472. After that, models 3 and 9 were placed with coefficients of determination of 0.451 and 0.442, respectively. Among these, model number 9 can be a better model due to its simplicity in use and the slight difference in the coefficient of determination.

Regression models (by Expert Design)
Due to the low R-square values of the models in the previous section, with the help of Expert Design software, it was tried to examine many more models according to the appropriate capabilities of this software. By entering the data into the hypothetical software, a large number of different models were extracted and compared, and finally, the most appropriate model in Eq. (10) was obtained: (N ch + 0.52) 0.74 = −0.426D f − 0.753D b + 11.03D l (10) The model presented in Eq. (10) has a higher coefficient of determination than the models in Section 4.1; as seen in Table 2, the significance of the model is confirmed (Sig. <0.005). According to the coefficient of determination of 0.786, the quality of the model increased significantly. Also,    Figure 3, a good report of the normality and model fit was obtained. But it is still possible to increase the coefficient of determination by thinking of measures. The value of the Lambda parameter (λ), according to Figure 4, was determined to be 0.74 by its many tests in its optimal state. The constant coefficient (k) was also set at 0.52 to obtain the superior model among the available options. As shown in Figure 5, the response variable value (N ch + 0.52) 0.74 is depicted three-dimensionally against the combined effect of changes in distance from the front (D f ) and distance from the back (D b ). The distance from the front and the distance from the back have more effective values than the values of the distance from the left and right on the number of lane changes. This diagram is also shown as a contour in Figure 6, expressing the effects more clearly. But then, we tried to provide a simpler model in terms of implementation and application and obtain a higher coefficient of determination. By increasing the value of the parameter D f and D b , the response variable decreases to its minimum value. In the other three corners of the graph, where the variable values of the response have large values, there are points where one of the parameters D f or D b has a low value, i.e., the distance from the front or back is much reduced.

Brownian motion model
Due to the low values of R-square, different regression models tried to study how to change the driving lane and check the graph of the recorded data more carefully. Figure 7 shows the distribution of lane-changing data in a part of the path. A great similarity was revealed between this distribution and Brownian motion. For this reason, according to Section 3.3, the Brownian motion model was simulated, and the following outputs were obtained.
The data collected through imaging and distance recording were entered into the Brownian model, and its' output data were compared with the values observed through the paired statistical test. According to Table 3, the observed and Brownian data pairs had a small mean difference (0.018). Also, the standard deviation was so slight. According to Table 4, the correlation is 0.912. It is a good value and indicates a small difference between the outputs of the Brownian model and the observations (this correlation is also shown in Figure 8). This correlation is also due to the meager value of Sig.
The results of the developing model by the Brownian motion method are investigated in Figures 9A-9D. Thus, as shown in Figure 9A (as a single factor), the number of lanes    changes in terms of the distance of the vehicle understudy from the front vehicles. When this distance is short, the number of lane changes is at its peak, and as this distance increases, the number of lane changes also decreases. But the interesting point in the results is when the distance exceeds a specific value (65 meters), the number of lanes changes again increases slightly. Figure 9B shows the distance from the rear car, which has caused more oscillations. It is almost similar to Figure 9A, but with the difference that the return point has reached near 50 meters and the increasing slope of lane-changing after this point has become steeper. Figures 9C and 9D evaluate the distance parameters from left and right. They have almost similar diagrams, which increase the number of lane changes by increasing the distance to 5 or 6 meters and then decreasing. It can be seen that the effect of these two variables is less than the two variables of distance from front and rear. Finally, an attempt was made to find the relationship between traffic density on the desired lane and the number of lane changes made by the vehicle in the same lane per hour. Thus, in Figure 10A   right. In Figure 10C, the number of lanes changes with density increases similar to lane 2 but with a greater slope. Of course, the density in this lane is less than the previous two lanes, which can be a reason to increase the number of lane changes at the end of the chart.

Conclusion
Due to the need to develop appropriate models for determining the number of lanes changing on the road (as an influential factor in accidents), this study tried to collect pertinent data in a new way and provide the best models. Also, for the first time, the Brownian motion model was used and adapted to lane-changing data. Various models were implemented in SPSS software, and 11 models which significance was confirmed. The resulting models failed to provide the proper R-square. However, model number 2, which is logarithmic, reported a better coefficient of determination than other models with a value of 0.472. Then models 3 and 9 were ranked with coefficients of determination of 0.451 and 0.442, respectively. The expert Design software was used to improve the responses, and after reviewing a large number of models, model (10) with a value of R-square (0.786) was able to have a better fit. The response variable value (N ch + 0.52) 0.74 was plotted three-dimensionally against changes in distance from the front (D_f) and distance from the back (D_b). The values of distance from the front and distance from the back have more effective values than the values of distance from left and right on the number of lane changes. Due to the low values of R-square, different regression models tried to study how to change the driving lane and check the graph of the recorded data more carefully. Figure 7 shows the distribution of lane change data in a part of the path; a great similarity was revealed between observed data distribution and Brownian Motion. The data collected through imaging and distance recording were entered into the Brownian model, and its output data were compared with the values observed through the paired statistical test. The observed and Brownian data pairs had a small mean difference (0.018), and also the standard deviation was very small. Also, the correlation in this data pair is 0.912, which is a good value and indicates a slight difference between the outputs of the Brownian model and the observations. This correlation is also due to the very low value of Sig. According to the obtained results, considering the similarity of the nature of Brownian motion and lane change data and the practical confirmation of the accuracy of this claim, further details of the Brownian model can be studied to gain a better understanding of the driver's lane changing. Distance-based models can give users a better view to gain a better understanding of lane changing. Naturally, the driver must change lanes for a reason (in rare cases, the person may change lanes for no reason). Whatever the reasons for the lane change, it eventually occurs in the form of distance from side vehicle or guard-rails. Therefore, recording distances and analyzing them will play an important role in lane changing.

Funding information:
The authors state no funding involved.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

Conflict of interest:
The authors state no conflict of interest.