Trip generation modeling for a selected sector in Baghdad city using the arti ﬁ cial neural network

: This study is planned with the aim of constructing models that can be used to forecast trip production in the Al - Karada region in Baghdad city incorporating the socioeconomic features, through the use of various statistical approaches to the modeling of trip generation, such as arti ﬁ cial neural network ( ANN ) and multiple linear regression ( MLR ) . The research region was split into 11 zones to accomplish the study aim. Forms were issued based on the needed sample size of 1,170. Only 1,050 forms with responses were received, giving a response rate of 89.74% for the research region. The collected data were processed using the ANN technique in MATLAB v20. The same database was utilized to develop the model of multiple linear regression ( MLR ) with the stepwise regression technique in the SPSS v25 software. The results indicate that the model of trip generation is related to family size and composition, gender, students ’ number in the family, workers ’ number in the family, and car ownership. The ANN prediction model is more accurate than the MLR predicted model: the average accuracy ( AA ) was 83.72% in the ANN model but only 72.46% in the MLR model.


Introduction
Urban transport is a multidimensional concept and a part of metropolitan mobility that combines the movement of merchandise, inhabitants, and information between urban regions [1]. Transportation is defined to be the movement of people, goods, and services from one place to another during a desirable condition, whereas transport planning is interested in the development of a plan with respect to the economic, social, and environmental effects of the populace to enhance the positive objectives. The fundamental aim of transportation planning is to accommodate the requirement for mobility in order to provide effective access to different activities that satisfy human needs. These reasons make it important to prepare comprehensive analyses of the regions in order to establish the origins of these journeys and then determine the origin and destination of these journeys in order to create statistical models to forecast the movement of these journeys. Therefore, recent studies have concentrated on the transportation system's appropriateness and robustness, as well as future demand [2]. Trip demand refers to the group of individuals or vehicles that could be expected to move on a given section of a transportation network during a given time span, based on land use, social, economic, and environmental variables. Forecasting of trip demand estimates the number, type, and origin of "trips" (source and target) on the transportation network [3]. The problem of urban transportation is caused by a number of interrelated causes. The high urban population growth and real expansion in urban areas as a result of population redistribution, improved living conditions due to higher incomes, and consequently the increased dependence on private automobiles are just some of these reasons [4]. A well-established transport system's important criterion is the anticipation and availability of transport services [5,6]. In addition, the implementation of comprehensive transportation planning may help minimize the difficulties that could arise in the future and pave the way for a sustainable urban transportation system [7].
The modeling of travel demand plays a basic role in the planning of effective transport networks as it offers useful data on traveler preferences and predicts current and future demand for travel within a rational context. Travel demand is defined as the number of individuals or automobiles that may be expected to utilize a segment of a transport system during a specific amount of time within a variety of different environmental, economic, social, and land usage circumstances. The forecast for travel demand anticipates the number, kind, and source of journeys on a transport network [3]. Numerous studies have been conducted on the relationship between urban transport and passenger travel demand. Research studies have documented the influence of land usage and the built environment on trip production and trip production by activity and intent, journey frequency, selection of location, geographic scale selection, and mode choice [8]. The demand for a metropolitan center in an area relies on the relative accessibility, connectedness, and appropriateness of transportation [6]. In specific, the four-step forecasting method for travel demand, known as the urban transport modeling system, Typically, this method transfers metropolitan activities into many journeys and aims to estimate the relation between metropolitan activities and travel via modeling trip generation, distribution of trips, selection of mode, and assignment of traffic. The first stage in the traffic demand forecasting process is trip generation prediction. The aim of the generation stage of the trips is to recognize the causes that explain the behavior of travel patterns and to create statistical models to synthesize the travel behavior, according to the information of travel survey, land use, and characteristics of the household. Conventionally, trip generation predictions are established independently of any straight consideration of the transportation network. This means that trips produced in or attracted to a zone are a function only of the characteristics of the zone itself and are not specifically a function of the transportation network on which the travels are made [4]. The rising population and geographic development of cities improve urban public transportation accessibility [9]. As public transportation is likely to make metropolitan districts more accessible and sustainable, improving public transportation accessibility may be considered a primary goal of transportation development. A more efficient transportation system lowers transaction costs, allows economies of scale and specialization, extends options, improves commerce, consolidates marketing, and increases society's income and welfare [10].
Hypothetically, developing a model from an empirical dataset is not always achievable. There are various causes for this, including the presence of nonlinearity and/or colinearity between parameters, as well as the fact that sometimes building a model is simply too hard. As a result, the methodology of the artificial neural networks (ANNs) is a critical component of model development, data analysis and assessment, and prediction independent of statistical approaches [11]. Therein, information can be simulated in order to develop an optimal model that explains the phenomenon that can be employed in the majority of cases and circumstances.
The use of neural networks in statistics has the following contributions: • The approach is seen as a modern data analysis pattern in which patterns are not specified directly but rather defined implicitly by the network. • It provides advanced model recognition capabilities.
• It allows for analysis in circumstances wherein conventional methods might be incredibly tedious or nearly difficult to interpret.
Multiple linear regression (MLR) is commonly used in trip generation modeling, and it has specific conditions. Alternative modeling techniques with less stringent conditions are required if the conditions for employing MLR are not met, The ANN method is model free. This is a method that is based on data. ANN does not make strict model suppositions such as normality, linearity, or mutual independence of autonomous variables. This is why ANN might be more advantageous for trip generation modeling. To improve the efficiency of the predicted model, ANN models must be developed in a systematic manner. In this study, the same database utilized to develop the model of multiple regression was utilized to develop a multilayer feed-forward back-propagation ANN model using a neural network tool in the MATLAB v20 software because MATLAB allows nonexperts to develop neural networks.

Literature review
Al-Hasani, developed relationships between daily trips and socioeconomic characteristics for the Al-Karkh region of Baghdad city [12]. The research area was split into 10 zones. The equation of trip generation was obtained using both MLR and cross-classification techniques. According to the findings, the overall trip per person per household is linked to the size of the family and variables of structure, such as the number of people above the age of 6, the number of males, and the number of male workers. It also is related to the availability of cars and the kind of housing unit. For all research regions, a model with a coefficient of determination (R 2 ) of 0.678 was developed [12].
Naser and Mahdi utilized ANNs to model trip generation in their research on the initial stage of transport planning [13]. The research evaluated the effectiveness of ANNs using data obtained for the city of Nasiriyah central business district by calculating all trips (y) generated by this sector. As a consequence, the ANN model outperformed statistical techniques in forecasting trips. The coefficient of determination R 2 for the total number of trips (y) was 0.948 for the ANN and 0.871 for the statistical technique approaches, which is regarded as an acceptable relationship; moreover, the ANN prediction model is more accurate for the prediction process [13].
Peng et al. focused on integrating neural networks that had excellent fitting ability and the genetic algorithm's excellent global searchability with the trip generation prediction model in order to enhance prediction accuracy, and they designed two types of trip generation prediction models based on genetic algorithms and neural networks [14]. Thus, there is much work to be done in this area. In a nutshell, integrating genetic algorithms and neural networks with transportation prediction is one method toward precise and rational traffic planning and design [14].
In recent years, transport planners have increasingly used neural networks to develop transportation models, particularly trip demand models. Over the last decade, a growing amount of literature has examined the efficiency of neural network techniques for estimating and analyzing travel demand [15].
The back-propagation ANN model outperforms the multiple regression model in terms of trip generation prediction as demonstrated by the R 2 value in previous research studies. In this study, the collected data from the field was processed using multilayer feed-forward neural networks by MATLAB v20, and it was more flexible than SPSS because the latter might not provide the required results be due to the type of training algorithm that was used.

Research objectives
Transport network is considered a major infrastructure, and it represents the degree of development in the area. The number of vehicles has increased more than two fold after 2003 in Baghdad according to the Traffic Police Directorate. The land usage in several sectors has changed, and expansion of residential areas is noticeable in other sectors. As a result, performing the daily activities has become a burden that grows with each passing day. The process is intended to create a statistical prediction model that explains the behavior and the relationship of the considered hypothesis. Therefore, the main aim of the research is to determine the pattern of household travel characteristics in the research region and to develop statistical models for a trip generation. The specific objectives are as follows: • Obtain accurate and detailed data on household travel characteristics wherever possible in the Al-Karada region of Baghdad urban area • Develop models for trip generation using the ANN technique and the MLR technique for building the models.

Research methodology
Data collection is critical and one of the most challenging stages in the urban transit process. It requires comprehensive investigation, extensive planning, qualified interviewers, cost, and time, as well as some governmental assistance. The methodology of the research for data collection should be accomplished with a sequence of steps in order to avoid flustering [16]. The following steps are appropriate to be followed:

The study area zoning
To simplify the collection of data in the transportation planning process, the research region is divided into a set of zones that are subdivided to assist in geographically connecting the origin and destination of the trip. Zones within the outermost cordon of the study area are known as internal zones whereas the others are known as external zones [17].

Zone coding
Baghdad city was split into 10 sectors. Each sector was split into a series of zones for the objectives of data collection and surveillance; hence the total number of traffic analysis zones (TAZs) is forty-five. The zoning is based on the governmental partitions of council municipalities.
In this study, the region of the study was divided into 11 zones

Sample size
Data are collected in urban transportation planning studies in either of two methods: through the home interviews method or through the distribution of questionnaire forms method. It would be important to set the necessary sample size for both ways. The required sample size to be interviewed is based on the overall inhabitants living in the research region. Censuses of inhabitants and households are obtained from the research area's municipal council for accuracy. In the specified research region, there are 91,340 residents and the aggregate number of families is 23,296. It is not practical to gather the data from all populations of the research area. As a result, it became essential to calculate the sample size according to the density of the inhabitants of the research region and represents the minimum and optimal values for the size of the sample in Table 2 [19]. Table 2, the appropriate sample size for the inhabitants of the research region is 1 in 20. As a result, the required sample size is as follows:

Household trip generation modeling
To develop a trip generation model that assumes a relationship between the number of trips produced (the dependent variable) and the socioeconomic characteristics (the independent variable) in the region under

Development of MLR model
The MLR models of trip production were developed via the SPSS v25 software. The stepwise method is the best and commonly used to derive simple prediction regression models for each independent variable [19]. The autonomous variable with the greatest F test score is chosen as the initial entry variable. The operation continues if at least one parameter exceeds the criterion. The method considers adding a second independent variable to improve the model. This evaluates all parameters to see which ones have a test of the F-value and which ones match the F-test selected for inter criterion. Either F value test or probability of F value test is used as enter criteria. F is employed as a probability equal to 0.05 in the analysis, and this coincides with an F test value of 3.48. Table 3 shows the descriptive statistics for autonomous variables that were utilized in the development of trip production models. Table 4 summarizes the stepwise regression models with a confidence level of 0.95. In the model of the total trips, it can be noticed that number of students (X 5 ), worker number (X 4 ), number of car owners (X 15 ), family size (X 1 ), family income (X 11 ), and the number of men (X 2 ) are the effective independent variables in the calculation of these trips whereas the parameter of the number of persons 19-24 years (X 8 ) has a negative effect on these trips.

Modeling trip generation using ANNs
The database utilized to develop the MLR model was also utilized to develop a multi-layer feed-forward back-propagation ANN model using a neural network tool in the MATLAB v20 software because MATLAB allows nonexperts to develop neural networks. Dividing the available data into three sets of training, validation, and testing to forecast trips in 1,050 observational data sets, a multilayer feed-forward backpropagation neural network model and sigmoid function were developed. Randomly selected (70%) data were utilized to train the network and the remaining samples (30%) were evenly divided for ANN validation and testing operations (selected 15% as test data and used the other 15% for network validation). To evaluate the validity of the obtained equations for all types of trips (Y), the developed ANN models were utilized to forecast these values based on all training and validation data sets applied as shown in the network diagram in Figure 2. The predicted values for all trips (Y) and the projected values are near to actual values, which indicate that a strong correlation exists between the input and output data, and can also be seen in Figures 3-8, respectively for the data sets.
An artificial neuron collects x1, x2, …, xn input signals, each of which is assigned a weight (Wik) indicating the strength of the connections for all connections, which will vary throughout the training process. The resultant connection weight is multiplied by each input signal. A bias (bk) seems to be a type of connection weight created by applying a constant nonzero value to the sum of the inputs and the resulting weights (Ik). The measure of the combined input is transmitted to a preselected transfer or activation function (T) and generates (Y k ) during the transfer function plotting, as the artificial neuron's outgoing end (k), as defined in equations (1) and (2). where IKthe level of activity of node K; Wikthe weight of the connection among nodes (I) and (k); Xithe input of the node (i), where (i = 0, 1, 2, …, n); bkthe bias to node (k); Y kthe output to node (k); and T(I)the transfer function.
There are several types of transfer functions in the neural network toolbox. In this study, we used the most common function in neural networks, the logistic sigmoid function, due to its differentiability. Equation (3) describes the sigmoid function: where ∅ is a positive scaling constant that controls the steepness of the asymptotic range between 0 and 1. This function builds well-behaved neural networks. The inhibitory and exciting impacts of weight variables are clearly visible with this feature, which also allows for more efficient network training [20,21]. The error percentages of the trip generation for sample data are also shown in Figure 9. Table 5 exhibits the mean squared error (MSE), the root mean squared error (RMSE), and the coefficient of determination (R 2 ) values that were developed in this study to assess the relevance of the ANN model.  Weights were determined and adjusted using network training and the data set was split into training and test information. The weights seized and their performance is summarized in Table 6. Where these weights are the best weights through, which we can get the best and most accurate results.
The forecast of the trip generation model for the Al-Karada region has been made using both MLR and ANN methods. A statistical comparison of the performance of the two approaches was made and is  presented in Table 7. Accordingly, the back-propagation ANN model outperforms the MLR model in terms of trip generation prediction, as demonstrated by the greatest R 2 value, with R 2 for (Y) being 0.851 in MLR as compared with 0.92441 under the ANN technique. In addition, the MSE and RMSE values for the ANN model were lower than for the MLR model. The ANN approach in forecast of the trip production model is better than the MLR approach from the statistical point of view and more accurate, with the average accuracy of

Discussion
The essential independent factors that impact the overall trip generation percentage in the study region are students' number, workers' number, the number of car owners, family size, income of the family, and number of males. A small increase in the monthly household income leads to a slight rise in the average household journeys for MLR. There is an important relationship between the mode of transportation usage and monthly household income. Families that have high monthly income have a propensity to utilize private cars whereas families with low monthly income show a propensity to use public transportation.

Conclusion
• The back-propagation ANN model outperforms the MLR model in terms of trip generation prediction, as demonstrated by the greatest R 2 value, with R 2 for (Y) being 0.851 in MLR as compared with 0.920 under the ANN technique. Thus, it can be regarded as a very good prediction model. • The values of MSE and RMSE for the ANN model were lower than that for the MLR model. • In comparison with the MLR-predicted model, the ANN-predicted model is more accurate. The average accuracy (AA) is 83.72% in the ANN model and 72.46% in the MLR model. • The ANN model's MAPE categorization is superior to that of the MLR model's with the categorization of 83.72% as against 72.46% in the latter. • The ANN method may be utilized for trip production models as it results in less prediction error.
• It is proposed that a zone system for Baghdad be developed and a database for all road networks be created utilizing a computer program such as geographical information system software for creating the optimal zoning system. • ANN can be utilized to develop models for other transport planning stages such as trip distribution, spilt of modal transport, and traffic assignment for the Al-Karada sector in Baghdad city and other cities in Iraq. • Another technique in ANN, namely, fuzzy logic, can also be used in trip production modeling.
Acknowledgments: The authors are grateful to the Civil Engineering Department, College of Engineering, University of Baghdad, for support and help in accomplishing the work contained in this research.

Conflict of interest:
Authors state no conflict of interest.