ANN - based model to predict groundwater salinity: A case study of West Najaf – Kerbala region

: Estimating groundwater salinity is important for the use of groundwater resources for irrigation pur poses and provides a suitable guide for the management of groundwater. In this study, the arti ﬁ cial neural net works ( ANNs ) were adopted to estimate the salinity of groundwater identi ﬁ ed by total dissolved solids ( TDS ) , sodium adsorption ratio ( SAR ) , and sodium ( Na + ) percent, using electrical conductivity, magnesium ( Mg 2 + ) , calcium ( Ca 2 + ) , potassium ( K + ) , and potential of hydrogen ( pH ) as input elements. Samples of groundwater were brought from 51 wells situated in the plateau of Najaf – Kerbala pro -vinces. The network structure was designed as 6 - 4 - 3 and adopted the default scaled conjugate gradient algorithm for training using SPSS V24 software. It was observed that the proposed model with four neurons was exact in esti mating the irrigation salinity. It has shown a suitable agreement between experimental and ANN values of irri gation salinity indices for training and testing datasets based on statistical indicators of the relative mean error and determination coe ﬃ cient R 2 between ANN outputs and experimental data. TDS, SAR, and Na percent predicted output tracked the measured data with an R 2 of 0.96, 0.97, and 0.96 with relative error of 0.038, 0.014, and 0.021, respectively, for testing, and R 2 of 0.95, 0.96, and 0.96 with relative error of 0.053, 0.065, and 0.133, respec -tively, for training. This is an indication that the designed network was satisfactory. The model could be utilized for new data to predict the groundwater salinity for irrigation purposes at the Najaf – Kerbala plateau in Iraq.


Introduction
Recently, drought has been a vital problem globally due to a lack of amounts precipitation. It affected the provision of freshwater sources. To compensate for the shortfall, it seemed to rely on drilling wells. It also became necessary to monitor the quality of the groundwater extracted from these wells. The monitoring process is carried out by taking samples of healthy water and testing them in a laboratory. This is a time-consuming process. Recently, an artificial neural network (ANN) method was used to estimate the quality of the groundwater to save time and effort. ANN is a mathematical technique that relies on model training to predict the quality of water extracted from wells [1].
Many researchers have worked on groundwater quality, where Ghalib et al. [2] predicted the Bahar Al-Najaf basin's water quality by adopting traditional methods. The calculation was based on the main and secondary elements such as calcium (Ca), sodium (Na), potassium (K), magnesium (Mg), chloride (Cl), sulfate (SO 4 ), nitrate (NO 3 ), bicarbonate (HCO 3 ), and bromine (Br), and field measurements include temperature (T), the potential of hydrogen (pH), electrical conductivity (EC), total dissolved solids (TDSs). They found that the water quality was not usable. Banerjee et al. [3] built a model of ANNs to predict salinity at various pumping rates. The quick propagation algorithm was adopted in the feed-forward ANN model. They found that the model is simpler and more accurate than the numerical methods. Khudair et al. [4] developed a model for groundwater quality assessment based on the water quality index. They used four elements, pH, Cl, SO 4 , and TDS, in the statistical program (SPSS) to train ANNs. They produced a neural network that predicted high efficiency. The pH and Cl were the most influential elements. Heidarzadeh [1] used Na as the response variable in ANNs. He trained the ANN with three inputs such as pH, EC, and total hardness (TH) and found that the ANN of a two-layer of the Logsig-Tansig transfer function with four and three neurons in the first and second layers was the best method compared with others. Kheradpisheh et al. [5] developed an ANN in the MATLAB program and used chemical elements, evaporation, water elevation, and discharge as input. After training, the network extracted Cl, EC, NO 3 , and SO 4 . It gave an excellent response to Cl, EC, and NO 3 , but it was not efficient enough to predict SO 4 because they believe in the various water sources or the influence of other factors. Klçaslan et al. [6] designed a model to predict groundwater quality using ANNs remotely. Sodium adsorption ratio (SAR) and SO 4 are the outcome for network training on elements pH, EC, TDS, Cl, and TH with two hidden layers. It gave the network efficiency and accuracy. Sunayana et al. [7] used ANNs to predict the quality of water in a landfill. They used three algorithms to train the model, and the best model was based on prediction. They used elements such as pH, TDS, chlorides Cl − , − NO 3 and − SO 4 2 with time and locations in the input layer, and the output layer had one element, the total hardness. They used analysis sensitivity to find the elements that most affect the total hardness. They drew a map using ArcGIS, showing the prediction of water quality for the study area. Aryafar et al. [8] used several models of artificial intelligence in addition to genetic programming (GP) to determine the water quality of 12 wells in the Khezri plain, eastern Iran. They used GP, ANN, and adaptive neuro-fuzzy inference system (ANFIS) models. They adopted the outputs such as TH, TDS, and EC and the values of chemical elements as inputs to evaluate the models from the statistical analysis. Where model GP gave the best result than ANN and ANFIS. ANN and ANFIS gave satisfactory results. Maroufpoor et al. [9] depended on hybrid artificial intelligence by merging several algorithms, resulting in the following models: ANN, ANN-PSO, NF-GP, and NF-FCM. They compared hybrid models with geostatistical models as inverse distance weighting, radial basis function, and kriging. Through sensitive analysis for the models, they concluded that the NF-GP model is the best to determine the quality of groundwater in wells by calculating EC, SAR, and Cl. Bedi et al. [10] depended on ANN, support vector machines, and extreme gradient boosting to evaluate three levels of pollution in groundwater in 203 wells spreading across the Midwest in the USA through nonlinear relationships. Land use, hydrological data, and water quality were based on concentrations of nitrates and pesticides.
In this study, the application of computer skills will be used instead of traditional methods to be sensitive to prediction errors. The model will give an idea of the sensitivity of each of the elements included in the study. The current study aims to use ANN to develop a model to estimate the value of SAR, Na percent, and TDS, which reflects the extent of groundwater salinity in the selected region. Mg, Ca, K, Na, Mg, EC, and pH are used as inputs in the ANN.

Description of the study area
The study region is situated in the north-western zone of the Euphrates region, which is the western zone of Mesopotamia, as it is between Najaf and Kerbala governorate. It is located geographically on the longitude 31055′-32045′ and latitude 43030′-44030′. It is an almost flat conical plateau with about 2,700 km 2 as in Figure 1. It is bordered to the south by Tar al Al-Najaf, to the northeast by Tar El-Sayyed, to the north via Lake Razzazah, and to the east by quadruple sediments. Its topography elevation is from 13 to 207 m with a mean of 83 m. The elevation decreases from west to east. Shallow valleys cover its surface with flat floors. Its soil is either gravel, gypsum, concrete, or rocky. The study region is stable, and the sedimentary covers range from 7 to 8 km above the basement of rocks, hydrology the Dabdaba.
The region is one of the five layers that make up the plateau, and it is the principal unconfined aquifer in the study region, where it covers an area 1,100 km 2 of the studied area. This aquifer is directly recharged via seasonal rainfalls inside the plateau. The flow direction of the groundwater in the Dabdaba reservoir is toward the Euphrates river (from the southwest to the northeast). The hydraulic grade ranges from 0.0011 to 0.0005, as shown in Figure 2 The General Commission drilled the wells for groundwater, Kerbala Branch. The well water samples were collected by Al-kubaisi et al. [11] in the period April-May, which is considered a wet period, where Cl, NO 3 , SO 4 , HCO 3 , anion, cation, TH, SAR, Na percent, K, Mg, and Ca were measured in the general groundwater authority laboratories. As for temperature, EC, TDS, DO, and pH, it was measured in the site.

ANNs modeling
ANNs are one of the mathematical techniques that is simulator of the human brain in its work. Its work depends on internal connections called neurons. The nodes deal with inputs and outputs to know the relationship between them. Relationships are either linear or nonlinear and with complex interconnectedness. The ANN contains three layers: the input, the hidden, and the output. Each layer consists of a group of nerve nodes that connect these layers (which are called neurons) with a complex network and different weights [1] and [6]. The weights give an impression of the strength and influence of each of the inputs on the outputs [12].
In hydrogeology and water resources, this process has been successfully applied. Usually, the nonlinear relationship is used [11]. There are also many algorithms to train the network. It has a quick convergence solution. Unique learning rates are used for each type of contact weight, which is adjusted dynamically by the training process [3]. The weights of neurons in the ANN are connected between the input and the output layer parameters, which determine the pattern of the relationship between them and the bias and algorithm of the middle layer. The hidden layer is usually responsible for the network training process. The hidden layer can consist of more than one layer. The difficulty of the network is characterized by the number of hidden layers and nervous seats. After entering data in the first layer, it is processed in the next layer as output for the first layer. Finally, it is transferred to the output layer after it is accepted. The process is represented by ref. [13]: where W ji is a weight of hidden layer linking between the neuron node in the input layer (ith) and the hidden layer (jth); ɷ j0 is a hidden neuron bias (jth); f h is an activation algorithm of the hidden neuron; W kj is a weight of output layer linking between the neuron node in the hidden layer (jth) and the output layer (kth); ω k0 is an output neuron bias (kth); f 0 is the activation algorithm for the output neuron; X i is an input parameter in the input layer (ith); y j is an output variable calculated; N N and M N are the numbers of node neurons in the hidden and input layers, respectively. Data were normalized before training and network testing to ensure speedy data processing and reduce error forecasting and convergence during training. As many algorithms and functions were modified, initial data for inputs and outputs were normalized before training and tested by equation (2). Converting the value of the outputs to the actual value after completing the training and normalizing according to equation (3) [12,13].
where X is the original value; X max and X min are maximum and minimum values in the sequence; X is the normalized data.
The work of the ANN is evaluated by calculating the coefficient of determination (R 2 ) and the relative mean error (RME) to determine its accuracy. The cumulative error of the model is accepted through observations and calculations as the lowest value of RME and the highest value of R 2 . Simulation work was evaluated in the calculation of the outputs, training, and testing by goodness fit measurement.
where N is number of observations; P i is the forecast values; O i is the observed data; P and Ō are the mean values for P i and O i , respectively.
In this study, 51 wells in the region Najaf-Kerbala plateau were used for the network model. Around 70% of the wells were approved to train the network, and 30% to test the efficiency of the network based on (214) and (92) readings recorded respectively. The optimal model chosen based on the first step is to determine the type of the transfer function and the number of layers. Input variables keep different. The data (EC, Ca 2+ , Mg 2+ , Na + , K, and pH) will be entered as variable inputs to the network and each well. By comparing the highest value of R 2 and the lowest value of RME with the results of the models, the best network of variables was chosen. After selecting the network, they tested the network for the specific layers and model performance through the different transport functions in the second step, and 30% were approved to determine this step.

Groundwater data
Samples of groundwater were brought from 51 observation wells situated in the plateau of the Najaf-Kerbala provinces, Iraq. The samples were collected by Al-kubaisi et al. [11] during April-May 2018, which is considered as a wet period. The applicability of ANN was examined to estimate the salinity indices TDS, SAR, and Na percent in 51 observation wells in the Najaf-Kerbala plateau. In this study, Table 1 presents the input elements used in the prediction of salinity indices. The variation of the input element is provided in Figure 3:

ANN modeling
ANNs are characterized as systems of linked nodes that can simulate parameters from input data. The structure of the ANN is posed from several neurons joined by links and arranged in layers. A layer is entirely connected to the preceding one through weights. Primarily proposed weights are gradually adjusted through the process of  training via comparing measured data with simulated outputs. The calculation of biases and weights is identified as training. The training process aims are to capture the optimal weights by minimizing the error of the output parameters. At once, functions of learning are utilized to update layer's bias and weight. Then, a data set is used randomly to investigate the generalization of a network, which is known as the test stage [14].
In the present study, a back-propagation algorithm training in the training step was adopted to relate the association between input elements EC, Na + , Ca 2+ , Mg 2+ , K + , and pH and output salinity expressed in TDS, SAR, and Na percent. The configuration of ANNs was achieved via adopting numerous trials till good results were reached. The ANN plot network properties are shown in Figure 4.
The suitability of the proposed ANN model is assessed by adopting the determination coefficient (R 2 ) defined based on output prediction errors and the values of R 2 . The condition of acceptance depends on the quantitative error through the predicted and measured data, including maximum R 2 and minimum RME [15]. The network information is listed in Table 2.

ANNs model results and discussion
The applicability of ANNs was examined to estimate the salinity of groundwater. In ANNs, the choosing of input elements is essential to staging, as it impacts the performance of the prediction process. Variables such as EC, Ca 2+ , Mg 2+ , Na + , K + , and pH were adopted as input elements to the estimation of three salinity indices that    characterizes the groundwater. These indices are TDSs, SAR, and Na percent. The statistical outputs of the ANNs are presented in Table 3. In training and testing steps, the relative error and R 2 values of the estimated TDS were 0.038 and 0.95, SAR were 0.014, and 0.96, Na percent were 0.012 and 0.96, respectively. While these values predicted in testing step for TDS were 0.053 and 0.96, SAR were 0.065 and 0.97, and Na percent were 0.0113 and 0.96, respectively.
The results shown in Table 3 indicate that the performance of the selected ANN model is precise in general. The precision of the proposed model is due to the high R 2 and low relative error values. Also, it can be seen that the four neurons are giving minor error for estimation salinity indices of the groundwater. Figure 5 shows the comparison plots between measured and estimated salinity indices. It can be noticed that the model in the stage of testing is closer, in general, to the agreement fit line.
The variations of predicted and measured data were also shown in Figure 6. As shown in the figure, the modeled data tracked the measured data closely. Therefore, by using simply measurable elements as input elements, ANNs can predict the TDS, SAR, and Na percent in groundwater, precisely.
The comparison between estimated and measured salinity indices of TDS, SAR, and Na percent based on linearity and R 2 shows the reliability of the assessed ANN model.  The significance of the input parameters (independent variables) lies in the measurement of the sensitivity of the model through the estimation of the output elements. In this context, Figure 7 presents that Na exerts the highest effect on the method adopted by the ANN model for the salinity prediction, followed by Mg, EC, Ca, K, and pH.

Conclusion
This study deals with predicting groundwater salinity in the plateau of Najaf-Kerbala provinces using the SPSS V24 software to build the ANN model. The salinity of groundwater in terms of TDS, SAR, and Na% was predicted by using K + , EC, Ca 2+ , Mg 2+ , Na + , and pH as input elements to the model. The proposed ANN model with a structure of 6-4-3 presented a suitable ability to achieve the association between output and input elements. In the testing stage, the predicted output of TDS, SAR, and Na percent followed the measured data with an R 2 of 0.96, 0.97, and 0.96 with relative error of 0.038, 0.014, and 0.021, respectively. While in the training stage, the value of R 2 for TDS, SAR, and Na percent is 0.95, 0.96, and 0.96, respectively, with relative error of 0.053, 0.065, and 0.133, respectively. The high R 2 and smallest values of error estimations for training and testing samples indicate that the designed ANN network was acceptable. The Na, Mg, and EC, the most significant elements, impacted the model estimation. The proposed model could be utilized for entering new inputs to predict the groundwater salinity for irrigation purposes at the Najaf-Kerbala plateau in Iraq.