Abstract
The natural streamflow of the River is encouraged to forecast through multiple methods. The impartiality of this study is the comparison of the forecast accuracy rates of the timeseries (TS) hybrid model with the conventional model. The behavior of the natural monthly statistical chaotic streamflow to use in the forecasting models has been compiled by projecting two distinguished rivers, the Indus and Chenab of Pakistan. Therefore, this article is based on the monthly streamflow forecast analysis that has been reported using the group method of data handling with wavelet decomposition (WGMDH) as a new forecasting attribute. Discrete wavelets decompose the perceived data into subseries and forecast hydrological variables; these fittingly have been endorsed as inputs in the hybrid model. The forecast efficiency and estimations of the hybrid model are measured by the appropriate statistical techniques such as mean absolute error (RME), root mean square error (RMSE), and correlation coefficients (R) and compared to the group method of data handling (GMDH), leastsquare support vector machine and artificial neural network conventional models. The comparative analysis shows that the hybrid WGMDH model is more stable and more potent for forecasting river flow than other predictive models and significantly proved that the hybrid model is a robust alternate forecasting tool for TS data sets.
1 Introduction
Nonlinear partial differential equations and nonlinear ordinary differential equations are the most considerable topics to analyze the wave phenomena which play great importance in engineering and science such as mathematical biology optical fibers, nonlinear optics, neural physics, solid state physics, chaos, hydrodynamics, quantum mechanics, fluid mechanics, and other numerous areas. The diffusion, dispersion, convection, reaction, and dissipation phenomena are extremely important in wave equations. Due to the complexity of these equations, there are some methods to find the solutions of wave equations [1,2,3,4,5,6,7,8,9,10].
Hydrological forecasting has always been a big deal for researchers and water management societies. The impact of hydrological changes is multiple rivers and waterrelated stakeholders, requiring informative observations to understand the hydrological forecast effects. Further, reliable and essential information is beneficial for the projections of streamflow, and these predictions improve the management efficiency regarding river disasters or behaviour such as floods, availability of future water resources, and so on, and is helpful for timely decisions [11]. For that multifaceted several hydrological datadriven statistical forecasting models of shortterm (daily or hourly) and longterm (monthly, seasonally, or yearly) have been developed [12,13]. Stakeholders and water management societies are currently using these forecasting datadriven statistical models and continually striving to improve their confidence levels in hydrological models using mathematical and modern technological attributes [14,15].
The group method of data handling (GMDH) algorithm offers automated structural and technicality to the time series forecasting models. The GMDH application to innovate mathematically forecasting models of multiparametric data sets is tremendous, and the model has performed more optimally than the conventional forecasting models such as AutoRegressive Integrated Moving Average, Least Square Support Vector Machine (LSSVM), Artificial Neural Network (ANN), and Multivariate Adaptive Regression Splines [16]. The GMDH forecasting algorithm is used in different engineering fields [17]. For at least two decades, various combinations of the GMDH algorithms based on the secondorder polynomial and different selfselection thresholds with optimizer algorithms are still being used instead of numerous traditional forecasting time series models in various fields [18,19,20,21]. An algorithm GMDH network describes higherorder polynomials that could estimate the complexity and linear dependency present in regression polynomials as there have two examples of network algorithms used in nonlinear polynomials that are hybrid and multilayer GMDH algorithms. The initial structure of the GMDH algorithm has two factors of the quadratic functions comprising the various layers, and every layer contains a series of inputs, which necessitate a specific contribution from the last layer. The GMDH algorithm is based on quadratic expressions and recommended a transfer function (TF) such as polynomial function, radial basis function, hyperbolic tangent function, and sigmoid function to communicate the network between input and output data sets model as used in the Volterra Functional Series [22]. The forecast accuracy estimation issues exercise the robust prediction concerning time series (TS) by artificial intelligence (AI) algorithms [11,23]. Generally, wavelet decomposition (WD) is widely applied for signal processing applications as a mathematical tool; it can transform the signal (hidden patterns in the data mass) into multiple lower resolution levels by administering the single mother wavelet functions scaling and shifting components [24,25]. Traditional neural networks have a standard criterion for accurately estimating nonlinear polynomials that can restrict. In contrast, WD has an excellent capability to exhibit the functions simultaneously, and their local characteristics reveal in the TS inputs by the hydrological process [26]. The usage of factors is the facilitator for neural networks training accuracy of nonlinear modelling signals [27].
A new hybrid model has been developed in this research work by combining the GMDH algorithm and WD technique for the hydrological forecast, and the model is used for forecasting the Rivers streamflow. The time series streamflow inputs are divided into two data sets to obtain both training and testing outcomes for the forecast [28]. Crossvalidation has been used for the extracted estimates, and the findings declared that the accuracy of the hybrid group method and data handling and wavelet decomposition (WGMDH) model is more effective than GMDH, LSSVM, and ANN models [29]. Furthermore, the dimension reduction carried out by preprocessing and WD is used and observed to increase accuracy estimation and significant differences in the conventional models [30].
The findings of the research work may help to take adequate measures based on predictive analysis using the proposed WGMDH model. Furthermore, the findings may show an alternative design of the forecast model that could be more versatile and robust than the traditional GMDH model and may establish greater precision and unique analytical potential than a few traditional methods.
2 Methods and materials
This section followed the research methods and materials addressed in each section stated in this research study. The tractable proposed combined model is implemented as an efficient forecasting tool on river streamflow data sets that obtain comprehensive informational signals. The implementation of WD in numerous conventional forecasting models increases the performance of the models and achieves exceptional outcomes. The developed coupled model with the wavelet and AI algorithms fulfils the described twostep protocol for forecasting projects.
The WD mastery has been used as a preprocessor of input data sets. It analyses TS signals at different intervals, providing inputs and significant detail regarding data sets.
Subsequently, searching the appropriate input signal by WD and used for further manipulations as AI input is used in numerous regular forecasting models.
Initially forecasted TS inputs are decomposed into a subTS {W _{1}, W _{2},…, W _{ p },C _{ p }} using the WD algorithm, where, W _{1}, W _{2}, …, W _{ p } and C _{ p } are described as detailed and contextual TS inputs, respectively. Usually, the forecast time series and each subTS have different roles and behaviour. Therefore, the attribute and influence on inputs of the forecasting TS are different from each other. The developed GMDH, LSSVM, and ANN forecast models t are the input subTS of the models, and the forecasting TS inputs at t + T (T is the forecast measurement time) time have the output of models. Finally, algorithms constructed the wavelet network models (WNM) and discovered several algorithms on different weights [31]. The fundamental object of forecasting TS data sets to the WNM is forming the combined models with the WD algorithm [32].
2.1 The WD
The WD algorithm combines the traditional AI forecasting algorithms developed in past periods. A novel methodology to innovate the TS forecast model and the WD algorithm has emerged as an ever more useful and vital tool for the models [33]. The WD established a predicting method for TS, which presented to develop several decomposition goals that utilize signals for repetitive trous wavelet transformation and benefit from being a shiftinvariant. Further, the WD algorithm can denoise nonstationary signals into subsignals at different levels with a relevant resource for improved streamflow elucidation [13,27]. The mathematical structure of the WD is described in Eq. (1) as a continuous TS x(t) (∵t ∈ [−∞, ∞]).
where t is the time, τ is the time step that iterates the window function, and s ∈ [0, ∞] for wavelet scale defined, respectively. As ψ(t) is known as the mother wavelet and expressed in Eq. (2),
whereas the continuous wavelet transform (CWT) illustrated in Eq. (3) [28]
where
where m, n ∈ Z switching as the scale and length of the TS, respectively, s _{0} > 1 is defined for fixed dilation step, and τ _{0} > 1 is the position parameter. The parameters s _{0} = 2 and τ _{0} = 1 are the most common preference. For a discrete TS x(t) that occurs at t discrete time, then WD formation is defined as follows:
In Eq. (5), x(t) is defined as discrete TS(∵1 ≤ t ≤ N − 1) on the power of 2 (N = 2^{ m }), and the wavelet coefficient of the WD (W _{ m,n }) is described on s = 2^{ m } and τ = 2^{ m } n scales, where n is the timetranslation component that varies within the ranges 0 < n < 2M − m and 1 < m < M, and M is the estimation of decomposition levels, which is interpreted the TS into numerous wavelet input segments on the particular decomposition level.
Conclusively, the desired decomposition level of the time series in WD plays a significant factor in retaining figures and decreasing the time series curvature. Though, there has no such hypothesis regarding the number of decomposition levels for data sets. Formation (6) is used to determine the number of decomposition levels [34].
According to Mallat's theory, the discrete TS x(t) decomposed into a new series of linearity independent estimate and detail signals through the inverse WD, which is described in Eq. (7) [13,35];
or defined as
Eq. (8) A _{ M }(t) is the residual term at M decomposition level and D _{ m }(t) has detailed subseries (∵m = 1,2, …, M) that are taken as small features in interpreted value data. The following various forms of TFs have emerged and are vastly used in the model [36]; the efficiency of the conventional GMDH models can be improved.
Three different TF types (9), the sigmoid, radial basis, and hyperbolic tangent functions have been proposed and improved the conventional GMDH model's performance and proved the numerous substitutes into a single network. In conventional GMDH models, most researchers have measured the partial quadratic polynomials (PQP) as the TF and demonstrated that utilizing various TF exclusively in a model gives preferred outcomes rather than using the units for a single TF [37,38,39]. Nevertheless, multiple TF (9) are accessible for the modified GMDH model, and it has provided more reliable and accurate outcomes than the single TF models.
2.2 GMDH
The GMDH model has been established to display the complex nonlinear framework by utilizing a TF to communicate the connection between input and output data sets. The Volterra linear function (10) describes the input and output mapping with a neural network like the GMDH model.
The mathematical structure (11) comprising the two order variables (neurons) is expressed in the following form:
In mathematical structure (11) have six coefficients {b _{ o }, b _{1}, b _{2}, b _{3}, b _{4}, b _{5}} in the g function, and these are unknowns [37]. Let us consider the coefficients are determined to expend the least square method. The system's observed input and forecast (output) variables are set for y and z, respectively.
The following are the iterative steps of the MGMDH model structure
Step I: Initially considered M _{1} = m as the neurons of the input vector in the first layer Y = (y _{ i }, y _{ i }, …, y _{ n }), whereas m is the number of inputs and taken j = 1 as a threshold.
Step II: Set
Step III: The aggregate number of partial descriptions (PDs) at the current layer has equivalent to the following combination:
Eq. (12) n is the number of nodes that have been chosen input variables from the nodes of the previous layer. The sort of PDs in each layer of the network is similar.
Step IV: Estimated the coefficients of the TF by the least squared (MSE) in the form of
where B = {b _{0}, b _{1}, …, b _{5}} has the input vector of the coefficients of the variables and Y = {y _{1}, y _{2}, …, y _{ M }}^{ T } is the output value from the following observation (14)
Step V: Distinguish the absolute best neuron out of T neurons. If the tiniest estimation by RMSE is less than the threshold, then the input will be terminated, or new input variables are introduced {y _{1}, y _{2}, …, y _{ n }, y′} (since y′ specified for the RMSE estimate) and T = T + 1 will be set; otherwise, the GMDH model Steps II and III will be repeated.
Step VI: The termination standard for the iteration of the model is as follows:
The lowest estimation at the individual layer acquired this iteration is associated with the lowest values found at the previous level; the lowest value is compared to the smallest assessments using the GMDHconstructed layer model. Unless the improvement is accomplished, till that repeats and rehashes steps II–IV. The last layer estimated the one node classified by the best execution, selected as the output node, and rejected the rest of the layer nodes. Finally, the GMDH model forecasting estimation is acquired, and then, the iteration is terminated and acknowledged by the network.
2.3 The WGMDH model
The methodology for a hybrid forecast model using the GMDH model combined with WD is anticipating future data sets by utilizing wavelet capability as a timefrequency restriction. Consequently, the best possible data sets of the GMDH algorithm have been developed through decomposition with a few good contrast strategies in this forecasting algorithm. The trous WD lets a significant translation of the filter outputs. The Atrous wavelet has been shown and begins with performing successive convolutions with the discrete lowpass channel.
whereas D _{0}(t) = Y(I) has the original time series. The development of separations between instances has priority and 2^{ i } I illustrated that a trous name is synonymous with this algorithm, and the lowpass filter is a spline. This expression has a point symmetric; it has also helped to minimize; however, lastly, it does not consider how time is essentially asymmetric. The smoothed structure and contrast of progressive variants enable the w _{ i } wavelet coefficients to be accomplished by the detail signal.
Eq. (16) is contained detailed signal or wavelet coefficients with a small detention feature of interpretational in the data value. The wavelet coefficients are iterated into the GMDH and analyzed as inputs. The WGMDH model ability was carried out by choosing the right inputs to overcome by the hybrid algorithm for the appropriate estimations.
2.4 Statistical performance
The use of statistical estimations is crucial for the reliability and credibility of any forecasting algorithm. Therefore, the estimations of the forecasting algorithms have been investigated through three statistical methods, such as the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), and the correlation coefficients (R). The statistical estimation outcomes fitted to the best fit line and the forecast performances estimated by the following indices [38]
where
The normalization and performance of the data set are used in the training and testing process and observed the estimations to select the best model with the least values of MAE and RMSE and the highest value of R (effect of flow).
3 Study areas and data utilization
The monthly TS streamflow data have been collected from two prominent rivers of the Islamic Republic of Pakistan, the Indus River, and the Chenab River, to endorse the developed forecast WMGMDH model. Both rivers are playing significant contributions to the region. These two rivers have made the forecast approximation of streamflow monthly records (Figure 1).
3.1 Indus River
The Indus River, locally called Sindhu discovered by the Scylax of Caryanda, the Greek advisor of the King Darius of Persia in 515 BC, and the early classical age information of the river had spread out in the West. The Indus River is one of the largest rivers in Pakistan and the Asian subcontinent and plays a significant role in the agricultural production industry. This river flows by Western Tibet, China, and India along with the full length of Pakistan to the south direction (via GilgitBaltistan) and falls into the Arabian Sea near the port city of Karachi in Sindh.
The Indus River streamflow 484 months rate has been collected from the Department of Irrigation and Drainage, Sindh, from September 1978 to December 2018 (Figures 2 and 3).
3.2 Chenab River
As five main rivers are flowing in the Punjab province of Pakistan, one of them is the Chenab River which originates in the upper Himalayas of the Indian state of Himachal Pradesh. It is flowing from Jammu and Kashmir territory into the country's fields before flowing into the Indus River near the city of Uch Sharif. Water has been granted for this significant River under the Indus Water Treaty.
The Chenab River streamflow 550 months rate has been collected from the Department of Irrigation and Drainage, Punjab Barrage from March 1973 to December 2018.
The hydrological data sets are used to forecast the river streamflow of both wellknown rivers and validate the modified GMDH algorithm for mean monthly river streamflow; then, the same data sets are used to develop hybridized wavelet decomposition GMDH model (Figures 4 and 5).
3.3 Model structure
Appropriate picking up the streamflow TS input has one of the most significant steps in constructing a new forecast prototype. In case of the average discrepancy ratio either less (underestimated) or more than (overestimated) the unit, the following linear transformation (LT) (20) estimated the inputs for the model with decomposed components which are scaled at (0, 1)
In LT (19) has been represented y _{ t } is the initial streamflow data set decomposed by Empirical Mode Decomposition (EMD) with h _{ t } = X _{ t } − min(X) the normalization of the data is conducted before all the Intrinsic Mode Functions (IMFs) are observed by the EncoderDecoder Long Short Term Memory (EnDeLSTM) model at the time t and X _{ t } the decomposed components and h _{max} = max(X) − min(X) the maximum values between the actual values.
The corresponding input variables allow the network to get the aspired output and prevent essential information trial effectively. There are no fixeddesigned rules for the TS input variables to develop the forecast prototype, while a general structure can be followed based on the successful implementation of water resources issues [39,40,41]. In this research work, six input model combinations have been developed to explore the inputoutput variables model, equivalent to lagging variables from the previous monthly streamflow time series cycles y _{ t−p }, where 1–6 months are set for p, respectively. Eqs. (21) and (22) mathematically model structures followed for original y _{ t−p } and DW_{ t−p } monthly streamflow statistics.
and
where y _{ t−1} corresponds to the last month's streamflow value, there DW _{ t−1} effectively estimated the components D _{ s }(D _{2}, D _{4}, and D _{8}) by evaluating the y _{ t−1} and the input model structure values described in Table 1 for monthly streamflow forecasting of the Indus River and Chenab River (Figures 6a and 6b).
Model  Input combination  Model  WD input combination 

M _{1}  y _{ t } = f(z _{ t−1}, z _{ t−2})  WM_{1}  WD_{ t−1} 
M _{2}  y _{ t } = f(z _{ t−1}, z _{ t−2}, z _{ t−3}, z _{ t−4})  WM_{2}  WD_{ t−1}, WD_{ t−2} 
M _{3}  y _{ t } = f(z _{ t−1}, z _{ t−2}, z _{ t−3}, z _{ t−4}, z _{ t−5}, z _{ t−6})  WM_{3}  WD_{ t−1}, WD_{ t−2}, WD_{ t−3} 
M _{4} 

WM_{4}  WD_{ t−1}, WD_{ t−2}, WD_{ t−3}, WD_{ t−4} 
M _{5} 

WM_{5}  WD_{ t−1}, WD_{ t−2}, WD_{ t−3}, WD_{ t−4}, WD_{ t−5} 
M _{6} 

WM_{6}  WD_{ t−1}, WD_{ t−2}, WD_{ t−3}, WD_{ t−4}, WD_{ t−5}, WD_{ t−6} 
4 Results and discussions
4.1 Fitting GMDH model to the data sets
It is significant to pick the proper input series for the GMDH model and then use it; therefore, the six(M1−M6) input structures are suitable for the model used for streamflow data sets in twotier training and testing phases. Significant and various studies have shown the positive results of the radial basis function (RBF) in the use of the AI models for the configuration and estimation of TS data sets [16,29,30]. Therefore, this study uses the RBF kernel function to control the streamflow modeling and TS forecast system. There are two parameters (γ, σ ^{2}) used for the estimation of the GMDH model. These parameters are used to test the model more efficiently and systematically to address possible shortcomings in the estimation and trails method by the standard grid search procedure. The model performance has been analyzed to determine the parameter values by grid exploring γ and σ ^{2} in the range of 10–1,000 and 0.01–1.0, respectively, and the crossvalidation procedure is applied for calibrating parameters to evade overfit. The kfold crossvalidation of the training phase for approximation forecasting error is given for each pair of hyperparameter (γ, σ ^{2}) in the search space. The forecasting bestfit algorithm is incorporated based on execution and performance standards.
The R, MAE, and RMSE statistical estimates for the training and testing phases from a combination of six input specimens (M1−M6) of the GMDH model are presented in Table 2 for the Indus and Chenab rivers. It is observed that the training and testing phases M6 and M5 input models R had the greatest estimate whereas the least estimations have for MAE and MSE; the testing estimations of M6 and M5 input combinations of R (0.9635 and 0.9547), MAE (0.0247 and 0.0290) and RMSE (0.0011 and 0.0015) shows better estimations for the model.
Indus River GMDH  Chenab River GMDH  

Input models  Training data set  Testing data set  Training data set  Testing data set  
R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  
M1  0.6775  0.1071  0.0198  0.5355  0.0969  0.0151  0.9450  0.0357  0.0027  0.9193  0.0419  0.0029 
M2  0.9688  0.0251  0.0013  0.9518  0.0304  0.0016  0.9578  0.0290  0.0019  0.9385  0.0360  0.0025 
M3  0.9761  0.0185  0.0008  0.9630  0.0251  0.0011  0.9662  0.0263  0.0016  0.9455  0.0335  0.0020 
M4  0.9725  0.0204  0.0011  0.9578  0.0279  0.0014  0.9796  0.0223  0.0012  0.9498  0.0317  0.0019 
M5  0.9704  0.0227  0.0012  0.9553  0.0276  0.0014  0.9891  0.0182  0.0007  0.9547  0.0290  0.0015 
M6  0.9780  0.0172  0.0007  0.9635  0.0247  0.0011  0.9743  0.0253  0.0015  0.9545  0.0324  0.0017 
The bold values are illustrated the Significant outcomes of the model.
4.2 Fitting hybrid model wavelet GMDH to the data sets (WGMDH)
A new forecast streamflow algorithm has been developed with a crossbred thought that uses the wavelet and GMDH algorithms. The primary TS data sets have been decomposed into WD periodic components using Mallat's DWT formation before applying the GMDH algorithm. As chosen, the decomposition levels based on the perceived time series have been decomposed into wavelet components. The best decomposition of TS data sets in wavelet analysis plays a significant role in conserving and minimizing information distortion.
The number of decomposition levels is estimated at M = 2.6848 and M = 2.7403 for the Indus and Chenab rivers using n = 484 and n = 550 monthly streamflow in the formula (6). Still, the data sets decomposed on roundoff value M = 3 (i.e. 2–4–8 months) decomposition levels used for both rivers estimations. The efficacy of WDs is measured by the correlation coefficients and analyzed between streamflow TS data sets and the various decomposition levels of wavelet coefficients. The measurement of the correlation coefficients of each TS WD and primary monthly streamflow data are described in Table 3. It is observed that the D _{11} (2months mode) has a low correlation coefficients value compared with D _{21} (4month mode) and D _{31} (8month mode) components, whereas these have well shown higher correlations. Subsequently, the correlated WDs have been well shown, such as D _{2p }, D _{3p }, and A _{3p } (approximate mode) components added to each other for developing the new input series and that developed series used with the WGMDH model as GMDH inputs. The proposed model structure of WGMDH is presented in Figure 7, and Figure 8 shows the combinations of useful details and approximation components mode such as D _{1p }, D _{2p }, D _{3p }, A _{3p }, and (D _{2p } + D _{3p } + A _{3p }). The six variant specimens of the newly input series data described in Table 1 are used to forecast as in the previous GMDH model.
Discrete wavelet components  D _{ t−1}/Q _{ t }  D _{ t−2}/Q _{ t }  D _{ t−3}/Q _{ t }  D _{ t−4}/Q _{ t }  D _{ t−5}/Q _{ t }  D _{ t−6}/Q _{ t }  Mean absolute correlation 

Correlations coefficients for Indus River data set  
D _{2p}  0.10354  0.09362  0.10051  0.07608  0.04751  0.03952  0.07679 
D _{3p}  0.40653  0.50106  0.40350  0.39352  0.39902  0.38305  0.41445 
A _{3p}  0.23205  0.20702  0.25505  0.27201  0.29056  0.31253  0.26154 
Correlations coefficients for Chenab River data set  
D _{1p}  0.05425  0.02674  0.00721  0.0075  0.02271  0.01922  0.02294 
D _{2p}  0.16821  0.17573  0.14175  0.08525  0.03623  0.08573  0.11548 
D _{3p}  0.41673  0.42220  0.43672  0.45021  0.45070  0.43271  0.43488 
A _{3p}  0.12475  0.12975  0.13971  0.15123  0.16821  0.19622  0.151645 
The computational codes for GMDH, WGMDH, LSSVM, and ANN models are written in the MATLAB software with the wavelet toolbox, which communicates the streamflow forecast and models estimations are presented in Tables 4–7 in the training and testing phases. The common critical approach concerning the machine learning model's achievement is the training and testing phases. The training phase is executed to the enhanced characteristic of the developed model algorithm while the testing phase estimations are used for the model validation and once the model is validated, then the training phase is eliminated from the testing phase [42,43,44]. The best statistical estimates are selected from the testing phase to appropriate forecast analysis; therefore, the testing phase estimations have the best forecast outcomes for developing the model. Further, the WGMDH model reveals the strong effects of the Indus and Chenab Rivers streamflow forecast, clearly described in Table 4. In view of the table analysis, the input model WM5 has estimated the most considerable and best for the Indus River; the largest and lowest values are R = 0.988 and MAE = 0.0117, RMSE = 0.0004 in the training phase, while the same input model has been observed that the highest and least optimal estimations as under R = 0.9859 and MAE = 0.0122, RMSE = 0.0002 in the testing phase. Similarly, in the table, the input combination WM4 has given the best estimate for Chenab River, with the largest and lowest values R = 0.9901 and MAE = 0.0119, RMSE = 0.0005 in the training phase, while the same input model has been opted for the highest and least estimations as under in the testing phase R = 0.9798 and MAE = 0.0127, RMSE = 0.0008.
Indus River WGMDH  Chenab River WGMDH  

Input models  Training data set  Testing data set  Training data set  Testing data set  
R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  
WM1  0.7673  0.1059  0.0191  0.7391  0.0296  0.0082  0.9491  0.0279  0.002  0.9392  0.0309  0.0019 
WM2  0.8698  0.0217  0.0107  0.9415  0.022  0.0021  0.956  0.0237  0.0017  0.9431  0.0244  0.0015 
WM3  0.9844  0.0158  0.0014  0.9601  0.0194  0.0022  0.9568  0.0206  0.0015  0.9593  0.0217  0.0013 
WM4  0.984  0.0204  0.0007  0.9723  0.0208  0.001  0.9901  0.0119  0.0005  0.9798  0.0127  0.0008 
WM5  0.988  0.0117  0.0004  0.9859  0.0122  0.0002  0.9815  0.0164  0.0016  0.9579  0.0171  0.0016 
WM6  0.9757  0.0205  0.0009  0.9811  0.0211  0.0011  0.9676  0.0188  0.0011  0.9593  0.0211  0.0012 
The bold values are illustrated the Significant outcomes of the model.
Input models  Indus River LSSVM  Chenab River LSSVM  

Training data set  Testing data set  Training data set  Testing data set  
R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  
M1  0.9026  0.0086  0.00025  0.868  0.0318  0.0109  0.9172  0.0421  0.0027  0.9041  0.0437  0.0042 
M2  0.9032  0.0086  0.00026  0.7357  0.0279  0.0097  0.9266  0.0401  0.004  0.9126  0.0445  0.004 
M3  0.9049  0.0086  0.00025  0.9161  0.0313  0.0109  0.9323  0.0393  0.0039  0.9196  0.043  0.0036 
M4  0.907  0.0084  0.00027  0.893  0.0323  0.0115  0.9342  0.036  0.0036  0.9296  0.0414  0.0034 
M5  0.9057  0.0085  0.00024  0.9217  0.0336  0.0102  0.9432  0.0345  0.0031  0.9408  0.0382  0.0027 
M6  0.9145  0.0078  0.00019  0.9335  0.0247  0.0019  0.9418  0.036  0.0032  0.9467  0.0327  0.0021 
The bold values are illustrated the Significant outcomes of the model.
Input models  Indus River ANN  Chenab River ANN  

Training data set  Testing data set  Training data set  Testing data set  
R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  R  MAE  RMSE  
M1  0.8678  0.0114  0.022  0.8193  0.0301  0.1035  0.9147  0.0173  0.0367  0.8144  0.0783  0.1008 
M2  0.8769  0.0124  0.0213  0.9195  0.0288  0.0922  0.9278  0.0219  0.0301  0.7956  0.0889  0.2934 
M3  0.8536  0.0125  0.0231  0.7094  0.0487  0.165  0.9026  0.0261  0.0346  0.7842  0.0298  0.372 
M4  0.7428  0.019  0.0297  0.6683  0.04  0.1059  0.8713  0.0271  0.0374  0.6945  0.2685  0.5161 
M5  0.7216  0.0207  0.0307  0.8643  0.0318  0.0824  0.8102  0.0199  0.0307  0.6141  0.1515  0.1973 
M6  0.8454  0.014  0.0237  0.8572  0.0306  0.0892  0.7009  0.0263  0.0332  0.5314  0.1259  0.2392 
The bold values are illustrated the Significant outcomes of the model.
Models  Monthly streamflow  Forecasting analysis  Monthly streamflow  Forecasting analysis  

R  MAE  RMSE  R  MAE  RMSE  
WGMDH  Indus River  0.9859  0.0122  0.0002  Chenab River  0.9798  0.0127  0.0008 
GMDH  0.9635  0.0246  0.0011  0.9547  0.029  0.0015  
LSSVM  0.9335  0.0247  0.0019  0.9467  0.0327  0.0021  
ANN  0.9125  0.0288  0.0824  0.8144  0.0294  0.1008 
4.3 Comparisons of forecasting models
The developed WGMDH model has been compared with GMDH, LSSVM, and ANN models based on performance and accuracy inspection by statistical estimations in the testing phase. The testing phase of Table 7 is reported that the least statistical estimations of MAE and RMSE have in the WGMDH model compared to other models; this is evidence of the valuable performance. Furthermore, the same model also provided authoritative evidence for estimating the most considerable value R; therefore, the WGMDH model is significant and better than the other models.
The correlation coefficient (R) estimated on Indus and Chenab Rivers for the GMDH model is 0.9635 and 0.9547, for LSSVM model is 0.9335 and 0.9467, for ANN model is 0.9125 and 0.9144 and for WGMDH model is 0.9859 and 0.9798 highest correlation value estimates and shows averagely 2.24% and 2.5% increased, respectively. It is a good sign for the developed model with the combination of WD and GMDH algorithms as compared to the conventional models in terms of model accuracy. In addition, the MAE and RMSE on Indus and Chenab Rivers data sets are statistically the smallest values in the forecasting models and Table 7 illustrated that the developed model had achieved the smallest value compared to the other models; this is again a valuable element for the model efficiency.
Figures 9 and 10 are demonstrated the predicted linear trend line scatter plotting for the WGMDH, GMDH, LSSVM, and ANN models from the testing phase and clearly is described the outperformance as relatively close to observed data on both streamflow stations by the WGMDH model. Therefore, the application of WGMDH in river streamflow forecast is better than that of conventional GMDH and other models. The scatter graphs are described by the regression fit line equation y = a + bx using MATLAB application, where a and b are the coefficients and defined for the GMDH, WGMDH, LSSVM, and ANN models, respectively. The WGMDH model has fewer scattered estimates, and the Rvalue of the WGMDH model has close 100 per cent (i.e. R = 0.9895 and R = 0.9798) for both Indus and Chenab data sets compared to the GMDH, WLSSVM, WANN and WMARS models. Therefore, this study concludes that the WGMDH model is superior to other models as it provides better forecast estimations for both streamflow stations.
5 Conclusions
The significant novelty of this research is that the new forecasting model on the two rivers of Pakistan has been developed by hybridizing two different meaningful algorithms and which has given tremendous performances, yielded presented in Table 4. The appropriate and best components of this model are the modified GMDH and WD.
The WD has filtered the monthly streamflow inputs into three (3) decomposition levels (2–4–8 months) have shown higher correlations and played a significant characteristic role in the primary data sets, as the most valuable information has been carried out after each level interpreted. After by WD algorithm have denoised and decoded the primary data sets and obtained the new input series “the sum of useful details (Ds) and the approximation (A) components (D _{2p } + D _{3p } + A _{3p })” of the WGMDH model and forecast estimations have been made. The WGMDH model has been used to forecast trained and tested phases through distinct new input combinations on monthly streamflow data sets. The research conducted comparative findings showed that the WGMDH model has an excellent estimated algorithm that increased accuracy forecasting than the conventional and modified models, clearly shown in Table 7 and Figures 9 and 10.
The key conclusion is that the TS forecasting of the WGMDH model has been delivered to be robust and efficient with great accuracy when the WD algorithm has been combined with the GMDH for the monthly streamflow preprocessing. The study of the relationships among the application of the dynamic hybrid algorithms and their operational strength in streamflow forecast is an impactable and an attractive research topic in the domain of TS streamflow estimations. Therefore, it is recommended to stakeholders that the water resources planning and management departments can use the proposed hybrid model based on the model’s performance to forecast various water operation policies.
Acknowledgments
This research work has been reinforced by the Department of Basic Sciences & Related Studies, Mehran University of Engineering & Technology, Jamshoro, Sindh, Pakistan. The authors have gratefully acknowledged the institute for its support and cooperation in the research activity and for providing a healthy research environment and facilities. The authors are grateful to the reviewers for their valuable suggestions.

Funding information: N. Jarasthitikulchai would like to acknowledge the financial support by Navamindradhiraj University through the Navamindradhiraj University Research Fund (NURF).

Author contributions: S. M. Pandhiani suggested the problem and guided throughout the manuscript, after authors discussions, W. A. Shaikh developed the algorithm, created a computationally coded algorithm, then obtained estimations, and wrote this manuscript. After that, S. F. Shah and M. A. Solangi supervised and critically examined the results with literature references, acknowledged the findings of this work encouraged the author W. A. Shaikh for conducting the research. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

Conflict of interest: The authors state no conflict of interest.
References
[1] Amin R, Ahmad H, Shah K, Hafeez MB, Sumelka W. Theoretical and computational analysis of nonlinear fractional integrodifferential equations via collocation method. Chaos Solitons Fractals. 2021;151:111252.10.1016/j.chaos.2021.111252Search in Google Scholar
[2] Khan MN, Ahmad I, Akgül A, Ahmad H, Thounthong P. Numerical solution of timefractional coupled Korteweg–de Vries and Klein–Gordon equations by local meshless method. Pramana. 2021;95(1):1–13.10.1007/s12043020020255Search in Google Scholar
[3] Rezazadeh H, Jhangeer A, TalaTebue E, Hashemi MS, Sharif S, Ahmad H, et al. New wave surfaces and bifurcation of nonlinear periodic waves for GilsonPickering equation. Results Phys. 2021;24:104192.10.1016/j.rinp.2021.104192Search in Google Scholar
[4] Cimpoiasu R, Rezazadeh H, Florian DA, Ahmad H, Nonlaopon K, Altanji M. Symmetry reductions and invariantgroup solutions for a twodimensional Kundu–Mukherjee–Naskar model. Results Phys. 2021;28:104583.10.1016/j.rinp.2021.104583Search in Google Scholar
[5] Tariq H, Sadaf M, Akram G, Rezazadeh H, Baili J, Lv YP, et al. Computational study for the conformable nonlinear Schrödinger equation with cubic–quintic–septic nonlinearities. Results Phys. 2021;30:104839.10.1016/j.rinp.2021.104839Search in Google Scholar
[6] Baskonus HM, Gao W, Rezazadeh H, MirhosseiniAlizamini SM, Baili J, Ahmad H, et al. New classifications of nonlinear Schrödinger model with group velocity dispersion via new extended method. Results Phys. 2021;31:104910.10.1016/j.rinp.2021.104910Search in Google Scholar
[7] Ahmad H, Alam MN, Omri M. New computational results for a prototype of an excitable system. Results Phys. 2021;28:104666.10.1016/j.rinp.2021.104666Search in Google Scholar
[8] Yokus A, Durur H, Kaya D, Ahmad H, Nofal TA. Numerical comparison of Caputo and Conformable derivatives of time fractional BurgersFisher equation. Results Phys. 2021;25:104247.10.1016/j.rinp.2021.104247Search in Google Scholar
[9] Javeed S, Anjum S, Alimgeer KS, Atif M, Khan MS, Farooq WA, et al. A novel mathematical model for COVID19 with remedial strategies. Results Phys. 2021;27:104248.10.1016/j.rinp.2021.104248Search in Google Scholar PubMed PubMed Central
[10] Abouelregal AE, Ahmad H, Nofal TA, AbuZinadah H. Moore–Gibson–Thompson thermoelasticity model with temperaturedependent properties for thermoviscoelastic orthotropic solid cylinder of infinite length under a temperature pulse. Phys Scr. 2021;96:105201.10.1088/14024896/abfd63Search in Google Scholar
[11] Adnan RM, Liang Z, Parmar KS, Soni K, Kisi O. Modeling monthly streamflow in mountainous basin by MARS, GMDHNN and DENFIS using hydroclimatic data. Neural Comput Appl. 2020;7:1–19.10.1007/s00521020051643Search in Google Scholar
[12] Maheswaran R, Khosa R. Waveletsbased nonlinear model for realtime daily flow forecasting in Krishna River. J Hydroinform. 2013;15(3):1022–41.10.2166/hydro.2013.135Search in Google Scholar
[13] Kisi O. Wavelet regression model for shortterm streamflow forecasting. J Hydrol. 2010;389(3–4):344–53.10.1016/j.jhydrol.2010.06.013Search in Google Scholar
[14] Matalas NC. Mathematical assessment of synthetic hydrology. Water Resour Res. 1967;3(4):937–45.10.1029/WR003i004p00937Search in Google Scholar
[15] Box GE, Jenkins GM, Reinsel G. Time series analysis: forecasting and control Holdenday San Francisco. Boxtime series analysis: forecasting and control holden day. John Wiley & Sons; 1970.Search in Google Scholar
[16] Ismail S, Shabri A, Samsudin R. A hybrid model of self organizing maps and least square support vector machine for river flow forecasting. Hydrol Earth Syst Sci. 2012;16(11):4417–33.10.5194/hess1644172012Search in Google Scholar
[17] Stefenon SF, Ribeiro MH, Nied A, Mariani VC, dos Santos Coelho L, da Rocha DF, et al. Wavelet group method of data handling for fault prediction in electrical power insulators. Int J Electr Power Energy Syst. 2020;123:106269.10.1016/j.ijepes.2020.106269Search in Google Scholar
[18] SaberiMovahed F, Najafzadeh M, Mehrpooya A. Receiving more accurate predictions for longitudinal dispersion coefficients in water pipelines: training group method of data handling using extreme learning machine conceptions. Water Resour Manag. 2020;34(2):529–61.10.1007/s1126901902463wSearch in Google Scholar
[19] Clemen RT. Combining forecasts: A review and annotated bibliography. Int J Forecast. 1989;5(4):559–83.10.1016/01692070(89)900125Search in Google Scholar
[20] Diebold FX, Lopez JA. 8 Forecast evaluation and combination. Handb Stat. 1996;14:241–68.10.3386/t0192Search in Google Scholar
[21] Timmermann A. Forecast combinations. Handb Econ Forecast. 2006;1:135–96.10.1016/S15740706(05)010049Search in Google Scholar
[22] Shaikh W, Shah S, Solangi M, Pandhiani S. Forecasting analysis of GMDH model with LSSVM and MARS models for hydrological data sets (Case study). Indian J Sci Technol. 2019;12:1–6. 10.17485/ijst/2019/v12i39/147941 Search in Google Scholar
[23] Muhammad M. Time series modeling using markov and arima models. PhD Thesis, MSc. Thesis. Faculty of Civil Engineering, University of Technology Malaysia; 2012.Search in Google Scholar
[24] Pandhiani SM, Shabri AB. Time series forecasting using waveletleast squares support vector machines and wavelet regression models for monthly stream flow data. Open J Stat. 2013;3(03):183.10.4236/ojs.2013.33021Search in Google Scholar
[25] Krishna B, Rao YS, Nayak P. Time series modeling of river flow using wavelet neural networks. J Water Resour Prot. 2011;3(01):50.10.4236/jwarp.2011.31006Search in Google Scholar
[26] Smith LC, Turcotte DL, Isacks BL. Stream flow characterization and feature detection using a discrete wavelet transform. Hydrol Process. 1998;12(2):233–49.10.1002/(SICI)10991085(199802)12:2<233::AIDHYP573>3.0.CO;23Search in Google Scholar
[27] Esfetanaj NN, Nojavan S. The use of hybrid neural networks, wavelet transform and heuristic algorithm of WIPSO in smart grids to improve shortterm prediction of load, solar power, and wind energy. Operation of distributed energy resources in smart distribution networks. Elsevier; 2018. p. 75–100.10.1016/B9780128148914.000047Search in Google Scholar
[28] Pandhiani SM, Shabri AB. Time series forecasting by using hybrid models for monthly streamflow data. Appl Math Sci. 2015;9(57):2809–29.10.12988/ams.2015.52164Search in Google Scholar
[29] Al Wadi S, Ismail MT, Karim SAA. Forecasting financial time series data base on Wavelet transforms and Neural network model.Search in Google Scholar
[30] Daubechies I. Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math. 1988;41(7):909–96.10.1137/1.9781611970104.ch6Search in Google Scholar
[31] Wang W, Ding J. Wavelet network model and its application to the prediction of hydrology. Nat Sci. 2003;1(1):67–71.Search in Google Scholar
[32] Shao XG, Leung AKM, Chau FT. Wavelet: a new trend in chemistry. Acc Chem Res. 2003;36(4):276–83.10.1021/ar990163wSearch in Google Scholar
[33] Mirbagheri SA, Nourani V, Rajaee T, Alikhani A. Neurofuzzy models employing wavelet analysis for suspended sediment concentration prediction in rivers. Hydrol Sci J–Jo des Sci Hydrol. 2010;55(7):1175–89.10.1080/02626667.2010.508871Search in Google Scholar
[34] Kişi Ö. Wavelet regression model as an alternative to neural networks for monthly streamflow forecasting. Hydrol Proces An Int J. 2009;23(25):3583–97.10.1002/hyp.7461Search in Google Scholar
[35] Mallat SG. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell. 1989;11(7):674–93.10.1515/9781400827268.494Search in Google Scholar
[36] Yonaba H, Anctil F, Fortin V. Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting. J Hydrol Eng. 2010;15(4):275–83.10.1061/(ASCE)HE.19435584.0000188Search in Google Scholar
[37] Walton R, Binns A, Bonakdari H, Ebtehaj I, Gharabaghi B. Estimating 2year flood flows using the generalized structure of the group method of data handling. J Hydrol. 2019;575:671–89.10.1016/j.jhydrol.2019.05.068Search in Google Scholar
[38] Safari MJS, Ebtehaj I, Bonakdari H, Eshaghi MS. Sediment transport modeling in rigid boundary open channels using generalize structure of group method of data handling. J Hydrol. 2019;577:123951.10.1016/j.jhydrol.2019.123951Search in Google Scholar
[39] Wang WC, Chau KW, Cheng CT, Qiu L. A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. J Hydrol. 2009;374(3–4):294–306.10.1016/j.jhydrol.2009.06.019Search in Google Scholar
[40] Firat M. Comparison of artificial intelligence techniques for river flow forecasting. Hydrol Earth Syst Sci. 2008;12(1):123–39.10.5194/hess121232008Search in Google Scholar
[41] Ismail S, Shabri A, Samsudin R. A hybrid model of selforganizing maps (SOM) and least square support vector machine (LSSVM) for timeseries forecasting. Expert Syst Appl. 2011;38(8):10574–78.10.1016/j.eswa.2011.02.107Search in Google Scholar
[42] Yu PS, Chen ST, Chang IF. Support vector regression for realtime flood stage forecasting. J Hydrol. 2006;328(3–4):704–16.10.1007/9783540798811_26Search in Google Scholar
[43] Uçar MK, Nour M, Sindi H, Polat K. The effect of training and testing process on machine learning in biomedical datasets. Math Probl Eng. 2020;2020. https://doi.org/10.1155/2020/2836236.Search in Google Scholar
[44] Shaikh WA, Shah SF, Pandhiani SM, Solangi MA. Wavelet decomposition impacts on traditional forecasting time series models. Computer Modeling in Engineering & Sciences. Tech Science Press. 2022;130(3):1517–32. 10.32604/cmes.2022.017822 Search in Google Scholar
© 2022 Wajid Ali Shaikh et al., published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.