Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access November 2, 2022

A hybrid forecasting model based on the group method of data handling and wavelet decomposition for monthly rivers streamflow data sets

  • Wajid Ali Shaikh , Syed Feroz Shah , Siraj Muhammad Pandhiani , Muhammad Anwar Solangi , Muhammad Farooq , Hijaz Ahmad , Artion Kashuri , Nantapat Jarasthitikulchai EMAIL logo and Weerawat Sudsutad
From the journal Open Physics


The natural streamflow of the River is encouraged to forecast through multiple methods. The impartiality of this study is the comparison of the forecast accuracy rates of the time-series (TS) hybrid model with the conventional model. The behavior of the natural monthly statistical chaotic streamflow to use in the forecasting models has been compiled by projecting two distinguished rivers, the Indus and Chenab of Pakistan. Therefore, this article is based on the monthly streamflow forecast analysis that has been reported using the group method of data handling with wavelet decomposition (WGMDH) as a new forecasting attribute. Discrete wavelets decompose the perceived data into sub-series and forecast hydrological variables; these fittingly have been endorsed as inputs in the hybrid model. The forecast efficiency and estimations of the hybrid model are measured by the appropriate statistical techniques such as mean absolute error (RME), root mean square error (RMSE), and correlation coefficients (R) and compared to the group method of data handling (GMDH), least-square support vector machine and artificial neural network conventional models. The comparative analysis shows that the hybrid WGMDH model is more stable and more potent for forecasting river flow than other predictive models and significantly proved that the hybrid model is a robust alternate forecasting tool for TS data sets.

1 Introduction

Nonlinear partial differential equations and nonlinear ordinary differential equations are the most considerable topics to analyze the wave phenomena which play great importance in engineering and science such as mathematical biology optical fibers, nonlinear optics, neural physics, solid state physics, chaos, hydrodynamics, quantum mechanics, fluid mechanics, and other numerous areas. The diffusion, dispersion, convection, reaction, and dissipation phenomena are extremely important in wave equations. Due to the complexity of these equations, there are some methods to find the solutions of wave equations [1,2,3,4,5,6,7,8,9,10].

Hydrological forecasting has always been a big deal for researchers and water management societies. The impact of hydrological changes is multiple rivers and water-related stakeholders, requiring informative observations to understand the hydrological forecast effects. Further, reliable and essential information is beneficial for the projections of streamflow, and these predictions improve the management efficiency regarding river disasters or behaviour such as floods, availability of future water resources, and so on, and is helpful for timely decisions [11]. For that multifaceted several hydrological data-driven statistical forecasting models of short-term (daily or hourly) and long-term (monthly, seasonally, or yearly) have been developed [12,13]. Stakeholders and water management societies are currently using these forecasting data-driven statistical models and continually striving to improve their confidence levels in hydrological models using mathematical and modern technological attributes [14,15].

The group method of data handling (GMDH) algorithm offers automated structural and technicality to the time series forecasting models. The GMDH application to innovate mathematically forecasting models of multi-parametric data sets is tremendous, and the model has performed more optimally than the conventional forecasting models such as Auto-Regressive Integrated Moving Average, Least Square Support Vector Machine (LSSVM), Artificial Neural Network (ANN), and Multivariate Adaptive Regression Splines [16]. The GMDH forecasting algorithm is used in different engineering fields [17]. For at least two decades, various combinations of the GMDH algorithms based on the second-order polynomial and different self-selection thresholds with optimizer algorithms are still being used instead of numerous traditional forecasting time series models in various fields [18,19,20,21]. An algorithm GMDH network describes higher-order polynomials that could estimate the complexity and linear dependency present in regression polynomials as there have two examples of network algorithms used in nonlinear polynomials that are hybrid and multilayer GMDH algorithms. The initial structure of the GMDH algorithm has two factors of the quadratic functions comprising the various layers, and every layer contains a series of inputs, which necessitate a specific contribution from the last layer. The GMDH algorithm is based on quadratic expressions and recommended a transfer function (TF) such as polynomial function, radial basis function, hyperbolic tangent function, and sigmoid function to communicate the network between input and output data sets model as used in the Volterra Functional Series [22]. The forecast accuracy estimation issues exercise the robust prediction concerning time series (TS) by artificial intelligence (AI) algorithms [11,23]. Generally, wavelet decomposition (WD) is widely applied for signal processing applications as a mathematical tool; it can transform the signal (hidden patterns in the data mass) into multiple lower resolution levels by administering the single mother wavelet functions scaling and shifting components [24,25]. Traditional neural networks have a standard criterion for accurately estimating nonlinear polynomials that can restrict. In contrast, WD has an excellent capability to exhibit the functions simultaneously, and their local characteristics reveal in the TS inputs by the hydrological process [26]. The usage of factors is the facilitator for neural networks training accuracy of nonlinear modelling signals [27].

A new hybrid model has been developed in this research work by combining the GMDH algorithm and WD technique for the hydrological forecast, and the model is used for forecasting the Rivers streamflow. The time series streamflow inputs are divided into two data sets to obtain both training and testing outcomes for the forecast [28]. Cross-validation has been used for the extracted estimates, and the findings declared that the accuracy of the hybrid group method and data handling and wavelet decomposition (WGMDH) model is more effective than GMDH, LSSVM, and ANN models [29]. Furthermore, the dimension reduction carried out by pre-processing and WD is used and observed to increase accuracy estimation and significant differences in the conventional models [30].

The findings of the research work may help to take adequate measures based on predictive analysis using the proposed WGMDH model. Furthermore, the findings may show an alternative design of the forecast model that could be more versatile and robust than the traditional GMDH model and may establish greater precision and unique analytical potential than a few traditional methods.

2 Methods and materials

This section followed the research methods and materials addressed in each section stated in this research study. The tractable proposed combined model is implemented as an efficient forecasting tool on river streamflow data sets that obtain comprehensive informational signals. The implementation of WD in numerous conventional forecasting models increases the performance of the models and achieves exceptional outcomes. The developed coupled model with the wavelet and AI algorithms fulfils the described two-step protocol for forecasting projects.

  1. The WD mastery has been used as a pre-processor of input data sets. It analyses TS signals at different intervals, providing inputs and significant detail regarding data sets.

  2. Subsequently, searching the appropriate input signal by WD and used for further manipulations as AI input is used in numerous regular forecasting models.

Initially forecasted TS inputs are decomposed into a sub-TS {W 1, W 2,…, W p ,C p } using the WD algorithm, where, W 1, W 2, …, W p and C p are described as detailed and contextual TS inputs, respectively. Usually, the forecast time series and each sub-TS have different roles and behaviour. Therefore, the attribute and influence on inputs of the forecasting TS are different from each other. The developed GMDH, LSSVM, and ANN forecast models t are the input sub-TS of the models, and the forecasting TS inputs at t + T (T is the forecast measurement time) time have the output of models. Finally, algorithms constructed the wavelet network models (WNM) and discovered several algorithms on different weights [31]. The fundamental object of forecasting TS data sets to the WNM is forming the combined models with the WD algorithm [32].

2.1 The WD

The WD algorithm combines the traditional AI forecasting algorithms developed in past periods. A novel methodology to innovate the TS forecast model and the WD algorithm has emerged as an ever more useful and vital tool for the models [33]. The WD established a predicting method for TS, which presented to develop several decomposition goals that utilize signals for repetitive trous wavelet transformation and benefit from being a shift-invariant. Further, the WD algorithm can de-noise nonstationary signals into sub-signals at different levels with a relevant resource for improved streamflow elucidation [13,27]. The mathematical structure of the WD is described in Eq. (1) as a continuous TS x(t) (∵t ∈ [−∞, ∞]).

(1) ψ ( t , s ) = 1 s ψ t τ 2 ,

where t is the time, τ  is the time step that iterates the window function,  and s ∈ [0, ∞] for wavelet scale defined, respectively. As ψ(t) is known as the mother wavelet and expressed in Eq. (2),

(2) ψ ( t ) d t = 0 ,

whereas the continuous wavelet transform (CWT) illustrated in Eq. (3) [28]

(3) W ( τ , s ) = 1 s x ( t ) ψ ¯ t τ s d t ,

where ψ ¯ ( t ) and W(τ, s) described to complex conjugation and consumed the total time of the inputs multiplied by the scale and shifted the form of the wavelet function ψ(t). Forecast by the CWT is non-feasible since wavelet-coefficient estimation at each credible scale is time-consuming and provides some outcomes. Thus, the WD is preferred in most prognostic models due to the time-efficient and simple algorithm. The WD consists of selecting the scales and position concerning the powers of 2, alleged dyadic scales, translations, and the interpretation that may be most precise and effectual. The significant impact of implying the WD is its robust algorithm because it has no hypothetically specious or parametric testing technique [34]. The WD is described as a mathematical structure in Eq. (4)

(4) ψ m , n t τ s = 1 s 0 m / 2 ψ t n τ 0 s 0 m s 0 m ,

where m, nZ switching as the scale and length of the TS, respectively, s 0 > 1 is defined for fixed dilation step, and τ 0 > 1 is the position parameter. The parameters s 0 = 2  and  τ 0 = 1 are the most common preference. For a discrete TS x(t) that occurs at t discrete time, then WD formation is defined as follows:

(5) W m , n = 2 m / 2 t = 0 N 1 ψ ( 2 m t n ) x ( t ) ,

In Eq. (5), x(t) is defined as discrete TS(∵1 ≤ tN − 1) on the power of 2 (N = 2 m ), and the wavelet coefficient of the WD (W m,n ) is described on s = 2 m   and  τ = 2 m n scales, where n is the time-translation component that varies within the ranges 0 < n < 2Mm  and  1 < m < M, and M is the estimation of decomposition levels, which is interpreted the TS into numerous wavelet input segments on the particular decomposition level.

Conclusively, the desired decomposition level of the time series in WD plays a significant factor in retaining figures and decreasing the time series curvature. Though, there has no such hypothesis regarding the number of decomposition levels for data sets. Formation (6) is used to determine the number of decomposition levels [34].

(6) M = log ( n ) ,

According to Mallat's theory, the discrete TS x(t) decomposed into a new series of linearity independent estimate and detail signals through the inverse WD, which is described in Eq. (7) [13,35];

(7) x ( t ) = T + m = 1 M t = 0 2 M m 1 W m , n 2 m / 2 ψ ( 2 m t n ) ,

or defined as

(8) x ( t ) = A M ( t ) + m = 1 M D m ( t ) ,

Eq. (8) A M (t) is the residual term at M decomposition level and D m (t) has detailed subseries (∵m = 1,2, …, M) that are taken as small features in interpreted value data. The following various forms of TFs have emerged and are vastly used in the model [36]; the efficiency of the conventional GMDH models can be improved.

(9) Polynomial function y = z z = y Sigmoid function y = 1 1 + e z z = ln y 1 y , ( y 1 ) Radial basis function y = e z 2 z = ln ( y ) Hyperbolic tangent function y = tanh ( z ) = 2 1 + e 2 z 1 z = tanh 1 ( y ) ,

Three different TF types (9), the sigmoid, radial basis, and hyperbolic tangent functions have been proposed and improved the conventional GMDH model's performance and proved the numerous substitutes into a single network. In conventional GMDH models, most researchers have measured the partial quadratic polynomials (PQP) as the TF and demonstrated that utilizing various TF exclusively in a model gives preferred outcomes rather than using the units for a single TF [37,38,39]. Nevertheless, multiple TF (9) are accessible for the modified GMDH model, and it has provided more reliable and accurate outcomes than the single TF models.

2.2 GMDH

The GMDH model has been established to display the complex nonlinear framework by utilizing a TF to communicate the connection between input and output data sets. The Volterra linear function (10) describes the input and output mapping with a neural network like the GMDH model.

(10) z = b 0 + i = 1 n b i y i + i = 1 n j = 1 n b i j y i y j + i = 1 n j = 1 n k = 1 n b i j k y i y j y k + , ( n N ) .

The mathematical structure (11) comprising the two order variables (neurons) is expressed in the following form:

(11) z = g ( y i , y j ) = b 0 + b 1 y i + b 2 y j + b 3 y i 2 + b 4 y j 2 + b 5 y i x j , ( j = 1 , 2 , , n ) .

In mathematical structure (11) have six coefficients {b o , b 1, b 2, b 3, b 4, b 5} in the g function, and these are unknowns [37]. Let us consider the coefficients are determined to expend the least square method. The system's observed input and forecast (output) variables are set for y and z, respectively.

The following are the iterative steps of the MGMDH model structure

Step I: Initially considered M 1 = m as the neurons of the input vector in the first layer Y = (y i , y i , …, y n ), whereas m is the number of inputs and taken j = 1 as a threshold.

Step II: Set N = M j ( M j 1 ) 2 for all independent variables Y as new variables x 1, x 2, …, x N in the data set and by the hypothesis of the PQPs constructed Eq. (10).

Step III: The aggregate number of partial descriptions (PDs) at the current layer has equivalent to the following combination:

(12) T = C 2 n = n ! 2 ! ( n 2 ) ! ,

Eq. (12) n is the number of nodes that have been chosen input variables from the nodes of the previous layer. The sort of PDs in each layer of the network is similar.

Step IV: Estimated the coefficients of the TF by the least squared (MSE) in the form of

(13) B j = ( Y i T Y i ) 1 Y j Z ,

where B = {b 0, b 1, …,  b 5} has the input vector of the coefficients of the variables and Y = {y 1, y 2, …,  y M } T is the output value from the following observation (14)

(14) Y = 1 y 1 i y 1 j y 1 i y 1 j y 1 i 2 y 1 j 2 1 y 2 i y 2 j y 2 i y 2 j y 2 i 2 y 2 j 2 1 y m i y m j y m i y m j y m i 2 y m j 2 ,

Step V: Distinguish the absolute best neuron out of T neurons. If the tiniest estimation by RMSE is less than the threshold, then the input will be terminated, or new input variables are introduced {y 1, y 2, …, y n , y′} (since y′ specified for the RMSE estimate) and T = T + 1 will be set; otherwise, the GMDH model Steps II and III will be repeated.

Step VI: The termination standard for the iteration of the model is as follows:

The lowest estimation at the individual layer acquired this iteration is associated with the lowest values found at the previous level; the lowest value is compared to the smallest assessments using the GMDH-constructed layer model. Unless the improvement is accomplished, till that repeats and rehashes steps II–IV. The last layer estimated the one node classified by the best execution, selected as the output node, and rejected the rest of the layer nodes. Finally, the GMDH model forecasting estimation is acquired, and then, the iteration is terminated and acknowledged by the network.

2.3 The WGMDH model

The methodology for a hybrid forecast model using the GMDH model combined with WD is anticipating future data sets by utilizing wavelet capability as a time-frequency restriction. Consequently, the best possible data sets of the GMDH algorithm have been developed through decomposition with a few good contrast strategies in this forecasting algorithm. The trous WD lets a significant translation of the filter outputs. The A-trous wavelet has been shown and begins with performing successive convolutions with the discrete low-pass channel.

(15) D i + 1 ( j ) = j = + D i ( j + 2 i I ) ,

whereas D 0(t) = Y(I) has the original time series. The development of separations between instances has priority and 2 i I illustrated that a trous name is synonymous with this algorithm, and the low-pass filter is a spline. This expression has a point symmetric; it has also helped to minimize; however, lastly, it does not consider how time is essentially asymmetric. The smoothed structure and contrast of progressive variants enable the w i wavelet coefficients to be accomplished by the detail signal.

(16) w i ( j ) = D i + 1 ( j ) D i ( j ) .

Eq. (16) is contained detailed signal or wavelet coefficients with a small detention feature of interpretational in the data value. The wavelet coefficients are iterated into the GMDH and analyzed as inputs. The WGMDH model ability was carried out by choosing the right inputs to overcome by the hybrid algorithm for the appropriate estimations.

2.4 Statistical performance

The use of statistical estimations is crucial for the reliability and credibility of any forecasting algorithm. Therefore, the estimations of the forecasting algorithms have been investigated through three statistical methods, such as the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), and the correlation coefficients (R). The statistical estimation outcomes fitted to the best fit line and the forecast performances estimated by the following indices [38]

(17) MAE = 1 m t = 1 m y t y ˆ t , ( 0 < MAE < ) ,

(18) RMSE = 1 m t = 1 m ( y t y ˆ t ) 2 , ( 0 < RMSE < ) ,

(19) R = 1 m t = 1 m ( y t y ¯ t ) ( y ˆ t y ¯ ˆ t ) 1 m t = 1 m ( y t y ¯ t ) 2 1 m t = 1 m ( y ˆ t y ¯ ˆ t ) 2 , ( 1 < R < 1 ) ,

where y t , y ˆ t , y ¯ t and y ¯ ˆ t are the actual (streamflow observed), forecasted, average streamflow observed, and average forecast values at a time t and, respectively, whereas m is the total number of observations.

The normalization and performance of the data set are used in the training and testing process and observed the estimations to select the best model with the least values of MAE and RMSE and the highest value of R (effect of flow).

3 Study areas and data utilization

The monthly TS streamflow data have been collected from two prominent rivers of the Islamic Republic of Pakistan, the Indus River, and the Chenab River, to endorse the developed forecast WMGMDH model. Both rivers are playing significant contributions to the region. These two rivers have made the forecast approximation of streamflow monthly records (Figure 1).

Figure 1 
               Country map of the study areas.
Figure 1

Country map of the study areas.

3.1 Indus River

The Indus River, locally called Sindhu discovered by the Scylax of Caryanda, the Greek advisor of the King Darius of Persia in 515 BC, and the early classical age information of the river had spread out in the West. The Indus River is one of the largest rivers in Pakistan and the Asian subcontinent and plays a significant role in the agricultural production industry. This river flows by Western Tibet, China, and India along with the full length of Pakistan to the south direction (via Gilgit-Baltistan) and falls into the Arabian Sea near the port city of Karachi in Sindh.

Figure 2 
                  Indus river location of the study area.
Figure 2

Indus river location of the study area.

The Indus River streamflow 484 months rate has been collected from the Department of Irrigation and Drainage, Sindh, from September 1978 to December 2018 (Figures 2 and 3).

Figure 3 
                  Streamflow of Indus River (484 months rate from 09-1978 to 12-2018).
Figure 3

Streamflow of Indus River (484 months rate from 09-1978 to 12-2018).

3.2 Chenab River

As five main rivers are flowing in the Punjab province of Pakistan, one of them is the Chenab River which originates in the upper Himalayas of the Indian state of Himachal Pradesh. It is flowing from Jammu and Kashmir territory into the country's fields before flowing into the Indus River near the city of Uch Sharif. Water has been granted for this significant River under the Indus Water Treaty.

Figure 4 
                  Chenab river location of the study area.
Figure 4

Chenab river location of the study area.

The Chenab River streamflow 550 months rate has been collected from the Department of Irrigation and Drainage, Punjab Barrage from March 1973 to December 2018.

Figure 5 
                  Streamflow of Chenab River (550 months rate from 03-1973 to 12-2018).
Figure 5

Streamflow of Chenab River (550 months rate from 03-1973 to 12-2018).

The hydrological data sets are used to forecast the river streamflow of both well-known rivers and validate the modified GMDH algorithm for mean monthly river streamflow; then, the same data sets are used to develop hybridized wavelet decomposition GMDH model (Figures 4 and 5).

3.3 Model structure

Appropriate picking up the streamflow TS input has one of the most significant steps in constructing a new forecast prototype. In case of the average discrepancy ratio either less (underestimated) or more than (overestimated) the unit, the following linear transformation (LT) (20) estimated the inputs for the model with decomposed components which are scaled at (0, 1)

(20) y t = h t h max .

In LT (19) has been represented y t is the initial streamflow data set decomposed by Empirical Mode Decomposition (EMD) with h t = X t − min(X) the normalization of the data is conducted before all the Intrinsic Mode Functions (IMFs) are observed by the Encoder-Decoder Long Short Term Memory (En-De-LSTM) model at the time t and X t the decomposed components and h max = max(X) − min(X) the maximum values between the actual values.

The corresponding input variables allow the network to get the aspired output and prevent essential information trial effectively. There are no fixed-designed rules for the TS input variables to develop the forecast prototype, while a general structure can be followed based on the successful implementation of water resources issues [39,40,41]. In this research work, six input model combinations have been developed to explore the input-output variables model, equivalent to lagging variables from the previous monthly streamflow time series cycles y tp , where 1–6 months are set for p, respectively. Eqs. (21) and (22) mathematically model structures followed for original y tp and DW tp monthly streamflow statistics.

(21) y t = f ( y t 1 , y t 2 , , y t p ) ,


(22) y t = f ( D W t 1 , D W t 2 , , D W t p ) ,

where y t−1 corresponds to the last month's streamflow value, there DW t−1 effectively estimated the components D s (D 2, D 4, and D 8) by evaluating the y t−1 and the input model structure values described in Table 1 for monthly streamflow forecasting of the Indus River and Chenab River (Figures 6a and 6b).

Table 1

Input model combinations on streamflow data sets

Model Input combination Model WD input combination
M 1 y t = f(z t−1, z t−2) WM1 WD t−1
M 2 y t = f(z t−1, z t−2, z t−3, z t−4) WM2 WD t−1, WD t−2
M 3 y t = f(z t−1, z t−2, z t−3, z t−4, z t−5, z t−6) WM3 WD t−1, WD t−2, WD t−3
M 4 y t = f z t 1 , z t 2 , z t 3 , z t 4 , z t 5 , z t 6 , z t 7 , z t 8 WM4 WD t−1, WD t−2, WD t−3, WD t−4
M 5 y t = f z t 1 , z t 2 , z t 3 , z t 4 , z t 5 , z t 6 , z t 7 , z t 8 , z t 9 , z t 10 WM5 WD t−1, WD t−2, WD t−3, WD t−4, WD t−5
M 6 y t = f z t 1 , z t 2 , z t 3 , z t 4 , z t 5 , z t 6 , z t 7 , z t 8 , z t 9 , z t 10 , z t 11 , z t 12 WM6 WD t−1, WD t−2, WD t−3, WD t−4, WD t−5, WD t−6
Figure 6 
                  (a) Flow structure of the study; (b) flowchart of proposed WMGMDH model.
Figure 6 
                  (a) Flow structure of the study; (b) flowchart of proposed WMGMDH model.
Figure 6

(a) Flow structure of the study; (b) flowchart of proposed WMGMDH model.

4 Results and discussions

4.1 Fitting GMDH model to the data sets

It is significant to pick the proper input series for the GMDH model and then use it; therefore, the six(M1−M6) input structures are suitable for the model used for streamflow data sets in two-tier training and testing phases. Significant and various studies have shown the positive results of the radial basis function (RBF) in the use of the AI models for the configuration and estimation of TS data sets [16,29,30]. Therefore, this study uses the RBF kernel function to control the streamflow modeling and TS forecast system. There are two parameters (γ, σ 2) used for the estimation of the GMDH model. These parameters are used to test the model more efficiently and systematically to address possible shortcomings in the estimation and trails method by the standard grid search procedure. The model performance has been analyzed to determine the parameter values by grid exploring γ  and σ 2 in the range of 10–1,000 and 0.01–1.0, respectively, and the cross-validation procedure is applied for calibrating parameters to evade over-fit. The k-fold cross-validation of the training phase for approximation forecasting error is given for each pair of hyper-parameter (γ, σ 2) in the search space. The forecasting best-fit algorithm is incorporated based on execution and performance standards.

The R, MAE, and RMSE statistical estimates for the training and testing phases from a combination of six input specimens (M1−M6) of the GMDH model are presented in Table 2 for the Indus and Chenab rivers. It is observed that the training and testing phases M6 and M5 input models R had the greatest estimate whereas the least estimations have for MAE and MSE; the testing estimations of M6 and M5 input combinations of R (0.9635 and 0.9547), MAE (0.0247 and 0.0290) and RMSE (0.0011 and 0.0015) shows better estimations for the model.

Table 2

Forecasting estimates of the GMDH

Indus River GMDH Chenab River GMDH
Input models Training data set Testing data set Training data set Testing data set
M1 0.6775 0.1071 0.0198 0.5355 0.0969 0.0151 0.9450 0.0357 0.0027 0.9193 0.0419 0.0029
M2 0.9688 0.0251 0.0013 0.9518 0.0304 0.0016 0.9578 0.0290 0.0019 0.9385 0.0360 0.0025
M3 0.9761 0.0185 0.0008 0.9630 0.0251 0.0011 0.9662 0.0263 0.0016 0.9455 0.0335 0.0020
M4 0.9725 0.0204 0.0011 0.9578 0.0279 0.0014 0.9796 0.0223 0.0012 0.9498 0.0317 0.0019
M5 0.9704 0.0227 0.0012 0.9553 0.0276 0.0014 0.9891 0.0182 0.0007 0.9547 0.0290 0.0015
M6 0.9780 0.0172 0.0007 0.9635 0.0247 0.0011 0.9743 0.0253 0.0015 0.9545 0.0324 0.0017

The bold values are illustrated the Significant outcomes of the model.

4.2 Fitting hybrid model wavelet GMDH to the data sets (WGMDH)

A new forecast streamflow algorithm has been developed with a crossbred thought that uses the wavelet and GMDH algorithms. The primary TS data sets have been decomposed into WD periodic components using Mallat's DWT formation before applying the GMDH algorithm. As chosen, the decomposition levels based on the perceived time series have been decomposed into wavelet components. The best decomposition of TS data sets in wavelet analysis plays a significant role in conserving and minimizing information distortion.

The number of decomposition levels is estimated at M = 2.6848  and  M = 2.7403 for the Indus and Chenab rivers using n = 484  and  n = 550 monthly streamflow in the formula (6). Still, the data sets decomposed on round-off value M = 3 (i.e. 2–4–8 months) decomposition levels used for both rivers estimations. The efficacy of WDs is measured by the correlation coefficients and analyzed between streamflow TS data sets and the various decomposition levels of wavelet coefficients. The measurement of the correlation coefficients of each TS WD and primary monthly streamflow data are described in Table 3. It is observed that the D 11 (2-months mode) has a low correlation coefficients value compared with  D 21 (4-month mode) and  D 31 (8-month mode) components, whereas these have well shown higher correlations. Subsequently, the correlated WDs have been well shown, such as D 2p ,  D 3p ,  and  A 3p (approximate mode) components added to each other for developing the new input series and that developed series used with the WGMDH model as GMDH inputs. The proposed model structure of WGMDH is presented in Figure 7, and Figure 8 shows the combinations of useful details and approximation components mode such as D 1p ,  D 2p ,  D 3p , A 3p , and (D 2p +  D 3p + A 3p ). The six variant specimens of the newly input series data described in Table 1 are used to forecast as in the previous GMDH model.

Table 3

The correlation coefficients between each sub-time series and the monthly streamflow data sets of Indus River and Chenab River

Discrete wavelet components D t−1/Q t D t−2/Q t D t−3/Q t D t−4/Q t D t−5/Q t D t−6/Q t Mean absolute correlation
Correlations coefficients for Indus River data set
D 2p 0.10354 0.09362 0.10051 0.07608 0.04751 0.03952 0.07679
D 3p 0.40653 0.50106 0.40350 0.39352 0.39902 0.38305 0.41445
A 3p 0.23205 0.20702 0.25505 0.27201 0.29056 0.31253 0.26154
Correlations coefficients for Chenab River data set
D 1p 0.05425 0.02674 0.00721 0.0075 0.02271 0.01922 0.02294
D 2p 0.16821 0.17573 0.14175 0.08525 0.03623 0.08573 0.11548
D 3p 0.41673 0.42220 0.43672 0.45021 0.45070 0.43271 0.43488
A 3p 0.12475 0.12975 0.13971 0.15123 0.16821 0.19622 0.151645
Figure 7 
                  Framework of the study.
Figure 7

Framework of the study.

Figure 8 
                  Wavelet decomposed combinations of effective details and approximation components mode of TS data sets of Indus and Chenab Rivers.
Figure 8

Wavelet decomposed combinations of effective details and approximation components mode of TS data sets of Indus and Chenab Rivers.

The computational codes for GMDH, WGMDH, LSSVM, and ANN models are written in the MATLAB software with the wavelet toolbox, which communicates the streamflow forecast and models estimations are presented in Tables 47 in the training and testing phases. The common critical approach concerning the machine learning model's achievement is the training and testing phases. The training phase is executed to the enhanced characteristic of the developed model algorithm while the testing phase estimations are used for the model validation and once the model is validated, then the training phase is eliminated from the testing phase [42,43,44]. The best statistical estimates are selected from the testing phase to appropriate forecast analysis; therefore, the testing phase estimations have the best forecast outcomes for developing the model. Further, the WGMDH model reveals the strong effects of the Indus and Chenab Rivers streamflow forecast, clearly described in Table 4. In view of the table analysis, the input model WM5 has estimated the most considerable and best for the Indus River; the largest and lowest values are R = 0.988 and MAE = 0.0117, RMSE = 0.0004 in the training phase, while the same input model has been observed that the highest and least optimal estimations as under R = 0.9859 and MAE = 0.0122, RMSE = 0.0002 in the testing phase. Similarly, in the table, the input combination WM4 has given the best estimate for Chenab River, with the largest and lowest values R = 0.9901 and MAE = 0.0119, RMSE = 0.0005 in the training phase, while the same input model has been opted for the highest and least estimations as under in the testing phase R = 0.9798 and MAE = 0.0127, RMSE = 0.0008.

Table 4

Forecasting results by WGMDH of Indus River and Chenab River

Indus River WGMDH Chenab River WGMDH
Input models Training data set Testing data set Training data set Testing data set
WM1 0.7673 0.1059 0.0191 0.7391 0.0296 0.0082 0.9491 0.0279 0.002 0.9392 0.0309 0.0019
WM2 0.8698 0.0217 0.0107 0.9415 0.022 0.0021 0.956 0.0237 0.0017 0.9431 0.0244 0.0015
WM3 0.9844 0.0158 0.0014 0.9601 0.0194 0.0022 0.9568 0.0206 0.0015 0.9593 0.0217 0.0013
WM4 0.984 0.0204 0.0007 0.9723 0.0208 0.001 0.9901 0.0119 0.0005 0.9798 0.0127 0.0008
WM5 0.988 0.0117 0.0004 0.9859 0.0122 0.0002 0.9815 0.0164 0.0016 0.9579 0.0171 0.0016
WM6 0.9757 0.0205 0.0009 0.9811 0.0211 0.0011 0.9676 0.0188 0.0011 0.9593 0.0211 0.0012

The bold values are illustrated the Significant outcomes of the model.

Table 5

Forecasting results by LSSVM of Indus River and Chenab River

Input models Indus River LSSVM Chenab River LSSVM
Training data set Testing data set Training data set Testing data set
M1 0.9026 0.0086 0.00025 0.868 0.0318 0.0109 0.9172 0.0421 0.0027 0.9041 0.0437 0.0042
M2 0.9032 0.0086 0.00026 0.7357 0.0279 0.0097 0.9266 0.0401 0.004 0.9126 0.0445 0.004
M3 0.9049 0.0086 0.00025 0.9161 0.0313 0.0109 0.9323 0.0393 0.0039 0.9196 0.043 0.0036
M4 0.907 0.0084 0.00027 0.893 0.0323 0.0115 0.9342 0.036 0.0036 0.9296 0.0414 0.0034
M5 0.9057 0.0085 0.00024 0.9217 0.0336 0.0102 0.9432 0.0345 0.0031 0.9408 0.0382 0.0027
M6 0.9145 0.0078 0.00019 0.9335 0.0247 0.0019 0.9418 0.036 0.0032 0.9467 0.0327 0.0021

The bold values are illustrated the Significant outcomes of the model.

Table 6

Forecasting results by ANN of Indus River and Chenab River

Input models Indus River ANN Chenab River ANN
Training data set Testing data set Training data set Testing data set
M1 0.8678 0.0114 0.022 0.8193 0.0301 0.1035 0.9147 0.0173 0.0367 0.8144 0.0783 0.1008
M2 0.8769 0.0124 0.0213 0.9195 0.0288 0.0922 0.9278 0.0219 0.0301 0.7956 0.0889 0.2934
M3 0.8536 0.0125 0.0231 0.7094 0.0487 0.165 0.9026 0.0261 0.0346 0.7842 0.0298 0.372
M4 0.7428 0.019 0.0297 0.6683 0.04 0.1059 0.8713 0.0271 0.0374 0.6945 0.2685 0.5161
M5 0.7216 0.0207 0.0307 0.8643 0.0318 0.0824 0.8102 0.0199 0.0307 0.6141 0.1515 0.1973
M6 0.8454 0.014 0.0237 0.8572 0.0306 0.0892 0.7009 0.0263 0.0332 0.5314 0.1259 0.2392

The bold values are illustrated the Significant outcomes of the model.

Table 7

Comparison of forecasting models

Models Monthly streamflow Forecasting analysis Monthly streamflow Forecasting analysis
WGMDH Indus River 0.9859 0.0122 0.0002 Chenab River 0.9798 0.0127 0.0008
GMDH 0.9635 0.0246 0.0011 0.9547 0.029 0.0015
LSSVM 0.9335 0.0247 0.0019 0.9467 0.0327 0.0021
ANN 0.9125 0.0288 0.0824 0.8144 0.0294 0.1008

4.3 Comparisons of forecasting models

The developed WGMDH model has been compared with GMDH, LSSVM, and ANN models based on performance and accuracy inspection by statistical estimations in the testing phase. The testing phase of Table 7 is reported that the least statistical estimations of MAE and RMSE have in the WGMDH model compared to other models; this is evidence of the valuable performance. Furthermore, the same model also provided authoritative evidence for estimating the most considerable value R; therefore, the WGMDH model is significant and better than the other models.

The correlation coefficient (R) estimated on Indus and Chenab Rivers for the GMDH model is 0.9635 and 0.9547, for LSSVM model is 0.9335 and 0.9467, for ANN model is 0.9125 and 0.9144 and for WGMDH model is 0.9859 and 0.9798 highest correlation value estimates and shows averagely 2.24% and 2.5% increased, respectively. It is a good sign for the developed model with the combination of WD and GMDH algorithms as compared to the conventional models in terms of model accuracy. In addition, the MAE and RMSE on Indus and Chenab Rivers data sets are statistically the smallest values in the forecasting models and Table 7 illustrated that the developed model had achieved the smallest value compared to the other models; this is again a valuable element for the model efficiency.

Figures 9 and 10 are demonstrated the predicted linear trend line scatter plotting for the WGMDH, GMDH, LSSVM, and ANN models from the testing phase and clearly is described the outperformance as relatively close to observed data on both streamflow stations by the WGMDH model. Therefore, the application of WGMDH in river streamflow forecast is better than that of conventional GMDH and other models. The scatter graphs are described by the regression fit line equation y = a + bx using MATLAB application, where a and b are the coefficients and defined for the GMDH, WGMDH, LSSVM, and ANN models, respectively. The WGMDH model has fewer scattered estimates, and the R-value of the WGMDH model has close 100 per cent (i.e. R = 0.9895 and R = 0.9798) for both Indus and Chenab data sets compared to the GMDH, WLSSVM, WANN and WMARS models. Therefore, this study concludes that the WGMDH model is superior to other models as it provides better forecast estimations for both streamflow stations.

Figure 9 
                  Predicted linear trend line scatter graphs for the Indus streamflow by WGMDH, GMDH, LSSVM and ANN models during the testing phase.
Figure 9

Predicted linear trend line scatter graphs for the Indus streamflow by WGMDH, GMDH, LSSVM and ANN models during the testing phase.

Figure 10 
                  Predicted linear trend line scatter graphs for the Chenab streamflow by WGMDH, GMDH, LSSVM, and ANN models during the testing phase.
Figure 10

Predicted linear trend line scatter graphs for the Chenab streamflow by WGMDH, GMDH, LSSVM, and ANN models during the testing phase.

5 Conclusions

The significant novelty of this research is that the new forecasting model on the two rivers of Pakistan has been developed by hybridizing two different meaningful algorithms and which has given tremendous performances, yielded presented in Table 4. The appropriate and best components of this model are the modified GMDH and WD.

The WD has filtered the monthly streamflow inputs into three (3) decomposition levels (2–4–8 months) have shown higher correlations and played a significant characteristic role in the primary data sets, as the most valuable information has been carried out after each level interpreted. After by WD algorithm have denoised and decoded the primary data sets and obtained the new input series “the sum of useful details (Ds) and the approximation (A) components (D 2p +  D 3p + A 3p )” of the WGMDH model and forecast estimations have been made. The WGMDH model has been used to forecast trained and tested phases through distinct new input combinations on monthly streamflow data sets. The research conducted comparative findings showed that the WGMDH model has an excellent estimated algorithm that increased accuracy forecasting than the conventional and modified models, clearly shown in Table 7 and Figures 9 and 10.

The key conclusion is that the TS forecasting of the WGMDH model has been delivered to be robust and efficient with great accuracy when the WD algorithm has been combined with the GMDH for the monthly streamflow pre-processing. The study of the relationships among the application of the dynamic hybrid algorithms and their operational strength in streamflow forecast is an impactable and an attractive research topic in the domain of TS streamflow estimations. Therefore, it is recommended to stakeholders that the water resources planning and management departments can use the proposed hybrid model based on the model’s performance to forecast various water operation policies.


This research work has been reinforced by the Department of Basic Sciences & Related Studies, Mehran University of Engineering & Technology, Jamshoro, Sindh, Pakistan. The authors have gratefully acknowledged the institute for its support and cooperation in the research activity and for providing a healthy research environment and facilities. The authors are grateful to the reviewers for their valuable suggestions.

  1. Funding information: N. Jarasthitikulchai would like to acknowledge the financial support by Navamindradhiraj University through the Navamindradhiraj University Research Fund (NURF).

  2. Author contributions: S. M. Pandhiani suggested the problem and guided throughout the manuscript, after authors discussions, W. A. Shaikh developed the algorithm, created a computationally coded algorithm, then obtained estimations, and wrote this manuscript. After that, S. F. Shah and M. A. Solangi supervised and critically examined the results with literature references, acknowledged the findings of this work encouraged the author W. A. Shaikh for conducting the research. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Conflict of interest: The authors state no conflict of interest.


[1] Amin R, Ahmad H, Shah K, Hafeez MB, Sumelka W. Theoretical and computational analysis of nonlinear fractional integro-differential equations via collocation method. Chaos Solitons Fractals. 2021;151:111252.10.1016/j.chaos.2021.111252Search in Google Scholar

[2] Khan MN, Ahmad I, Akgül A, Ahmad H, Thounthong P. Numerical solution of time-fractional coupled Korteweg–de Vries and Klein–Gordon equations by local meshless method. Pramana. 2021;95(1):1–13.10.1007/s12043-020-02025-5Search in Google Scholar

[3] Rezazadeh H, Jhangeer A, Tala-Tebue E, Hashemi MS, Sharif S, Ahmad H, et al. New wave surfaces and bifurcation of nonlinear periodic waves for Gilson-Pickering equation. Results Phys. 2021;24:104192.10.1016/j.rinp.2021.104192Search in Google Scholar

[4] Cimpoiasu R, Rezazadeh H, Florian DA, Ahmad H, Nonlaopon K, Altanji M. Symmetry reductions and invariant-group solutions for a two-dimensional Kundu–Mukherjee–Naskar model. Results Phys. 2021;28:104583.10.1016/j.rinp.2021.104583Search in Google Scholar

[5] Tariq H, Sadaf M, Akram G, Rezazadeh H, Baili J, Lv YP, et al. Computational study for the conformable nonlinear Schrödinger equation with cubic–quintic–septic nonlinearities. Results Phys. 2021;30:104839.10.1016/j.rinp.2021.104839Search in Google Scholar

[6] Baskonus HM, Gao W, Rezazadeh H, Mirhosseini-Alizamini SM, Baili J, Ahmad H, et al. New classifications of nonlinear Schrödinger model with group velocity dispersion via new extended method. Results Phys. 2021;31:104910.10.1016/j.rinp.2021.104910Search in Google Scholar

[7] Ahmad H, Alam MN, Omri M. New computational results for a prototype of an excitable system. Results Phys. 2021;28:104666.10.1016/j.rinp.2021.104666Search in Google Scholar

[8] Yokus A, Durur H, Kaya D, Ahmad H, Nofal TA. Numerical comparison of Caputo and Conformable derivatives of time fractional Burgers-Fisher equation. Results Phys. 2021;25:104247.10.1016/j.rinp.2021.104247Search in Google Scholar

[9] Javeed S, Anjum S, Alimgeer KS, Atif M, Khan MS, Farooq WA, et al. A novel mathematical model for COVID-19 with remedial strategies. Results Phys. 2021;27:104248.10.1016/j.rinp.2021.104248Search in Google Scholar PubMed PubMed Central

[10] Abouelregal AE, Ahmad H, Nofal TA, Abu-Zinadah H. Moore–Gibson–Thompson thermoelasticity model with temperature-dependent properties for thermo-viscoelastic orthotropic solid cylinder of infinite length under a temperature pulse. Phys Scr. 2021;96:105201.10.1088/1402-4896/abfd63Search in Google Scholar

[11] Adnan RM, Liang Z, Parmar KS, Soni K, Kisi O. Modeling monthly streamflow in mountainous basin by MARS, GMDH-NN and DENFIS using hydroclimatic data. Neural Comput Appl. 2020;7:1–19.10.1007/s00521-020-05164-3Search in Google Scholar

[12] Maheswaran R, Khosa R. Wavelets-based nonlinear model for real-time daily flow forecasting in Krishna River. J Hydroinform. 2013;15(3):1022–41.10.2166/hydro.2013.135Search in Google Scholar

[13] Kisi O. Wavelet regression model for short-term streamflow forecasting. J Hydrol. 2010;389(3–4):344–53.10.1016/j.jhydrol.2010.06.013Search in Google Scholar

[14] Matalas NC. Mathematical assessment of synthetic hydrology. Water Resour Res. 1967;3(4):937–45.10.1029/WR003i004p00937Search in Google Scholar

[15] Box GE, Jenkins GM, Reinsel G. Time series analysis: forecasting and control Holden-day San Francisco. Boxtime series analysis: forecasting and control holden day. John Wiley & Sons; 1970.Search in Google Scholar

[16] Ismail S, Shabri A, Samsudin R. A hybrid model of self organizing maps and least square support vector machine for river flow forecasting. Hydrol Earth Syst Sci. 2012;16(11):4417–33.10.5194/hess-16-4417-2012Search in Google Scholar

[17] Stefenon SF, Ribeiro MH, Nied A, Mariani VC, dos Santos Coelho L, da Rocha DF, et al. Wavelet group method of data handling for fault prediction in electrical power insulators. Int J Electr Power Energy Syst. 2020;123:106269.10.1016/j.ijepes.2020.106269Search in Google Scholar

[18] Saberi-Movahed F, Najafzadeh M, Mehrpooya A. Receiving more accurate predictions for longitudinal dispersion coefficients in water pipelines: training group method of data handling using extreme learning machine conceptions. Water Resour Manag. 2020;34(2):529–61.10.1007/s11269-019-02463-wSearch in Google Scholar

[19] Clemen RT. Combining forecasts: A review and annotated bibliography. Int J Forecast. 1989;5(4):559–83.10.1016/0169-2070(89)90012-5Search in Google Scholar

[20] Diebold FX, Lopez JA. 8 Forecast evaluation and combination. Handb Stat. 1996;14:241–68.10.3386/t0192Search in Google Scholar

[21] Timmermann A. Forecast combinations. Handb Econ Forecast. 2006;1:135–96.10.1016/S1574-0706(05)01004-9Search in Google Scholar

[22] Shaikh W, Shah S, Solangi M, Pandhiani S. Forecasting analysis of GMDH model with LSSVM and MARS models for hydrological data sets (Case study). Indian J Sci Technol. 2019;12:1–6. 10.17485/ijst/2019/v12i39/147941 Search in Google Scholar

[23] Muhammad M. Time series modeling using markov and arima models. PhD Thesis, MSc. Thesis. Faculty of Civil Engineering, University of Technology Malaysia; 2012.Search in Google Scholar

[24] Pandhiani SM, Shabri AB. Time series forecasting using wavelet-least squares support vector machines and wavelet regression models for monthly stream flow data. Open J Stat. 2013;3(03):183.10.4236/ojs.2013.33021Search in Google Scholar

[25] Krishna B, Rao YS, Nayak P. Time series modeling of river flow using wavelet neural networks. J Water Resour Prot. 2011;3(01):50.10.4236/jwarp.2011.31006Search in Google Scholar

[26] Smith LC, Turcotte DL, Isacks BL. Stream flow characterization and feature detection using a discrete wavelet transform. Hydrol Process. 1998;12(2):233–49.10.1002/(SICI)1099-1085(199802)12:2<233::AID-HYP573>3.0.CO;2-3Search in Google Scholar

[27] Esfetanaj NN, Nojavan S. The use of hybrid neural networks, wavelet transform and heuristic algorithm of WIPSO in smart grids to improve short-term prediction of load, solar power, and wind energy. Operation of distributed energy resources in smart distribution networks. Elsevier; 2018. p. 75–100.10.1016/B978-0-12-814891-4.00004-7Search in Google Scholar

[28] Pandhiani SM, Shabri AB. Time series forecasting by using hybrid models for monthly streamflow data. Appl Math Sci. 2015;9(57):2809–29.10.12988/ams.2015.52164Search in Google Scholar

[29] Al Wadi S, Ismail MT, Karim SAA. Forecasting financial time series data base on Wavelet transforms and Neural network model.Search in Google Scholar

[30] Daubechies I. Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math. 1988;41(7):909–96.10.1137/1.9781611970104.ch6Search in Google Scholar

[31] Wang W, Ding J. Wavelet network model and its application to the prediction of hydrology. Nat Sci. 2003;1(1):67–71.Search in Google Scholar

[32] Shao X-G, Leung AK-M, Chau F-T. Wavelet: a new trend in chemistry. Acc Chem Res. 2003;36(4):276–83.10.1021/ar990163wSearch in Google Scholar

[33] Mirbagheri SA, Nourani V, Rajaee T, Alikhani A. Neuro-fuzzy models employing wavelet analysis for suspended sediment concentration prediction in rivers. Hydrol Sci J–Jo des Sci Hydrol. 2010;55(7):1175–89.10.1080/02626667.2010.508871Search in Google Scholar

[34] Kişi Ö. Wavelet regression model as an alternative to neural networks for monthly streamflow forecasting. Hydrol Proces An Int J. 2009;23(25):3583–97.10.1002/hyp.7461Search in Google Scholar

[35] Mallat SG. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell. 1989;11(7):674–93.10.1515/9781400827268.494Search in Google Scholar

[36] Yonaba H, Anctil F, Fortin V. Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting. J Hydrol Eng. 2010;15(4):275–83.10.1061/(ASCE)HE.1943-5584.0000188Search in Google Scholar

[37] Walton R, Binns A, Bonakdari H, Ebtehaj I, Gharabaghi B. Estimating 2-year flood flows using the generalized structure of the group method of data handling. J Hydrol. 2019;575:671–89.10.1016/j.jhydrol.2019.05.068Search in Google Scholar

[38] Safari MJS, Ebtehaj I, Bonakdari H, Es-haghi MS. Sediment transport modeling in rigid boundary open channels using generalize structure of group method of data handling. J Hydrol. 2019;577:123951.10.1016/j.jhydrol.2019.123951Search in Google Scholar

[39] Wang W-C, Chau K-W, Cheng C-T, Qiu L. A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. J Hydrol. 2009;374(3–4):294–306.10.1016/j.jhydrol.2009.06.019Search in Google Scholar

[40] Firat M. Comparison of artificial intelligence techniques for river flow forecasting. Hydrol Earth Syst Sci. 2008;12(1):123–39.10.5194/hess-12-123-2008Search in Google Scholar

[41] Ismail S, Shabri A, Samsudin R. A hybrid model of self-organizing maps (SOM) and least square support vector machine (LSSVM) for time-series forecasting. Expert Syst Appl. 2011;38(8):10574–78.10.1016/j.eswa.2011.02.107Search in Google Scholar

[42] Yu P-S, Chen S-T, Chang I-F. Support vector regression for real-time flood stage forecasting. J Hydrol. 2006;328(3–4):704–16.10.1007/978-3-540-79881-1_26Search in Google Scholar

[43] Uçar MK, Nour M, Sindi H, Polat K. The effect of training and testing process on machine learning in biomedical datasets. Math Probl Eng. 2020;2020. in Google Scholar

[44] Shaikh WA, Shah SF, Pandhiani SM, Solangi MA. Wavelet decomposition impacts on traditional forecasting time series models. Computer Modeling in Engineering & Sciences. Tech Science Press. 2022;130(3):1517–32. 10.32604/cmes.2022.017822 Search in Google Scholar

Received: 2021-11-14
Revised: 2022-01-28
Accepted: 2022-03-28
Published Online: 2022-11-02

© 2022 Wajid Ali Shaikh et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 30.11.2023 from
Scroll to top button