Hybridizing the Cuckoo Search Algorithm with Different Mutation Operators for Numerical Optimization Problems

Abstract The Cuckoo search (CS) algorithm is an efficient evolutionary algorithm inspired by the nesting and parasitic reproduction behaviors of some cuckoo species. Mutation is an operator used in evolutionary algorithms to maintain the diversity of the population from one generation to the next. The original CS algorithm uses the Lévy flight method, which is a special mutation operator, for efficient exploration of the search space. The major goal of the current paper is to experimentally evaluate the performance of the CS algorithm after replacing the Lévy flight method in the original CS algorithm with seven different mutation methods. The proposed variations of CS were evaluated using 14 standard benchmark functions in terms of the accuracy and reliability of the obtained results over multiple simulations. The experimental results suggest that the CS with polynomial mutation provides more accurate results and is more reliable than the other CS variations.

The CS algorithm [42] is an NIO algorithm that simulates the nesting and parasitic reproduction behaviors of some cuckoo species. It starts its evolution process with n candidate solutions (eggs). At each generation of the CS algorithm, a cuckoo's egg (i.e. a random candidate solution) is selected using a uniformly random function in an attempt to improve it using a mutation-like function based on the Lévy flight operator. The algorithm then replaces an egg of a host bird (i.e. another candidate solution selected randomly) from the population of solutions with the cuckoo's egg, only if the objective value of the cuckoo's egg is better than the objective value of the egg of the host bird. Finally, the CS algorithm replaces a portion of the worst eggs with randomly generated eggs.
Many research studies in the literature have shown the efficiency of Lévy flight as a mutation and exploration method for real-world optimization problems [2,5,13,34,[42][43][44]. The goal of the current research study is to evaluate the performance of CS after replacing the Lévy flight based update function with different mutation methods adopted from different evolutionary algorithms.
Mutation operators are mainly used to avoid getting trapped in local optima when solving a given optimization problem [26]. Although the mutation concept was first introduced in the GA algorithm, several evolutionary algorithms, such as HS and CS, apply special mutation operators to produce new solutions or alter some features of a candidate solution.
The main purpose of this paper is to investigate the suitability and efficiency of seven mutation methods as replacements for the Lévy flight based update function in CS, namely, random, boundary, non-uniform, MPT (Makinen, Periaux and Toivanen), power, polynomial, and pitch adjustment mutation methods [26,31]. These mutation methods were modified to be applicable to CS, as described in Section 4. The proposed seven variations of CS were evaluated using a set of 14 popular benchmark functions in terms of the accuracy and reliability of the obtained results over multiple simulations. The obtained results indicate that CS with polynomial mutation provides more accurate and stable results than the other algorithms for a considerable number of the benchmark functions.
The rest of the paper is organized as follows: the CS algorithm is explained in Section 2, the related work to CS is overviewed in Section 3, Section 4 presents the different mutation methods, Section 5 presents the experimental results, and finally, Section 6 presents the conclusion of this paper.

CS Algorithm
Inspired by the parasitic breeding behavior of some cuckoo species and the Lévy flight behavior of birds, Yang and Deb [42] formulated and introduced the CS optimization algorithm. The inhabitants of some cuckoo species commonly rely on other birds (host birds) to raise their baby birds. This phenomenon is known as a parasitic reproduction strategy in which a cuckoo bird lays its eggs on a nest of a host bird, taking the risk that the host bird may discover the cuckoo's eggs. If the host bird discovers the cuckoo's eggs, it rejects the eggs (discards the eggs or abandons its nest); otherwise, the host bird tends to the cuckoo's egg as if it were its own. The CS algorithm attempts to solve a wide range of optimization problems based on two simulation models: a model that simulates the parasitic reproduction strategy of some cuckoo species and another model that simulates the Lévy flight behavior of some bird species [16,35,42].
According to the terminologies of CS, a candidate solution is called an egg and a new solution is called a cuckoo's egg. The CS algorithm attempts to generate new solutions (cuckoos' eggs) using the Lévy flight operator for potential replacement with lower quality solutions in the population (eggs in the nests). The CS algorithm depends mainly on two parameters, the population size n (i.e. the number of candidate solutions) and the fragment of the candidate solutions that are going to be discarded at each iteration of CS (p a ∈ [0, 1]). The following notations and assumptions formulate the mathematical model of the original CS algorithm: -A candidate solution X = ⟨x 1 , ..., x m ⟩ is a vector of m decision variables, where each variable x i is in the range [LB i , UB i ]. LB i is the lower bound and UB i is the upper bound of the i th solution. -The value of a decision variable x i is randomly generated using the function where U(0, 1) is a uniform random number in the range (0, 1). -The fitness value of the candidate solution X is f (X).
-Th population size is denoted as n, which is a fixed number that represents the number of candidate solutions. An augmented matrix of size n × m is used to represent the population of n candidate solutions: -The CS algorithm replaces a fraction p a ∈ [0, 1] of the population (solutions with the lowest fitness values) with new randomly generated solutions at the last step of each generation of CS. -At the end of each generation of CS, the best candidate solutions are moved to the next generation of CS.
The CS algorithm starts by generating a population of n nominated solutions X i (i = 1, 2, . . ., n) from the range of potential solutions of the objective function f (X i ). In addition, the termination condition or the maximum number of iterations (MaxItr) of the algorithm is specified based on the optimal value/s of f (X i ).
The CS algorithm attempts to improve a candidate solution (X t i ) at generation t + 1 using a Lévy flight mutation method to produce a new solution (X t+1 i ) as follows: where β > 0 is the distance parameter that determines the distance of mutation and the symbol ⊕ represents the entry-wise product operator. The Lévy flight is a special random walk in which the step length is a random number generated from a heavy tailed Lévy distribution with a power law: where D is the step size and α ∈ (1, 3] is a parameter related to fractal dimension (i.e. a ratio that provides a statistical index of complexity and that compares details in a fractal pattern). An interesting property of the Lévy distribution function is that it has infinite variance [42]. This property indicates that some of the generated random solutions will be close to the local optima, while a large percentage of the generated solutions will be random solutions far from the local optima. This means that the Lévy flight is more efficient in exploring the search space than the random walk method because the distance value in the Lévy flight method increases over the course of generations.

Related Work
Most research studies that have attempted to improve the performance of CS have focused on hybridizing the CS algorithms with other search techniques or evolutionary operators. Kanagaraj et al. [28] incorporated the evolutionary operators (crossover and mutation) of the GA into CS. The new algorithm, which is called the hybrid CS and GA (HCSGA) algorithm, is used to provide solutions for constrained engineering design optimization problems. The crossover and mutation operators are used at each iteration of CS to balance the ratio of exploration and exploitation. According to the simulation results in ref. [28], HCSGA provides good performance compared to CS, PSO, and Global convergence PSO (GCPSO). Kanagaraj et al. [27] designed another hybrid CS and GA algorithm (CS-GA) to solve the reliability-redundancy allocation problem (RAP). RAP is an optimization problem that involves finding an optimal allocation plan for redundant subsystems subject to a set of predefined resource constraints. The simulation results in ref. [27] suggest that CS-GA provides a competitive performance compared to popular optimization algorithms [e.g. CS, artificial bee colony (ABC) [38], HS]. Saraswathi et al. [36] introduced a hybrid CS and BA algorithm. The hybrid CS and BA algorithm was designed to find an optimal path in the mobile robot path planning problem. Wang et al. [41] introduced a new hybrid CS algorithm called CS/HS, which incorporates the pitch adjustment operator of the HS algorithm into the CS algorithm to speed up the convergence speed of CS. The simulation results in ref. [41] suggested that CS/HS outperforms many standard optimization algorithms. Wang et al. [40] introduced a hybrid optimization algorithm called CSKH, which uses the update rule and abandon method of CS inside the improvement loop of KHA. In another related work, Wang et al. [39] proposed the Lévy flight Krill herd (LKH) algorithm, which incorporates the Lévy flight operator into the KHA algorithm to improve its global search ability. LKH was experimentally proven to be an efficient algorithm, according to ref. [39]. However, CSKH and LKH have complex structures and require more computations compared with KHA and CS. It is important to note that, in general, the hybrid CS algorithms have complex structures and require more computational time than standard optimization algorithms do (e.g. CS, GA, ABC, HS, BA, KHA). The established efficiency and usability of the mutation methods in the GA algorithm have encouraged many researchers to adopt the use of several mutation methods with evolutionary algorithms such as the HS and CS algorithms. These algorithms apply special mutation operators to produce new solutions or alter some features of a candidate solution. The random, boundary, non-uniform, MPT, power, polynomial, and pitch adjustment mutation methods were used to enhance the performance of the ABC algorithm [22]. The same mutation methods were also used to improve the efficiency of the HS algorithm [26].

Mutation Operators
The original CS algorithm applies the Lévy flight method to explore the search space. This method alters the values of the decision variables of a candidate solution based on equation (2). This paper introduces seven variations of the CS algorithm using different mutation methods. Each one of the proposed variations replaces the Lévy flight method with a unique mutation method.
Several mutation operators can be used with the evolutionary algorithms based on the representation of the optimization problems. In this section, the mutation operators that are suitable for continuous optimization problems will be presented. These mutation operators have been used with the GA [31], HS [26], and ABC [22] algorithms. The experimental results in refs. [22,26] indicate that most of the mutated optimization algorithms perform better than the original algorithms do. The efficiency of a mutation operator is normally affected by the type of the optimization problem and the strength of the population (i.e. how far is the mutated solution from the initial population) [26].

Random Mutation
The random mutation method is a basic mutation method that is usually used in GAs [26,31]. In this method, the value of a gene (decision variable) of a chromosome (candidate solution) is replaced with a randomly generated value from the range of possible solutions of the gene as follows: where U(0, 1) is a uniform random number between 0 and 1. The authors in refs. [5,42] suggested that the Lévy flight method is more efficient in exploring the search space than the random mutation method. However, the random mutation method was not used in refs. [5,42] as a mutation method for the CS. It was actually used as a mutation method for the GA algorithm. Therefore, it would be interesting to evaluate the performance of CS after replacing the Lévy flight method with the random mutation method.

Boundary Mutation
The boundary mutation method is a mutation method that is traditionally used with evolutionary algorithms for integer or floating point decision variables [26,31]. A new solution can be generated using this method by randomly replacing the value of a decision variable of a candidate solution with its lower or upper bound (x i = LB i or x i = UB i ). The purpose of the boundary mutation method is to extend the region of the global search, as suggested in ref. [30].

Non-Uniform Random Mutation
The non-uniform random mutation is a well-known mutation method that is commonly used in GAs to overcome the disadvantages of random mutation [26,31,32]. The probability that the amount of mutation will approach zero by the end of the evolution process increases with the non-uniform random mutation method. It helps the population of solutions evolve in the early stages of the evolution process, which helps to avoid local optima. It also helps in increasing the accuracy of the solutions by finely tuning the solutions at the final stages of the evolution process.
Given a candidate solution X t = ⟨x 1 , ..., x m ⟩ at generation t, if the decision variable x k was selected from X t for non-uniform random mutation, the result is the vector where LB i is the lower bound and UB i is the upper bound of x k . The value returned from ∆(t, y) is in the range [0, y] such that the probability of ∆(t, y) approaches zero with each increase in the number of generations t. This property means that the search process will be uniform when t is small (early generations) but will get closer to local values over the course of iterations. The following function has been used in the experimental section of the paper to calculate ∆(t, y): where U(0, 1) is a random number between 0 and 1, maxt is the maximum number of generations, and b is a parameter that determines the degree of dependency on the generation number (b was set to 1 and 5 in the experimental section, as suggested in ref. [32]). The above function aims to increase the probability of producing new values near the previously produced values instead of generating random values [26,31,32].

MPT Mutation
MPT mutation was proposed by Makinen et al. [37]. It is a mutation method that has been used to solve several types of optimization problems such as the multidisciplinary shape optimization problems and optimization problems with a constrained nature [22]. A decision variable x i can be mutated using MPT to generate a new variable x′ i as follows: -uniformly generate a random number and where LB i is the lower bound and UB i is the upper bound of x i . It is worth pointing out that the efficiency of the MPT mutation method is not affected by the generation number, unlike the non-uniform random mutation method. In the experimental section, the value of b for the MPT method was set to 1 and 5, as suggested in ref. [37].

Power Mutation (PM)
Deep and Thankur [20] introduced the PM method, which is based on power distributions. The function of the power distribution is represented as follows: and the density function is as follows: where p represents the index of the distribution. Given a decision variable x, the PM method generates a new solution x′ according to the following steps: -Generate a new random number r between 0 and 1.
-Create a random number s based on the final distribution.
-Calculate t as follows: where LB is the lower bound and UB is the upper bound of x. -Create a new solution x′ as follows: In the experimental section, the value of b for the PM method was set to 0.25 and 0.5, as suggested by Deep and Thankur in ref. [20].

Highly Disruptive Polynomial (HDP) Mutation
The polynomial mutation method suggested by Deb and Agrawal in ref. [17] may get trapped in a local optima when the value of a decision variable that is to be mutated is near one of its boundaries. HDP mutation [18] is an enhanced variation of the polynomial mutation. The value of a decision variable x i can be mutated using the HDP method, as described in the algorithm in Figure 1. In Figure 1, r is a random number between 0 and 1, P m is the probability of mutation, LB i is the lower boundary of x i , UB i is the upper boundary of x i , δ 1 is the difference between x i and LB i divided by UB i − LB i , δ 2 is the difference between UB i and x i divided by UB i − LB i , and η m is a non-negative number that represents the distribution index. An advantage of the HDP method over the polynomial method is that it can sample the whole search space of the decision variable even if the variable's value is near to one of its boundaries.

Pitch Adjustment
The pitch adjustment method is a special mutation method used in the HS algorithm [25,29]. Given a vector of m decision variables (a candidate solution) X t = ⟨x 1 , ..., x m ⟩ at generation t, the pitch adjustment method mutates the value of each decision variable with probability PAR ∈ (0, 1): If PAR is greater than or equal to a randomly generated number (r ∈ [0, 1]), the value of x′ i will be mutated to one of its neighboring values as follows: is a parameter that determines the distance of mutation and U(−1, 1) is a random number in the range (−1, 1).

Worst Case Analysis of Computational Complexity of Mutation Operators
The purpose of this section is to discuss the computational complexity of each mutation operator discussed in Section 4 for a single iteration of CS. In the following analysis, it is assumed that a mutation operator is applied to all decision variables. However, in practice, it is applied to one or only a few decision variables.
Assume a candidate solution with m decision variables Applying the Lévy flight operator to a single variable costs four operations (1 -fetching the values of the Lévy distribution, 2 -performing the entry wise product with the Lévy distribution, 3 -adding the results of 2 to X t i , 4 -adding the results of 3 to X t+1 i ). Thus applying it to a vector of m solutions [equation (2)] costs 4m operations. This means that the computational complexity of the Lévy flight operator is O(m). Similarly, the computational complexity of random mutation, boundary mutation, nonuniform random mutation, MPT mutation, power mutation, HDP mutation, and pitch adjustment mutation is also linear in the number of decision variables. For example, no significant differences in the runtime have been observed when the mutation operators with CS have been used to solve two test functions from Table 1: De Jong's first function and shifted sphere function (iterations = 1000, runs = 50, dimension = 50). The average run time for each mutation operator with CS for the two test function was approximately 2000 ms.

CS with Mutation
This section shows how the Lévy flight method in the original CS algorithm can be replaced with one of the seven different mutation methods discussed in Section 4.  Simple modifications should be conducted to incorporate one of the seven mutation operators into CS, as shown in Figure 2 (lines 4 and 6). In line 4, one of the mutation operators discussed in Section 4 should be selected before the beginning of the improvement loop of CS (lines 5-16). In line 6, the algorithm attempts to improve a candidate solution (X t i ) at generation t + 1 using the selected mutation operator to produce a new solution (X t+1 i ). Note that the original CS algorithm uses the Lévy flight mutation method at line 6 to produce a new solution at each iteration of the improvement loop.

Benchmark Functions
In this section, 11 variations of CS (see Table 2) were compared and evaluated using 14 benchmark functions (see Table 1). This set of benchmark functions has been used in the literature to evaluate evolutionary algorithms with mutation methods [22,26]. The threshold values in CS4 and CS5, CS6 and CS7, CS8 and CS9 were used in refs. [20,22,26] to evaluate mutated variations of the HS and ABC algorithms.

Setup
The experiments were conducted using an Intel Core i5 6th Gen CPU (3.4 GHz) with 8-GB RAM running macOS 10.13, High Sierra (Cupertino, CA, USA). All of the algorithms were implemented using the Java programming language.
All the CS variations used the same parameter settings: n = 10, p a = 0.25, and the mutation rate r = 0.05. These values are based on those used in refs. [2,5,13,42,43]. Tables 3 and 4 show the final objective value for each of the 14 benchmark functions used in these experiments. The obtained values are in the following format: mean of the best obtained objective values (first row), standard deviation (second row), and error value (third row) for 50 independent runs. This format of results was used by the researchers in refs. [2,5,13] to show and analyze the simulation results of optimization algorithms. In this section, an algorithm is said to have outperformed the other algorithms when it achieves the lowest mean and error values among all of the algorithms over 50 independent runs. Table 3 shows the simulation results of the 30D (30 decision variables) problems. The results in Table 3 show that CS10 (CS with polynomial mutation) achieved the best objective values for 7 of the 14 functions. CS1 (CS with Lévy flight) is the second best performing algorithm with 4 functions out of 14, followed by CS11 (CS with pitch adjustment), which achieved the best objective values for 2 functions out of 14. It is worth noting that CS3 (CS with boundary mutation) performed better than the other CS variations for the six-hump camelback function. However, CS3 did not perform well for many of the other functions, which may be because CS3 randomly replaces the value of a decision variable with the lower or upper bound of the decision variable rather than slightly modifying the best known values.

Comparison Results of Variations of CS
The cuckoo variation CS10 performed even better (Table 4) when the problem size was increased to 50 decision variables (best results for 8 out of 14 functions). This observation indicates that CS with polynomial mutation has robust performance even when the problem complexity increases. This is expected because the polynomial mutation can sample the whole search space of the decision variable even if the variable's value is near to one of its boundaries, whereas most of the other algorithms get trapped closer to the middle.
The overall results in Tables 3 and 4 indicate that the performance of C10 improves with the increase of the problem complexity. In addition, CS10 has the lowest standard deviation for most of the functions, which means that its performance is more stable over multiple runs compared to the performance of the other CS variations.
The superior performance of CS10 is possibly due to the fact that it uses the HDP mutation, which can sample the whole search space of the decision variable even if the variable's value is near to one of its boundaries. The simulation results suggest that CS2 (CS with random mutation) can easily get stuck in local optima in an early stage of the evolution process. However, CS2 performs much better than CS3 (CS with boundary mutation), which is possibly because CS3 uses the boundary mutation method that moves the search process of CS3 to the boundaries of the search range without consideration of the current value. The results also indicate that the variations of CS that use the non-uniform random mutation (CS4 and CS5) lose their strength over the course of iterations. The rest of the variations of CS [CS with MPT mutation (CS6 and CS7), CS with power mutation (CS8 and CS9), and CS with pitch adjustment (CS11)] all provide better results than CS with Lévy flight. Figures 3-7 show the convergence behavior of five functions (f 1 , f 2 , f 8 , f 12 , f 14 ) over 1000 iterations (with 50D size). In all of the figures, CS10 converges to a solution more quickly compared to the other CS variations. In contrast, CS3 has the worst convergence rate compared to the other CS variations.

Statistical Test Results
As described in ref. [21], nonparametric statistical tests are the recommended methodology for comparing evolutionary algorithms. The Friedman test [23] has been used to test if the means of the 11 variations of CS

4.26E+02
The best results in the table are marked in bold.  are equal to determine if there exist at least two means that are significantly different from each other. The tested hypotheses are as follows:  Tables 5 and 6 show results for the Conover post hoc test [15] for the different dimensions (Table 5: D = 30 and Table 6: D = 50). As can be seen from Table 5, for 30 dimensions, the results from CS3, CS4, CS9, and CS10 are significantly (p < 0.05) better than those for CS1. Similarly, from Table 6, the results from all variations except for CS7, CS8, and CS9 are significantly (p < 0.05) better than those for CS1. Combined with the analysis from the previous section, this suggests that CS10 is significantly better than CS1 while also converging more quickly. This suggests that the polynomial mutation could be a better mutation method than the Lévy flight method.

Comparison Results of CS with Other Algorithms
In this section, the CS algorithm was compared with the HS algorithm [26] and the ABC algorithm [22]. The algorithms were compared using eight different functions, F 1 , F 3 , F 6 , F 10 , F 11 , F 12 , F 13 , F 14 for the following mutation methods: random mutation (M 1 (random mutation), M 2 (non-uniform random mutation with b = 1), M 3 (non-uniform random mutation with b = 5), M 4 (MPT mutation with b = 1), M 5 (power mutation with b = 0.25), and M 6 (polynomial mutation with b = 0.5)). The results for the ABC algorithm and the HS algorithm were taken from refs. [22,26], respectively. Table 7 shows the mean values of the objective values of CS, ABC, and HS over 50 independent runs for 100D problems. The results are in the following format: CS results in the first row, HS results in the second row, and ABC results in the third row for each function. The table illustrates that the variations of CS show competitive performance compared to the variations of ABC and HS. However, the variations of the ABC algorithm outperform the other algorithms for five functions out of eight. Nevertheless, the obtained results for the three algorithms confirm that the performance of the optimization algorithms is affected by the type of the chosen mutation method.    The mean of 50 runs is shown. CS results are in the first row for each function.
The best results are marked with bold.

Conclusion
The current paper presented new variations of CS using different mutation methods. Extensive simulations were conducted using a set of 14 well-known benchmark functions to evaluate the performance of the proposed variations of CS. CS with polynomial mutation was found to be more accurate and stable than the other algorithms for a notable number of the benchmark functions. Future work will be directed towards implementing different selection schemes to CS with polynomial mutation instead of the currently used random selection method. Future work also includes hybridizing CS with polynomial mutation and the simulated annealing algorithm [13]. Furthermore, it would be interesting to see how CS with polynomial mutation performs in practice when it is applied to cooperative Q-learning [1,[9][10][11], as described in refs. [2,3].