## 1 Introduction

A hyperspectral remote sensor scans ground objects with a spectrum covering visible to infrared region, and generally records the spectral signature with hundreds of narrow bands, thus bringing great opportunities for the quantitative analysis of remote sensing. However, the hyperspectral imagery (HSI) poses a major challenge for data storage, management, processing, and analysis because of its massive data quantities and the serious data redundancy among the bands with high correlations [1]. For example, the “Hughes phenomenon” will be faced when classifying land use/cover types with HSI, where a limited number of training samples are usually collected [2]. It is, therefore, of paramount importance, to reduce the data redundancy and data dimension, so as to facilitate the analysis of hyperspectral remote sensing. Generally, dimensionality reduction methods can be divided into two categories: feature extraction and feature selection. The feature extraction techniques include unsupervised approaches, such as principal component analysis (PCA) and independent component analysis (ICA) [3,4,5], and supervised approaches, such as Fisher’s linear discriminant analysis (LDA) [6]. But all these feature extraction methods are not suitable for dimensionality reduction of HSI [7]. For example, PCA, which often uses an ideal projection method to linearly transform the original high-dimensional data into a low-dimensional feature space, is incapable of drawing distinctions between patterns [8]; ICA assumes that the observed signals are statistically independent of each other and that the observed signal vector is a linear mixture of these separate independent components, but such a decorrelation assumption cannot be satisfied with HSI [9]; LDA, which is similar to PCA, fails when the class-conditional distributions are not Gaussian [10]. In recent years, because of its good performance on the preservation of primitive physical interpretability [11], feature selection has received increasing attention from researchers in the field of remote sensing. This type of method tries to pick the most representative subset from a large number of HSI features to maintain acceptable classification accuracy [12]. The general process of feature selection consists of the following basic steps: the generation procedure, evaluation function, stopping criterion comparison, and validation procedure [13]. The generation procedure searches for a feature subset for evaluation, which is the basis of the feature selection model. To date, many subset generation procedures have been proposed in relevant studies, which can be generally classified into three types: full search, heuristic search, and random search. Breadth-first search (BFS), which is a typical full-search method, is unpractical, because it enumerates exhaustively all of the possible combinations of features and results in a high time complexity. The heuristic search method reduces the search space with heuristic information, whose order of the search space is quadratic in terms of the number of features. Greedy algorithms (e.g., sequential forward selection (SFS), sequential backward selection (SBS)), plus-L minus-R selection (LRS), sequential floating forward selection (SFFS), and sequential floating backward selection (SFBS)) are widely used heuristic search methods [14,15]. Generally, SFFS and SFBS have better performance than others that are similar [14]. The SFFS-based feature selection algorithm selects the optimal feature subset in a two-stage selection process. Firstly, a new feature is appended to the previously selected feature subset, and secondly, a feature is discarded to achieve the maximum value of the objective function [16,17].

The ant colony algorithm (ACA) is an emerging heuristic search algorithm, and its effectiveness has been proved in the field of HSI band selection [18]. The random search methods start the procedure with some of the feature subsets initialized randomly. Typical random search methods including simulated annealing (SA), genetic algorithm (GA), particle swarm optimization (PSO), and the clone selection algorithm (CSA) have been widely used in feature selection [19,20]. Among the above-mentioned three types of generation procedures, ACA, GA, PSO, and CSA are typical swarm intelligence algorithms, which often have excellent ability for self-organization, self-learning, or self-memory.

Optimal feature subset selection for HSI is a typical NP-hard problem [12], which should be solved with a proper search procedure [21]. Thus, swarm intelligence algorithms, such as ACA, GA, PSO, and CSA, have been widely employed in the optimized feature selection of the original hyperspectral data [22]. Many studies have shown that the swarm intelligence algorithms have better performance than other search algorithms in HSI feature selection. The performance of ACA was compared with that of SFFS and was also compared with that of GA [1,8,13,23]. Zhong et al. compared CSA and its improved version with SFS for HSI band selection [24]. The performance of PSO for feature selection has also been compared with those of GA, SFS, and other search algorithms [15,25,26]. However, the performance of swarm intelligence algorithms for HSI feature selection has not been investigated systematically.

In this study, the typical swarm intelligence algorithms were used for selecting the optimal band subset from HSI, and their performance in terms of overall classification accuracy and average runtime was compared and analyzed. To make a comprehensive comparison, the SFFS, one of the most effective greedy search methods, was also included as a benchmark comparator to further verify the effectiveness of the swarm intelligence algorithms. Thus, a total of five algorithms (including ACA, GA, PSO, CSA, and SFFS) were compared on two public hyperspectral datasets (The Indian Pines and Pavia University datasets). The motivation of this paper is to provide a reference for the selection of dimensional reduction methods and algorithm improvement to guide future research.

## 2 Materials and methods

### 2.1 Data source

In our experiment, two public hyperspectral datasets were used to compare the performances of the above-mentioned algorithms. The first site covered by mixed vegetation is located in the Indian Pines test site in northwest of Indiana (Indian Pines dataset). The second site is an urban site over the Pavia University, northern Italy (Pavia University dataset).

The Indian Pines dataset (Indian dataset) was acquired by the airborne visible/infrared imaging spectrometer (AVIRIS) sensor on June 12, 1992 (Figure 1a). The imagery is composed of 224 bands, with a wavelength ranging from 0.4 to 2.5 µm. The 10 nm spectral resolution provides refined discrimination among similar land covers. After removal of the bands that were seriously influenced by moisture absorption, there were 200 bands remaining in the Indian dataset. Thus, the size of the dataset is 145 × 145 × 200. The ground truth for this area consisted of 16 land cover types (Figure 1b). As commonly used in previous literature [27,11], nine land cover types that have enough samples of ground truth were chosen from the 16 categories in this experiment. The training and testing samples of these classes, derived from the ground truth map, are shown in Table 1.

Land cover classes and the number of training and testing samples in the Indian dataset

Land cover type | No. of training samples | No. of test samples | Total |
---|---|---|---|

C1. Corn-min | 415 | 415 | 830 |

C2. Corn-notill | 714 | 714 | 1,428 |

C3. Grass/pasture | 241 | 242 | 483 |

C4. Grass/tree | 365 | 365 | 730 |

C5. Hay-windrowed | 239 | 239 | 478 |

C6. Soybeans-clean | 296 | 297 | 593 |

C7. Soybeans-min | 1,227 | 1,228 | 2,455 |

C8. Soybeans-notill | 486 | 486 | 972 |

C9. Woods | 632 | 633 | 1,265 |

Total | 4,615 | 4,619 | 9,234 |

The original Pavia University dataset (Pavia dataset) with 340 × 610 pixels was gathered by the reflective optics system imaging spectrometer (ROSIS) sensor in 2002 (Figure 1c). The dataset contains 115 bands over 0.43–0.86 µm range of spectrum. The high spatial resolution of 1.3 m per pixel aims to avoid a high fraction of mixed pixels. The preprocessed dataset has only 103 bands after removing water absorption and low signal-to-noise ratio bands (Figure 1c). In total, nine land cover types were identified (Figure 1d). The training and testing samples of these classes, derived from the ground truth map, are shown in Table 2.

Land cover classes and the number of training and testing samples in the Pavia dataset

Land cover type | No. of training samples | No. of test samples | Total |
---|---|---|---|

C1. Asphalt | 3,315 | 3,316 | 6,631 |

C2. Meadows | 9,324 | 9,325 | 18,649 |

C3. Gravel | 1,049 | 1,050 | 2,099 |

C4. Trees | 1,532 | 1,532 | 3,064 |

C5. Painted metal sheets | 672 | 673 | 1,345 |

C6. Bare soil | 2,514 | 2,515 | 5,029 |

C7. Bitumen | 665 | 665 | 1,330 |

C8. Shadows | 473 | 474 | 947 |

C9. Self-blocking bricks | 1,841 | 1,841 | 3,682 |

Total | 21,386 | 21,391 | 42,776 |

### 2.2 Optimal band selection

In this study, we employed the swarm intelligence algorithms as the search processes to select bands from the original HSI to constitute the band subsets. The general algorithm flow diagram is shown in Figure 2. For convenience, solutions of these search processes are defined in a binary form. Intelligent agents of the swarms are initialized as a population of binary strings, the length of each string being equal to the number of bands of the original HSI (Figure 3). In a binary string, a value of 1 at position *i* means that the *i*th band is included in the iteration process, whereas a value of 0 indicates its absence. The binary string with *M* “1” characters denotes *M* bands are selected by the Intelligent agents.

To evaluate the search results of the swarm intelligence algorithms, a reasonable objective function is also necessary. Jeffreys–Matusita distance (JM) is one of the best class separability measures for multiclass problem [28]. The JM distance between two classes *c*_{h} and *c*_{k} (*J*_{h,k}) can be calculated by the following equation:

*is a*

**r***d*-dimensionality feature vector (the selected band subset has

*d*bands),

*under classes*

**r***c*

_{h}and

*c*

_{k}, respectively. When

*B*

_{h,k}is the Bhattacharyya distance between class

*c*

_{h}and

*c*

_{k}, and it can be calculated as per the following equation.

In this study, we employed the average JM among all of the classes as the criterion function to evaluate the results of the band subset selections [23]. The larger the average JM between different classes is, the better the solution will be. The average JM can be calculated as:

*c*is the number of classes.

### 2.2.1 ACA band selection algorithm (BS-ACA)

ACA, an artificial model inspired by the foraging behavior of an ant colony in nature, was first proposed in the late 1990s [29,30]. ACA solves the optimization problem by way of stigmergy, which is an indirect information transfer among ants [31]. For convenience, we used a fully connected undirected weighted graph G = 〈*B*, *E*〉 to represent the search space of ACA-based band selection algorithm. In this graph, the elements of B indicate the graph nodes and *m* < *n*). In this study, an ant-quantity system proposed by Dorigo was adopted to simulate the ants’ secretion behavior in which the amount *Q* of pheromone secreted by each ant is a constant [32]. The pheromone concentration *t*:

*ρ*is a volatility coefficient that controls the volatilization rate of the pheromone.

*k*, which is proportional to the JM

*N*is the total number of ants.

When the ant reaches *i* and band *j*

*α*denotes the rate of information accumulation in the movement of the ants and

*β*is the heuristic coefficient.

The BS-ACA is realized with the following steps [23]:

Step 1: Initialize the ant colony and the parameters, including *N*, *Q*, *ρ*, *α*, *β*, *λ* and the number of iterations *T*.

Step 2: Calculate the JM of ant *k* (*k* = 1), and obtain the probability that ant *k* will reach the candidate band set.

Step 3: Select the next band by roulette wheel selection, and repeat Step 2 until *M* bands are selected.

Step 4: Set *k* = *k* + 1, and repeat Steps 2 and 3 until *k* = *N*.

Step 5: Update the pheromone concentration on the route according to equation (5).

Step 6: Repeat Steps 2–5 until the iteration count reaches the user-specified number, and obtain the best band combination.

### 2.2.2 BS-CSA

The artificial immune system is a metaphor of an animal’s immune system. Clone selection is one of the well-known theories that effectively explain the immunity phenomenon [33]. CSA-based band selection algorithm was designed according to the clone selection theory. The objective function (JM) represents the antigen, the solution set of the specific problem denotes the population of antibody (**Ab**), and the value of the objective function is employed to evaluate the affinity of the solution (i.e., **Ab**). After the proliferation, mutation, and selection operations, the maximum affinity will approach stabilization (i.e., affinity maturation). The probability of mutation of the *i*th **Ab** (*P*_{i}) is inversely proportional to the affinity and can be calculated by equation (8).

*α*is a variable coefficient,

*f*

_{i}denotes the affinity of the

*i*th

**Ab**, and

*AB*), respectively. Therefore, the antibody with higher affinity can be mutated with lower frequency. Such a mutation strategy can improve the mutation effect.

The BS-CSA is designed with the following steps [34]:

**Step 1:** Initialize BS-CSA. The *AB* is generated randomly, and the parameters, including the size of the population (*N*) and *α*, are initialized.

**Step 2:** Calculate the affinity of *AB* according to equation (1).

**Step 3:** Select the *k* highest-affinity antibodies from *AB* to compose the new antibody population *k* can be calculated as follows according to the retention rate *r*:

**Step 4:** Clone the members of

**Step 5**: Each **Ab** in

**Step 6:** Calculate the affinity of *k* highest-affinity antibodies from *k* lowest-affinity antibodies in *mc* if its affinity is greater than the *mc*’s.

**Step 7:** In order to increase the diversity of the antibody population, *d* antibodies are produced randomly to replace the *d* lowest-affinity antibodies in *d* can be calculated according to the elimination rate *ɛ* by equation (11):

**Step 8:** When the BS-CSA iteration count reaches the user-specified number, stop the execution of the algorithm and obtain the optimal band subset. Otherwise, return to Step 2.

### 2.2.3 BS-PSO

Inspired by the foraging behavior of a bird flock, the PSO was proposed [35]. A bird in the flock is regarded as a particle that represents a potential solution of the optimization problem. The particles move through the *n*-dimensional problem search space and search for the optimal or good enough solution. The particle broadcasts its current position to the neighboring particles. Here **x**_{i}, defined by equation (12), is used to represent the position of the *i*th particle. The position change of each particle (as illustrated by Figure 4), defined as equation (13), is relative to its changing rate (i.e., velocity, denoted by *ν*).

*i*= {1, 2,…,

*N*}, where

*N*is the size of the particle population.

The velocity of the particle is adjusted as follows: according to the difference between the best position found by its neighbors (

**R**_{1}and

**R**_{2}are two random functions, each of which returns a vector comprising random values uniformly generated in the range [0, 1]. The symbol · denotes a point multiplication operation.

The selection of the optimal band subset is a discrete optimization problem, and the position *x*_{id}can be updated according to equation (15):

**Step 1:** Initialize the particle population, and set the parameters, including *N*, *T*.

**Step 2:** Calculate the JM of each particle, and obtain the best optimal position of each particle and the best optimal position of the population.

**Step 3:** Update the velocity and position of each particle according to equations (14) and (15), respectively.

**Step 4:** Recalculate the JM of each particle, and update the best positions found by the particle and its neighbors when the JM has been improved.

**Step 5:** When the iteration count reaches the user-specified number, terminate the algorithm and output the optimal band subset. Otherwise, return to Step 2.

### 2.2.4 BS-GA

GA is a widely used algorithm for searching for the optimal solution by simulating the evolutionary process. The solutions are represented by a genetic population, and individuals are encoded according to various rules. GA complies with Darwin’s theory of evolution; it generates a more optimal solution with each generation by selecting the superior and eliminating the inferior. The evolutionary process is realized by a sequence of the following genetic operators: selection, crossover, and mutation. Then, the individual with the highest JM is decoded and regarded as the optimal solution.

In BS-GA, the method for encoding the individual is as shown in Figure 2. The JM of an individual is evaluated by the average JM distance. Initially, BS-GA initializes the genetic population with a size of *N*. Then, a chromosome is selected according to a selection probability pb that is directly proportional to the fitness *f*. pb can be calculated as follows:

**Step 1:** Initialize the parameters and the population. Generate the population (*G*) with a size of *N* randomly, and set the crossover probability *T*.

**Step 2:** Calculate the JM of the chromosomes, and select the chromosomes having a higher selection probability pb to generate a new subset of the gene population,

**Step 3:** Select two parents from

**Step 4:** Select a locus on a chromosome randomly, and generate a number

**Step 5:** Evaluate the JM of *T*.

## 3 Experiment and results analysis

In the experiment, the four search algorithms based on their respective swarm intelligence algorithms, as well as the band selection algorithm based on SFFS (BS-SFFS), were tested on both the Indian dataset and the Pavia dataset, respectively. The maximum likelihood (ML) classifier was employed to evaluate the performances of these band selection algorithms, in terms of the JM of band subset, overall classification accuracy (OA), and computational efficiency. For convenience of description, BS-ACA-ML denotes the ML classifier with the band subset provided by BS-ACA, while other notations can be inferred by analogy.

For the four swarm intelligence algorithms, the average Jeffreys–Matusita distances (a-JM) varying with the number of iterations were used to measure the convergence property; at the same time, the classification results in the case of the best JM were presented to demonstrate the discrimination ability of ground features. The average OA as well as the average JM distance of the best band subsets were evaluated on different numbers of bands (5, 10, 15, 20, 25, and 30). The average running time (ART) of algorithms was used to evaluate the computational efficiency. Moreover, the impacts of different population sizes on the best JM, average OA, and computational efficiency were also provided.

These algorithms were encoded with Interact Data Language (IDL) and run under the IDL8.5 compiler on a PC with an Intel(R) Core(TM) CPU i5-4460 processor (3.20 GHz) and 8 GB RAM. To avoid accidental results, all experiments were repeated five times in the same software and hardware environment.

### 3.1 Parameter setting

Generally, parameters of swarm intelligence algorithm are empirically chosen. In this paper, the parameters of BS-ACA, BS-PSO, and BS-GA were estimated by means of grid searching (see Table 3). The population size of ants, particles, and chromosomes of these algorithms was set to 20. For the BS-CSA, the population size of antibodies is assigned to 10, and the other parameters are listed in Table 3 [34]. The user-specified number of iterations was assigned to 500 for all of the algorithms.

Parameter values for each feature selection algorithm

BS-ACA | BS-CSA | BS-PSO | BS-GA | ||||
---|---|---|---|---|---|---|---|

Parameter | Value | Parameter | Value | Parameter | Value | Parameter | Value |

Q | 10 | α | 1 | η_{1} | 2 | P_{c} | 0.6 |

ρ | 0.1 | ε | 0.3 | η_{2} | 2 | P_{m} | 0.4 |

α | 1 | r | 0.8 | ||||

β | 5 |

### 3.2 Experiment 1: Indian dataset

### 3.2.1 Convergence of algorithms

In order to investigate the convergences of these algorithms, band subsets with size of 30 were selected from Indian dataset. Each experiment was repeated five times. The variation in average JM distances with iterations is shown in Figure 6(a). From the figure, it can be seen that BS-ACA and BS-GA were premature and plunged into local optima within 20 and 60 iterations, respectively. BS-CSA could not evolve continuously and took a long time to jump out of the local optima. BS-PSO had been optimizing constantly before converging at the 361st generation. The mean values of the best JM over the five repeated experiments for BS-ACA, BS-CSA, BS-PSO, and BS-GA were 1.48, 1.57, 1.61, and 1.58, respectively. The best JM derived from BS-SFFS was 1.62, approximating to that from BS-PSO, and higher than those of other swarm-intelligence based algorithms.

### 3.2.2 Discrimination ability of selected band subsets

The average OAs of the five search algorithms, by the selected band subsets with size of 30, were 82.05%, 87.66%, 90.15%, 88.44%, and 90.45%, respectively. The best classification results of the algorithms in the five repeated experiments are shown in Figure 7(a–e), respectively, with the corresponding confusion matrices illustrated in Figure 8(a–e). Class-level classifications are also presented by Figures 7 and 8. For the most confusing soybean-mintill that was often misidentified as Corn-mintill and Corn-notill, BS-PSO and BS-SFFS were still superior to BS-ACA, BS-CSA, and BS-GA.

### 3.2.3 Sizes of band subsets

The performances of the algorithms, resulting from the selected band subsets with different sizes, were compared. As shown in Figure 9, the mean JM and the corresponding average OA were increasing with the size of the band subset for each algorithm. When varying the size of band subsets from 5 to 30, the mean values of the best JM of BS-ACA, BS-CSA, BS-PSO, BS-GA, and BS-SFFS increased from 1.29, 1.28, 1.36, 1.31, and 1.32 to 1.47, 1.57, 1.61, 1.58, and 1.62, respectively. The JM achieved by BS-PSO, approximating to that of BS-SFFS, was higher than those of BS-ACA, BS-CSA, and BS-GA. The corresponding average OAs increased from 65.08%, 64.81%, 72.55%, 69.12%, and 65.19% to 82.05%, 87.66%, 90.15%, 88.44%, and 90.45%. The average OA achieved by BS-PSO-ML was also higher than those of the other swarm-intelligence based algorithms. The average OA of BS-SFFS-ML approximated to that of BS-PSO-ML, except for the size of band subset <10, where a lower OA was obtained in comparison with BS-PSO-ML.

The test results for the ART of the algorithms were obtained and are shown in Figure 10, where 5, 10, 15, 20, 25, and 30 bands were selected from the HSI within 500 iterations. Among the swarm-intelligence based algorithms, the ARTs of BS-CSA and BS-ACA were the smallest, whereas those of BS-PSO and BS-GA were the largest, regardless of the number of selected bands. In terms of the BS-SFFS, the running time was much lower than those of the swarm-intelligence based algorithms, because it did not involve the iterative process.

### 3.2.4 Impacts of different population sizes

The impacts of different sizes of population on the four swarm-intelligence based algorithms were evaluated. The test results for ART, average JM distance, and average OA are shown in Figure 11. There was no conspicuous impact on the discrimination ability between different land cover types. Instead, the ART was increased markedly. Therefore, it makes no sense to enhance the searching ability of these algorithms by increasing the size of population.

### 3.3 Experiment 2: Pavia dataset

The band selection algorithms based on swarm intelligence were tested on the Pavia dataset, with the same parameter settings as in experiment 1.

### 3.3.1 Convergence of algorithms

The convergences of the algorithms relevant to the number of iterations were tested on the Pavia dataset, and the variation in average JM distances with iterations is shown in Figure 6(b). Generally, the convergences of the algorithms appeared in similar patterns to those in experiment 1. The BS-PSO provided the best performance, where the a-JM increased gradually at the beginning and stabilized in the end along with the increase in the number of iterations; whereas the BS-ACA had the worst performance, where it prematurely fell into local optima. The highest a-JM of BS-ACA, BS-CSA, BS-PSO, BS-GA, and BS-SFFS were 1.455, 1.474, 1.477, 1.473, and 1.477, respectively.

### 3.3.2 Discrimination ability of selected band subsets

The classification images and the corresponding confusion matrices are shown in Figure 12(a–e) and 13(a–e). Band subsets derived from BS-PSO and BS-SFFS had stronger discrimination in ground features than those of the other algorithms. To be specific, BS-PSO-ML and BS-SFFS-ML were inferior to BS-ACA-ML in the matter of the classification of Asphalt and Gravel, but they had better performances for the classification of Meadows and Bare soil.

### 3.3.3 Sizes of band subset

The responses of the a-JM and average OA to the change in the size of band subset are shown in Figures 14a and b, respectively. The a-JM and the average OA increased with the size of the band subset for each algorithm. In general, the a-JM derived from BS-PSO and BS-SFFS were higher than those of the other three algorithms. The corresponding average OAs achieved by BS-PSO-ML and BS-SFFS-ML were higher than those of BS-ACA-ML, BS-CSA-ML, and BS-GA-ML, whereas the performance of BS-ACA was the worst in terms of a-JM and average OA.

The corresponding ART of each algorithm for the selection of band subsets with different sizes was obtained and is shown in Figure 15. From the figure, it can be seen that the ART of each algorithm increased with the size of the band subset. The ARTs of BS-PSO and BS-GA were obviously higher than those of BS-ACA and BS-CSA. What’s more, the ART of BS-SFFS was the lowest.

### 3.3.4 Impacts of different population sizes

The test results for ART, average JM distance, and average OA are shown in Figure 16. The ARTs of these algorithms have been increased greatly, but the average OAs did not improve markedly. Therefore, it makes no sense to enhance the searching ability of the band selection algorithms by increasing the size of population.

## 4 Discussion

A wide range of swarm-intelligence algorithms have been widely used in the feature selection for hyperspectral image classification. Yet, whether the algorithms can constantly choose the optimal features (bands) with consideration of the running time, i.e., the real performance of the algorithms, has not been systematically investigated, which would negatively impact their extensive application. Four typical swarm-intelligence algorithms including the ACA, the CSA, the PSO, and the GA, together with a benchmark called sequent float forward selection (SFFS) – one of the best non-intelligence algorithms, have been tested and compared with two well-known public hyperspectral image datasets (the Indian Pines and the Pavia University).

Generally, compared with the other three swarm-intelligence based algorithms, BS-PSO has stronger optimization capability, which has been demonstrated by the better a-JM, but at a disadvantage in terms of ART.

BS-PSO has the characteristics of evolutionary computation. The “speed-displacement” search strategy, changing particle’s position with iterations by adjusting

BS-ACA is different from the other three algorithms for its searching process that starts from a single ant but does not start with a solution set; it searches for the solution by obtaining the candidate bands one by one with a certain probability. However, when the number of candidate bands is large, the selection probability for the potentially best candidate band is small; thereby, the BS-ACA cannot easily find the better candidate band and will fall into the local optimum. Due to the volatilization of the pheromones that have been laid on the best route, BS-ACA cannot easily jump out of the local optimum. Therefore, BS-ACA has a poor performance in terms of the average JM distance.

The a-JM of BS-CSA and BS-GA are similar and lower than that of BS-PSO. This shows that BS-CSA and BS-GA also fall into local optima. BS-CSA and BS-GA, both use an evolution operator. In BS-CSA, the **Ab** with higher a-JM has more opportunities to generate offspring and a lower probability of mutating. As a result, the diversity of the population decreases over the generations. BS-GA is similar: in order to maintain a proper convergence speed, the probability of chromosome of being selected is proportional to its a-JM; at the same time, the crossover and mutation operator of BS-GA cannot be activated, unless the crossover probability and mutation probability were met.

The ARTs of these algorithms are different. The running time of algorithm is determined by many factors, such as the computer being used, how and with which programming language the code is implemented [40]. The computational complexity is employed to evaluate the performance of algorithm in terms of running time. In the worst case, the computational time of CSA, required in the optimization problem, is *C*, the complexity of objective function, can be calculated as follows [28]:

*s*is the number of samples. Thus, the calculation of objective function is time consuming.

ACA and PSO are meta-heuristic algorithms [41]. For BS-ACA, the average JM distance of two different arbitrary bands has been previously calculated in the initialization phase; therefore, much running time can be saved in the iteration process of BS-ACA. Different from BS-ACA, the BS-PSO searches for the optimal band subset by iteratively updating the positions and velocities of the particles. The a-JM of current position and the best position found by each particle must be recalculated after the update operator. Thus, the ART of BS-PSO is greater than that of BS-ACA. BS-CSA and BS-GA are evolutionary algorithms. In most of the cases, the number of cloned Abs (*N*_{c}) approximates to *N*; thus, the calculation of objective function determines the running times of BS-CSA and BS-GA. Although the number of operators in BS-CSA is greater than that of BS-GA, which only has the crossover operator and the mutation operator, the number of chromosomes involved in the crossover and mutation is greater than those of mutated and newly generated **Ab**s. The a-JM of these chromosomes must be recalculated in each generation. However, BS-CSA only needs to calculate the a-JM for the mutated and newly generated **Ab**s. This characteristic saves many computational times for BS-CSA. Therefore, the time consumed by BS-GA is obviously larger than that of BS-CSA. The computational complexity of BS-PSO and BS-GA is the same, and the number of calculation times of the objective function is approximately equal. Therefore, the ARTs of BS-PSO and BS-GA are approximate.

## 5 Conclusions

In this study, we used four common types of swarm intelligence algorithms and a greedy algorithm to select the optimal band subset from a hyperspectral remote sensing image. The performance was compared from three aspects: a-JM, average OA, and ART.

The experiment results show that the band selection algorithm based on PSO has better performance in terms of average OA, but it needs some improvements to reduce the running time, such as band selection based on parallel PSO [42]. PSO is a potential subset generation procedure for optimal band selection in hyperspectral remote sensing imagery. The band selection algorithm based on the ACA gets into local optima easily. One of the effective methods that can avoid getting into local optima is combined with other algorithms, such as GA, CSA, etc. The performance of the band selection algorithm based on the CSA and that of the band selection algorithm based on the GA were unremarkable, having lower a-JM. A series of evolutionary operators reduced the diversity of the population and led the algorithms to converge to local optima. In brief, PSO has a stronger optimizing capability than the other algorithms, and the optimizing capabilities of the ACA, the CSA, and the GA are weaker than that of SFFS, which is a typical greedy algorithm.

Hence, for the field of band selection in hyperspectral remote sensing imagery, the PSO and the GA have greater room for improvement in achieving an acceptable runtime, and PSO is the best subset generation procedure. These swarm intelligence algorithms can complement each other’s advantages in the structure of a hybrid subset generation procedure having stronger optimizing capabilities.

The research was jointly supported by the National Natural Science Foundation of China (Grant Nos.: 41301465, 4197060691), Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering, Guangdong Laboratory (Guangzhou) (Grant No. GML2019ZD0301), the GDAS’ Project of Science and Technology Development (Grant No. 2019GDASYL-0301001), Guangzhou Science and Technology Planning project (Grant No. 201902010033), and the National Key Research and Development Program of China (Grant No. 2016YFB0502300).

## References

- [1]↑
Huo C, Zhang R, Yin D. Compression technique for compressed sensing hyperspectral images. Int J Remote Sens. 2012;33(5):1586–604. .

- [2]↑
Hughes G. On the mean accuracy of statistical pattern recognizers. IEEE Trans Inform Theory. 1968;14(1):55–63. .

- [4]↑
Dalla Mura M, Villa A, Benediktsson JA. Jocelyn Chanussot, Lorenzo Bruzzone, Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis. IEEE Geosci Remote Sens Lett. 2011;8(3):542–6. .

- [5]↑
Ma Y, Li R, Yang G, Sun L, Wang J. A research on the combination strategies of multiple features for hyperspectral remote sensing image classification. J Sens. 2018:7341973.

- [6]↑
Du Q. Modified fisher’s linear discriminant analysis for hyperspectral imagery. IEEE Geosci Remote Sens Lett. 2007;4(4):503–7. .

- [7]↑
Ding X, Li H, Yang J, Dale P, Chen X, Jiang C, et al. An improved ant colony algorithm for optimized band selection of hyperspectral remotely sensed imagery. IEEE Access. 2020;8:25789–99.

- [8]↑
Talukder A, Casasent D. General methodology for simultaneous representation and discrimination of multiple object classes. Opt Eng. 1998;37(3):904–13. .

- [9]↑
Nascimento JMP, Dias JMB. Does independent component analysis play a role in unmixing hyperspectral data. IEEE Trans Geosci Remote Sens. 2005;43(1):175–87. .

- [11]↑
Feng J, Jiao L, Liu F, Sun T, Zhang X. Unsupervised feature selection based on maximum information and minimum redundancy for hyperspectral images. Pattern Recognit. 2016;51:295–309. .

- [12]↑
Feng L, Tan AH, Lim MH, Jiang SW. Band selection for hyperspectral images using probabilistic memetic algorithm. Soft Comput. 2016;20(12):4685–93. .

- [14]↑
Pudil P, Novovičová J, Kittler J. Floating search methods in feature selection. Pattern Recognit Lett. 1994;15(11):1119–25. .

- [15]↑
Yang H, Du Q, Chen G. Particle swarm optimization-based hyperspectral dimensionality reduction for urban land cover classification. IEEE J Select Top Appl Earth Observ Remote Sens. 2012;5(2):544–54. .

- [16]↑
Gomez-Chova L, Calpe J, Camps-Valls G, Martin JD, Soria E, Vila J, et al. Feature selection of hyperspectral data through local correlation and SFFS for crop classification. IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No. 03CH37477). Toulouse, France: IEEE; 2003, July. vol. 1, p. 555–7.

- [17]↑
Chang CY, Chang CW, Kathiravan S, Lin C, Chen ST. DAG-SVM based infant cry classification system using sequential forward floating feature selection. Multidimension Syst Signal Process. 2017;28(3):961–76.

- [18]↑
Samadzadegan F, Partovi T. Feature selection based on ant colony algorithm for hyperspectral remote sensing images. Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), 2010 2nd Workshop on IEEE. Reykjavik, Iceland: IEEE; 2010. p. 1–4.

- [20]↑
Zhuo L, Zheng J, Li X, Wang F, Ai B, Qian J. A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine. In: Proc. Geoinformat. Joint Conf. GIS Built Environ. Classif. Remote Sens. Images Int Soc Opt Photonics. Bellingham, USA: SPIE; 2008. p. 71471J.

- [21]↑
Al-Ani A. Feature subset selection using ant colony optimization. Int J Comput Intell. 2005;2(1):53–8.

- [22]↑
Zhu X, Li N, Pan Y. Optimization performance comparison of three different group intelligence algorithms on a SVM for hyperspectral imagery classification. Remote Sens. 2019;11(6):734.

- [23]↑
Zhou S, Zhang J, Su B. Feature selection and classification based on ant colony algorithm for hyperspectral remote sensing images, Image and Signal Processing, 2009. CISP'09. 2nd International Congress on. IEEE, Tianjin, China. 2009. p. 1–4.

- [24]↑
Zhong Y, Zhang L. A fast clonal selection algorithm for feature selection in hyperspectral imagery. Geo-spatial Inform Sci. 2009;12(3):172–81. .

- [25]↑
Tu CJ, Chuang LY, Chang JY, Yang CH. Feature selection using PSO-SVM. Int J Comput Sci. 2007;33(1):111–6.

- [26]↑
Wang X, Yang J, Teng X, Xia W, Jensen R. Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett. 2007;28(4):459–71. .

- [27]↑
Samadzadegan F, Mahmoudi FT. Optimum band selection in hyperspectral imagery using swarm intelligence optimization algorithms, Image Information Processing (ICIIP). 2011 International Conference on. IEEE, Shimla, Himachal Pradesh, India. 2011. p. 1–6.

- [28]↑
Su H, Du Q, Chen G, Du P. Optimized hyperspectral band selection using particle swarm optimization. IEEE J Select Top Appl Earth Observ Remote Sens. 2014;7(6):2659–70. .

- [29]↑
Dorigo M, Di Caro G, Ant colony optimization: a new meta-heuristic, Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on. IEEE, Washington, DC, USA, 2, p. 1470–1477.

- [30]↑
Dorigo M, Birattari M, Stutzle T. Ant colony optimization. IEEE Comput Intell Mag. 2006;1(4):28–39. .

- [31]↑
Dréo J, Siarry P. A new ant colony algorithm using the heterarchical concept aimed at optimization of multiminima continuous functions. In: International Workshop on Ant Algorithms. Berlin, Heidelberg: Springer; 2002. p. 216–21. .

- [32]↑
Dorigo M. Optimization, Learning and Natural Algorithms (in Italian), PhD thesis, Dipartimento di Elettronica, Politecnico di Milano, Italy, 1992.

- [33]↑
Bean WB. The clonal selection theory of acquired immunity. AMA Arch Intern Med. 1960;105(6):973–4. .

- [34]↑
Zhang L, Zhong Y, Huang B, Gong J, Li P. Dimensionality reduction based on clonal selection for hyperspectral imagery. IEEE Trans Geosci Remote Sens. 2007;45(12):4172–86. .

- [35]↑
Kennedy J, Eberhart RC. Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks. vol. 4, Piscataway, NJ: IEEE Press; 1995. p. 1942–8.

- [36]↑
Potts JC, Giddens TD, Yadav SB. The development and evaluation of an improved genetic algorithm based on migration and artificial selection. IEEE Trans Syst Man Cybernet. 1994;24(1):73–86. .

- [37]↑
Alba E, Troya JM. A survey of parallel distributed genetic algorithms. Complexity. 1999;4(4):31–52. .

- [38]↑
Lokman G, Baba AF, Topuz V. A trajectory tracking FLC tuned with PSO for TRIGA Mark-II nuclear research reactor. International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Berlin, Heidelberg: Springer; 2011. p. 90–9.

- [39]↑
Li HP, Zhang SQ, Zhang C, Li P, Cropp R. A novel unsupervised Levy flight particle swarm optimization (ULPSO) method for multispectral remote-sensing image classification. Int J Remote Sens. 2017;38(23):6970–92. .

- [40]↑
De Castro LN, Von Zuben FJ. Learning and optimization using the clonal selection principle. IEEE Trans Evolution Comput. 2002;6(3):239–51. .

- [41]↑
Oliveto PS, He J, Yao X. Time complexity of evolutionary algorithms for combinatorial optimization: a decade of results. Int J Automat Comput. 2007;4(3):281–93. .

- [42]↑
Chang YL, Fang JP, Benediktsson JA, Lena C, Hsuan R, Kun-Shan C. Band selection for hyperspectral images based on parallel particle swarm optimization schemes. In: Geoscience and Remote Sensing Symposium, 2009 IEEE International, IGARSS 2009. IEEE, Cape Town, South Africa. vol. 5, New York, USA: IEEE; 2009. p. V-84–7.