Mineralized indicator minerals are an important geological and mineral exploration indicator. Rapid extraction of mineralized indicator minerals from hyperspectral remote sensing images using ensemble learning model has important geological significance for mineral resources exploration. In this study, two mineralized indicator minerals, limonite and chlorite, exposed at the surface of Qinghai Gouli area were used as the research objects. Sparrow search algorithm (SSA) was combined with random forest (RF) and gradient boosting decision tree (GBDT) ensemble learning models, respectively, to construct hyperspectral mineralized indicative mineral information extraction models in the study area. Youden index (YD) and ore deposit coincidence (ODC) were applied to evaluate the performance of different models in the mineral information extraction. The results indicate that the optimization of SSA parameter algorithm is obvious, and the accuracy of both the integrated learning models after parameter search has been improved substantially, among which the SSA-GBDT model has the best performance, and the YD and the ODC can reach 0.661 and 0.727, respectively. Compared with traditional machine learning model, integrated learning model has higher reliability and stronger generalization performance in hyperspectral mineral information extraction and application, with YD greater than 0.6. In addition, the distribution of mineralized indicative minerals extracted by the ensemble learning model after parameter optimization is basically consistent with the distribution pattern of the fracture tectonic spreading characteristics and known deposits (points) in the area, which is in line with the geological characteristics of mineralization in the study area. Therefore, the classification and extraction model of minerals based on hyperspectral remote sensing technology, combined with the SSA optimization algorithm and ensemble learning model, is an efficient mineral exploration method.
Mineral resources are natural materials formed by long-term evolution under specific geological conditions in a long geological history period and are the important material basis for the survival and development of human society [1,2,3]. Under the background of the rapid development of the global economy, the demand for mineral resources has surged, and many large mines have been gradually exhausted due to long-term overdraft [4,5,6]. In addition, traditional mining methods have caused serious ecological and environmental problems in exploring mineral resources in recent years. Under the dual objectives of mineral resources exploration and ecological environment protection, how to accurately carry out green exploration with high and new technology, reduce the exposure of surface projects, and protect the surrounding ecological environment is the primary problem to be solved urgently in mineral resources exploration [7,8]. At present, mineralized indicative mineral information has become an independent prospecting indicator with the same status as geochemical anomaly and geophysical anomaly, and is an important basis for delineating prospecting potential areas and exploration targets [9,10,11]. For example, in terms of medium and low temperature hydrothermal deposits, chloritization, limonitization, sericite, pyritic feldspar, silicification, and carbonation are typical types of surrounding rock alteration development. Among them, limonitization is closely related to medium and low temperature hydrothermal deposits. The stronger the degree of limonitization, the larger the extent. This indirectly reflects that the stronger the alteration of metal sulfide, the higher the grade of ore . Chloritization is mainly formed by the alteration of mica and hornblende during regional metamorphism, which is superimposed on the mineralization stage, and therefore has a certain relationship with mineralization. Sericitization is mainly a facies-type alteration formed by regional metamorphism, which is commonly found in the fracture zone, schistose zone, and its surrounding rocks on both sides, indirectly indicating the superposition of sericitization (potassicization) with the mineralization of gold ore. Numerous scholars have shown that, due to the close connection between indicator minerals and mineralization, the use of hyperspectral remote sensing technology to extract the spatial location of mineralized indicator minerals can provide effective clues for mineral exploration in difficult areas on a large scale [13,14].
For a long time, the study of spectral characteristics for rock and ore has been an important content of the application of remote sensing technology in geological exploration [15,16,17]. Aerospace and aviation hyperspectral remote sensing technology is a new earth observation technology gradually formed in the early 1980s. It can not only obtain the spatial location of the surface material, but also realize the effective identification of minerals and lithology information by detecting the continuous reflection spectral information of the surface material. Spaceborne hyperspectral remote sensing technology has great potential and application prospects in mineralized indicative minerals determination and prospecting target delineation due to its large detection area, fast detection speed, and non-destructive detection. At present, spaceborne hyperspectral remote sensing technology has become an important means of green mineral exploration [18,19,20]. In recent years, with the continuous development of statistical learning theory, many scholars have combined machine learning models with hyperspectral remote sensing technology to extract mineralized alteration information and achieved research results . For example, Chen et al., based on ASTER and GF-5 satellite remote sensing data, used SPCA and MTMF techniques to map hematite, kaolinite, calcite, dolomite, and other mineralized indicative minerals in Longtoushan lead-zinc deposit, and verified the accuracy of the experiment through field survey and kappa coefficient . Roy et al. used Airborne Visible-Infrared Imaging Spectrometer-Next Generation observation data, carried out large-scale mapping of the Jahazpur mineralized zone by MapReduce model using the integrated extreme learning machine (IELM) method, and quantitative evaluation of its identification accuracy was carried out . Shayeganpour and Tangestani selected FODPSO algorithm to segment HyMap mosaic image scene from it to build sample dataset, and used Gaussian regression algorithm to extract spatial information of mineralized indicative minerals such as alunite, jarosite, kaolinite, muscovite, and montmorillonite as well as main rock types from Birjand in eastern Iran .
Since hyperspectral images are characterized by large data volume, and high data dimensionality and data redundancy, the extraction of mineralized indicative mineral information by traditional machine learning methods often suffers from high complexity of assumption space calculation and model overfitting, which makes the extraction effect unsatisfactory and less reliable in terms of accuracy. Therefore, finding and constructing a new method to realize the effective extraction of mineralized indicative mineral information is the key to the mineral exploration work at this stage [24,25]. Ensemble learning is a machine learning method that uses different learners to train and integrates multiple learning results through some combination strategy to achieve better training results than a single learner [26,27,28]. Ensemble learning can effectively solve traditional machine learning problems, such as strong sensitivity to training samples, high computational complexity, and over-fitting, and has become one of the research hotspots in machine learning. Various ensemble learning algorithms have been widely used in biology, engineering, medicine, computer vision, image processing, etc. [29,30,31,32,33]. The commonly used ensemble learning methods include bagging parallel ensemble learning and boosting serial ensemble learning in learner combination strategies. Random forest (RF) and gradient boosting decision tree (GBDT) are typical representatives of two different integrated learning combination frameworks of bagging and boosting, and the base learners chosen for both models are CART decision trees, which perform well in regression and classification problems [34,35,36].
In using machine learning models for mineral information extraction, optimal selection of key parameters within the model can effectively improve model performance, e.g., network weights for artificial neural network (ANN) models [37,38], error-tolerant penalty factors and kernel parameters for support vector machine (SVM) models [39,40], number of growing trees in RF models [41,42], etc. In recent years, a large number of intelligent optimization algorithms have been gradually proposed and used to optimize the internal parameters of machine learning models, such as genetic algorithm [43,44], particle swarm optimization [45,46]. However, these classical optimization algorithms are sensitive to the initial parameter settings and easily fall into local optimal solutions during the optimization search process, resulting in slow convergence of the algorithm in the later stage. The Sparrow search algorithm (SSA) is a new swarm intelligence optimization algorithm, which is mainly inspired by the foraging behavior and anti-predation behavior of sparrows [47,48,49]. The SSA has the characteristics of strong global optimization capability, independent of gradient information, and little influenced by initial parameters. It has been widely used to solve the problem of parameter optimization calculation of complex models due to its good parallelism, fast convergence speed, and excellent performance in model parameter optimization calculation [50,51,52].
In this study, based on the ZY1-02D hyperspectral remote sensing image of Qinghai Gouli area, two typical mineralized indicative minerals, iron ore and chlorite, which are obviously exposed in the area, were taken as the objects of study, and the spectral absorption characteristics were analyzed by using ASD ground spectrometer for field spectral acquisition. Characteristic variables were extracted as comprehensive learning training samples using kernel principal component analysis (KPCA). The SSA algorithm combined RF and GBDT ensemble learning methods to build a mineral information extraction model. The spatial distribution map of typical mineralized indicative minerals in the study area was drawn. In the calculation process, the Youden index (YD) was selected as the fitness value of the optimization objective function. The growth number and splitting node number of the RF model and the growing number and learning rate of the GBDT model were optimized, respectively. Finally, the extraction effect of mineral information was evaluated by calculating the YD and the ore deposit coincidence (ODC), combined with the geological and metallogenic conditions in the area and the field investigation. The main contribution of this work is to propose the combination of SSA and ensemble learning model for hyperspectral mineralized indicative mineral extraction and introduce the YD for evaluating the extraction effect of mineral. Combining the SSA algorithm and ensemble learning model can improve the performance of the ensemble learning model in alteration mineral prediction. The YD provides a precision evaluation standard for mineralized indicative minerals from the perspective of cost minimization and benefit maximization.
2 Study area and data processing
2.1 Geological setting and altered mineral
The study area is located in the eastern part of the east Kunlun orogenic belt in the northwest Qinghai Province. The administrative division of the study area belongs to Guoli Township, Haixi Tibetan and Mongolian Autonomous Prefecture, Qinghai Province. The tectonic structure in the region belongs to the southern margin of the Qin-Qi-Kun orogenic system, and the region spans two tectonic units (the Kunbei tectonic belt and the Kunnan tectonic belt) from south to north. The region is westerly adjacent to the Western Kunlun orogenic belt, and the two areas are separated by Altun sinistral strike-slip fault. It is easterly separated from the Western Qinling orogenic belt by Gonghe Basin, while it is close to Qaidam Basin in the north. Most stratigraphic mapping units exposed in the region are in contact with faults transformed by multi-stage deformation and metamorphism. The New Archaean Milan rock group, Paleoproterozoic Baishahe rock formation, Kuhai rock group, Jinshuikou rock group, Neoproterozoic Wanbaogou group, Wenquangou group, Qingbanshisuban formation, Phyllite member, and Changcheng System Xiaomiao rock formation are crystalline basement and transitional basement systems in the study area. The Ordovician-Silurian Nachitai group, Devonian Maoniushan Formation, Carboniferous Huitoutala Formation Volcanic Member, Permian-Carboniferous Gahai group, and Haotelowa Formation are all sedimentary in the upper Permian basin. The well-preserved original sedimentary sequence consists of a sandstone section of Triassic Hongshuichuan Formation, ice and snow accumulation of Middle Pleistocene, late Pleistocene alluvial, Holocene alluvial and alluvial, to form the stratigraphic system of early Mesozoic orogenic sedimentary basin and Cenozoic sedimentary basin (Figure 1).
Geological formations, intrusions, and polymetallic mineralization in the study area are influenced by a series of NW-SE trending deep faults, which constitute the Kunbei, Kunzhong, and Kunnan faults. The magmatic activity in the area is mainly developed in different stages of the Caledonian-Indosinian orogenic cycle, and the magma is mainly composed of intermediate-acid intrusive rocks of Ordovician, Silurian, Permian, and Late Triassic. The magmatic activity is characterized by the huge outcrop of granitic intrusive rocks, which are distributed in rock base and rock strain. The intermediate-acid rock is mainly distributed on both sides of the North Dongkun fault and Dongkun middle fault. The regional mineralization mostly exists near the contact zone between different layers in the area and intrusive rocks of different ages. The study area has superior metallogenic geological conditions. The output mineral is a medium–low temperature hydrothermal type gold polymetallic deposit. Affected by tectonic and hydrothermal activities in multiple stages, the wall rock alteration in the area is strong, overlapping in space and having obvious zoning characteristics. It mainly occurs in the fracture alteration zone. The wall rock alteration is mainly silicification, sericitization, pyritization, arsenopyrite mineralization, and stibnite mineralization. The oxidation zone containing limonite Jarosite and chlorite are extremely developed, and their strike, dip, and other occurrences are consistent with those of the fractured alteration zone. In combination with the geological and mineral data of the study area and the results of field investigation, limonite and chlorite are selected as the mineralized indicative minerals to carry out the hyperspectral image extraction test, which is mainly based on the following: limonite and chlorite are the main mineral types in the fracture alteration oxidation zone exposed on the surface of the area. In addition, the information of limonite and chlorite minerals is exposed in a large area, which makes it easy to conduct spectral measurement of ground objects. The spectral characteristics of the two minerals are typical, and they are suitable for the experimental study of hyperspectral mineral extraction.
2.2 Hyperspectral image data
ZY1-02D with visible near-infrared (VNIR) and hyperspectral cameras is a commercial hyperspectral satellite launched by the Ministry of Natural Resources of China in September 2019. The hyperspectral camera can obtain hyperspectral data of 156 spectral segments with a width of 60 km and a spatial resolution of 30 m, including 76 bands of VNIR with a spectral resolution of 10 nm and 90 bands of short-wave infrared (SWIR) with a spectral resolution of 20 nm. In this study, ZY1-02D hyperspectral images were obtained on October 16, 2020 and preprocessed in the transboundary covering test area. Due to the obvious fringes in ZY1-02D hyperspectral SWIR band data, the method of “global striping” was adopted to repair the fringes and the bands severely disturbed by water vapor and to remove the overlapping bands. A total of 149 spectral bands of VNIR and SWIR bands were combined and stored. Radiometric calibration of image data was carried out on the ENVI software platform. The atmospheric correction was performed on the image using the FLAASH module to obtain the real reflectance of ground objects. The processed image is shown in Figure 2.
2.3 Ground hyperspectral data
Based on the geological and mineral data of the study area, rock and alteration mineral collection and spectral testing were carried out around the identified ore deposits in the area (Figure 3). In order to reduce the environmental errors of ground measured spectrum and image spectrum detection, the field acquisition time was selected to be consistent with the transit time of the ZY1-02D satellite. The ASD Filed Spec4 ground object spectrometer with the standard mineral probe was used to conduct spectral measurement on the limonite and chlorite alteration mineral samples and three rock samples of marble, syenogranite, and monzogranite. The whiteboard was calibrated before each measurement to improve the accuracy of spectral measurement data. The mean values of ten spectral measurements were used as the reflectance spectral data of the collected sample. The measured spectral data show that the measured spectral noise is concentrated in the range of 350–399 nm and 2,451–2,500 nm, resulting in a low SNR. Therefore, the measured data in this range of the original spectral data are excluded.
2.4 Image preprocessing results verification
In order to verify the accuracy of spectral correction of the ZY1-02D hyperspectral image, the hyperspectral image was superimposed with the geological and mineral maps of the study area. According to the known lithologic distribution information, 50 end-member spectra of monzogranite, syenite, and marble were randomly selected, respectively. The correlation analysis was conducted with the spectral data of the three rocks measured in the field. Pearson correlation coefficient was used to evaluate the processing effect of hyperspectral images (Figure 4). The results indicate that the spectral reflectance curves of the images of the three kinds of rocks are similar to the spectral curves measured in the field, and the feature absorption positions are basically the same where the spectral shape is highly consistent. The Pearson correlation coefficients of most samples were above 0.8. The correlation coefficient curve of marble is generally lower than monzogranite and syenogranite, but the mean value is still over 0.7, indicating a high matching degree. The above results indicate that the spectral correction accuracy of the ZY1-02D hyperspectral image after spectral correction is relatively high, which meets the accuracy requirements of rock and mineral information extraction.
3.1 Ensemble learning models
Ensemble learning is a learning method by building and combining multiple learners of the same or different kinds. Combining the constructed multiple learners in different combination methods can usually get a better learning effect and better generalization performance than a single learner [53,54]. The commonly used ensemble learning methods include bagging parallel ensemble learning and boosting serial ensemble learning in learner combination rules . Bagging extracts training samples from the training set to construct a training set with the same size but different from the training set for each basic classifier, thus training different basic classifiers [56,57,58]. Boosting is to first give each training sample the same initial weight, at the same time, select the first basic classifier to test the training dataset, so as to improve the weight proportion of classification error samples, and then use the adjusted weighted training dataset to repeatedly train other basic classifiers, and finally obtain a learner with high enough accuracy [59,60]. The RF and GBDT are the typical representatives of bagging and boosting integrated learning combination frameworks, respectively, which can solve classification prediction problems (Figure 5).
3.1.1 RF algorithm
The RF is a machine learning algorithm for classification problems proposed by Leo Breiman and Adele Cutler in 2001 based on the bagging parallel ensemble learning idea. When RF is applied to classification problems, the number of classification trees should be set first, and then the data sample set should be randomly sampled by Bootstrap resampling several times to obtain the Bootstrap sample set equal to the number of classification trees . According to the training set, the optimal attributes are randomly and equally probabilistic extracted from the attribute set of feature variables for node splitting to form many decision trees for voting. Finally, the category with the most output categories among all decision trees is taken as the classification result [52,62,63].
The RF adopts different principles for node splitting for classification and regression problems. The RF adopts the Gini index as the calculation principle in the classification tree. Gini index is defined as:
where K represents the number of categories on the current node, Gini(p) is the Gini index of the current node, and p k is the proportion of the kth sample in the node dataset. According to formula (1), Gini index is proportional to the evenness of the category distribution. Furthermore, the purity of the set is inversely proportional to the probability of the selected samples of the set being misclassified.
When searching for the optimal splitting characteristics and threshold value, the evaluation criterion is:
where α * is the optimal splitting feature of the current node, and t is the splitting threshold. Furthermore, Gini(P left), Gini(P right) and ω left, ω right are Gini index and sample weight of left and right leaf nodes after node division, respectively. Formula (2) indicates that in the set composed of M candidate attributes, attribute α *, and threshold t are sought to reduce the Gini index of the leaf node to the maximum extent compared with the parent node after partitioning. The final classification discrimination is as follows:
where H(x) represents the classification combination model, M is the number of self-set decision trees, h i is the i th decision tree classification model, I() is an indicative function (the so-called explicit function refers to a function that takes the value 1 when the set contains this number and the value 0 when there is no such number in the set), and Y represents the output variable.
The GBDT is an algorithm combining decision tree and boosting serial ensemble learning idea proposed by Professor Friedman of Stanford University in 2001. It is an additional model that weights multiple weak learners into strong learners , which can be expressed as follows:
where F(x) is the model objective function, T is the number of trees that need to be constructed in gradient lifting decision tree, h t (x) is a weak learner (classification regression tree CART), and α t is the weight of the t th tree.
The GBDT adopts the forward distribution algorithm . First, F0(x) is determined as the initial value of the model, which is usually a constant. The model at the m th step is:
where F m−1(x) is the current model. The newly added classification regression tree h m(x) can be obtained by minimizing the loss function:
where N is the number of samples. The gradient lifting decision tree adopts the gradient descent method to solve the optimal model. It takes the negative gradient value of the loss function in the current F m−1(x) model as the direction of gradient descent.
where α m can be obtained by the line search
Regularization of GBDT can be adjusted by setting the learning rate
where v represents the learning rate. The lower learning rate means that more CART decision trees are needed, resulting in smaller final errors, but the training time will also be increased.
3.2 Parameter optimization
The SSA is a new swarm intelligence optimization algorithm proposed by Xue in 2020 based on the foraging behavior and anti-predation behavior of sparrows . The foraging behavior of sparrows corresponds to finders and followers in the SSA algorithm, i.e., several sparrows with better positions are selected as finders in each iteration to search for food globally and provide foraging areas and directions for all followers, while the remaining sparrows as followers to follow the finders to compete for food . The anti-predation behavior of sparrows corresponds to the reconnaissance and early warning mechanism in SSA, i.e., if danger is found, some sparrows of the population conduct reconnaissance and give early warning to give up food searching and fly to a new location [68,69]. In the D-dimensional solution space, the position of individual sparrows represents a set of effective solutions to the search space, and the energy reserve of individual sparrows represents the fitness value. In the process of parameter optimization using the sparrow algorithm, the global search process should be performed first to update the sparrow finder position. According to the study of Dong et al., the updated formula of the sparrow finder position is as follows :
where represents the D-dimensional position of the i th individual of the t generation in the population; d = 1, 2,., D; G max is the maximum number of iterations; α is the uniform random number in (0,1); Q is a random number that follows the standard normal distribution; L is a matrix where each element is 1; ST is the warning threshold with its value range of [0.5,1]; R 2 is the warning value ranging from 0 to 1.
After the global search process described above, a local search for the current best position is performed using followers around the search area. According to the study of Liu et al., the follower position update formula during the local search is :
where is the worst position of the current population; is the optimal location of the current population; , where A is the 1 × D matrix (row vector), and each element in the row vector is randomly assigned 1 or −1; T represents the sparrow population size.
where is the current global optimal position; β is the random number following the standard normal distribution; k is the uniform random number between [−1,1]; f i , f g, and f n represent the fitness, global optimal fitness, and global worst fitness values of the current population, respectively; ε is the smallest constant to avoid zero division error.
In this work, the YD commonly used in medical statistical analysis was applied to evaluate the extraction effect of mineralized indicative mineral information . In evaluating medical diagnostic ability, True Positive Rate (TPR) and False Positive Rate (FPR) are two basic indexes to evaluate the diagnostic accuracy of test methods. The TPR refers to the proportion of visitors who can be correctly identified as patients by the screening method, while FPR refers to the proportion of non-patients who can be correctly identified by the screening method. Both indicators are important for medical diagnostic research [73,74]. Therefore, the YD is the difference between TPR and FPR, which can comprehensively reflect the diagnostic ability. The TPR and FPR are redefined in evaluating the mineral information extraction effect. The TP represents the number of calculated mineral information units correctly classified as mineral units, and TN represents the number of calculated non-mineral information units correctly classified as mineral-free units [75,76]. The FN represents the number of actual mineral units classified as non-mineral information units, and FP represents the number of mineral-free units classified as mineral information units. The TPR is the ratio of the number of mineral units correctly identified to the total number of abnormal mineral units predicted. The FPR represents the proportion of the number of wrongly identified background units to the total number of predicted background units:
The redefined YD represents the difference between the proportion of correctly predicted mineral units and the proportion of incorrectly predicted mineral-free units, with a value between −1 and +1. YD = 1 means that all the extracted mineralized indicative mineral information units are correct. YD > 0 means that the classifier effect is better than the random classification strategy. YD < 0 means that the classification effect is worse than the random classification strategy.
4.1 Sample data selection
The selection of training samples is the key link to mineral identification. The rationality and reliability of sample selection will directly affect the extraction effect of mineral information. Standard mineral spectral library and field measurement can acquire alteration mineral spectral data. The mineral spectrum measured in the field is more similar to the end-member spectrum of the hyperspectral image in the detection environment in aspects of atmospheric environment, illumination, and topography than the standard mineral spectral library. In this study, the spectrum of limonite and chlorite minerals measured in the field was taken as the standard spectrum. The field measured spectra were resampled according to the wavelength range of the ZY1-02D image. Furthermore, the single-pixel spectrum of hyperspectral images was extracted by spectral angle mapper (SAM) as the training sample for mineral extraction. The SAM matching is a method to measure spectral similarity by the included angle between pixel spectra and the reference spectral vector. The included angle is inversely proportional to the similarity of the two spectral curves. In the experiment, the number of altered samples can be adjusted by setting the spectral angle threshold. The threshold is proportional to the obtained sample size. The spectral angle is also proportional to the sample size. To obtain sufficient and more representative pixel spectra of minerals, the hierarchical regression statistical method was used to determine the optimal threshold of spectral angle. The specific calculation steps are as follows: First, the standard spectrum and hyperspectral image spectrum of limonite and chlorite minerals were used to calculate the spectral angle, respectively. The cosine operation was performed on the calculated results to obtain the cosine gray image. According to the principle of low angle and high cosine, the cosine value of the image was divided into five levels at the same time, and the number of multi-category pixels was counted and plotted. Finally, the least square regression method was used to get the piecewise fitting of the lower limit value of gray anomaly, and the inverse cosine value of the lower limit value of gray anomaly was solved to obtain the optimal threshold of spectral angle. In this calculation, the abnormal lower limits of the cosine gray scale of limonite and chlorite are 0.986 and 0.982, respectively, and the corresponding spectral angle thresholds are 0.160 and 0.190, respectively. Pixel spectra of limonite and chlorite were extracted by the spectral angle matching method as training samples. Spectral angles were set at 0.160 and 0.190, respectively, and the number of training samples extracted was 274 and 296, respectively (Figure 6).
4.2 Dimensionality reduction of spectral data
Hyperspectral images record ground object spectral information about continuous and close bands, with a large amount of data and rich information. Due to many bands, there are high collinearity and information redundancy between bands, which affect the extraction results of mineral information. In this study, the KPCA algorithm was used to extract characteristic band information on hyperspectral images. The KPCA is an improvement in principal component analysis (PCA). The basic idea is to combine the kernel function with PCA to effectively extract the characteristic variables from high-dimensional variables and solve the multicollinearity problem between variables to improve the representativeness and integrity of feature data. KPCA method was used to extract feature bands from ZY1-02D images. The specific steps are as follows:
A total of 149 spectral bands of limonite samples and chlorite samples were taken as input data, and the input band data were normalized;
The kernel function of KPCA was selected to calculate the kernel matrix K, and the radial basis kernel function was selected for this test;
Find the eigenvalues and eigenvectors of the kernel matrix K;
The cumulative contribution rate C i of the characteristic value was calculated, and the threshold K was set.
When the cumulative contribution rate was greater than the threshold, the previous I principal components were extracted, and the cumulative contribution rate threshold K was set as 90%. The dimension reduction results are shown in Figure 7. After KPCA dimension reduction, the number of spectral bands of limonite samples and chlorite samples was reduced to 13 and 17, accounting for 8.49 and 11.26% of the total number of spectral bands, respectively. The spectral information redundancy and the complexity of model calculation were greatly reduced.
4.3 Hyper parameter optimization with sparrow algorithm
The RF and GBDT model were applied to extract mineralized indicative mineral information in the study area. Although the two models have different principles for constructing an integrated learning framework, the basic learner of both RF and GBDT are the CART decision tree model. Furthermore, when an ensemble learning model is used to extract classification information, accurate selection of the internal key parameters of the model can effectively improve the model prediction accuracy and generalization performance. The main modeling parameters of the ensemble learning algorithm are shown in Table 1.
|Model||Principal parameters categories||Parameter meaning|
|RF||Growing number of trees (T)||The main parameter index affecting the degree of fitting, generally 100 by default|
|Growing depth of trees (H)||Mainly controls the complexity of the model and affects the generalization error of the model|
|Split nodes of trees (M)||Influence the computational speed of the model by controlling the diversity of features in trees|
|Max Features (F)||The main objective parameter is to reduce the variance in model estimation|
|Child nodes (C)||Mainly used to capture the training data noise|
|GBDT||Growing number of trees (T)||The main parameter index affecting the degree of fitting, generally 100 by default|
|Learning rate (v)||Control the row progress length of the descent gradient of the loss function|
|Growing depth of trees (H)||Mainly controls the complexity of the model and affects the generalization error of the model|
|Loss Function (L)||Affects the robustness and effectiveness of the model in classification problems|
|Subsampling (S)||The prediction accuracy of the model is improved by controlling the proportion of secondary sampling|
For the RF model based on the bagging parallel learning framework, the number of growth (T) and the number of split nodes (M) of the CART tree are two most important parameters for model building. A high increasing number will generate more split nodes to participate in the iterative calculation, thus increasing the complexity of model calculation. The key internal parameters of the GBDT model for boosting the serial learning framework are the number of CART tree growth (T) and the learning rate v, in which the numerical value of the learning rate (v) controls the fitting degree of the model. A smaller value of v means that more CART tree growth (T) is needed, prone to over-fitting. In order to obtain the best extraction effect of the model and improve the extraction accuracy of alteration mineral information, the SSA algorithm has been introduced to optimize the two key modeling parameters in RF and GBDT models, respectively. The optimization process is shown in Figure 8:
The initial parameters of the SSA algorithm contain the maximum iteration times G, the population size P, the ratio of the number of discoverers to the population size PD, the number of scouts SD, and the alarm value R 2. In this study, the maximum iteration times G = 40, the population size P = 50, the number of discoverers PD is 20% of the population size, the number of scouts SD = 5, and the alarm value R 2 = 0.8, are set as the default parameters to initialize the SSA algorithm. This experiment introduced YD as the fitness function value of the select SSA algorithm. The study area was divided into 1,705 grid cells with a size of 1 km × 1 km using the grid partitioning method. The results of 2 mineral extractions and the data layer containing 11 known ore deposits were rasterized to make the spatial precision of raster data consistent with the precision of the grid layer. The resulting layers extracted from the two kinds of minerals were overlapped with the layers of known ore deposits (points) in space, and the values of YD at different iterations of the SSA algorithm were calculated, respectively. Figure 9 shows that the YD values calculated by the model increase gradually with the increase in iteration times. When G = 31, the YD value of the abnormal information on the alteration of the chlorophyll-RF models recognized reaches the maximum of 0.472, and the corresponding number of decision tree T and split node M is 13 and 100, respectively. When G = 27, the YD value of limonite information identified by the SSA-RF model reaches the maximum of 0.480, and the corresponding number of decision tree T and split node M is 17 and 300, respectively. The SSA-GBDT model has a faster convergence speed and better extraction effect for limonite and chlorite information than the SSA-RF model. When G = 22, the YD curve of chlorite information identified by the SSA-GBDT model reaches a stable state with a value of 0.496. Meanwhile, the learning rate v = 0.08 and the number of decision trees T = 80 in the extraction of limonite information. When G = 24, the YD value of chlorite information extraction reaches the peak of 0.508, and the corresponding learning rate v and the number of decision trees T are 0.1 and 140, respectively.
4.4 Mineralized indicator mineral information extraction and assessment
The SSA parameter optimization algorithm is combined with the two integrated learning methods to construct the mineralized indicative mineral information extraction models by using the parameters obtained from the optimization calculation. The spectral band vector extracted from KPCA features was used as the input end of the model. The RF and GBDT models were used to extract information on limonite and chlorite minerals in the study area, respectively. In order to compare and analyze the parameter optimization effect of the SSA algorithm and the advantages of the integrated learning model, the extraction results from the mineral information from the ensemble learning model without parameter optimization by SSA and traditional machine learning SVM model were calculated, respectively. SVM is a machine learning model proposed by Vapnik based on the principle of minimizing structural risk in statistical learning theory with the help of quadratic optimization methods [37,77], and it is the most widely used traditional machine learning method in multi-source big data information extraction, which has a streamlined sample learning process and high stability and always occupies an important position in solving classification and regression problems . The spatial distribution map of mineralized indicative mineral information extraction was drawn based on the GIS software platform (Figure 10). From the results of mineral information extraction, five kinds of models to extract mineral information distribution and spatial distribution of trend show high similarity, the main distribution range is concentrated in the center of the study area, the northwest and southeast, and zonal distribution along the line direction, at the same time in the southwest and northeast are scattered in the study area.
Due to the small amount of information about single mineral extraction, it is not conducive to the quantitative evaluation of the extraction effect. Therefore, in order to better analyze different models of mineralized indicative mineral information recognition effect, the information of limonite and chlorite minerals extracted from the models was combined in this study to enhance the mathematical–statistical characteristics of mineral information. By spatial superposition with the data layers of known ore deposits, the YD and ODC of different extraction models were obtained (Table 2). The calculation process of YD is shown in Section 3.3, and the mathematical expression of ODC is defined as follows:
where n correct is the value of ODC, m inis the number of known deposits falling in the extracted mineral region, and M total is the total number of known deposits.
|SVM||0.518||6/11 = 0.546|
|RF||0.602||6/11 = 0.546|
|SSA-RF||0.647||7/11 = 0.636|
|GBDT||0.610||7/11 = 0.636|
|SSA-GBDT||0.661||8/11 = 0.727|
By comparing and analyzing the evaluation indexes of the extraction results in different models, it can be found that the constructed SSA-GBDT model has the best extraction effect. The YD and ODC calculated by the model are higher than other models. Although there is little difference in the ODC calculated by the extraction results of each prediction model, the values of YD calculated by the ensemble learning model are all over 0.6, which is significantly better than the traditional machine learning model SVM. By analyzing the extraction effects of RF and GBDT models before and after the optimization of the SSA algorithm, it can be seen that the evaluation index values of the integrated learning algorithm have been improved after optimization. YD and ODC of GBDT model changed greatly after optimization, YD increased from 0.610 to 0.661, and ODC increased by 0.091. In addition, the GBDT model with boosting serial learning is better than the RF model with bagging parallel learning after SSA optimization. The reason for the result is that the parallel type bagging training strategy with strong robustness is more suitable for large-scale dataset of study. For the training dataset with a small sample size in this study, boosting serial type weighted iterative algorithm can be used to reduce the model error and error rate and strengthen the expression ability of mineral information as weak information to achieve effective extraction of mineralized indicative minerals.
5.1 Feasibility demonstration of evaluation index selection
Based on the evaluation index calculation table, the evaluation of the model by YD and ODC generally shows the same rule. However, compared with the ODC, YD represents the percentage of benefit exceeding cost, which can describe the comprehensive characteristics of benefit and cost. It can provide a more reasonable evaluation standard for the accuracy of mineralized indicative minerals extraction from the perspective of minimizing the cost of prospecting and maximizing the benefit. In order to further verify YD’s ability to evaluate the extraction effect of mineral information of mineralization, according to the spatial distribution map of mineralized indicative minerals drawn (Figure 11), the pixel number of limonite and chlorite minerals identified by different models was counted. Figure 11 indicates that the range of minerals extracted by the traditional SVM models covers the grid units distributed by six known ore points, and the difference is relatively small compared with other models. However, the number of pixels identified by the traditional SVM model is much higher than other models, indicating that the traditional SVM model needs to pay a higher cost of prospecting when the difference in prospecting benefits is small. From the analysis of prospecting benefit, it is not reliable to evaluate the extraction effect of the model by the ODC, but the problem can be effectively solved by YD. The YD is negatively correlated with the prospecting cost and positively correlated with the prospecting benefit. The YD results calculated by each model show that although the mineral coincidence degree calculated by the traditional SVM model reaches 0.546, its value of YD was only 0.518. Similarly, when evaluating the performance of the SSA-RF and SSA-GBDT models, the value of ODC of the two models is 0.636. However, the YD of the constructed SSA-GBDT model is higher than that of SSA-RF, reflecting its better classification and extraction performance.
5.2 Evaluation of model accuracy using fault structure information
Fault structure and wall rock alteration are two important characteristics of metal mineral exploration. Especially for the hydrothermal alteration phenomenon in the study area, the wall rock alteration is closely related to the geological characteristics and spatial distribution of fault structures. Figure 12 shows the spatial superposition analysis results of mineralized indicative mineral information extracted by SSA-RF and SSA-GBDT models and spatial distribution information of fault structures in the study area. The results show that the spatial distribution of mineral information extracted by the two models is basically the same, mainly distributed among the adjacent structures or the intersection with the secondary structures and a little distributed over the intersection of the main structural traces and other structures. The distribution of mineralized indicative minerals is very similar to the spatial distribution trend of fault structure, which accords with the geological and metallogenic conditions of the study area. Detailed analysis and comparison show that the SSA-RF model has a poor identification effect on the Langzharigang-Longli-Gazhima fault with obvious tectonic action. Only a small amount of chlorite exists in the secondary tectonic zone relatively far from the main fault zone. In the densely covered area of the northeastern part of the study area, part of the mineral information extracted by the SSA-RF model is “pseudo-anomaly” information caused by flowing water erosion. Compared with the SSA-RF model, the SSA-GBDT model did not find obvious false information. In the northern part of the study area, part of the limonite alteration information was extracted near the Langzharigang-Longli-Gazhima fault with a circular distribution of the secondary tectonic tracks and was consistent with the spatial distribution of fault structures. The SSA-GBDT model is better than the SSA-RF model in identifying mineralized indicative mineral information.
The spatial distribution location and range of mineralized indicative mineral information extracted by SSA-RF and SSA-GBDT models were combined. In August 2021, a field survey of mineralized indicative mineral information was carried out in the Gouli area, Qinghai Province. Under the restriction of field conditions, four areas of mineral information concentration extracted from the two models were selected as key reconnaissance areas. The spatial distribution locations and field observation photos of the mineralized indicative mineral information verification points are shown in Figure 13. Position 1 is located in the concentrated area of mineralized indicative mineral information extracted by the SSA-RF and SSA-GBDT models. According to the field survey, two broken alteration zones are found, with a strike of 107° and a width of 2–5 m. Two or three quartz veins are distributed in the zone. The quartz vein is 10–30 cm wide, and ferritization is obvious, indicating that the two models have effectively extracted the mineralized indicative mineral information in this area. Position 2 is located in the study area with the highest altitude and obvious regional geological tectonics. According to the field survey, a mineralized alteration zone with a strike of 124° is found, in which ferritization and jarosite are developed, and quartz veinlets are developed. The two models can identify the mineralized indicative mineral information here, and the extraction effect of the SSA-GBDT model is closer to the exploration result.
Position 3 is located in the coverage area of the Gazhima Au–Ag deposit with obvious information on limonite alteration on the surface after field exploration. The field investigation found a broken alteration zone, in which limonite, jarosite, and green mud were developed. A small amount of mineral information was identified by the SSA-GBDT model at this point. However, the mineral information distribution failed to be extracted from the SSA-RF model. In Position 4, a large amount of mineral information distribution was extracted by both the models. Although there is no known distribution of ore points here, a quartz vein with a strike of 213° and a width of 10–60 cm was found through field survey, and there is obvious ferritization in it. The metallogenic geological environment in this region is superior, and the region is obviously controlled by faults. The Paleoproterozoic Jinshuikou group is the main source bed in this region. The late Triassic monzonitic granite body is the thermal dynamic condition for the formation of tectonic altered gold deposits. Therefore, the probability of forming hydrothermal gold deposits in this section is relatively high, which can be used as a key area for future mineral exploration.
There are certain limitations in this study, which need to be improved gradually in the future work. The effect of extracting mineral information based on hyperspectral remote sensing is often affected by the spectral purity of image end elements and regional geological features. In the process of image acquisition, the differences of weather conditions, topography, and surrounding environment in the study area may affect the spectral reflectance of image end elements, resulting in the phenomenon of “same spectrum, different object” and “same object, different spectrum,” which leads to the reduction in the accuracy of mineral information recognition. Therefore, this study selects the features of model input data from two perspectives: spatial distribution features and image spectral matching, in order to reduce the noise information interference and improve the accuracy of mineral information extraction.
In addition, referring to previous research results on the identification of mineral information, it is found that regional geological features largely influence the spatial distribution pattern of minerals, such as fracture tectonic movement, deposit genesis type, and other factors. Fracture tectonics can influence the formation process and spatial distribution pattern of metallic minerals by changing the deep underground temperature and pressure conditions, while mineral deposits are comprehensive geological bodies formed by multi-layer geological interactions, and the variability of mineral assemblages within the deposits is obvious due to different formation mechanisms of different genesis types. Therefore, regional geological features are another factor that restricts the extraction of hyperspectral remote sensing mineral information. In this study, based on the detailed analysis of geological features and mineral genesis types in Qinghai Gouli area, a hyperspectral mineral information extraction model was constructed by combining the sparrow optimization algorithm and integrated learning algorithm to carry out mineralized indicative mineral information extraction experiments. The model is applicable to areas with similar geological features as the study area, and is not universal. Further experimental analysis is needed for other areas in conjunction with regional geological conditions.
The SSA algorithm was combined with RF and GBDT integrated models, respectively, to construct a hyperspectral mineral information extraction model. With strong search ability and high stability, the SSA algorithm can search the optimal parameters efficiently in the global potential region. Compared with the traditional trial-and-error method, this method has faster convergence speed and higher operation efficiency, and can realize free switching between global optimization and local optimization. In addition, SSA can realize free switching between global optimization and local optimization. In this study, the optimal parameters of two integrated learning models were searched automatically by the SSA algorithm with YD as the fitness function. The accuracy of the two integrated learning models was greatly improved after parameter optimization of SSA using the effect of alteration minerals extracted from the combined model. The improvement in the accuracy ensures the validity and reliability of the model parameter optimization calculation in the application process of the machine learning method and effectively improves the performance of the integrated learning model in the extraction of mineralized indicative mineral information.
As evaluation indexes, YD and ODC were considered, respectively, to compare and analyze the extraction accuracy of mineralized indicative mineral information on SVM traditional machine learning model and integrated learning model, which contains RF and GBDT. The calculated evaluation index results prove that the integrated learning model constructed has higher performance stability, stronger generalization performance, and better effect of altered minerals extracted than the SVM model. For the training dataset with fewer samples in this experiment, the GBDT model under boosting serial learning framework has higher classification accuracy than the RF model under the bagging parallel learning framework. Furthermore, the GBDT model has a strong centralized recognition ability for mineral information and fast convergence speed. The iteration times of limonite and chlorite are within 25. The comparative analysis of YD and ODC shows that the ODC has serious generalization and poor reliability. In contrast, YD provides a more reasonable and reliable evaluation standard of mineral extraction accuracy from the perspective of minimizing the cost and maximizing the benefit of prospecting.
By comparing the extraction results of mineralized indicative mineral information, the distribution range and spatial distribution trend of mineral regions extracted by SSA-RF and SSA-GBDT models are relatively similar. The extraction results of the two models are consistent with the distribution characteristics of fault structures and known deposits in the study area, which accords with the geological and metallogenic characteristics of the study area. In the southeast of the study area, a large amount of mineralized indicative minerals was identified by applying different models, and the mineral exposure was found in the field verification. The metallogenic conditions in this area are superior, which can be a major area for future detailed geological and mineral exploration. It should be noted that due to the limitation of data acquisition sources, this study only carried out the extraction test of limonite and chlorite, two indicative minerals of mineralized, and did not analyze the extraction of other indicative minerals of mineralized types. Therefore, the relationship between mineralization and minerals should be fully considered in future studies, and the formation mechanism and spectral response relationship between indicator minerals with different mineralized types should be analyzed in detail.
The authors would like to thank the China Resources Satellite Application Center for providing ZiYuan-1-02D data. The anonymous reviewers’ suggestions to improve the quality of this manuscript are greatly appreciated.
Funding information: Scientific and Technological Transformative Special Project of Qinghai Province (2020-SF-150).
Author contributions: conceptualization: N.L. and H.L.; methodology: G.L.; validation: X.Y. and R.J.; investigation: M.W. and D.L.; writing – original draft preparation: N.L.; writing – review and editing: N.L. and H.L. All authors have read and agreed to the published version of the manuscript.
Conflict of interest: The authors declare no conflict of interest.
 Zhai FR, Yang ZX, Pan YQ, editors. Liaoning province molybdenum, rare earth and mineral resources exploration situation and demand forecasting. International Conference on Energy, Environment and Sustainable Development (ICEESD 2011); 2011 Oct 21–23. Shanghai Univ Elect Power, Shanghai, PEOPLES R CHINA. DURNTEN-ZURICH: Trans Tech Publications Ltd; 2012.10.4028/www.scientific.net/AMR.347-353.804Search in Google Scholar
 Peattie R, Chamberlain V, Flitton T. Mineral Resource and Mineral Reserve governance and reporting for AngloGold Ashanti. J S Afr Inst Min Metall. 2017;117(12):1101–4.10.17159/2411-9717/2017/v117n12a2Search in Google Scholar
 Li XM, Zhang XL, Du HR, editors. Economic Effect Analysis on the Development of Mineral Resources in Tarim River Basin. International Conference on Computational Materials Science (CMS 2011); 2011 Apr 17–18. Guangzhou, PEOPLES R CHINA. DURNTEN-ZURICH: Trans Tech Publications Ltd; 2011.10.4028/www.scientific.net/AMR.268-270.383Search in Google Scholar
 Tian LH, Bai GF, editors. Study on Methods and Application of Energy and Mineral Resources Assessment Based on GIS. 2nd International Conference on Energy, Environment and Sustainable Development (EESD 2012); 2012 Oct 12–14. Jilin, Peoples R China. Durnten-zurich: Trans Tech Publications Ltd; 2013.10.4028/www.scientific.net/AMR.616-618.84Search in Google Scholar
 Ren HB, Fan YL, editors. Study on correlative income of mining resources in China. 1st International Conference on Energy and Environmental Protection (ICEEP 2012); 2012 Jun 23–24. Hohhot, Peoples R China. Stafa-zurich: Trans Tech Publications Ltd; 2012.10.4028/www.scientific.net/AMR.524-527.355Search in Google Scholar
 Li CF, Wang AJ, Chen XJ, Chen QS, Zhang YF, Li Y. Regional distribution and sustainable development strategy of mineral resources in China. Chin Geogr Sci. 2013;23(4):470–81.10.1007/s11769-013-0611-zSearch in Google Scholar
 Bide T, Brown TJ, Gunn AG, Deady E. Development of decision-making tools to create a harmonised UK national mineral resource inventory using the United Nations Framework Classification. Resour Policy. 2022;76:11.10.1016/j.resourpol.2022.102558Search in Google Scholar
 Ding X, Wang YP, He ZC. The application research of satellite remote-sensing to exploration of hydrocarbon alteration information. Chin Sci Bull. 1993;38(17):1475–9.Search in Google Scholar
 Zhou KF, Zhang NN. Extraction of alteration mineral information from moderate remote sensing images using MPS method. J Indian Soc Remote Sens. 2018;46(1):89–96.10.1007/s12524-017-0668-8Search in Google Scholar
 Pan ZW, Liu JJ, Ma LQ, Chen FR, Zhu GC, Qin F, et al. Research on hyperspectral identification of altered minerals in yemaquan west gold field, Xinjiang. Sustainability. 2019;11(2):20.10.3390/su11020428Search in Google Scholar
 Etoh J, Izawa E, Watanabe K, Taguchi S, Sekine R. Bladed quartz and its relationship to gold mineralization in the Hishikari low-sulfidation epithermal gold deposit Japan. Econ Geol Bull Soc Econ Geol. 2002;97(8):1841–51.10.2113/gsecongeo.97.8.1841Search in Google Scholar
 Holley EA, Monecke T, Bissig T, Reynolds TJ. Evolution of high-level magmatic-hydrothermal systems: New insights from Ore paragenesis of the veladero high-sulfidation epithermal Au–Ag deposit, El Indio-Pascua Belt, Argentina. Econ Geol. 2017;112(7):1747–71.10.5382/econgeo.2017.4528Search in Google Scholar
 Duan JL, Tang JX, Mason R, Zheng WB, Ying LJ. Zircon U-Pb age and deformation characteristics of the Jiama porphyry copper deposit, tibet: Implications for relationships between mineralization, structure and alteration. Resour Geol. 2014;64(4):316.10.1111/rge.12043Search in Google Scholar
 Acosta ICC, Khodadadzadeh M, Tusa L, Ghamisi P, Gloaguen R. A machine learning framework for drill-core mineral mapping using hyperspectral and high-resolution mineralogical data fusion. Ieee J Sel Top Appl Earth Obs Remote Sens. 2019;12(12):4829–42.10.1109/JSTARS.2019.2924292Search in Google Scholar
 Lorenz S, Ghamisi P, Kirsch M, Jackisch R, Rasti B, Gloaguen R. Feature extraction for hyperspectral mineral domain mapping: A test of conventional and innovative methods. Remote Sens Environ. 2021;252:112129.10.1016/j.rse.2020.112129Search in Google Scholar
 Tripathi MK, Govil H. Regolith mapping and geochemistry of hydrothermally altered, weathered and clay minerals, Western Jahajpur belt, Bhilwara, India. Geocarto Int. 2022;37(3):879–95.10.1080/10106049.2020.1745302Search in Google Scholar
 Cui JC, Liu YJ, Pan MZ, Tang YG. The Integrative Design for Imaging Spectrometer. Spectrosc Spectr Anal. 2012;32(3):839–43.Search in Google Scholar
 Hu L, Gan S, Yuan XP, Li Y, Lu J, Yang ML. Airborne hyperspectral features of three types of typical surface vegetation in central Yunnan. Spectrosc Spectr Anal. 2021;41(10):3208–13.Search in Google Scholar
 Xia Z, Gu YF. Parameter feature extraction for hyperspectral detection of the shallow underwater target. Sci China-Technol Sci. 2021;64(5):1092–100.10.1007/s11431-020-1723-6Search in Google Scholar
 Chen Q, Zhao Z, Zhou J, Zhu R, Xia J, Sun T, et al. ASTER and GF-5 satellite data for mapping hydrothermal alteration minerals in the longtoushan Pb-Zn deposit, SW China. Remote Sens. 2022;14:5.10.3390/rs14051253Search in Google Scholar
 Roy S, Bhattacharya S, Omkar SN. Automated large-scale mapping of the jahazpur mineralised belt by a MapReduce model with an integrated elm method. PFG-J Photogramm Remote Sens Geoinf Sci. 2022;90(2):191–209.10.1007/s41064-021-00188-3Search in Google Scholar
 Shayeganpour S, Tangestani MH. Extraction of rock and alteration geons by FODPSO segmentation and GP regression on the HyMap imagery: A case study of SW Birjand, Eastern Iran. Ore Geol Rev. 2022;143:11.10.1016/j.oregeorev.2022.104767Search in Google Scholar
 Kumar C, Chatterjee S, Oommen T, Guha A. Automated lithological mapping by integrating spectral enhancement techniques and machine learning algorithms using AVIRIS-NG hyperspectral data in Gold-bearing granite-greenstone rocks in Hutti, India. Int J Appl Earth Obs Geoinf. 2020;86:102006.10.1016/j.jag.2019.102006Search in Google Scholar
 Deng K, Zhao H, Li N, Wei W. Identification of minerals in hyperspectral imagery based on the attenuation spectral absorption index vector using a multilayer perceptron. Remote Sens Lett. 2021;12(5):449–58.10.1080/2150704X.2021.1903612Search in Google Scholar
 Monge DA, Holec M, Zelezny F, Garino CG. Ensemble learning of runtime prediction models for gene-expression analysis workflows. Clust Comput. 2015;18(4):1317–29.10.1007/s10586-015-0481-5Search in Google Scholar
 Sheng XC, Ma JX, Xiong WL. Smart soft sensor design with hierarchical sampling strategy of ensemble Gaussian process regression for fermentation processes. Sensors. 2020;20(7):21.10.3390/s20071957Search in Google Scholar PubMed PubMed Central
 Mostafaei S, Ahmadi A, Shahrabi J. Dealing with data intrinsic difficulties by learning an interPretable Ensemble Rule Learning (PERL) model. Inf Sci. 2022;595:294–312.10.1016/j.ins.2022.02.048Search in Google Scholar
 Holliday A, Barekatain M, Laurmaa J, Kandaswamy C, Prendinger H. Speedup of deep learning ensembles for semantic segmentation using a model compression technique. Comput Vis Image Underst. 2017;164:16–26.10.1016/j.cviu.2017.05.004Search in Google Scholar
 Zhao ZY, Zhang Y, Liao HJ. Design of ensemble neural network using the Akaike information criterion. Eng Appl Artif Intell. 2008;21(8):1182–8.10.1016/j.engappai.2008.02.007Search in Google Scholar
 Mohammed I, Al Shehri D, Mahmoud M, Kamal MS, Alade OS. Feature ranking and modeling of mineral effects on reservoir rock surface chemistry using smart algorithms. ACS Omega. 2022;7(5):4194–201.10.1021/acsomega.1c05820Search in Google Scholar PubMed PubMed Central
 Pipan T, Christman MC, Culver DC. Abiotic community constraints in extreme environments: Epikarst copepods as a model system. Diversity-Basel. 2020;12(7):16.10.3390/d12070269Search in Google Scholar
 Bojorquez J, Sonia Ruiz E, Bojorquez E, Reyes-Salazar A. Probabilistic seismic response transformation factors between SDOF and MDOF systems using artificial neural networks. J Vibroeng. 2016;18(4):2248–62.10.21595/jve.2016.16506Search in Google Scholar
 Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev. 2015;71:804–18.10.1016/j.oregeorev.2015.01.001Search in Google Scholar
 Chen W, Liu Y, Wang L, Liu X, editors. A Study of the Multi-Objective Evolutionary Algorithm Based on Elitist Strategy. Asia-Pacific Conference on Information Processing (APCIP 2009); 2009 2009 Jul 14–19. Shenzhen: Peoples R China; 2009.Search in Google Scholar
 Jiang J-J, Wei W-X, Shao W-L, Liang Y-F, Qu Y-Y. Research on large-scale bi-level particle swarm optimization algorithm. IEEE Access. 2021;9:56364–75.10.1109/ACCESS.2021.3072199Search in Google Scholar
 Liu GY, Shu C, Liang ZW, Peng BH, Cheng LF. A modified sparrow search algorithm with application in 3D route planning for UAV. Sensors. 2021;21(4):21.10.3390/s21041224Search in Google Scholar PubMed PubMed Central
 Gao B, Shen W, Guan H, Zheng L, Zhang W. Research on multistrategy improved evolutionary sparrow search algorithm and its application. IEEE Access. 2022;10:62520–34.10.1109/ACCESS.2022.3182241Search in Google Scholar
 Wu MH, Yang CB, Zhang YH, Lin N, editors. Study on driving forces of wetland change in the Western Liaohe River basin based on random forest model. International Symposium on Resource Exploration and Environmental Science (REES), 2017 Apr 14–16; Ordos, Peoples R China, Bristol: Iop Publishing Ltd; 2017.10.1088/1755-1315/64/1/012009Search in Google Scholar
 Raczko E, Zagajewski B. Comparison of support vector machine, random forest and neural network classifiers for tree species classification on airborne hyperspectral APEX images. Eur J Remote Sens. 2017;50(1):144–54.10.1080/22797254.2017.1299557Search in Google Scholar
 Sun DL, Wen HJ, Wang DZ, Xu JH. A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology. 2020;362:14.10.1016/j.geomorph.2020.107201Search in Google Scholar
 Zhou J, Jiang ZB, Chung FL, Wang ST. Formulating ensemble learning of SVMs into a single SVM formulation by negative agreement learning. IEEE Trans Syst Man Cybern-Syst. 2021;51(10):6015–28.10.1109/TSMC.2019.2958647Search in Google Scholar
 Zhu YS, Zhu XR, Wang J. Ensemble learning-based intelligent fault diagnosis method using feature partitioning. J Vibroeng. 2013;15(3):1378–92.Search in Google Scholar
 Wang G, Ma J, Yang SL. IGF-bagging: Information gain based feature selection for bagging. Int J Innov Comp Inf Control. 2011;7(11):6247–59.Search in Google Scholar
 Agarwal S, Chowdary CR. A-Stacking and A-Bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection. Expert Syst Appl. 2020;146:10.10.1016/j.eswa.2019.113160Search in Google Scholar
 Speiser JL, Miller ME, Tooze J, Ip E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl. 2019;134:93–101.10.1016/j.eswa.2019.05.028Search in Google Scholar PubMed PubMed Central
 Kim JC, Lee S, Jung HS, Lee S. Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea. Geocarto Int. 2018;33(9):1000–15.10.1080/10106049.2017.1323964Search in Google Scholar
 Chen YY, Zheng WZ, Li WB, Huang YM. Large group activity security risk assessment and risk early warning based on random forest algorithm. Pattern Recognit Lett. 2021;144:1–5.10.1016/j.patrec.2021.01.008Search in Google Scholar
 Zhang CS, Zhang Y, Shi XJ, Almpanidis G, Fan GJ, Shen XJ. On incremental learning for gradient boosting decision trees. Neural Process Lett. 2019;50(1):957–87.10.1007/s11063-019-09999-3Search in Google Scholar
 Li LL, Xiong JL, Tseng ML, Yan Z, Lim MK. Using multi-objective sparrow search algorithm to establish active distribution network dynamic reconfiguration integrated optimization. Expert Syst Appl. 2022;193:18.10.1016/j.eswa.2021.116445Search in Google Scholar
 Xiong Q, Zhang XM, He SB, Shen J. A fractional-order chaotic sparrow search algorithm for enhancement of long distance iris image. Mathematics. 2021;9(21):17.10.3390/math9212790Search in Google Scholar
 Sun WZ, Zhang HJ, Tseng ML, Zhang WP, Li XY. Hierarchical energy optimization management of active distribution network with multi-microgrid system. J Ind Prod Eng. 2022;39(3):210–29.Search in Google Scholar
 Dong ZS, Li X, Luan F, Zhang DH. Prediction and analysis of key parameters of head deformation of hot-rolled plates based on artificial neural networks. J Manuf Process. 2022;77:282–300.10.1016/j.jmapro.2022.03.022Search in Google Scholar
 Liu D, Fu Q, Xu D, Liu DP, Huang Y, Li M, et al. New method for diagnosing resilience of agricultural soil-water resource composite system: Projection pursuit model modified by sparrow search algorithm. J Hydrol. 2022;610:12.Search in Google Scholar
 Hajian-Tilaki K. The choice of methods in determining the optimal cut-off value for quantitative diagnostic test evaluation. Stat Methods Med Res. 2018;27(8):2374–83.10.1177/0962280216680383Search in Google Scholar PubMed
 Schisterman EF, Perkins N. Confidence intervals for the Youden index and corresponding optimal cut-point. Commun Stat-Simul Comput. 2007;36(3):549–63.10.1080/03610910701212181Search in Google Scholar
 Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89(4):468–73.10.1080/17453674.2018.1453714Search in Google Scholar PubMed PubMed Central
 Ji DZ, Zhang Z, Xu HG. Evaluation of serum CEA for the gastrointestinal cancer diagnosis using different cut-off values. Int J Clin Exp Pathol. 2016;9(8):7807–12.Search in Google Scholar
 Huan S, Dai J, Song SL, Zhu GN, Ji YH, Yin GP. Stroke volume variation for predicting responsiveness to fluid therapy in patients undergoing cardiac and thoracic surgery: A systematic review and meta-analysis. BMJ Open. 2022;12(5):12.10.1136/bmjopen-2021-051112Search in Google Scholar PubMed PubMed Central
© 2022 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.