Optimal diagnosis of the skin cancer using a hybrid deep neural network and grasshopper optimization algorithm

Abstract When skin cells divide abnormally, it can cause a tumor or abnormal lymph fluid or blood. The masses appear benign and malignant, with the benign being limited to one area and not spreading, but some can spread throughout the body through the body’s lymphatic system. Skin cancer is easier to diagnose than other cancers because its symptoms can be seen with the naked eye. This makes us to provide an artificial intelligence-based methodology to diagnose this cancer with higher accuracy. This article proposes a new non-destructive testing method based on the AlexNet and Extreme Learning Machine network to provide better results of the diagnosis. The method is then optimized based on a new improved version of the Grasshopper optimization algorithm (GOA). Simulation of the proposed method is then compared with some different state-of-the-art methods and the results showed that the proposed method with 98% accuracy and 93% sensitivity has the highest efficiency.


Introduction
The skin is the protective layer of the body that covers it all around and protects us from sunlight, heat, cold, superficial damage such as wounds and scratches, infection, and penetration of bacteria and viruses. Between the various layers of skin, there are two main layers called the epidermis and dermis that act as a protector. The dermis is a layer that contains blood, hair follicles, and glands. The epidermis contains three main types of cells called squamous cells, basal cells, and melanocytes.
When skin cells divide abnormally, it can cause a tumor or abnormal lymph fluid or blood. The masses appear benign and malignant, with the benign being limited to one area and not spreading, but some can spread throughout the body through the body's lymphatic system [1]. Skin cancer is easier to diagnose than other cancers because its symptoms can be seen with the naked eye.
The most common causes of skin cancer are exposure to ultraviolet (UV) rays from direct sunlight or exposure to chemicals produced by certain types of light bulbs [2]. These two factors alter the DNA of the cells we talked about above, altering the growth and development of these cells and turning them into cancerous masses [3].
The initial formation of cancer in an organ is called primary cancer. A malignant mass that has not yet spread to other parts of the body is called a local. These masses can grow into their blood vessels by attacking the surrounding tissues. Secondary cancer or metastasis occurs when cancer cells grow elsewhere and form a new mass. Therefore, the initial diagnosis of skin cancer can be so useful for the early detection of this cancer. Diagnosis of skin cancer is usually possible with a biopsy. But in most cases, this method is a suffering process in both pain and time for the patient. Recently, methods such as dermoscopy have been used to help diagnose suspected lesions, but ultimately a skin biopsy should be used to diagnose the nature of any suspected skin lesions. In recent years, research has been conducted on methods for rapid and accurate diagnosis of skin cancer from dermoscopic images with different diagnostic accuracy. For example, Zhang et al. [4] analyzed the diagnose of skin diseases using an optimum Convolution Neural Network (CNN). Quick diagnose of skin melanoma helps prevent the disease. One of the most widely used methods in the diagnosis of skin diseases is the use of image processing.
In this study, a new method of CNN based on the Whale optimization algorithm was used to diagnose melanoma.
Xu et al. [5] diagnosed melanoma diseases using the K-Fuzzy C-, a technique based on the development of the Red fox optimization (RFO) algorithm. In this study, an optimum pipeline process was used to accurately detect the melanoma spots from dermoscopic images. First, after pre-processing of the image, areas of the skin were divided into parts by the C-means Kernel Fuzzy technique. Then, the main features of the divided skin are optimally selected by the optimized algorithm. The results showed that the optimum K-Fuzzy C-means was the accurate diagnosis of melanoma spots on the skin. K-Fuzzy C-a technique based on the development of the RFO algorithm provided more accurate and reliable results in the classification of the skin and the detection of melanoma spots on the skin.
Tan et al. [6] used an intelligent technique to detect melanoma spots on the skin. They used Particle swarm optimization (PSO) methods and deep learning techniques. The deep CNN is optimized by the PSO model. In this research, the PSO method was used to optimize the identification of melanoma areas on the skin. The results of the developed deep CNN were compared with classical methods and statistical tests. The results showed that developed deep CNN had a better ability to detect damaged areas of the skin and melanoma spots compared to classical methods.
Parsian et al. [7] detected melanoma spots in dermoscopic images using the Wildebeest herd optimization (WHO) algorithm. A common method for diagnosing skin cancer is a non-invasive dermatoscopic method based on ocular inference. Therefore, it is difficult for specialists to diagnose melanoma spots on the skin. Therefore, the use of artificial intelligence techniques can increase the accuracy of diagnosing melanoma. In this study, deep learning optimized by WHO algorithm was used to detect melanoma spots on the skin. The suggested model was implemented on the ISIC-2008 skin cancer dataset. The data analysis showed that this method has a high ability to diagnose the disease.
Khamparia et al. [8] used the deep learning method to detect the cancerous spots on the skin. Diagnosis and classification of skin cancer in the early stages of development can increase the possibility of recovery of patients. For this purpose, the CNN was used to distinguish benign from malignant spots. Observation of the results showed good performance of the CNN in the diagnosis and classification of malignant skin lesions.
It can be observed from the literature that there are different research works for the diagnosis of skin cancer from dermoscopy. The results also show that using metaheuristic algorithms for this purpose is exponentially increasing. This study uses a hybrid technique based on deep learning and metaheuristics for the diagnosis of skin cancer. The new metaheuristic is based on an improved version of the Grasshopper optimization algorithm (GOA) which provides results with higher accuracy and precision.

Dataset
The designed module is an optimized deep learning-based system that includes a general form to diagnose cancer from the input images, directly. The designed diagnosis in this study has been programmed in MATLAB R2019b environment and its results are verified by applying to a database, called PH 2 [9]. The PH 2 database includes some different dermoscopic images that are gathered from the Dermatology Service of Hospital Pedro Hispano (Matosinhos, Portugal) under identical conditions. The dataset includes 8-bit 768 × 560 resolution RGB color images. The total dataset includes 200 dermoscopic images with 80 atypical nevi, 80 common nevi, and 40 melanomas. This database is available at: https://www.fc.up.pt/addi/ph2% 20database.html.
The training and the test data for the benchmarks are set at 80 and 20%, respectively. Figure 1 shows some samples in the PH 2 dataset in this study.

AlexNet
AlexNet is a family of deep neural networks and a subdivision of the CNN, which is designed by Alex Krizhevsky and Ilya Sutskever, and Geoffrey Hinton [18]. The Alex network does well the diagnosis targets, for example, the classification of the ImageNet dataset with high precision [10]. In this study, we used the batch normalization (BN) technique for improving the AlexNet reliability to be used as a diagnostic system in skin cancer detection. Due to the higher complexity of the database images because of their higher variance in terms of brightness, the distributions of the inputs in AlexNet are different in each layer. This process increases the complexity of the system elapsed time during training of the parameters with good initialization. To resolve this problem, BN has been utilized [11]. With CNN training based on the minibatch technique, a normalization transform is used by the activations of the layer to recall the constant means and variances. So, with a random valuation of a set of variables, , their mean value and variance are formulated as follows: Consequently, the normalized values ( x i ) are modeled as follows: where ε is used for preventing steadiness. Since, in some cases, the normalized activations are not the purpose of the learning goal, the following transformation is used for that target: where a and b are two adjustable parameters in the minibatch. By considering the BN, the speed for training in the CNNs has been accelerated, such that their independence increases from the initial values of the parameters. Furthermore, BN adjusts and enhances the networkability generalization.

Extreme learning machine (ELM)
Because of the dependency of the AlexNet to the previous fully connected layers, it is better to improve it to get better results. Therefore, the network is combined with a popular efficient network, called ELM. This report presents the model and its relationship with SVM-based models. These models are in the field of binary classification. Of course, with techniques such as the one against all and one against one, they can be developed in several categories. The ELM is a simplified integration of the PSVM, LS-SVM, and regulatory algorithms. The hidden layers of the ELM model do not need to be tuned, and these layer functions are determined. Therefore, this network has been used for improving the accuracy of the model. A general form of an ELM model has been illustrated in Figure 2.
In Figure 2, b defines the bias of the hidden layer, w, and β describe the input and output weights, respectively, x and O represent the input and output.
An important reason for utilizing the ELM network along with AlexNet for skin cancer diagnosis is that it doesn't need more iterations for training, which enhances its efficiency in terms of convergence.
With assuming a training set M: where x i and t i represent the input vector and label, respectively. The output matrix for hidden layer H has been obtained by the following equation: where N describes the number of hidden nodes and ( ) ⋅ f represents the hidden layer activation function. Finally, the target is to deliver the ELM model output, like the actual sample labels, that is: where So, the θ obtained by the following equation: where t defines the pseudo-inverse operator.
As mentioned before, the ELM model has been used to replace the preceding layers to decrease the complexity of the system for the diagnosis purpose.
One important case in designing the weights and biases in the conventional methods is that they are selected randomly. Here to provide a more optimal model for this study, the weights and biases are selected optimally based on a new improved version of the GOA.  [12]. The proposed algorithm is mathematically modeled and proposed inspired by grasshopper attack behavior in nature to solve optimization problems. Grasshoppers are small insects. But because of the damage, they do to agricultural products, they are a serious pest for crops. Although grasshoppers are usually found alone in the wild, they belong to the largest group of insects. The size of a group of grasshoppers can be on a continental scale and can be a big nightmare to the farmers. A unique aspect of grasshoppers is their group behavior in childhood and adulthood. Millions of baby grasshoppers jump and move like rolling hoses, eating and destroying almost every product in their path.
When they grow up, they form groups in the air and travel long distances to migrate. The main feature of the grasshopper group in the larval stage is their slow movement and small steps. In contrast, sudden movement is the main feature of larger grasshoppers. Searching for food is another important feature of the grasshopper group.
The main article of the GOA claims that grasshopper life inherently has both exploitation and exploration. In this way, immature grasshoppers have smooth and continuous movements, and next to them, adult grasshoppers have completely random and mutant movements. Therefore, they have the role of exploitation and exploration, respectively. As a result, modeling the GOA leads to the creation of a powerful and appropriate algorithm [13].
Therefore, if this behavior is mathematically modeled, a new nature-inspired algorithm can be designed. The mathematical model used to simulate the group behavior of grasshoppers is as follows: where X i is the position of the i grasshopper, S i is the social interaction, G i is the gravitational force in the i grasshopper, and is the horizontal motion of the wind. Note that to create a random behavior, the equation can be written as where r r 1, 2 , and r 3 are random numbers in the range [ ] 0, 1 . The S i function, which defines social interaction, is calculated as follows: where d ij is the distance between grasshopper i and grasshopper j and is calculated as  (11) where f represents the adsorption intensity and l represents the adsorption length scale. The G i factor in Equation is calculated as follows: where g is the constant of gravity and e g represents a single vector toward the center of the earth. The factor A i in equation is calculated as follows: where u is a floating object constant and e w is a unit vector in the wind direction. Baby grasshoppers have no wings. As a result, their motion is highly dependent on the wind direction. By substituting G, S, and A in Equation this equation can be defined as follows: The situation update is as follows: where ub d is the upper range in the D dimension, lb d is the low range in the D dimension, d is the value of the D dimension in the target (the best solution obtained), and β is the reduction coefficient to minimize the neutral zone and the gravity-repulsion zone. The equation shows that the next position of a grasshopper is defined based on the current position, the target position, and the position of the other grasshoppers. Note that the first factor in this equation is the current position of the grasshopper relative to the other grasshoppers. Keep in mind that all grasshoppers need to be able to determine the position of the search agents around the target.
To balance exploration and exploitation, a parameter is needed to reduce repetition. This increases the utilization factor, while also increasing the repetition rate. Parameter β has been used twice in the above equation for the following reasons: -Intra-sigma coefficient β reduces the gravitational-repulsive zone and the neutral zone between the grasshoppers. -The coefficient β outside Sigma strikes a balance between exploration and exploitation To balance the two characteristics of exploration and operation, the update coefficient is considered as a geometric sequence, which is calculated as follows: where β max is the maximum value and β min is the minimum value, w is the geometric coefficient, it is the current iteration, and MaxIt is the maximum number of iterations. For this purpose, the update coefficient w is calculated as follows: where W max is the maximum and W min is the minimum, it is the current iteration, and MaxIt is the maximum iteration.

Improved GOA
The original GOA has some shortcomings like premature convergence and lower consistency. This issue motivates us to design an improved version of this algorithm with modifications on it about the aforementioned issues. Here we used two modifications to improve the algorithm.

The quasi-oppositional learning (quasi-OBL)
Quasi-oppositional learning is studied here to improve the convergence speed of the algorithm. Based on the OBL mechanism, the randomly generated candidate has been compared with its symmetric value to select the best one during the process [14]. By considering the ith integer ( ) X i in a D-dimensional search space with Lb and Ub as lower and upper limitations, the symmetric value has been obtained by the following equation: Besides, the quasi-opposite value ( ) X i of the ith integer (X i ) is obtained by the following equation:

Merit function (MF)
The MF is another modification that can be utilized for improving the algorithm consistency. This mechanism provides a proper balance between exploration and exploitation. Based on this mechanism, the optimization process begins with large steps (exploration), and then, it gradually decreases its steps (exploitation). The MF can be formulated as follows: where ( ) mF X i describes the MF that is obtained as follows: where X 0 and ( ) ∇ g X T i signify random value and the gradient vector of ( ) g X i j at point X i .

Algorithm authentication
To validate the effectiveness of the proposed improved GOA, it has been applied to four standard benchmark functions including two unimodal and two multimodal basic functions including Schwefel 2.22 function, Sphere function, Quartic function, and Rosenbrock function. The studied functions are explained completely in the following.

(22)
Sphere: a function with 30 dimensions that is limited in the range [−100, 100]. The mathematical formula for this function is as follows: where the minimum value of all the abovementioned functions is 0.
To verify the efficiency of the proposed improved GOA, it is compared with some popular and new algorithms including Black hole (BH) [15], Multi-verse optimizer (MVO) [16], Emperor penguin optimizer (EPO) [17], and the original GOA [18]. Due to the stochastic behavior of the presented improved GOA, it is run 30 times, independently.
To get a fair analysis, the maximum iteration number is set at 200 and the population size is set at 35. The programming has been implemented on a 64-bit MATLAB R2019b environment. The configuration of the system is given in Table 1. Table 2 indicates the simulation results of the presented improved GOA and its comparison with some state-of-the-art metaheuristics based on the mean value (MEAN) and the standard deviation (SD) value.
As can be inferred from Table 2, the presented IGOA has the minimum value of the results in terms of the mean value for all four benchmark functions which indicates the better accuracy of this algorithm toward the comparative algorithms. Also, on checking the standard deviation in the proposed algorithm, the minimum value of this parameter in the functions shows its higher consistency toward the other state-of-the-art algorithms.

The proposed network
This part of the article explains the method of optimization for the proposed combined AlexNet and ELM net by considering the batch normalized technique and the design improved GOA. The method starts with a pretrained AlexNet for extracting the features from the dermoscopy images. To resolve the internal covariate shifting problem, BN is performed on the layers. Because of the number of classes in this study, i.e., cancerous and normal, the last three layers of the pre-trained network should be modified. Because the default output number for this network is 1,000 nodes.
We also added six numbers of normalization layers, after the convolution and pooling layers. Finally, the ELM network has been addended as the classifier part of the AlexNet. The best numbers of the layers are achieved based on trials and errors. For optimal designing of the ELM model, its weights and biases are optimally selected based on the proposed improved GOA. To use the proposed IGOA for the network optimization, the following objective function has been utilized: where N defines the number of training samples, d and y i describe the output of the ELM network and the image label, respectively.

Experimental results
The performance analysis has been evaluated based on six parameters, accuracy, specificity, precision, F1 score, sensitivity, and Matthew's correlation coefficient (MCC). The mathematical model of the mentioned measure formulations has been given below: where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative, respectively. Table 3 illustrates the performance analysis of the proposed method toward some other state-of-the-art methods including AlexNet [20], CNN [21], and RCNN [22] (Figure 3).  From Table 3, it has been experimentally proved that the proposed AlexNet-ELM-IGOA technique outperforms the other analyzed methods. For more clarification, it is clear that the proposed methodology with 98% accuracy, 96% precision, 96% specificity, 94% F1-score, 93% sensitivity, and 91% MCC has the highest values for all the measurements.
To provide more analysis for the proposed AlexNet-ELM-IGOA technique, its results are compared with some other methods including Brinker et al. [23], Mustafa and Kimura [24], Babino et al. [25], Hagerty et al. [26], and Bi et al. [27] from literature. To perform the analysis, sensitivity, accuracy, specificity, and negative predictive value (NPV), and positive predictive value (PPV) measures are utilized where: The comparison results of the simulation are given in Table 4.
To provide a graphical clarification, the results are also shown in Figure 4.
As can be seen from Table 3 and Figure 4, the proposed method has the highest accuracy again which shows its superiority toward the second series comparative algorithms; however, Brinker et al. and Babino et al. methods with 84 and 82% are placed in the second and the third

Conclusion
Melanoma is the most dangerous skin cancer with a high mortality rate, and the most worrying thing is that the more fashionable tanning becomes in the world, the higher the incidence of this disease. The main benefit of diagnosing the first symptoms of melanoma is seeing a doctor and getting treatment very quickly, which will be more helpful. One non-destructive test for this purpose is to use dermoscopy images. To reduce human errors, recently, image processing and artificial intelligence techniques have been utilized. Therefore, in this study, a new configuration of the deep learning based on the AlexNet and ELM network was utilized to provide better results of the diagnosis. To get better results, the weights and biases of the network were optimally selected based on an improved version of the GOA. The final results showed that the proposed method with 98% accuracy and 93% sensitivity provides the highest accuracy compared to the other methods.

Funding information:
The authors received no financial support for the research, authorship, and/or publication of this article. Babino's [26] Accuracy NPV PPV Specificity Sensitivity Figure 4: The classification analysis of the proposed method toward some other state of the art methods.