Diagnosis of diabetes in pregnant woman using a Chaotic-Jaya hybridized extreme learning machine model

Abstract As stated by World Health Organization (WHO) report, 246 million individuals have suffered with diabetes disease over worldwide and it is anticipated that by 2025 this estimation can cross 380 million. So, the proper and quick diagnosis of this disease is turned into a significant challenge for the machine learning researchers. This paper aims to design a robust model for diagnosis of diabetes using a hybrid approach of Chaotic-Jaya (CJaya) algorithm with Extreme Learning Machine (ELM), which is named as CJaya-ELM. In this paper, Jaya algorithm with Chaotic learning approach is used to optimize the random parameters of ELM classifier. Here, to assess the efficacy of the designed model, Pima Indian diabetes dataset is considered. Here, the designed model CJaya-ELM, has been compared with basic ELM, Teaching Learning Based Optimization algorithm (TLBO) optimized ELM (TLBO-ELM), Multi-Layer Perceptron (MLP), Jaya algorithm optimized MLP (Jaya-MLP), TLBO algorithm optimized MLP (TLBO-MLP) and CJaya algorithm optimized MLP models. CJaya-ELM model resulted in the highest testing accuracy of 0.9687, sensitivity of 1, specificity of 0.9688 with 0.9782 area under curve (AUC) value. Results reveal that CJaya-ELM model effectively classifies both the positive and negative samples of Pima and outperforms the competitors.


Introduction
Diabetes is dreadful for human beings as it threatens them irrespective of their age and gender. It is not a disease caused by any pathogens, but the deficiency of insulin. However, its impact is so harmful upon vital organ that it is regarded as a mother of all diseases. The impact of diabetes is worst on women in comparison to men due to lower longevity rate and substandard condition of life. According to World Health Organization (WHO) reports, majority of the women who are affected by diabetes haven't any information about it. Especially in case of pregnant women, this disease can be transmitted to their offspring. In case of diabetic women, they are suspectable to miscarriage, kidney failure, heart strokes, blindness and other chronic and fatal ailments [1]. Due to this purpose, a faster diagnosis of diabetes in case of pregnant woman is very much essential.
Normally, a person is said to be diabetic, if his/her blood sugar level is above the range of 4.4-6.1 mmol/L [1]. Generally, less hormone production or no proper insulin production occurs in diabetic patients. Three types of diabetic patients are often seen, which are Gestational, Type 1 and Type 2 [2]. Due to the damage of insulin secretion of pancreatic cells, Type 1 diabetes occurs at early ages in the cases of teens which is termed as autoimmune disease. Type 2 diabetes occurs when the different parts of the body become immune to insulin and pancreas is not able to produce the required amount of insulin. Pregnant women generally suffer from Gestational diabetes when the pancreas cannot be able to make the needed quantity of insulin. All the complicacy due to diabetes can be avoided, if it is diagnosed at an early stage.
Since a decade back it is of the top challenges for machine learning researchers to diagnose diabetes. Iyer et al. [3] applied classification mining methods for the identification of diabetes. T. Santhanam et al. [4] used K-Means with genetic algorithms for dimensionality reduction of diabetes data and applied Support Vector Machine (SVM) for classifying the diabetes data. Kavakiotis et al. [5] established the link between machine learning approaches and diabetes research. R Gargeya and T Leng [6] introduced automatic detection of diabetes by applying a deep learning approach. Md.Maniruzzaman et al. [7] proposed a comparative method of diabetes data classification by employing a machine learning model. H. Kaur et al. [8] introduced a predictive model for diabetes by implementing a machine learning technique. R. F. Mansour et al. [9] applied a deep learning for automatic diagnosis of diabetes. S. Perveen [10] developed a predictive model for identifying diabetes using machine learning techniques. Siva Shankar G. et al. [11] introduced an optimized fuzzy rule based grey wolf optimization algorithm for identifying diabetes.
Still it is challenging to handle the complexity of diabetes after a variety of studies on designing better classifiers. As a result, scopes for solving such problem are always open. So far, many statistical classifiers as well as many soft computing techniques have been used successfully for classification. Some of them are: Multilayer perceptron (MLP) [12], Bayesian decision theory [13], Euclidean minimum distance (EMD) [14], k-nearest neighbour (KNN) classifiers [15], fuzzy rule-based systems [16], SVM [12][13][14]17] and Back-Propagation (BP) [18] classifiers. These traditional learning approaches suffer from large number of shortcomings like: as trapping at local optima, unvarying learning rate and adjustment of random parameters [19]. To deal with these drawbacks Huang et al. introduced extreme learning machine (ELM) [20] algorithm. This is otherwise known as generalized single-hidden layer feed forward networks (SLFNs). In spite of sensible generalization ability with quicker learning speed, ELM has some limitations [21,22]. The choice of a better activation function and random parameters in this classifier may produce the unstable solutions. While solving classification and regression problems, these random parameters may create uncertainty. To minimize the training error rate, the output weights of ELM are estimated from randomly chosen input weights and hidden biases [23]. For the optimum selection of random parameters, the ELM model is optimized with various meta-heuristic learning algorithms viz. evolutionary algorithms, swarm intelligence and other nature inspired algorithms. From evolutionary learning algorithms, genetic algorithm (GA) hybridised ELM [24] as well as differential evolution (DE) hybridised ELM [25] have already been successfully used for the classification of medical data. In swarm intelligence algorithms, ELM optimized with Particle Swarm Optimization (PSO) algorithm [26] has been effectively applied. From other nature inspired algorithms like Cuckoo Search (CS) [27] and Cat Swarm Optimization (CSO) hybridised ELM [28] have been successfully applied by the researchers.
In this study, a recently developed Jaya optimization algorithm [29] is applied to optimize random parameters of ELM. This algorithm is introduced by R. Venkata Rao which is able to handle both unconstrained and constrained optimization problems. It is designed on the basis of keeping the best one and removing the worst one. Recently, many researchers [30][31][32][33] have applied this algorithm for solving different problems. As Jaya algorithm is working with two random variables it may produce suboptimal result. To deal with this problem, Chaos theory is integrated with Jaya algorithm. The random numbers of Jaya algorithm are generated by adopting a chaotic random number generator which not only produces optimal result but also improves the convergence speed and provides the better exploration of the search space without trapping in local optima. In this work, Chaos theory upgraded Jaya algorithm [34] is hybridised with ELM and designs a robust classifier, called as the Chaotic Jaya-ELM (CJaya-ELM) model.
The remaining part of the study is structured as follows: the model description part is described in section 2, section 3 focuses on all the methodologies related to this study, section 4 enlightens all the experimental part, section 5 describes the results analysis part and section 6 is ended with the conclusion part.

Model description
The overall architecture of diabetes data classification model is depicted in Figure 1. Here, Pima Indian diabetes dataset is considered to test all the models. The source of this dataset is UCI repository [35]. The attributes of this dataset contain the following information of a pregnant woman such as the number of times a woman is pregnant, concentration of glucose, thickness of skin fold, blood pressure rate, insulin rate, body mass index (BMI) and diabetes pedigree function including patient's age.
The detail clarification about this dataset is given in Table 1. Here, min-max normalization is used to normalize the dataset in the pre-processing phase by applying Eq. (1) within the range (−1, 1).
In Eq. (1) M n represents the normalized form of original value M, the values of a and b are taken as −1 and 1 respectively, D min is the minimum and D max is the maximum value of the dataset.
Here, randperm() function is used to shuffle the dataset. After shuffling, the diabetes dataset is separated into training file and testing file, in the 7:3 ratio. In the training phase, this model is trained using CJaya-ELM, Jaya-ELM, TLBO-ELM, basic ELM, MLP, Jaya-MLP, TLBO-MLP, and CJaya-MLP algorithms. The performance of this model is estimated by classification accuracy percentage, confusion matrix, specificity, sensitivity, F-score, Gmean including ROC graph with area under curve (AUC) values. In proposed model, the training samples are fed to ELM classifier which is trained by CJaya algorithm and the trained ELM performs the classification task efficiently. Here, the CJaya algorithm optimizes the randomly taken weights and biases of ELM which can create non-optimal results. The trained CJaya-ELM model is tested by testing samples and calculated results are obtained. When the calculated results are compared with the target value, it represents the misclassification rate. The Overall schema of the presented CJaya-ELM model is given below in Figure 2.

Supported methodologies
This section discusses about all the supported methodologies, used in this study.

ELM model
ELM has high significance as compared to other neural networks because it is computationally free from iterations which makes its learning speed faster. The basic architecture of ELM is depicted in Figure 3. The steps of basic ELM algorithm are given below in a summarized manner: (1) Randomly pick the input weights (w i ) and hidden layer biases (b i ).
(2) Determine the output matrix (H ) of the hidden layer.    (2) Here, T is the target value.
In the ELM, the input weights and biases in the hidden layer are chosen randomly and the output weights are calculated accordingly. However, it causes two problems. First one is, ELM may need more neurons in comparison to traditional [20] tuning based machine learning algorithms and it concludes that ELM responds slow for unknown testing data. The second one is that, when large number of hidden layer neurons are used, an ill-conditioned [20] hidden output matrix (H) may be formed which may deteriorate the generalization performance. ELM is not required to be tuned like SLFN. The primary attraction of ELM is that it calculates the output weights, rather than tuning the hidden layer. In this work, the performance of ELM models is compared with MLP classification models, which is widely accepted for solving real life problems.

MLP model
The MLP [27,36] classifier is designed with multiple layers of fully connected neurons. The neurons interact with each other by weighted connections. Basically, in MLP, any number of hidden layers may exist in the middle of input layer and output layer. Here, a three hidden layer MLP model is considered, where each hidden layer contains five nodes. In the input layer, the input vector, {Y 1 , Y 2 , … Yn} is given to the model. The expected output is provided by the supervisor. If an MLP model consists of two hidden layers, then the total input, Y L2 j received by jth neuron in the hidden layer L 2 is interpreted as Eq. (3) where n L1 i depicts the ith neuron of the previous hidden layer L 1 , w L1 ji represents the weight of the link from the ith neuron in the hidden layer L 1 to the jth neuron in the hidden layer L 2 .
The output of a neuron is shown as a nonlinear sigmoid activation function of its total input and is defined as Eq. (4) In the input layer, the outputs of all nodes are defined in Eq. (5) where Y 0 j is the jth component of the input vector in the input layer. In MLP, BP learning algorithm is used to find out all internal weights of the hidden units. The error related to weight vector 'w' and output vectors, is calculated by Least Mean Square (LMS) error calculation method using the Eq. (6) Here, n L j, s is the output for node j in Lth layer of sth input/output case and d j,s is taken as the desired output. To minimize E(w) the gradient-descent method is used and a sequence of weight updating are carried out by applying the formula of Eq. (7) Here, ε is defined as a positive constant, 0 ≤ α ≤ 1 defines the coefficient of momentum. Moreover, to deal with the local minima problem of BP algorithm, MLP is integrated with Jaya and CJaya, Teaching Learning Based Optimization algorithms (TLBO) and the outcomes are compared with ELM based models. The basic architecture of MLP model is shown in Figure 4.

TLBO algorithm
TLBO is used to deal with non-linear optimization problem. This algorithm is influenced by teaching learning process. TLBO algorithm [37] is based on two phase operations: (a) Teacher Phase and (b) Learner Phase. The steps of the TLBO algorithm is shown below: (1) Input the number of population and stopping condition.
(2) Get the mean of the design variable.
(3) Initialize the teacher as the best solution. (4) Change the solution using the Eq. (8): Check, if the existing one is better than the new one, then reject the new one. Else, accept the new solution and pick any two solutions (Z i , Z j ) arbitrarily. (6) If Z i is better than Z j , then apply Eq. (9)

Jaya optimization algorithm
Like TLBO, Jaya is another optimization [29] approach which does not need any specific algorithm-oriented parameters. It requires less computational time, less implementation complexity with faster convergence rate than TLBO algorithm. The steps of Jaya algorithm are elaborated below: (1) Set the size of the population, number of design variables and the stopping condition.
(2) Get the best and worst solutions from the population.
(3) The result according to the best and worst solutions would be changed by applying Eq. (10).
During ith iteration, Z j, k, i is the value of the ith variable for the kth candidate. Here, k is the population size, i is the number of iteration and j is the number of design variables.
(4) Then the existing solution is compared with the modified one and if it is found that the modified solution is better, then it will be exchanged by the previous one, else the previous solution will be kept. (5) The procedures, from step 2 to step 4 will be repeated till the stopping condition is reached.

Chaotic learning method
In this work, one of the variants of Jaya algorithm, Chaotic-Jaya (CJaya) is used. This algorithm is established on the chaos theory. This algorithm improves the convergence speed and provides the better exploration of the search space without trapping in local optima [38,39]. In mathematical term, Chaos is defined as the randomness of a deterministic dynamical system. To interpret chaos theory in different optimization algorithms, various chaotic maps with various mathematical equations are applied. In this work, from various functions, logistic map function is used for generating the chaotic random numbers due to its simplicity and it is defined by Eq. (11).
where x t is the obtained value of the chaotic map at t th iteration. The working principle of the CJaya algorithm is same as the Jaya algorithm. The main difference is that, the random numbers in CJaya algorithm are generated by adopting a chaotic random number generator. Here, the two random variables (r 1 and r 2 ) of Jaya algorithm are substituted by the logistic chaotic variables. The population is updated as Eq. (12).
Here, t represents the iteration number, x t is the value of t th chaotic iteration, and the initial value of x 0 is randomly created in between [0, 1].

Proposed CJaya-ELM algorithm
Algorithm: Pima diabetes data classification by using CJaya-ELM model.  [24]. The detail descriptions about these methods are given in the Table 2.

Experimental analysis
Here, the experimental analysis section is separated into two phases. The first phase presents the miss classification rate w.r.t number of iterations graph of different models which gives a clear idea about the convergence speed of different algorithms. The second phase contains training accuracy, testing accuracy, training time, TP, TN, FP, FN values, specificity, sensitivity, F-score, Gmean, and ROC with AUC values of all the ELM based models. The results of the proposed model are observed from 10 to 300 number of hidden nodes with each run there is an increment of 10 neurons. In this study, the value of population size and the number of iterations is set as 100. Among different activation functions, sigmoidal function is taken for all the models in this study.
All the experiments of this work are implemented in an environment having Windows 10 operating System, Intel(R) Core (TM) i5-7200U CPU of 2.5 GHz processing speed with 8 GB RAM. Language used in this work is MATLAB (version: R2015b, 64 bits).

Phase I: description of miss classification rate convergence graph
The main purpose of the proposed CJaya-ELM algorithm is to improve the convergence speed and providing the better exploration of the search space without trapping at local optima. In this work, ELM is taken as an objective function. Here, the global best is considered as the minimum objective value (miss classification rate) whereas the global worst is considered as the maximum objective value (miss classification rate). At first iteration, the weights and biases are chosen arbitrarily, after that the weight is modified using the optimization technique and this process repeats up to the end of iterations. The error or miss classification rate convergence  graph of ELM, TLBO-ELM, Jaya-ELM and CJaya-ELM is shown in Figure 5, where it is clearly visualised that the converging rate of the proposed CJaya-ELM model is better than other models.

Phase II: evaluating all the models by various performance measuring variables
In this phase, all the ELM based models such as ELM, TLBO-ELM, Jaya-ELM and CJaya-ELM are evaluated by training and testing accuracy, specificity, sensitivity, confusion matrix, Gmean, F-score, and ROC with AUC values. The highest testing accuracy of all MLP based models and ELM based models is also compared in this study. The training with testing accuracy for ELM, TLBO-ELM, Jaya-ELM and CJaya-ELM appear in Table 3. In all the tables, HN stands for the size of hidden neurons. Table 4 shows the sensitivity, TP, TN, FP and FN values for ELM, TLBO-ELM, Jaya-ELM and CJaya-ELM. The specificity, F-score and Gmean values of all the ELM based models are demonstrated in Table 5.  graphs for all the ELM, TLBO-ELM, Jaya-ELM and CJaya-ELM models whereas Figure 22 presents sensitivity versus specificity versus F-score graph of CJaya-ELM based model. Figure 23 displays the ROC curve of CJaya-ELM with AUC value.    Table 6 shows the Maximum training with testing accuracies of both MLP and ELM based models. Table 7 displays the AUC values of MLP, TLBO-MLP, Jaya-MLP, ELM, TLBO-ELM, Jaya-ELM and CJaya-ELM models.

Result discussion
In this paper, Jaya optimization algorithm is used because it does not require any algorithm-specific parameter to be adjusted unlike PSO and DE optimization algorithms. Unlike TLBO, Jaya does not require two phases like teaching phase and learning phase. Still, Jaya algorithm is working with two random variables which may produce suboptimal result. These random numbers of Jaya algorithm are also generated by adopting a chaotic random number generator which not only produces optimal result but also improves the convergence speed and provides the better exploration of the search space without trapping in local optima. Here, mainly Chaos theory is integrated with Jaya algorithm to refine the quality of the best solution.
In this work, Jaya algorithm is modified by Chaos learning method which is termed as CJaya optimization algorithm. This CJaya algorithm optimizes the random parameters of ELM and called as CJaya-ELM model.   Here, the CJaya-ELM model is proposed to classify the Pima Indian diabetes dataset. The obtained results are compared with MLP, TLBO-MLP, Jaya-MLP, CJaya-MLP, ELM, TLBO-ELM and Jaya-ELM. Here, the performance evaluating attributes viz. testing and training accuracy, confusion matrix, sensitivity, specificity, Gmean, F-score and ROC with AUC values are considered for evaluating the proposed model. The Table 3 gives a comparison of training and testing accuracy with respective hidden neurons of all the ELM-based models. From Table 3 and Figures 6-9, it is clearly visualized that the proposed CJaya-ELM model acquires more testing accuracy (0.9783) with a smaller number of hidden neurons (230) in comparison to other ELM-based models.     (1) and specificity (0.9688) with respect to other ELM-based models. If sensitivity of the model will be more then positive samples are well classified and if specificity will be more then negative samples are well classified. This indicates that CJaya-ELM classifier classifies both the positive and negative samples more accurately than other models. Moreover, the superiority of CJaya-ELM model can be seen from the 3-D graphs of Sensitivity versus Specificity versus Gmean through Figures 18-21 as its sensitivity and specificity are higher than other models.
As ROC is a graphical representation between sensitivity and specificity. So, the Area under ROC curve i.e. AUC value determines an aggregate evaluation of performance across all possible classification thresholds. The ROC graph of Figure 23 and Table 7 show the AUC values of the different models. The highest AUC value (0.9782) shows the superiority of CJaya-ELM model over other models.
The box plots (Figures 14-17) show the TP, TN value exceed FP and FN value in CJaya-ELM model than other ELM-based models which indicates that positive samples and negative samples are correctly classified.
The above observations reveal that basic ELM based models like ELM, TLBO-ELM, Jaya-ELM and CJaya-ELM are superior classifiers in comparison to MLP based models such as Jaya-MLP, TLBO-MLP and CJaya-MLP. Apart from that, when basic ELM based models are compared with the presented CJaya-ELM model, it is proved that CJaya-ELM is significantly better.

Conclusions
In this work, CJaya-ELM is presented for classification of Pima diabetes dataset. Other two ELM based models such as basic ELM, TLBO-ELM and three MLP based models like MLP, Jaya-MLP and TLBO-MLP are also discussed and compared. The proposed model is evaluated through a series of empirical studies. In order to perform an unbiased comparison among all the models, many performance measuring attributes like testing accuracy, training accuracy, confusion matrix, sensitivity, specificity, Gmean, F-score and ROC with AUC values are considered. Here, ELM is integrated with CJaya algorithm to make the classifier more robust. The outcomes prove that CJaya-ELM can successfully handle the ill-condition problem and gives better performance in comparison with ELM, TLBO-ELM, MLP, Jaya-MLP and TLBO-MLP. This study concludes that the proposed CJaya-ELM model efficiently classifies the diabetic data and helps in identifying the diabetes in pregnant women.