A novel deep learning - based brain tumor detection using the Bagging ensemble with K - nearest neighbor

: In the case of magnetic resonance imaging ( MRI ) imaging, image processing is crucial. In the medical industry, MRI images are commonly used to analyze and diagnose tumor growth in the body. A number of successful brain tumor identi ﬁ cation and classi ﬁ cation procedures have been developed by various experts. Existing approaches face a number of obstacles, including detection time, accuracy, and tumor size. Early detection of brain tumors improves options for treatment and patient survival rates. Manually segmenting brain tumors from a signi ﬁ cant number of MRI data for brain tumor diagnosis is a tough and time - consuming task. Automatic image segmentation of brain tumors is required. The objective of this study is to evaluate the degree of accuracy and simplify the medical picture segmentation procedure used to identify the type of brain tumor from MRI results. Additionally, this work suggests a novel method for identifying brain malignancies utilizing the Bagging Ensemble with K - Nearest Neighbor ( BKNN ) in order to raise the KNN ’ s accuracy and quality rate. For image segmentation, a U - Net architecture is utilized ﬁ rst, followed by a bagging - based k - NN prediction algorithm for classi ﬁ cation. The goal of employing U - Net is to improve the accuracy and uniformity of parameter distribution in the layers. Each decision tree is ﬁ tted on a little di ﬀ erent training dataset during classi ﬁ cation, and the bagged decision trees are e ﬀ ective since each tree has minor di ﬀ erences and generates slightly di ﬀ erent skilled predictions. The overall classi ﬁ cation accuracy was up to 97.7 percent, con ﬁ rming the e ﬃ ciency of the suggested strategy for distinguishing normal and pathological tissues from brain MR images; this is greater than the methods that are already in use.


Introduction
Neoplasms are referred to as tumors, which is a generic term. Tumor simply means "mass." Tumors are a broad term that refers to either benign (usually noncancerous) or malignant (cancerous) growths [1]. A tumor refers to malignant (uncontrollable) cell development that poses a threat to body's health. Cancer is the largest cause of death worldwide, with approximately 10 million deaths expected in 2020 or about one in every six deaths [2]. These figures have increased since 2016, when 8.9 million individuals died from cancer. Many tumors are curable if caught early and treated properly. A primary brain tumor starts in the brain or the spinal cord. This year, primary malignant tumors of the brain and spinal cord will be found in approximately 25,050 people in the United States (14,170 men and 10,880 women). The likelihood of developing this type of tumor in one's lifetime is less than 1%. Brain tumors make up the bulk of the primary central nervous system malignancies, accounting for 85 to 90% of all cases. A total of 308,102 patients were diagnosed with a primary brain or spinal cord tumor around the world.
As per the 2018 Cancer Registry data, there are 18,078,957 cancers reported in both genders, with 296,851 of these being brain malignancies. Asia had the highest rate of brain cancer (156,217 cases, or 52.6%), while Oceania had the lowest rate (2438 cases, 0.82%) [3]. Continental-wise brain tumor cases in 2018 are depicted in Figure 1. In 2018, there were 9,555,027 cancer-related fatalities worldwide, with brain cancer accounting for 241,037 (2.71%) of those. Asia had the highest fatality rate (57.3%), with 129,483 cases, and Oceania had the least fatality rate (2017 cases) among the continents (0.84%). The distribution of brain cancer mortality by continent is shown in Figure 2. According to the findings of this study, countries with a higher human development index (HDI) had a higher fatality rate for brain tumors. Environmental toxins, ionized radiation exposure at work, and commercial radioactive substances are all factors that lead to a higher prevalence rate in countries with a higher HDI. On the other hand, it is possible to claim that the higher cancer rates in these countries are attributable to more modern diagnostic facilities. As a result, more epidemiological research into the factors that influence disease's incidence and fatality rates may be advantageous in lowering those rates.
Each person's first reason of cancer cell appearance may be diverse; some are caused by genetic factors, while others are caused by external environmental causes. Usage of tobacco, alcohol, poor eating habits, and a lack of consistency in physical activity are the most common cancer risk elements in the world.
Several techniques [4] have been documented in the past several decades. The findings of brain tumor evaluation are proven to identify between high and low grade gliomas, as well as between brain lesions, using medical imaging analyses that play a critical part in the biomedical study such as magnetic resonance imaging (MRI) and computed tomography (CT). However, assumptions based on individual studies and also based on the differentiation of research quality are difficult to come by, and the choice of different implemented methodologies may have an impact on study's outcome. To overcome the limitations of these research periods, we performed a comprehensive review to assess the actual effectiveness of the brain tumor detection technique and, in particular, to present the criteria for determining the diagnostic brain tumor in a timely way. To discriminate brain cancers, imaging approaches such as individual extraction of features, multifeature extraction, and border tumor identification with MRI and CT images are used. Nevertheless, since machine learning has grown in popularity, numerous studies have established approaches to actually happen and provide an optimal method in this area. Although it still needs to be improved to get to that point, the system is in place to save people's lives. According to the procedures assessment, the Convolutional neural network was employed 14% of the time in the entire techniques, indicating that malignant brain tumors are prevalent in the entire tumors.
In ref. [5] the applied classification methods to a data collection built utilizing tissue-based features of MRI. In comparison to the other classification techniques presented in this study, support vector machine (SVM) has a significant precision and accuracy and is considerably more accurate. The accuracy attained is acceptable due to the variety of features and intricacy of malignancies. A huge data set is examined and density-based attributes are removed from texture-based characteristics to enhance the accuracy. Study's reach can be increased by generalizing it to other sorts of lesions and malignancies than brain MR images. Additionally, the size of the data collection and the variety of texture elements will extend the scope of the investigation. If these flaws are addressed, it will aid in the resolution of tumor type segmentation and classification issues, as well as enhance algorithms' classification efficiency. These programs, which physicians can use for radiological tests, will allow for faster diagnosis and therapy, as well as a healthier and more successful outcome.
In ref. [6], Kalaivani et al. computed segmentation results and margin of error using raw data patients and a threefold machine learning classification algorithm, such as K-NN, K-means, and fuzzy C means (FCM), each of which has a different precision and margin of error, and we also determine if the tumor is cancerous. According to the findings of the segmentation performance assessment, our suggested technique is productive and useful, requiring less computation time than existing means, FCM, and K-nearest neighbor (KNN) methods. A total of 150 patients' real data sets were collected from various hospitals and health institutions, and those datasets were used to categorize afflicted and unaffected MRI regions. Using a contrast enhancement and de-noising method, the MR image is first enhanced and noise is eliminated. Second, morphological operations with double thresholding are conducted, and then a triple machine learning classifier is utilized for segmentation, with 120 brain MR pictures being learned and the trained images being tested with the remaining 30 images. The quality of the trained photos is assessed using the testing set in the outcome phase. The use of triple approach segmentation provides improved accuracy and lower error rates for large data sets. Using a huge data to train will increase system efficiency, and this triple classifier has a high accuracy rate, with FCM achieving a maximum of 98.97% accuracy, KNN achieving 89.96% accuracy, and means achieving a minimum of 79.9% accuracy. Because of the diverse categorization strategies and training processes used by each classifier, the accuracy rate varies. These three techniques are useful for detecting different forms of irregularities due to their simplicity, reduced calculation time, and good performance.
In ref. [7], Jayade et al. presented a study effort for the classification and extraction of brain tumors. To improve effectiveness of tumor identification, a variety of computer vision techniques were used. Image enhancement is one of the approaches used. The results of the empirical study demonstrate that the approach is effective for the categorization of MRI images into the usual versus abnormal categories. SVM and KNN classification algorithms are implemented, and their accuracy rates are 91.21 and 79.23%, respectively. When it comes to performance, the hybrid classification SVM-KNN is effective; its reliability is 94.13%, which is higher than that of other approaches.
Currently, a variety of machine learning (ML) strategies for automatically classifying brain tumors have been presented. The location, size, and effects on the surrounding areas of a brain tumor must be determined through radiological evaluation after it is clinically suspected. The optimal treatment, whether it is surgery, radiation therapy, or chemotherapy, is chosen on the basis of this data. It is obvious that if a tumor is appropriately identified in an initial point, the patient's chances of recovery can be greatly boosted. The downside of existing approaches is that they require a bigger dataset for training, are time-consuming, and are less precise for application. In this study, we concentrate on a deep learning approach for brain tumor segmentation from MRI data. Increasing forecast accuracy is the main objective of this strategy. We offer a novel ensemble bagging KNN to improve the performance of these machine learning models and get beyond the constraint. Ensemble learning is one of the approaches for reducing errors. It uses a KNN model to produce an optimal solution with lower variance (bagging) and better prediction (stacking). The reminder sections of this article as presented as follows: Section 2 presents a brief literature analysis of various brain tumor segmentation schemes and KNN classifier, section 3 presents the proposed solution, section 4 presents a comparative analysis of segmentation performance, and section 5 presents a detailed description of the proposed approach's conclusion and future scope.

Literature survey
Tumors are becoming more common, especially among younger age. Tumors have the ability to kill all healthy brain cells. A human mistake can occur when classification is done manually (physically). The use of an automatic categorization approach is necessary since it relieves the human observer's task and ensures that precision is not compromised even in an enormous quantity of photos. Chavan et al. [8] describe an attempt to detect and classify tumors in the benign stage. Feature extraction and classification are the two stages of the proposed technique. In the second step, we used gray-level co-occurrence matrix (GLCM) to extract texture characteristics and a KNN classifier to sort the resulting images into categories. Computer technology is now widely employed in medical decision support in a variety of sectors, including tumor research, heart disease, and brain tumors, to name a few. Magnetic resonance pictures may be used to classify objects, which is useful for research and clinical studies. KNN can be utilized to differentiate between normal and abnormal images. The input images are initially segmented using image processing techniques in this proposed method. The text character's property is taken from the feature. The term "feature extraction" refers to the process of extracting picture information in the form of numerical data. For feature extraction, the GLCM is used.
The level of precision in diagnosing tumor type using MRI data is critical for developing appropriate medical treatment. Ramdlon et al. [9] developed a tumor classification system that detects tumor and edema in T1 and T2 image sequences and classifies tumor types. The data for this method are derived solely from the axial part of MRI scans, which are categorized as astrocytoma, glioblastoma, and oligodendroglioma. Basic image processing methods such as image enhancement, image binarization, morphological image, and watershed are used to detect tumor area. Tumor classification is applied when the shape extraction process is completed. The tumor categorization findings achieved were 89.5%, indicating that information about tumor detection can be provided more clearly and precisely. However, due to the dispersion of training and testing or validation models and features used, classification process' accuracy can be reduced. The outcomes of the segmentation method are particularly important because extraction of features is highly relying on them. Variables that can affect segmentation outcomes must be addressed at the preprocessing stage, ensuring that segmentation process' findings are accurate. The paucity of characteristics employed results in low accuracy. To acquire good, accurate, and auto-detecting segmentation outcomes, an ML technique is required. A study of multiple tumor identification algorithms for MRI images was presented by Suhartono et al. [10]. A comparison of several methodologies has also been carried out. Synthetic aperture radar (SAR) photos are high-resolution photographs that are difficult to obtain manually. These SAR photos were found on the Internet at random, with various region inclusions. The core two qualities of gray levels, discontinuity and resemblance, are used in most segmentation algorithms. The discontinuity in an image is caused by differences in the intensity values of pixels. Contours in a picture are detected using the discontinuities between gray-level sections. The similarity of pixel intensity levels is also useful for item recognition. Clustering is done using the efficient k-means method. The genetic algorithm is in charge of classification and feature extraction. The location and size of the image are extracted in grayscale after extraction, and finally, the tumor is removed from the image. Only two forms of brain tumors are treated with this technique.
For the detection of brain tumors, Raja and Nirmala [11] proposed using the KNN classification technique and the self-organizing map algorithm to ignore the dataset error. The discrete wavelet transform is applied to the input image, in which the RGB color of the input data image is changed to gray scale. Edge mapping and energy are two types of gray-level features. The contour of an item in an image is given by an edge mapping procedure, which is a core implementation in image processing. The edge mapping result can be used to trace object's border as well as the curve surface. For picture segmentation, edge mapping is utilized. After that, it was classified using KNN, and then the error-avoiding method was applied. The KNN classification model uses the distance function to map samples to classes. This will aid in the differentiation of tumor cells from healthy cells. Parametric analysis through simulation is used to determine the existence of a tumor in a brain picture. Priyanka and Saniya [12] devised a method for detecting tumors and calculating the area (percentage age) occupied by the tumor in the total number of brain cells. To begin, an OSTU algorithm is used to separate tumor areas from an MR image. KNN and LLOYED are utilized to detect and differentiate tumor-affected tissues from nonaffected tissues. By applying the "wavelet transform on the transformed gray-scale image," 12 features such as correlation, contrast, energy, and homogeneity are recovered. The DB5 wavelet transform can be used to extract features. Noise is effectively removed by using low-pass and high-pass filters, as well as morphological operations such as dilation and erosion.

Proposed model
Many methods for classifying brain cancers in MR images have been developed but for suitable medical treatment; however, the level of accuracy in detecting tumor type using MRI results is critical for appropriate medical therapy. As a result, a Bagging ensemble with K-nearest neighbor (BKNN) algorithm for brain tumor diagnosis was developed in this study. To identify the brain tumor disease, this algorithm uses the following principles: data acquisition, image preprocessing, segmentation, classification, and accuracy estimations. U-Net will be used to segment the preprocessed image, and BKNN will be used to classify it. This will aid in the differentiation of tumor cells from healthy cells. With simulation results, the parametric analysis of the brain imaging shows if a tumor is present. The proposed system's architecture is shown in Figure 3: A) Dataset This brain tumor dataset includes 3,064 T1-weighted pictures of three different categories: (1) glioma, (2) pituitary tumor, and (3) meningioma. There are a total of 1,047 coronal images available. Images taken from the back of the head are known as coronal images. There are 990 axial images, or images taken from above the skull. There are additional 1,027 sagittal images in the dataset, which were taken from the side of the head. Each image in this dataset contains a label that identifies the tumor kind. There are 233 patients represented by these 3,064 photos. The tumors in the dataset are divided into three categories: 708 meningiomas, 1,426 gliomas, and 930 pituitary tumors (Figure 4).

B) Data preprocessing
Preprocessing is primarily used to improve the integrity of MR images and prepare them for further assessment by a human or computer vision system. The data from the dataset is saved in the MAT format. We will use hdf5storage to load the MATLAB file and convert it into the PNG format. After that, the image is adjusted and masked to a custom size. Images, labels, and masks are created to numpy arrays. The image-trimming process is carried out at this stage by finding brain regions that will be used in the next step and deleting background objects that will not be used in the future step. This procedure is used to reduce the probability of factors influencing the outcomes of the features collected, such as the detection error of tumor objects during the segmentation phase. A few of the cropped images are shown in Figure 5. Image resizing is conducted because most deep-learning model designs demand all input images are of the same dimensions. Another popular image enhancement technique is flipping the image. The only thing to keep in mind is that the flipping should be appropriate for your application. Depending on the object in the image, the image can be tilted horizontally or vertically. C) U-Net In many visual tasks, notably in biomedical image analysis, the intended outcome must include localization, i.e., each pixel should have a class label. Furthermore, thousands of training photos are typically out of reach in scientific tasks. So, by using a small patch around each pixel as input, we used a sliding-window approach to train a network to determine the most likely label of each pixel. This network has the potential to localize, and in terms of patches, the training data outnumber the number of training photos by a large margin. First, it is slow because each patch requires a new network run, and there is a lot of duplication due to intersecting patches. Second, there is an exchange among precision and context utilization in localization. Bigger patches require additional max-pooling layers, lowering localization accuracy, which reduces localization accuracy, but tiny patches permit the network to sense just a limited amount of context.
In this research, we enhance and adapt a more elegant architecture such that it can function with less training photos and provide more precise segmentation results. The basic idea is to use consecutive layers to complement a traditional contractual network by substituting pooling operators. As a result, the output  resolution is improved by using these layers. To localize, the upsampled output is blended with better resolution characteristics from the contracting path. A succeeding convolution layer could produce a more precise output based on this information. Separating contacting objects of the same class is another issue in many cell segmentation tasks. To achieve this, we propose using a weighted loss function, in which the separating background labels between contacting cells are given a high weight in the loss function. This generated network aids in the segmentation of biomedical data. Architecture for the 2D U-Net is shown in Figure 6. D) Bagging-based KNN (BKNN) algorithm The KNN classifier is a variant of the NN classifier method. A simple nonparametric judgment is used by the nearest neighbor classifier. The distance between the attributes of each test image B q and the attributes of other images in the training data is analyzed. The image that is closest to the test image in the feature space is called the nearest neighbor. To calculate the distance between two features, we can use one of the distance functions mentioned later. The distance between two coordinates of a linked object is commonly calculated as city block distance. It is the total of the absolute differences between two coordinates. Regardless of the dimensions, the Euclidean distance is the shortest between two places. Because the two comparable data items are isolated by the Euclidean distance due to their size, the cosine distance is preferable because they may have a lesser angle among them.
The KNN algorithm uses the K samples that are the closest to the test image. Each of these instances is from the P i class. The test image B q is assigned to the C m class, which appears in the most of the K samples. The k value, the sample sizes, and their topological distribution over the feature space all have an impact on the effectiveness of this classifiers.
• Step-2: Distance between number of neighbors is calculated by Euclidean distance. • Step-3: Take the Euclidean distance obtained from KNN.
• Step-4: Each category's number of data points should be counted.
• Step-5: New data points must assigned for those category with the greatest number of neighbors.
A method of enhancing the accuracy of a collection of classifiers by merging their findings using one of the voting methods is known as an ensemble classifier wrapper. A common learning system has two components: a feature detection unit and a decision-making module (classifier). A classification algorithm compares all of the training examples to a decision function. A training algorithm uses training data to set the variables of a classifier, resulting in a certain accuracy rate. The system is then used to forecast the outcome of a testing data set. In the majority of circumstances, a group of classifiers can provide a good accuracy than a single classifier.

Results and discussion
This brain tumor collection contains 3064 T1-weighted pictures of three types of tumors. U-Net is used to segment the preprocessed image, and KNN is used to train the model. We divide the data into two categories: training and testing, with training accounting for 80% of the data and testing accounting for 20%. With KNN, we were able to obtain a 96.9% accuracy rate. BKNN has developed a new methodology for detecting brain tumor disease that uses predictive logic to identify the disease with a high accuracy rate of 97.7%. In the Bagging classifier, BKNN is implemented by using KNN as the base estimator. The proposed solution is implemented using PYTHON, an open-source programming language, with Jupyter NoteBook serving as a medium for developing these codes. The produced scenarios are appropriate and intelligent for detecting disease utilizing real-time datasets and training norms. One of the brain tumor image (Meningioma) is displayed in Figure 7.
At this point, the image data are treated to enhance image quality and to normalize the variables in each training image. So that the image features findings are not substantially different. The input files in mat format are first transformed into data in png format. Crop an image to the bounding by forcing a squared image as output after preprocessing the input image, as shown in Figure 8. Cropping strategies are required to ensure that processed photos remain on the object. Cropping is required to eliminate images from the background that are not brain regions. Figure 9 shows the segmented input image. The U-Net architecture is deployed for image segmentation, and tumor plot is also shown. Ground truth image is depicted in Figure 10.   These segmented photos are separated into a train-test split function, with 80% of the images being used for training and the rest for testing. The model will then be trained using BKNN, and its performance will be evaluated. Figure 11 illustrates the perception of confusion matrix of the BKNN model. We measure the effectiveness of our model with the help of confusion matrix.
The classification report for our proposed model is shown in Figure 12. The classification accuracy is equal to the total number of correct predictions divided by the total number of predictions created for the dataset. Precision is the number of positive class forecasts that truly belong to the positive class. Recall is the number of positive classes predicted from all positive cases in the dataset. F-Measure calculates a single score that takes into account both precision and recall problems.   Our model's sensitivity, specificity, fallout, and false negative rate are shown in Figure 13. The true positive rate of a test refers to the ratio of data that are actually positive and get a positive result when utilizing the test in question. The true negative rate, also known as the specificity of a test, is the percentage of data that are actually negative and provide a negative result. The perception of the suggested BKNN accuracy ratio is depicted in Figure 14, which is cross-validated with the traditional classification algorithms KNN. The proof that results graphically displays the efficiency of the recommended technique ( Table 1).

Conclusion and future work
Despite MRI imaging among the most advanced methods for detecting brain cancers, professionals are unable to keep up with diagnosis if they solely look at MRI images. As a result, effective brain tumor classification necessitates computer-assisted diagnosis. Brain tissue segmentation in MRI images has a variety of uses in the diagnosis, surgical planning, and therapy of brain disorders. We use U-Net architecture for brain tumor segmentation. Medical specialists, on the other hand, must do the assignment in a timely manner. Furthermore, due to intensity overlap across distinct tissues induced by intensity homogeneity and errors intrinsic to MRI, it is difficult. As a result, we use the BKNN to build a unique method for detecting brain tumors. In the Bagging classifier, BKNN is implemented by using KNN as the base estimator.  Figure 13: Sensitivity, specificity, fallout, and false negative rate of BKNN.  The Sensitivity, Specificity, Fallout & False negative rate of our proposed method (BKNN) for each category is displayed in Figure 13. The overall recognition rate or classification accuracy was up to 97.7%, which is higher than existing methods. Although the accuracy is improved, large datasets make it very expensive to calculate the distance among each new point and each old point, which reduces the algorithm's efficiency. The classification accuracy can be increased in the future by using supervised approaches like convolutional neural networks.