Systematic review for lung cancer detection and lung nodule classi ﬁ cation: Taxonomy, challenges, and recommendation future works

: Nowadays, lung cancer is one of the most dangerous diseases that require early diagnosis. Arti ﬁ cial intelligence has played an essential role in the medical ﬁ eld in general and in analyzing medical images and diagnosing diseases in particular, as it can reduce human errors that can occur with the medical expert when analyzing medical image. In this research study, we have done a systematic survey of the research published during the last 5 years in the diagnosis of lung cancer classi ﬁ cation of lung nodules in 4 reliable databases ( Science Direct, Scopus, web of science, and IEEE ) , and we selected 50 research paper using systematic literature review. The goal of this review work is to provide a concise overview of recent advancements in lung cancer diagnosis issues by machine learning and deep learning algorithms. This article summarizes the present state of knowledge on the subject. Addressing the ﬁ ndings o ﬀ ered in recent research publications gives the researchers a better grasp of the topic. We checked all the characteristics, such as challenges, recommen - dations for future work were analyzed in detail, and the published datasets and their source were presented to facilitate the researchers ’ access to them and use it to develop the results achieved previously.


Introduction
In 2020, an estimated 19.3 million new cancer cases and 10 million deaths occurred worldwide.Lung cancer ranked second in the number of patients with about 11.4% of the total cancer cases, with an estimated 2.2 million lung cancer cases only.Lung cancer is the number one cause of death among other cancers, with fatalities accounting for 18% of all cancers.Smoking is a major cause of lung cancer; in some countries, smoking has peaked or continues to increase.This suggests that lung cancer rates will increase for at least decades [1]. Figure 1 shows the incidence and mortality of ten common types of cancer in 2020.
It has been claimed that if the condition is found early, diagnosed, and treated effectively, the patient's chances of living a long life are increased.
The survival rate for lung cancer patients 5 years after diagnosis ranges from 10 to 20%.Low-dose computed tomography (CT) screening can help detect lung cancer early that makes the condition more responsive to treatment [1].In general, It has been reported that if the cancer case is found early, diagnosed, and treated effectively, the patient's chances of living a long life increase [2].Analyzing medical data and diagnosing diseases require a medical expert, and experts' opinions often differ when analyzing medical images because of their complexity.Artificial intelligence has played an essential role in the medical field.
Machine learning (ML) and deep learning (DL) algorithms have been used in analyzing and processing medical images and diagnosing diseases in recent years, as it provides exciting solutions for applications in medicine [3].Providing a prediction system that makes accurate diagnoses is still challenging, and the field of research is still ongoing.This study aims to create a survey of what researchers have reached in developing the efficiency and accuracy of lung cancer diagnostic systems' performance in the last 5 years.This study will give an accurate view of each research paper, starting with data acquisition.The pre-processing approach used, what methodologies are used to train the model, evaluate the model, review the results reached, examine the mentioned challenges, published dataset with its source, size, number of samples, and researchers' suggestions for future work.This study was organized as follows: the second section explained the method used to collect the relevant literature for the study.The third section included the results of the technique used and divided the articles into groups to facilitate their understanding.The fourth section included the review and the survey, where the methods used in each article and their results were extracted.The fifth section is to discuss what was presented in the papers.The sixth section is to explain the limitations of the study.The seventh section is devoted to the conclusion of the study.

Method
The review style was followed by the systematic review (PRISMA).It is one of the strategies for evaluating intellectual output that focuses on a research topic and attempts to locate, assess, choose, and prepare all high-quality scientific evidence and research articles on specific issues [4].This article used four digital databases, namely, Science direct, Scopus, Web of Science (WoS), and IEEE.Science direct provides access to research papers published in highly authoritative journals in the fields of technology and science; Scopus is a reliable source for all disciplines such as science, medicine, engineering, and technology.WoS is a highly specialized resource in the sciences, social sciences, humanities, engineering, arts, and interdisciplinary studies.IEEE Xplore delivers full-text access to the world's highest quality technical literature in engineering and technology.The four databases cover the academic aspects for lung cancer detection and nodule classification; this review can contribute to saving lives by contributing to providing suggestions for the development of diagnostic systems using DL algorithms.Systematic review for lung cancer detection and lung nodule classification  945

Search strategy
The search was conducted in the four previously mentioned databases for research published from 2017 to 2021 in the English language exclusively.These indicators were chosen because they can cover all developments in the early detection of lung cancer using DL algorithms.We searched using a query consisting of keywords related to improvement or development on detection systems ("improving" OR "improvement" OR "enhancement"), disease-specific keywords as defined in the literature (e.g., "lung cancer" OR "lung carcinoma"), keywords related to detection, diagnosis or prediction ("diagnosis" OR "detection" OR "predictive" OR "prediction), the type of algorithms used to make a prediction ("deep learning" OR "artificial intelligence" OR "machine learning").The search and selection were made according to Figure 2.

Inclusion criteria
1 -Articles published in scientific journals in English only. 2 -Articles focused on developing different systems, applications, technologies, and algorithms to diagnose lung cancer using ML and DL. 3 -The articles that studied lung cancer detection or lung nodules classification.

Exclusion criteria
1 -Articles that detect lung cancer or nodules classification with another disease (e.g., detect lung cancer and covid-19).2 -Articles that detect lung nodules without classification.

Result
The results of the query search were 591 articles from the four databases.After that, the duplicative articles (n = 49) were removed, which means the number of articles were n = 542.The second stage of the examination was done by scanning the title and abstract, where the number of articles became n = 132; in the final filter, we read the articles' full text, but we were unable to download 10 papers due to access issues, that resulted in 122 articles and finally based on inclusion criteria, we have selected 55 related articles.After that, a complete reading of all the selected articles was done, and a suggested classification was made.We found that researchers worked in two areas: the first field is lung cancer detection in general, and the second field is the classification of lung nodules as benign or malignant.In both fields, we found that some researchers used traditional ML algorithms and others who are the majority used DL algorithms; it was found that the convolutional neural network algorithm is the most widely used among other DL algorithms.Our relevant studies were subsequently classified based on the similarity of the methods and algorithms used, as in Figure 3. Systematic review for lung cancer detection and lung nodule classification  947

Cancer diagnosis
In this category, articles were placed that dealt with the detection of lung cancer in general without touching on the details of diagnosing lung nodules, and they were also divided into two sub-categories, the first sub-category that worked with traditional ML algorithms, numbering 2, and they were also divided into two parts for articles that worked in one method [4,5] and the other section for articles that used more than one method [7].As for the second sub-category of articles that used DL algorithms, it was divided into two parts: the first section was for papers that used one traditional method and their number was 10 [7-16] and the other section was for articles that suggested new methods, there were two articles [17,18].

Cancer diagnosis based on nodules
In this category, articles were placed that defined the diagnosis of cancer based on the classification of lung nodules into benign or malignant, and this section was divided into two sub-categories; the first subcategory is about articles that used ML algorithms, which numbered 5, and was also divided into articles that used one method, numbering 3 [19,20], and the other that used more than one method, which numbered 2 [22,23].The second subcategory is about articles that used DL algorithms, which numbered 30 articles and divided into articles that used one traditional method, numbering 22 [24,25], and another that suggested new methods, and their number was 8 [46,47].All studies that depended on diagnosing lung nodules used CT scan data because it is the only one that can show lung nodules.Most of them used two (malignant cases and benign cases) or three (normal cases, malignant cases, and benign cases) class labels as shown in Figure 4; unlike X-ray images, they do not show pulmonary nodules.

Reviews and surveys
The main objective of this article and the laboratory articles on a particular topic is to understand the direction and ideas of the studies and to give suggestions for future work based on the previous articles.In the past, it was challenging to implement artificial intelligence applications in medicine due to the lack of medical data such as chest x-ray images because the accuracy of classification depends on the amount of training data, but now it is huge, which makes it easy to apply ML and DL algorithms [3].For lung cancer, many data were available that were used to train DL algorithms and help them in the diagnosis process, such as CT Scan, Chest X-ray, magnetic resonance imaging (MRI), Contrast-Enhanced Computed

Normal case Malignant case
Benign cases Tomography (CE-CT), Positron Emission Tomography (PET) Scan, etc. [55].When we reviewed the 50 selected articles, we found that the articles in recent years used low-dose CT scan image dataset and moved away from chest X-ray image dataset because it did not contribute much to the early detection of lung tumors and a false negative in it also causes a significant delay in diagnosis.The low-dose CT scan (LDCT) images have been widely used because it can show areas where there is an abnormal growth and can be classified as malignant (cancerous) [55].The National Lung Screening Trial (NLST) established in 2011 that those who were screened using LDCT had a lower risk of death by about (15-20%) than their peers who were screened with chest X-rays [56].We found traditional ML algorithms in the selected 13 articles, and the largest share was for DL algorithms, as the number of articles that used the convolutional neural network algorithm reached 22 articles.It is the most widely used algorithm because it has been proven to be the best in classifying images [55].The number of usages of the rest of the DL algorithms is displayed in Figure 5.
In this study, various data and information related to the topic were collected from related articles, such as the objectives of the studies and challenges and recommendations for future work that we will explain in detail.The distribution of the selected papers according to the years of publication is illustrated in Figure 6.As it turned out that the year 2020 was the most published year in which many articles related to our field of interest were published.Then comes the year 2021.
Table 1 gives the details of each article, such as the name of the dataset used in the article, the preprocessing methods, and the methods used in the diagnosis process.We reviewed the best result that the

Discussion
This study aims to explore and review the literature that was directed to the diagnosis of lung cancer.It also aims at the directions of the researchers' work in relation to the methods used and the challenges that the researchers worked to solve, as we show and clarify the methodologies for lung imaging, datasets used, nodules segmentation approaches, challenges of each article, and then review what has been suggested by the authors for future work.

Methodologies for lung imaging
Image lung data that are used to detect diseases are collected in different ways according to the method used for imaging.Table 2 explains lung imaging methods and the advantages and disadvantages based on diagnosis and problems for each one.Imaging Method Application Pros Cons X-ray X-ray is ionizing kind of radiation that uses rays to capture a picture.X-rays often take pictures of bones in order to identify bone cancer, but they do not give medical data for tissues or organs.
-Based on information obtained from digital X-ray pictures, illness is classified.
Because of picture noise or blurring, an X-ray may not be able to diagnose the condition.-It creates a picture that may be reviewed right away.

CT
To obtain cross-sectional pictures of the body, a computerized tomography (CT scan) employs spinning X-ray equipment and computers.CT scans include more information than standard X-ray images.It displays soft tissues, blood arteries, and bones in various bodily sections.
-It can be used to locate nodules and cancers.
-Extremely high radiation doses are involved.
-Much faster than MRI scanning.
-Radiation is dangerous and can lead to cancer.-Provides more information than ultrasonography.

Ultrasound
Medical ultrasonography is a type of ultrasound that uses high-frequency sound waves to detect activities inside the human body.The soundwaves are produced and directed towards the region of interest using a transducer, and the results are presented on the integrated monitor in real-time.It is suitable for both thick and soft tissues.
-Easily accessible and widely available, noninvasive, rapid, and extremely sensitive.
The individual's ability to do the scan has a significant impact on the image quality and interpretation.
MRI MRI imaging creates pictures of bodily structures using magnetic, radiofrequency, and a computer.
-Produces comprehensive photos of organs and other inside body structures.
-Accurate technique of illness detection throughout the body.

Datasets
Several lung cancer datasets are published from reliable sources that are designed to apply artificial intelligence algorithms to them to perform various tasks such as classification, segmentation, and detection of nodules.These data are shown in Table 3.

Lung images segmentation
Lung nodule segmentation is a crucial step in classifying and detecting lung nodules.To check the nodule for malignancy, we should first segment the lung parenchyma, followed by the nodule segmentation.
During the training and testing phases, the pulmonary nodule is segmented.Solid, semi-solid, non-solid, and calcified pulmonary nodules are the four kinds of nodules.The surface of a big solid nodule (>10 mm) has a different intensity range than smaller lesions, making the solid nodule identification technique challenging to capture.Separating the juxta-pleural and juxta-vascular nodules would be the most challenging stage since the contrast of a big solid nodule with the pleura is poor [64].Segmentation is generally done after the pretreatment stage and has a significant impact on the system's efficiency.It is an essential process to locate the pulmonary lesion.The best five lung segmentation models according to their performance that were tested in different datasets are shown in Table 4.

Challenges
Lung cancer has been an important threat to human health and the prime cause of death worldwide in the last few decades.The diagnosis of lung cancer at early levels is the main approach to increase patients' survival averagely [5].Many challenges are mentioned in the literature, and researchers are working to address them within the scope of a lung cancer diagnosis.The lung cancer prediction at an early stage based on CT images is very difficult for the radiologist which is time consuming, costly, tedious job, and error-prone task, screening process requires very high concentration and skill owing to factors such as low contrast variation, heterogeneity, and visual similarity between benign and malignant nodules [4,6,7,12,16,17,19,20,[23][24][25]29,30,32,34,34,35,39,40,42,42,[47][48][49][50][51].Precise detection of lung nodules is a challenging task in the field of medical imaging.Especially the nodules are extremely unbalanced with high intra-class variance, the complex lung structure makes this problem even more difficult to be addressed [11,22,35].In a previous study, authors demonstrated that machine learning algorithms are effective in diagnosing cancer with FDG-PET imaging in the setting of ultra-low-dose PET scans [9], analyzed the ability to extract automatically generated features using deep structured algorithms in lung nodule CT image diagnosis, and compared its performance with traditional computer-aided diagnosis (CADx) systems using hand-crafted features [27].Acquisition of labeled samples is time-consuming and laborious in the medical field [28].Reading EBUS images requires well-trained and experienced radiologists.Most chest physicians do not have sufficient training or experience to read EBUS images; there were no studies using the CNN to diagnose the EBUS images [14].The ability to distinguish between benign nodules and nodules that can become malignant [29].In another study, the author provided evidence that DL networks may be used for mortality risk stratification based on standard-of-care CT images from NSCLC patients [15].Not being able to detect the the cancerous nodule around the lobe or lung at an early stage [54].Studies cannot tell how the CNN works in predicting the malignancy of the given nodule, e.g., it is hard to conclude whether the region within the nodule or the contextual information matters according to the output of the CNN [32].Automatic lung disease detection is a critical, challenging task for researchers because of the noise signals getting included into creative signals amid the image capturing process, which may corrupt the cancer image quality, thus bringing about the debased performance [19].Plenty types of pulmonary nodules and the visual similarity between them and the surrounding tissues [34].Few positive samples in the datasets [42].
Deep convolutional neural networks (DCNNs) always require many labeled training data, which are not available for most medical image analysis applications due to the work needed in image acquisition and particularly image annotation [53].Clinical imaging acquisition may be irregularly sampled, and such sampling patterns may be commingled with clinical usages [47].Lung cancer diagnosis in patients produces many false positives .

Recommendation feature work
This section will explain the future work found in the articles, and the authors suggested ways to develop efficient diagnostic systems for future work.As potential future work, we would like to incorporate patient referral (or reject option) as part of the training strategy and learn models that would automatically reject the most uncertain decisions.We would also like to visually analyze learned feature representations to assess whether they could be used as informative biomarkers and help radiologists better understand and interpret CADe/CADx results [11].Future research can focus on the limitation that many objects are generated in the initial segmentation process, which increases the computation time [23].The FSVT-KIRFbased lung cancer nodule detection method can be extended for the early detection and diagnosis of lung cancer.A semantic segmentation can be developed by using a CNN model and optimization algorithm that would maintain high efficiency, while obtaining even better results in detecting lung cancer nodules [20].In a study, the author proposed Hybridized Heuristic Mathematical Model for predicting lung cancer at an earlier stage for future work [18].Exploring classification network architectures that accept inputs of simultaneous multi-scale resolutions or variable sizes, an approach common to fully convolutional networks is used in image segmentation [15].Author of ref.
[8[ suggested enhancement work by using MRI and Ultrasound images.Proposed future work will focus on discovering more effective deep learning methods for characterizing lung nodules [48].In another study, the author used Ant colony optimization with ANN for better results [10].Pulmonary nodule detection should be added to perform medical imaging interpretation without requiring human-cropped nodule regions [35].Automatic detection of high-level nodule attributes and their use for malignancy determination and use of imaging modalities such as PET could be considered for diagnostic imaging of lung cancer and treatment planning within the TumorNet framework [36].Focusing on medical image classification, lung CT image classification scenarios use more effective network structures to extract common features from images of different types and perform cross-type matching to improve the performance [50].Finding a approach to detect the accurate location of cavitary and Juxta vascular nodules [26].Standardizing image quality and reducing contrast in medical laboratories can increase classification accuracy [5].The reinforcement learning is applied to optimize automatically the architecture of the reconstruction network and classification network and the settings of hyper-parameters, such as the weighting factor λ, batch size, and learning rate, aiming to make the proposed MK-SSAC model more accurate and more efficient [53].Authors have planned, as future work, to regularly update the LCRM as more data will be collected.We intend to package the model as software and combine its predictions with medical expertise using Bayes reasoning.Expert's experience can help improve the interpretability of the LCRM and encourage its better performance when examples of positive cases are limited.In addition, the LCPM model can take clinical test variables, imaging data, and genetic data into consideration, so as to improve the prediction performance and have a prognostic evaluation function [42].In future work, we will focus on lung nodule malignancy prediction with pathology data and unsure labeled data.Moreover, we plan to analyze neural networks to further increase model interpretability [58].In future work, the extraction of more advanced features that could be useful, along with malignancy, to improve cancer prediction is planned [52].The author would recommend using the convex function as the backbone since the DLSTM1 achieves the most robust performances across different settings and metrics [47].Nodules adjacent to blood vessels and the dura cavity can be detected accurately by Systematic review for lung cancer detection and lung nodule classification  959 diagnostic methods [23].Author recommends that in the future, more focus is required on promoting the performance in clinical practice and on constructing new lighter effective subnetworks [33].The most common future work proposal was to increase the volume of data of various types and collect them from different sources to increase the accuracy of lung cancer diagnosis [13,16,20,26,36,38,50].A future study was suggested to increase the capacity to classify numerous types of pulmonary nodules by gathering data such as pulmonary nodule-connected vessels and lung walls for network model training [28].In future work, the authors recommend an end-to-end explainable CAD system for lung cancer diagnosis that integrates nodule detection, segmentation, and malignancy prediction, which will be of extensive clinical application value.

Limitation
We were unable to reach about 15 research papers that could have helped in developing this review work.

Conclusion
This research aimed to analyze the literature related to the field of lung cancer diagnosis.The authors' suggestions in the field of lung cancer diagnosis were either to suggest new methods of solution or to change the methods of pretreatment and segmentation for improving the accuracy of the diagnosis.Some of them have proved their ability for lung cancer diagnosis based on new types of dataset and it was reviewed based on the authors' suggestions (dataset used, pre-processing approaches, and method used) with the results (area under the ROC curve, accuracy, precision, specificity, and sensitivity) obtained for diagnosis.This study reviewed the published datasets that are available to researchers, their sources, and size.The idea of lung nodule segmentation was illustrated and five of the best models for this purpose have been listed.The challenges identified in the literature are registered and suggestions are given for future work that could help researchers in the field of research.
Our suggestions to researchers regarding future work are several directions that this research scope lacks, and they are as follows: 1. Focusing on the triple and quadrant classification based on the type of nodule or the type of malignant tumor such as squamous cell carcinoma, adenocarcinoma, and large cell carcinoma.2. The need of more lung image data from various types of imaging and publishing private datasets so that researchers can compare with each other and focus on making available MRI and Ultrasound datasets.Ultrasound imaging has proven its high ability to identify many diseases, for example identifying a complete gynecological abnormality [70].3. Segmentation of large solid nodules because of their difficulty and lack of search in it.4. Develop a lung cancer detection model that can distinguish between a tiny malignant lesion and a benign nodule at an early stage.5.In order to increase the efficiency of automated tumor diagnosis, additional patient information such as medical history and genetic reports may be examined and combined with deep features derived from lung scan pictures.6. Use different pre-processing methodologies and filters to improve image quality such as harmony search to improve gray scale image quality [71] and images with Edge preserving [72].7. The use of cloud computing technologies to make a system based on the remote diagnosis of lung cancer using ELM, where this proposal was applied for the remote detection of breast cancer [73].8. To extract features from lung medical pictures, employ a cat swarm-optimized deep belief network [74].

Figure 1 :
Figure 1: Case and mortality chart for ten most common types of cancer for both sexes [1].

Figure 2 :
Figure 2: Study selection flowchart with exact query and inclusion criteria.

Figure 3 :
Figure 3: Taxonomy of studies of lung cancer diagnosis.

Figure 4 :
Figure 4: Sample of three types of CT SCAN of patients in classification operation.

Table 1 :
ContinuedSystematic review for lung cancer detection and lung nodule classification  951

Table 1 :
ContinuedSystematic review for lung cancer detection and lung nodule classification  953

Table 3 :
Explanation datasets used in studies

Table 4 :
Top 5 lung nodules segmentation models