Machine learning approaches for diagnosing depression using EEG: A review

Abstract Depression has become one of the most crucial public health issues, threatening the quality of life of over 300 million people throughout the world. Nevertheless, the clinical diagnosis of depression is now still hampered by behavioral diagnostic methods. Due to the lack of objective laboratory diagnostic criteria, accurate identification and diagnosis of depression remained elusive. With the rise of computational psychiatry, a growing number of studies have combined resting-state electroencephalography with machine learning (ML) to alleviate diagnosis of depression in recent years. Despite the exciting results, these were worrisome of these studies. As a result, ML prediction models should be continuously improved to better screen and diagnose depression. Finally, this technique would be used for the diagnosis of other psychiatric disorders in the future.


Introduction
Depression is a common mood disorder that has a substantial negative impact on the physical and mental health of patients [1,2]. The typical symptoms of depression encompassed low energy, fatigue, depressed mood, and even self-injurious or suicidal behavior in severe cases [3]. A recent survey from WHO has shown that the number of depression patients worldwide has exceeded 300 million people [4]. However, the clinical diagnosis of depression still relied on the Statistical Manual of Mental Disorders (DSM-V) and the subjective judgment of clinicians. Accurate identification and diagnosis of depression remained shrewd due to the lack of objective laboratory diagnostic criteria. Fortunately, the development of modern neurophysiological techniques offered a potential strategy for early disease detection. The application of the techniques in the field of clinical diagnosis has amassed large achievements in recent years.
Electroencephalogram (EEG) was widely used in neuroscience as a non-invasive neurophysiological technique. Compared to functional magnetic resonance imaging, EEG recordings had the advantage of shorter test times and lower prices, making them more suitable for identifying various psychiatric disorders [5]. Resting-state EEG (rsEEG) could accurately reflect the activity of human brain networks. Several studies have indicated that the frequency domain characteristics and functional connectivity (FC) of rsEEG were important in depression identification [6,7]. The analysis of rsEEG features might unravel the underlying complex neural mechanisms of depression. With the development of computational psychiatry [8], the use of rsEEG-based machine learning (ML) techniques to identify disease phenotypes has heightened increasing attention, which provided a theoretical basis for diagnosing clinical depression. Since Ahmadlou et al. first applied ML techniques to the early identification and diagnosis of depression [9], an increasing number of original studies have been published with exciting results [10][11][12]. Therefore, the rational application of rsEEG-based ML for diagnosing depression could help clinicians in rapid decision-making and treatment.
To systematically analyze the ML approaches for diagnosing depression using rsEEG, this study focused on reviewing the literature pertained to rsEEG-based ML for s depression diagnosis. (1) A total of 36 related articles were included by systematically searching domestic and international databases and filtering by specific criteria.
(2) The ML approaches and their accuracy were highlighted in the studies above. Finally, this study would discuss the current status of rsEEG-based ML studies in the field of depression diagnosis and furnish further suggestions for future research.

Literature search strategy
Our study retrieved the results of domestic and international data from 1 January 2010 to 1 June 2022. The Chinese databases included Zhiwang, Wanfang, and Wipu, and the English databases encompassed PubMed, Web of Science, and Medline. Meanwhile, we utilized subject terms + keywords for the literature search, with the search terms: ("depression" OR "depressive disorder") AND ("electroencephalography" OR "EEG") AND ML. Finally, a total of 435 articles were involved in the analysis. In addition, this study further widened the number of articles analyzed by conducting reference back and hand searching, and a total of 449 articles were retrieved.

Inclusion and exclusion criteria
We further screened the literature for initial inclusion in the analysis based on the following criteria: (1) the main purpose is depression diagnosis; (2) the sample includes patients with unipolar depression and healthy controls; (3) rsEEG data as the data driver; (4) depression detection using ML; and (5) accuracy as the primary outcome. In addition, duplicates, conference papers, and literature for which full text was not accessible were excluded from this study. Finally, a total of 36 relevant articles that met the inclusion and exclusion criteria were entailed.

Results
Our study systematically reviewed 36 articles on depression diagnosis published between 2010 and 2022 to illustrate the current value of rsEEG-based ML approaches in depression diagnosis. Because of the distinct methods used in different studies, our study would focus on the sample size, EEG data acquisition and preprocessing methods, feature extraction and selection, types of ML techniques, and their accuracy in depression diagnosis wielded in the aforementioned literature, as shown in Figures 1 and 2.

Sample size
Ahmadlou et al. published the first study based on rsEEGbased ML for the diagnosis of depression. In their study, a sample of 24 cases was included in the analysis [9]. Subsequently, Puthankattil and Joseph and Faust et al. increased the sample size (both 60 cases) and conducted similar studies to attain more credible results [10,12]. Further, Hosseinifard et al. published their study with a larger sample size (90 cases) [11]. Bairy et al. and Mohammadi et al., respectively, collected 60 and 96 cases for rsEEG-based ML on depression diagnostic analysis [13,14]. In the same year, Acharya et al. included 30 cases for analysis and used the data again in a subsequent study [15,16]. Later, Mumtaz et al. published four studies with sample sizes ranging from 60 to 64 cases [17][18][19][20]. Liao et al. in a study published included a sample of 24 cases [21]. One year later, Cai et al. and Wan et al. included 265 and 65 cases in their analyses [22,23]. In 2020, nine studies respectively compiled samples ranging from 32 to 92 cases for analysis [24][25][26][27][28][29][30][31][32]. An increasing number of researchers have conducted studies using data from previous or public samples [29,31,[33][34][35][36]. Recently, some researchers revealed the sample size (20-400 cases) in their studies [37][38][39][40][41][42][43][44]. To our knowledge, 400 cases are the largest sample size to date. Overall, a total sample of 2,545 cases was included in this study. The distributions of the training and testing sets are shown in Table 1.

EEG data acquisition methods
The number of electrodes, sampling frequency, and sampling duration altered slightly between studies, which might lead to different analytical results. Ahmadlou et al. first recorded rsEEG signals for 3 min with the eyes closed in depressed patients and healthy controls using 19 electrodes (10/20 standards) with a sampling frequency of 256 Hz [9]. A great number of following research employed the same number of electrodes and sample frequency, with only minor changes in sampling duration. With the development of EEG acquisition technology, many researchers have raised the sampling frequency to 500 Hz or higher. In recent years, some studies have used 64 electrodes EEG devices for recording depressed patients and healthy controls to enrich the reliability of rsEEG signals [24,25,27]. Furthermore, researchers were increasingly interested in region-specific EEG signals.    on the performance of rsEEG-based ML in the diagnosis of depression [23]. The sampling duration of these studies tended to be longer, up to 30 min (Table 1).

EEG data preprocessing
Because EEG data were very shaky, weak, and prone to interference, it was critical to preprocess them. There were various methods to remove data noise, among which most studies use manual methods or filters to remove noise.

Data feature extraction and selection
Data feature extraction and selection were one of the most important steps in ML. Feature extraction refers to the extraction of linear or nonlinear features from EEG data. Feature selection was the further dimensionality reduction of the traits to remove redundant and irrelevant information.  [38,41]. As time goes by, assorted feature extraction and selection methods were widely applied in this field ( Table 1).  [36]. Subsequently, they also presented dictionary learning approaches for automated MDD diagnosis [42]. Both past and present, SVM had been more widely employed in related studies [39][40][41]43,44]. Meanwhile, an increasing number of studies had used CNN or CNN + LSTM for EEG recognition of normal individuals and depressed patients ( Table 1).

Validation strategies
Most studies enlightened how to assess the stability of the performance of the above ML models. K-fold crossvalidation (K = 10 or 5) is one of the most commonly used methods, and a large number of studies use this method to assess the classification performance of ML. Some studies assessed the reliability of classification accuracy by leave-one-out cross-validation (LOOCV) and its variants. Only two of all included studies used hold-out cross-validation ( Table 1).

Accuracy of various ML strategies in depression diagnosis
Ahmadlou et al. found that the HFD in the Beta rhythm was a more effective feature in distinguishing between normal and depressed individuals, and the classification accuracy of EPNN was 91.30% [9]. Puthankattil and Joseph used ANN to classify normal and depressed signals and obtained an accuracy of 98.11% [10].

Discussion
In recent years, an increasing number of studies have combined EEG with ML for the diagnosis of depression with thrilling results. Among the included studies, the highest classification accuracy was up to 99.5% [12], which offers the potential strategy for screening and prevention of early clinical depression. Although the MRIbased ML studies have attracted a lot of attention over time [45][46][47], EEG-based ML has achieved better performance in depression diagnosis in terms of both cost and classification accuracy [5]. It was worth pointing out that, despite the high accuracy of such studies in distinguishing normal individuals from depression patients, additional studies are needed to confirm their reliability and variability. Among all past published studies, the overall classification accuracy ranged from 76 to 99.5% with a large variation. This reason might be related to the sample size, data collection and preprocessing methods, various feature combinations, and ML models wielded by different studies.The small sample size was a common problem faced by most of the current EEG-based ML studies. Of the 36 studies published from 2010 to date, only three studies had a sample size of more than 100 cases [22,38,40]. The problem of the limited sample size further constrained the diagnostic utility of EEG signals at the personalized level of depression, which might be one of the important reasons for the stability of classification accuracy. Although increasing the sample size for diagnostic accuracy was not necessary [22], it was essential to improve the sample size for analysis in order to make prediction models applicable to a huge population. With the publication of more studies in related fields and the improvement of public databases, a growing number of studies used more data volume for further analysis. This has addressed the above-mentioned issues to some extent [29][30][31]33,35,36]. However, the generalization of the results from public databases is limited due to the variability in data collection and data processing. As a result, it was essential to constantly improve the public databases. In the future, researchers needed to focus on the standardization and reproducibility of EEG data acquisition and processing processes. In addition, the distributions of the training and testing sets should be reported in the study, because they had a direct impact on the accuracy and clinical application of the obtained results. Feature extraction and selection were the indispensable steps in ML. The use of appropriate features facilitated the overall performance of the prediction model. Although some researchers have demonstrated that utilizing raw EEG data for DL prediction models provides excellent performance (98.32%) [20], selecting EEG features of depression is a crucial strategy to improve the model's diagnostic accuracy. Currently, a large number of studies have reported the variability in EEG between depressed and normal individuals [49,50]. In depression patients, the asymmetry of different rhythms in the left and right hemispheres is one of the most valuable neurophysiological indicators [17]. Similarly, graph theory analysis based on FC has been widely used in the study of depression abnormalities [31]. Furthermore, various nonlinear characteristics have been widely used in ML models, outperforming various linear features in the diagnosis of depression [12,13]. It was worth noting that GA was used to reduce the dimensionality of the features to improve the performance of the classifier in some studies [14,24,25]. Therefore, it was mandatory to select and combine various EEG features in a rational way to further improve the accuracy of the ML model, especially the left and right hemisphere asymmetry of the alpha rhythm, FC, and various nonlinear traits.
Most studies used SVM and its variants as the main classifier, which may be related to its reliable theoretical foundation and flexible response characteristics to highdimensional data [12,13,36,44,45]. SVM was used to classify EEG nonlinear features of depression with good accuracy in almost half of the past studies. However, the studies are also restricted by issues such as the small sample size and excessive nonlinear features, which may further lead to the overfitting of the data. It was reported that leave-one-out cross-validation or k-fold crossvalidation applied could be avoided overfitting of the prediction model and thus improve the generalization ability of the model [49]. Unfortunately, some studies did not report explicit internal or external validation information, thus failing to ensure the reliability of the prediction model accuracy. Therefore, it was key to accurately select the appropriate ML method, data properties, and reasonable validation methods in the future. At present, DL, especially CNN, is gradually applied to a depression diagnosis. The self-learning functions of CNN can effectively obtain and integrate valid information from complex data to obtain better prediction ability (95.96%) [16]. Therefore, the application of DL to assist in depression diagnosis is the focus of ensuing research.
Diagnostic heterogeneity of depression might be one of the motives to lead the different results [48]. The different diagnostic tools and criteria might be used in the different studies about EEG-based ML of depression diagnosis. It could influence the performance of the classifier [43,44]. Furthermore, depression is usually co-morbidity with other mental disorders such as anxiety, substance use disorders, and borderline personality disorder [51,52]. Meanwhile, it was also difficult to distinguish between bipolar and unipolar depression [53]. EEG information might be different in various psychiatric symptoms. Fortunately, some studies have begun to employ EEG-based classification models to identify various depressive symptoms [54,55]. Especially, the researcher found that the resting-state connectivity biomarker could be used to define neurophysiological subtypes of depression [56]. It would be an important reference to precisely identify clinical subtypes of depression.
In conclusion, it was necessary to continuously optimize ML prediction models. To move the diagnostic window of depression forward and effectively prevent the onset and progression, some strategies should be adopted such as increasing the sample size, combining multiple EEG features, and using the DL model. In the future, it would benefit many patients with psychiatric disorders and high-risk groups, especially with affective spectrum disorders.
At the same time, the limitations of our article still needed to be deemed carefully. Our study only discusses using a single EEG signal as a data-driven ML model in the diagnosis of depression, which lacks clinical utility and accuracy compared to current ML models combining multimodal data. Given these limitations, we would further integrate socio-epidemiological survey data, neurobiological and molecular biology techniques, and other multimodal data to build more accurate artificial prediction models, which would eventually provide new strategies for early diagnosis of depression as well as other psychiatry disorders.
Funding information: We would also like to acknowledge the National innovation and entrepreneurship training program (Grant: 201910403062), and Science and Technology Plan Projects of Jiangxi Provincial Health Commission (Grant: 20195621). The funding sources were not involved in the design of the study; in the collection, analysis, and interpretation of the data; in the writing of the report; or in the decision to submit the article for publication.

Conflict of interest:
The authors state no conflict of interest.
Data availability statement: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.