The electroencephalogram (EEG) measures and records the electrical activity generated by brain structures. When brain cells (neurons) are activated, local current flows are produced therefore using electrodes placed on the scalp, the electrical activity of the brain can be collected and saved. In this way, electroencephalographic reading is a completely non-invasive procedure that can be applied repeatedly to patients, normal adults, and children with virtually no risk or limitation . As it is widely accepted [5, 6], EEG signal is considered a chaotic dynamical system. EEG signal is chaotic because its amplitude changes randomly with respect to time, see Figure 1 [4, 12, 13].
The research on EEG signals offers an excellent opportunity to improve the understanding of the brain function. Thus the analysis of the EEG signal has been used to detect abnormal functions of the brain like disorders such as autism, dementia, schizophrenic, depression etc... [8, 19–21]. Besides medical applications, the information extracted in this type of analysis has been used for non-medical purposes such as mental fatigue, emotion recognition or BCI.
A BCI offers an alternative to natural communication and control. A BCI is an artificial system that bypasses the body’s normal afferent pathways, which are the neuromuscular output channels . This means that there is direct communication between the brain signal and the computer. A general scheme of a motor imagery (MI) BCI system could be represented by 5 main steps, as presented in Figure 2. All BCI systems start with a brain signal, thus it is crucial to acquire the brain activity and it is necessary to place electrodes on the scalp in the most suitable areas for each experiment. In a first stage, the user must develop a predefined mental task (motor imagery). The EEG signal changes according to the task and these signals are acquired by electrodes and transmitted to the next step. In step 2, the EEG signal is digitized, amplified and filtered in order to delete undesired signals called artifacts. The artifacts could be bio-signals such as heart beating or breach, or noise such as a power line. Then, the clean signal goes to step 3 where the system applies several techniques in order to compute features to be used as a descriptors of the EEG signals for the pattern recognition machine. The Features are classified in step 4, Pattern Classification, and the predictions are used or presented in step 5. Finally, the user observing the application gets feedback from the system and can modify his EEG signals to improve the performance of the BCI system.
Although there are several studies about BCI systems it is usual that every one selects one or two feature extraction techniques for step 3, and one or more from step 4 due to the lack of a unique way to implement BCI systems. In addition, it is common that researchers use signals from repository data available on Internet such as BCI competitions I-IV or from expert users. It is well known that a BCI system requires the adaptation of the user to the system, so it is necessary to train the user in order to get good results [15–17]. In a MI BCI experiment an accuracy between 80% and 90% is expected after 6-9 training sessions of 20 minutes . Nevertheless, and according to the state-of-the-art, certain subjects may face difficulties to use MI-based BCI systems and, in these cases, the classification performances are quite poor even using multiple training sessions . Therefore, it is expected a previous selection of subjects with good classification performances in the experiments. In contrast, in this work the response of a system with EEG signals from 5 novel users without previous selection will be analyzed. We will test power spectrum density (PSD) as a feature extraction method combined with three classifiers: extreme learning machine (ELM); linear discriminant analysis (LDA) and support vector machine (SVM) in order to know which is the combination that reaches the highest classification accuracy.
Extreme learning machine (ELM) provides a fast and efficient multilayer perceptron (MLP) training . Although Huang formalized the idea of ELM algorithm [25, 26], it was previously analyzed in other works [23, 24]. Huang demonstrated that the ELM is a universal approximator for a wide range of random computational nodes. ELM has been used for BCI systems, using it in its classic form, in voting optimized strategy, based on weighted probabilistic model, with adaptive extreme learning machine, and other variants [37–40]. However, it has not been tested with novice users.
The main contribution of this work is to test the ELM suitability to classify EEG signals from novice subjects for BCI systems, which until now had not been used with this type of users.
The paper is organized as follows: Section 2 describes the acquisition of the EEG signal, the Feature extraction and, ELM; Section 3 describes the experimental work; Section 4 presents and discuss the experimental results; and in Section 5, the main concluding remarks are stated. Finally, the acknowledgements to the funding organizations and late Master are stated in this article.
2 Material and methods
2.1 EEG signal generation and acquisition
For this work we have used a dataset generated in the University Centre of Defence at the Spanish Air Force Academy, Spain. The dataset is composed by EEG signals from 5 right-handed males volunteers (21, 30, 30, 33, 33 years old) with normal vision implementing a Motor Imagery task. They have imagined the right and left hand movement according to a predefined timing. It is important to remark that none of these subjects have used a BCI system before the current experiment. It entails that there has been no a previous selection of the users, therefore, low classification results are expected in comparison with expert users.
The EEG signals have been captured with a g.USBamp (g.tec Medical Engineering GmbH, Austria) and two bipolar channels (C3-C4) thus passive electrodes were located at FC3, CP3, FC4 and CP4 positions. In this case Ag/AgCl passive electrodes were used. The amplifier was configured to acquired the data at 256Hz with 8 bits. A band pass filter was also applied between 0.5 and 30 Hz to erase artifacts. The users were comfortably seated in front of a screen to do the experiments. They must be relaxed and imagine the movement of the right or left hand looking at the screen when the BCI system requires it. Each experiment was composed by 40 trials (20 left and 20 right hand).
At the beginning of each trial (see Figure 3) the screen is black, then a cross appears and in second 2 an acoustic beep signal sounds beep to attract the user’s attention. At the third second of the trial the system presents an arrow pointing to the left or right for 1.25 s and the subject must imagine the movement of the right or left hand. Along 4 s the user must imagine the movement, therefore each trial needs 8 s. The system does not provide feedback to the users. Finally, it is important to remark that there is a random time between trials from 0.5 to 2.5 s to avoid adaptation. The system does not offer feedback to the users.
2.2 Feature extraction methods
In BCI, researchers have applied different techniques in step 3 [4, 10, 11]. Power spectrum density (PSD) has been computed as a feature extraction approach to test the ELM as a suitable method to be implemented in BCI systems to classify EEG signals from the first session of novice users. It is important to remark that PSD is a standard method for these systems . Below there is a brief explanation about the algorithm used in this work.
2.2.1 Power spectral density
It is usual to decomposed EEG signals in four bands called α, β, θ and δ [2, 3]. However it is usual that researchers use only two bands in motor imagery BCI systems, α (8-13 Hz) and β (13-30Hz) [7, 10]. α waves are rhythmical waves and they are found in the EEGs of most adults when they are awake. When the awake person’s attention is directed to some specific type of mental activity, the α waves are replaced by higher frequency β waves. θ waves (4-8 Hz) normally occur in parietal and temporal regions in children, but they also appear during emotional stress in some adults .
It is important to note that the researchers only calculate α and β bands because they have the most discriminative information for the BCI experiments . For computing these energies, the first step is to obtain the fast Fourier transform (FFT) of the EEG signals. Then, the corresponding coefficients of the α and β bands are added. Finally, the power of this addition is obtained by: (1) (2)
where p is the number of points in the signal temporal window of the signal, FFT (EEGα) denotes the FFT of the EEG signal in α band and FFT* (EEGα) denotes its complex conjugate. The same notation is used for the β band of the EEG signal, (EEGβ).
ELM is an algorithm used to train an MLP. It is based on the concept that if the MLP input weights are fixed to random values, it can be considered as a linear system, so the output weights can be easily obtained using the pseudo-inverse of the hidden neurons outputs matrix H for a given training set . Given a set of N input vectors, an MLP can approximate N cases with zero error, being the output network for the input vector xi with target vector ti. Thus, there are βj, wj and bj so that, (3)
where βj = [βj1,βj2, …, βjm]T is the weight vector connecting the jth hidden node with the output nodes, wj = [wj1,wj2, …, wjn]T is the weight vector connecting the jth hidden node and the input nodes, and bj is the bias of the jth hidden node.
The previous N equations can be expressed by: (4) where (5) (6) where H ∈ 𝔎N × M is the hidden layer output matrix of the MLP, B ∈ 𝔎M × m is the output weight matrix, and T ∈ 𝔎N × m is the target matrix of the N training cases. The MLP training is given by the solution of the least square problem of (4). The optimal output weight layer is where H† is the Moore-Penrose pseudo-inverse .
ELM needs to fix the number of hidden neurons. To do this, several pruned methods have been proposed [29–34]. The most commonly used method to avoid the exhaustive search for the optimal value of M, is the ELM optimally pruned (OP-ELM) . The OP-ELM sets a very high initial number of hidden neurons and classifies them according to their importance is solving the problem . The pruning of neurons is done by choosing the combination of neurons that provides lower Leave-One-Out error .
3 Experimental work
The aim of the experiments was to test ELM as an algorithm to classify features extracted with a linear method (PSD) from novice subjects. In this study EEG signals have been used as described in Section 2.1. To get the features, in a first step, it has been computed from 2 windows of 1 second located in the center of each trial (5-6s and 6-7s), see Figure 3. Subsequently, each feature has been averaged in a single vector, thus there are two features (α and β) from each channel in each trial that go to the classifier. It is important to remark that we have done a previous evaluation of some of the most used linear feature extraction techniques such as PSD, Hjorth and AAR . However preliminary results show that Hjorth and AAR present very low performance with all the classifiers tested in this work, while PSD could get acceptable results for these signals. Thus we have discarded others approaches and we have chosen PSD as the most suitable method to explore the ELM in BCI systems.
In order to get a reference, the performances of ELM have been compared with the standard LDA and SVM techniques. These methods are widely used by researchers in BCI environments due to the successful results obtained by them [9, 10]. For SVM, a linear kernel has been used, since a standard SVM with linear kernel is the most used method for BCI [9, 22]. For the ELM, a Gaussian kernel has been used, since it provides a non-linear solution and better results. In order to make an accurate and fair performance evaluation of the different classification approaches, this study uses a leave-one-out cross validation (LOO-CV) procedure . LOO-CV avoids undesirable shifts from the random selection of training and test sets. For the N total number of samples involved in the study, one is retained for testing, and the remaining N-1 are used for training the classifier using the ELM approach. This process is repeated N times (i.e. an iteration for each input vector) Note that all cases are used for training and testing purposes during the N iterations of the LOO-CV procedure and, also, the performance evaluation measures are computed at the end of this iterative procedure.
Table 1 shows the accuracy results obtained under LOO-CV (in %) using PSD as feature and three classifiers: ELM, LDA and SVM. As may be expected, values were not especially high. This is considered within normality due to the inexperience of the users and the absence of feedback in the system.
For the three classifiers, LOO-CV has been used. LDA and SVM produce stable performance, however, ELM showed a random initialization of the weights, so 30 initializations were made, and the results have been shown in relation to mean and standard deviation.
According to the results ELM improves the LDA and SVM performance for User 2 with 71.36% ± 02.84% vs 66.67% of LDA, for User 3 with 69.79% ± 02.40% vs 56.41% of LDA and SVM, and for User 4 with 58.54% ± 04.70% vs 53.81% of SVM. However for User 1 SVM showed the best result with 71.79%, and for User 5 LDA improved the other results with 71.79%. It is interesting to note that ELM gets results between 65.67%± 3.21% and 71.36%± 4.02% in contrast to LDA that showed a low value for User 4 (41.03%) and SVM that presents 51.28% for User 2. Therefore, in this study, ELM achieved acceptable results for all users. As is shown in Figure 3, from the average data of the three methods, ELM improves significantly the mean classification performance of the LDA and SVM approaches: 66.51% vs. 60.51% and 59.47% respectively.
To validate this assertion, a non parametric statistical test has been performed. Specifically, the Wilcoxon Signed Ranks Test is used . A peer review has been performed. Comparing ELM to LDA, the p-value obtained is 0.04, Which indicates that there are significant differences to 96%, being the best ELM. Likewise, when applying the test with ELM against SVM, the p-value=0.01 indicates that there are significant differences to 99%, being ELM better than SVM. However, there are no significant differences between LDA and SVM, since the p-value = 0.477.
It is widely known each subject is different and the performance of the same experiment and system could be different between users and even for the same user in different trials or sessions. Figure 4 shows the mean accuracy values of all users taking in to account the mean of the data of the three classifiers applied for each user. Thus, Users 1, 2 and 5 improved the mean value of the group in contrast with Users 2 and 3. Probably Users 1 and 2 have good skills to control BCI systems under motor imagery paradigms. Therefore, they could be suitable candidates to be trained to develop future experiments.
5 Conclusions and future work
This study tested an ELM with EEG signals from 5 novices users using a MI based BCI system. PSD features were computed and averaged from 2 central windows with 1 second each one. In order to compare the accuracy of the classification results obtained by ELM, two standard classifiers, LDA and SVM have been implemented. The results have been evaluated under LOO-CV and showed that, in contrast to LDA and SVM, ELM reached appropriate results in all the users, and besides that, it outperformed in 3 of 5 users standard methods. Therefore, the tested methodology showed a suitable performance for application in MI BCI systems.
Future research on this topic should be focused on testing ELM with wider data sets, as well as with signals from BCI expert users or from reference signals in BCI approaches such us BCI competitions. In addition, future studies could be accomplished to compute and combine features in three or four temporal windows to take in to account the time course of the signal. Other possibilities of the ELM will be evaluated, such as different variants of pruning and selection of the architecture. Finally, it might be interesting to implement more feature extraction techniques and classifiers.
This work has been partially supported by Spanish MINECO grant number MTM2014-51891-P and by the Spanish MINECO under grant TIN2016-78799-P (AEI/FEDER, UE). But the most important is that all the persons that have worked in this paper were fortunate to learn, to work and to be friends of PhD. Pedro José García Laencina. Rest in peace.
Sanei D., Chambers J., EEG Signal Processing. John Wiley & Sons., 2008. Google Scholar
Sabeti M., Boostani R., Katebi S., Price G., Selection of relevant features for EEG signal classification of schizophrenic patients, Biomedical Signal Processing and Control, 2007, 2(2).122-134. CrossrefWeb of ScienceGoogle Scholar
Rodríguez-Bermúdez G., García-Laencina P., Analysis of eeg signals using nonlinear dynamics and chaos: A review, Applied Mathematics & Information Sciences, 2015. Google Scholar
Tong S., Thankor N. V., Quantitative EEG Analysis Methods and Clinical Applications, Artech House. Google Scholar
Pfurtscheller G., Brunner C., Schlögl A., da Silva FL. Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks, NeuroImage, 2006, 31(1), 153-159. CrossrefGoogle Scholar
Bashashati A., Fatourechi M., Ward R.K., Birch G.E., A survey of signal processing algorithms in brain-computer interfaces based on electrical brain signals, Journal of Neural Engineering, 2007, 4(2), 32-57. CrossrefWeb of ScienceGoogle Scholar
Lotte F., Congedo M., Lécuyer A., Lamarche F., Arnaldi B., A review of classification algorithms for EEG-based brain-computer interfaces, Journal of neural engineering, 2007, 4(2), 1-13. CrossrefWeb of ScienceGoogle Scholar
Garca-Laencina P. J., Rodriguez-Bermudez G., Roca-Dorda J., Exploring dimensionality reduction of EEG features in motor imagery task classication, Expert Systems with Applications, 2014, 23 (04), 5285-5295. Google Scholar
Pérez-García V.M., Fitzpatrick S., Pérez-Romasanta L.A., Pesic M., Schucht P., Arana E., Sánchez-Gómez P., Applied mathematics and nonlinear sciences in the war on cancer, Applied Mathematics and Nonlinear Sciences, 2016, 1 (2), 423-436. CrossrefGoogle Scholar
Taplan M. Fundamentals of EEG measurement. Measurement Science Review, 2002, 2, 1-11. Google Scholar
Nijboer F., Furdea A., Gunst I., Mellinger J., McFarland D.J., Birbaumer N., K¨ubler A., An auditory brain–computer interface (BCI), Journal of Neuroscience Methods, 2008, 167, 43-50. Web of ScienceCrossrefGoogle Scholar
Vukelić M., Gharabaghi A., Oscillatory entrainment of the motor cortical network during motor imagery is modulated by the feedback modality, NeuroImage, 2015, 111, 1-11. Web of ScienceCrossrefGoogle Scholar
Guger C., Edlinger G., Harkam W., Niedermayer I., Pfurtscheller G.A., How Many People are Able to Operate an EEG-Based Brain-Computer Interface (BCI)?, IEEE Trasanctions on neural systems and rehabilitation engineering, 2003, 11(2), 145-147. CrossrefGoogle Scholar
Liu J., Zhang C., Zheng C., EEG-based estimation of mental fatigue by using KPCA-HMM and complexity parameters, Biomedical Signal Processing and Control, 2010, 5, 124-130.Web of ScienceCrossrefGoogle Scholar
Bajaj V., Pachori R., Classification of human emotions based on multiwavelet transform of EEG signals, 2013 AASRI Conference on Intelligent Systems and Control, Elsevier, Vancouver, 2013. Google Scholar
Krusienski D. J., Sellers E. W., Cabestaing F., Bayoudh S., McFarland D. J., Vaughan T. M., and Wolpaw J. R., A comparison of classification techniques for the P300 speller, J. Neural Eng, 2006, 3(4), 299–305. CrossrefGoogle Scholar
Pao Y.-H., Park G.-H., Sobajic D. J., Learning and generalization characteristics of the random vector functional-link net Neurocomputing, Elsevier, 1994, 6, 163-180. Google Scholar
Igelnik B., Pao Y.-H., Stochastic choice of basis functions in adaptive function approximation and the functional-link net Neural Networks, IEEE Transactions on, 1995, 6, 1320-1329. Google Scholar
Serre D., Matrices: Theory and Applications, Springer, New York, 2002. Google Scholar
Miche Y., Bas P., Jutten C., Simula O., Lendasse A., A methodology for building regression models using extreme learning machine: OP-ELM, In: Proceedings of the European Symposium on Artificial Neural Networks (ESANN), 2008, 247–252.Google Scholar
Miche Y., Sorjamaa A., Lendasse A., OP-ELM: Theory, experiments and a toolbox, In: Proceedings of the International Conference on Artificial Neural Networks (ICANN), LNCS, 2008, 5163, 145–154.Google Scholar
Mateo F., Lendasse A., A variable selection approach based on the delta test for extreme learning machine models, In: Proceedings of the European Symposium on Time Series Prediction (ESTP) 2008, 57–66.Google Scholar
Miche Y., Lendasse A., A faster model selection criterion for OPELM and OP-KNN: Hannan-quinn criterion, In: Proceeding of the European Symposium on Artificial Neural Networks (ESANN), 2009, 177–182.Google Scholar
Miche Y., Sorjamaa A., Bas P., Simula O., Jutten C., Lendasse A., OP-ELM: Optimally Pruned Extreme Learning Machine, IEEE Transactions on Neural Networks, 2009, 21 (1), 158–162. Web of ScienceGoogle Scholar
Similä T., Tikka J., Multiresponse sparse regression with application to multidimensional scaling, In: Proceedings of the 15th International Conference on Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005, LNCS, 2005, 3697, 97–102. Google Scholar
Alpaydin E., Introduction to Machine Learning, MIT Press. Cambridge, MA, USA, 2010. Google Scholar
Duan L. and Zhong H. and Miao J., Yang Z., Ma W., Zhang X., A voting optimized strategy based on ELM for improving classification of motor imagery BCI data, Cognitive Computation, 2014, 477-483. Web of ScienceGoogle Scholar
Bamdadian A., Guan C., Ang K.K., Xu J., Improving session-to-session transfer performance of motor imagery-based BCI using adaptive extreme learning machine, Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE, 2013, 2188-2191.Google Scholar
Tan P., Tan G.-Z., Cai Z.-X., Sa W.-P., Zou Y.-Q., Using ELM-based weighted probabilistic model in the classification of synchronous EEG BCI, Medical & biological engineering & computing, 2017, 55 (1), 33-43.CrossrefWeb of ScienceGoogle Scholar
Tan P., Sa W., Yu L., Applying Extreme Learning Machine to classification of EEG BCI, Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), 2016 IEEE International Conference on, 2016, 228-232. Google Scholar
About the article
Published Online: 2017-07-07
Citation Information: Open Physics, Volume 15, Issue 1, Pages 494–500, ISSN (Online) 2391-5471, DOI: https://doi.org/10.1515/phys-2017-0056.
© 2017 G. Rodríguez-Bermúdez et al.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0