We address the correspondence problem which arises when applying empirical mode decomposition (EMD) to multi-trial and multi-subject data. EMD decomposes a signal into a set of narrow-band components named intrinsic mode functions (IMFs). The number of IMFs and their signal properties can be different between trials, channels and subjects. In order to assign IMFs with similar characteristics to each other, we compare two assignment methods, unbalanced assignment and k-cardinality assignment and two clustering algorithms, namely hierarchical clustering and density-based spatial clustering of applications with noise based on heart rate variability data of children with temporal lobe epilepsy.
The empirical mode decomposition (EMD) is a data-driven adaptive algorithm for multi-scale decomposition and time-frequency analysis of real-world signals proposed by Huang et al. in 1998 . EMD takes no a priori assumptions regarding the data and can be applied to non-linear and non-stationary signals. It decomposes the signal into a set of amplitude and frequency modulated zero mean oscillatory signals, named intrinsic mode functions (IMFs). The sifting process by which the IMFs are obtained, acts like a data adapted filtering and the method was shown to have filter bank properties . Without predefined cut-off frequencies, each IMF has its own distinct frequency band and specific frequency characteristics, preserving the non-linear properties of the signal’s natural components. That is why EMD has already been used in many applications in fields such as neuroscience and biomedicine , . Currently several modified versions of EMD exist that either address disadvantages like the mixing problem, i.e. similar frequencies can appear in different IMFs, such as ensemble empirical mode decomposition (EEMD)  and complete ensemble empirical mode decomposition with adaptive noise (CEEMD) , or are multivariate extensions of EMD such as bivariate EMD , trivariate EMD and multivariate empirical mode decomposition (MEMD)  when multichannel (multi-trial) data are available. The advantage of the multivariate versions is that the number of IMFs is equal for all channels (trials) and the filter bank property is preserved . Applying EMD separately on each channel (trial, subject) most often produces a different number of IMFs for each channel with IMFs being misaligned (the spectral characteristics deviate from each other). This is known as the correspondence problem (CP). In the case of a multi-trial, multichannel and group analyses, the CP must be solved before any further analysis (e.g. ensemble averaging) of the IMFs can be performed. Solutions to the CP have been provided so far only by a few studies. A semi-automated approach (SA) based on assignment decision made by an expert  can be considered only for small data sets as it is time consuming. Cluster analysis approaches used to assign trial or group-related IMFs , ,  cannot guarantee a pair-wise assignment to different groups of different IMFs from one source. Using MEMD reduces the CP by enforcing an equal number of IMFs for all channels, and a direct correspondence to each other with respect to their frequency characteristics . However, some IMFs may contain no real signal information but are the result of the forced correspondence. In a recent study , we proposed a fully automated approach for assigning IMFs from different sources based on the Kuhn-Munkres algorithm (KMA) . The KMA based approach was tested on simulated as well as on real data and was compared to the SA and a hierarchical clustering approach . In this study, we continue to explore the CP by applying two new methods, k-cardinality assignment  and density-based spatial clustering of applications with noise (DBSCAN) . For a direct comparison with the previous results we investigate one of the real data sets used in . The data consists of heart rate variability (HRV) derived from children with temporal lobe epilepsy (TLE).
2 Material and methods
HRV data was derived from ECG recordings from 18 children with TLE. The ECG data were obtained during pre-surgical evaluation of the subjects performed at the Vienna pediatric epilepsy center following a standard procedure. The protocol was approved by the local committee of the University Hospital Vienna. A detailed description on how the HRV was computed can be found in , . The data consist of 300 s before seizure onset and has a sampling frequency of 8 Hz.
2.2.1 The correspondence problem
The CP can be solved by a pair-wise approach, i.e. pair-wise correspondence, or by a clustering approach. Different forms of CP are possible, depending on the method used to compute the IMFs, the signals’ properties and the specific approach used to assign the IMFs among themselves. Some of the cases that can arise in a group analysis of IMFs, which are addressed by the four methods used in the present study, are presented in Figure 1. The first three cases (I–III) are related to a pair-wise approach. Case (I) represents a 1:1 assignment of all IMFs for two sets with the same number of IMFs. In case (II) the number of IMFs of the two sets can be different, but a 1:1 assignment is still required for a maximum amount of IMFs. These cases can be solved by the KMA approach. Case (III) allows the selection of the number of IMFs that have to be assigned. This is helpful when IMFs of one set do not have a direct correspondent in the other one and their forced assignment should be avoided. We address this case by using the k-cardinality assignment method. Cases (IV) and (V) show the multiple correspondence approaches that can be solved with clustering. In case (IV) all IMFs must be assigned, allowing also sets with different number of IMFs. In this case, a 1:n-assignment is enforced. This case occurs when using hierarchical clustering is used. The last case (V) shows the behavior of DBSCAN. In this case 1:1- or 1:n- assignments can occur but also IMFs that don’t have a correspondent may be left unassigned.
2.2.2 Reference-based assignment approach
The group assignment of IMFs can be seen as a pair-wise linear assignment, with one set of IMFs selected as a reference for all remaining sets . The linear assignment problem is solved by an extension of the Kuhn-Munkres algorithm  that minimizes the cost of an assignment with the minimum cost between two matching pairs. The reference set is selected among the ones with maximum number of IMFs and has the minimal total assignment cost . The k-cardinality assignment allows the selection of k IMFs for each set that have to be assigned, with k < minimum number of IMFs. The k-cardinality assignment problem was shown to be resolvable by transforming it into an assignment problem . Thus, the KMA approach can also be used to solve the k-cardinality assignment after the required transformation. A maximum k For all methods, the cost of assigning two IMFs is computed as one minus the Person correlation coefficient between the amplitude spectra of two IMFs.
2.2.3 Clustering approach
One advantage of the clustering methods is that no reference set is required for assignment. However, a disjunct assignment of different IMFs of one set to pair-wise corresponding clusters is not guaranteed as in the case of the reference-based approach. For the agglomerative hierarchical clustering approach with weighted average distance as a linkage criterion, the number of clusters is selected as the maximum number of IMFs among all sets. DBSCAN is a clustering algorithm that doesn’t require an a priori selection of the number of clusters and allows a special noise cluster for all unassigned IMFs. Thus, a forced matching of non-matching IMFs may be avoided. However, two parameters must be selected for DBSCAN: the minimum number of elements that can form a cluster and the maximum distance allowed between two points of the same cluster . The performance of DBSCAN is highly dependable of the selection of these criteria, especially of the latter distance threshold. The distance measure used corresponds to the cost defined for the KMA.
The HRV data are independently decomposed using CEEMD with a noise standard deviation of 0.05, 20 noise process realizations each and a maximum number of sifting iteration of 100 . The sifting process by which the IMFs are computed assures that the IMFs are ordered from high to low in relation to their frequency band. Thus, low-index IMFs have a higher peak frequency and usually broader frequency band as compared to the IMFs with higher indices. After the spectrum of each IMF is computed with the fast Fourier transform (FFT), the distances between all pairs of IMFs are determined. These distances represent the cost of assigning two IMFs into one group/cluster. An example of distances computed between IMFs belonging to four subjects is presented in Figure 2. It is noticeable that high-index IMFs that correspond to the low frequency bands have a much smaller distance between them compared to the distance between other IMFs, not only for IMFs of different subjects but also for IMFs of the same subject. This is an important aspect that affects the behavior of the clustering approaches that can be observed in Figure 3, where the final assignment of the IMFs for the entire group using all methods is shown.
The reference-based approach (I) and the k-cardinality assignment with k = 10 (II) offer the best solution to the CP. For a smaller k, k = 8 (V) and k = 5 (VI), the low-index IMFs (high frequency band) are excluded from assignment, which is to be expected given that the distance between high-band IMFs is greater than for low-band ones (see Figure 2) and so, the total assignment cost is minimized when they are not assigned. The hierarchical clustering with 13 clusters (II) performs better than DBSCAN (III) but both fail to assign different high-index IMFs to different clusters. This can be seen in cluster M (II) and cluster G (III). Even more, DBSCAN finds only 7 clusters for a selected threshold distance of 0.165 with three different clusters having the same IMF index. This is the effect of the small distances between IMFs presented in Figure 2. In the case of DBSCAN, a higher threshold distance results in the assignment of all IMFs in one cluster, and a smaller value assigns better the high-index IMFs and excludes the other ones in the noise cluster. The physiologically most relevant IMFs (5–8) for HRV data given their spectral components ,  are best assigned in the case of the reference-based approach and k-cardinality assignment with k = 10.
Appling EMD to multi-trial, multichannel and multiple subject data, the properties of the resulting IMFs can deviate from each other between trials, channels and subjects. The correspondence problem must be solved before any further analysis is performed. The automatic correspondence assignment of IMFs proposed in  and the k-cardinality assignment approach with k = 10, both reference-based and solvable with the KMA, offer the best results compared to the hierarchical clustering approach and DBSCAN. The advantage of the k-cardinality assignment approach is that it allows the selection of k number of IMFs to be assignment, relaxing the constriction that for all IMFs a correspondent is required. However, the selection of k is strongly data-dependent because it is not necessary fulfilled that IMFs of interest do not exhibit the lowest distances. In our example, as the number k decreases, the low-index IMFs that correspond to higher frequency bands will be the ones excluded, because the distance between them is much higher compared to the distance between high-index ones (lower frequencies). This effect is not desired when the IMFs of interest are the ones with higher frequencies.
Research funding: This study was supported by the German Research Foundation (Wi 1166/12–2 and Le 2025/6–2). Conflict of interest: Authors state no conflict of interest. Material and Methods: Informed consent: Informed consent has been obtained from all individuals included in this study. Ethical approval: The research related to human use complies with all the relevant national regulations, institutional policies and was performed in accordance with the tenets of the Helsinki Declaration, and has been approved by the authors’ institutional review board or equivalent committee.
 Huang NE, Shen Z, Long SR, Wu MLC, Shih HH, Zheng QN, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A Mat. 1998;454:903–95.10.1098/rspa.1998.0193Search in Google Scholar
 Chen B, Zhao SL, Li PY. Application of Hilbert- Huang transform in structural health monitoring: a state-of-the-art review. Math Probl Eng. 2014;2014:22.10.1155/2014/317954Search in Google Scholar
 Piper D, Schiecke K, Pester B, Benninger F, Feucht M, Witte H. Time-varaint coherence between heart rate variability and EEG activity in epileptic patients: an advanced coupling analysis between physiological networks. New J Phys. 2014;16:115012.10.1088/1367-2630/16/11/115012Search in Google Scholar
 Torres ME, Colominas MA, Schlotthauer G, Flandrin P. A complete ensemble empirical mode decomposition with adaptive noise. IEEE Int Conf on Acoust, Speech and Signal Proc ICASSP-11, Prague (CZ) 2011; pp. 4144–7.10.1109/ICASSP.2011.5947265Search in Google Scholar
 Schiecke K, Wacker M, Piper D, Benninger F, Feucht M, Witte H. Time-variant, frequency selective, linear and nonlinear analysis of heart rate variability in children with temporal lobe epilepsy. IEEE Tr BME. 2014;61:1798–808.10.1109/TBME.2014.2307481Search in Google Scholar PubMed
 Wang ZS, Maler A, Logothetis NK, Liang HL. Single-trial classification of bistable perception by integrating empirical mode decomposition, clustering, and support vector machine. EURASIP J Adv Signal Process. 2008;2008:592742.10.1155/2008/592742Search in Google Scholar PubMed PubMed Central
 Rutkowski TM, Mandic DP, Cichocki A, Przybyszewski AW. Emd approach to multichannel eeg data – the amplitude and phase components clustering analysis. J Circuit Syst Comp. 2010;19:215–29.10.1142/S0218126610006037Search in Google Scholar
 Schiecke K, Schmidt C, Piper D, Putsche P, Feucht M, Leistritz L, et al. Assignment of empirical mode decomposition components and its application to biomedical signals. Methods Inf Med. 2015;54:461–73.10.3414/ME14-02-0024Search in Google Scholar PubMed
 Sander J, Ester M, Kriegel HP, Xu XW. Density based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Disc. 1998;2:169–94.10.1023/A:1009745219419Search in Google Scholar
 Bai Gz. A New Algorithm for k-Cardinality Assignment Problem, Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on, Wuhan, 2009. p. 1–4.10.1109/CISE.2009.5363717Search in Google Scholar
©2016 Diana Piper et al., licensee De Gruyter.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.