An extension of kPCA using multiple kernel learning would enable the user to project the data points into a low-dimensional subspace that is based on several data sources and, thereby, to visualize different characteristics of the data points in combination. However, the direct implementation, i.e.

$$\mathrm{a}\mathrm{r}\mathrm{g}\underset{\mathit{\alpha},\mathit{\beta}}{max}{\Vert \left(\sum _{m=1}^{M}{\mathit{\beta}}_{m}{K}_{m}\right)\mathit{\alpha}\Vert}^{2},$$

$$||{\lambda}_{1}{\mathit{\alpha}}^{T}\mathit{\alpha}||=1;\phantom{\rule{thinmathspace}{0ex}}{\mathit{\beta}}_{m}\ge 0,\phantom{\rule{thinmathspace}{0ex}}m=1,\dots ,M;\phantom{\rule{thinmathspace}{0ex}}\sum _{m=1}^{M}{\beta}_{m}=1$$

does not allow for data integration. This becomes clear when looking at Thompson’s inequality concerning the eigenvalues of sums of matrices [12]. Consider *A* and *B* being *n*×*n* Hermitian matrices and *C*=*A*+*B*, with their respective eigenvalues *λ*(*A*)_{i}, *λ*(*B*)_{i}, and *λ*(*C*)_{i} sorted decreasingly. Then, for any *p*≥1,

$$\sum _{i=1}^{p}\lambda (C{)}_{i}\le \sum _{i=1}^{p}\lambda (A{)}_{i}+\sum _{i=1}^{p}\lambda (B{)}_{i}$$

holds. If we extend this formula by including the kernel weight *β*_{1} with *C*=*β*_{1}*A*+(1–*β*_{1})*B* and 0≤*β*_{1}≤1, we obtain the following inequality

$$\sum _{i=1}^{p}\lambda (C{)}_{i}\le {\beta}_{1}\sum _{i=1}^{p}\lambda (A{)}_{i}+(1-{\beta}_{1})\sum _{i=1}^{p}\lambda (B{)}_{i}.$$

One can see, that the right hand side is maximized if the kernel matrix with the highest sum of the *p* largest eigenvalues has a weight of 1. In that setting, the right hand side is equal to the left hand side and, thus, this would also be the maximum of the left hand side. The extension to more than two kernel matrices can be made recursively. Therefore, optimizing Problem [1] leads to weight vectors *β* with *β*_{i}=1 and *β*_{j}=0 for all *j*≠*i*, where *i* is the index of the matrix with the *p* largest eigenvalues.

Although this behavior maximizes the variance, it might not be the best choice for biological data, where we assume, that different data types can give complementing information and should therefore be considered jointly. Hence, in the following, we will introduce a scoring function, that combines the idea of kPCA with the assumption of different data supplementing each other.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.