Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access June 10, 2022

Regularity-based spectral clustering and mapping the Fiedler-carpet

  • Marianna Bolla EMAIL logo , Vilas Winstein , Tao You , Frank Seidl and Fatma Abdelkhalek
From the journal Special Matrices

Abstract

We discuss spectral clustering from a variety of perspectives that include extending techniques to rectangular arrays, considering the problem of discrepancy minimization, and applying the methods to directed graphs. Near-optimal clusters can be obtained by singular value decomposition together with the weighted k-means algorithm. In the case of rectangular arrays, this means enhancing the method of correspondence analysis with clustering, while in the case of edge-weighted graphs, a normalized Laplacian-based clustering. In the latter case, it is proved that a spectral gap between the (k1)st and kth smallest positive eigenvalues of the normalized Laplacian matrix gives rise to a sudden decrease of the inner cluster variances when the number of clusters of the vertex representatives is 2k1, but only the first k1 eigenvectors are used in the representation. The ensemble of these eigenvectors constitute the so-called Fiedler-carpet.

MSC 2010: 05C50; 62H25; 65H10

1 Introduction

The spectral graph theory started developing about 50 years ago (see, e.g., Hoffman [1], Fiedler [2], Cvetković et al. [3], and Chung [4]), with the goal of characterizing certain structural properties of a graph by means of the eigenvalues of its adjacency or Laplacian matrix. In the last decades of the 20th century, the eigenvectors corresponding to the smallest few eigenvalues of the Laplacian matrix were used to obtain clusterings of the vertices into disjoint parts so that the inter-cluster relations are negligible compared to the intra-cluster ones. The motivation for finding such clusterings comes from machine learning, where the classification of data into a small number of highly similar clusters is a common task. In this setup, the famous Fiedler-vector, the eigenvector, corresponding to the smallest positive Laplacian eigenvalue was used to classify the vertices into two parts. To find a clustering with k2 parts, one can instead use the k1 eigenvectors corresponding to the k1 smallest positive eigenvalues. It was also noted in [5] that it is better to have the more eigenvectors but no exact estimates between the spectral gap and the quality of the clustering were available, except the k=1 and k=2 cases; the former case is related to the isoperimetric number and expander graphs (see, e.g., [4,6,7]), while the latter one is related to the sum of the inner variances of 2-clusterings (see [8]).

Since then, the problem has been generalized in several ways: to edge-weighted graphs and rectangular arrays of nonnegative entries (e.g., microarrays in biological genetics and forensic science [9,10, 11,12]), and to degree-corrected adjacency and Laplacian matrices [13,14]. Starting in the 21st century, physicists and social scientists introduced modularity matrices and investigated the so-called anticommunity structures (where intracluster relations are negligible compared to the intercluster ones) in contrast to the former community structures [15]. By uniting these two approaches, so-called regular cluster pairs with small discrepancy can be defined, where one looks for homogeneous clusters (e.g., in microarrays, one looks for groups of genes that similarly influence the same groups of conditions), see [10]. The existence of such a regular structure is theoretically guaranteed by the Abel-prize winner Szemeredi’s regularity lemma [16], which, for any small positive ε, guarantees a number k=k(ε) of clusters (independent of the numbers of vertices) such that by partitioning the vertices into k parts (sometimes requiring possibly a small exceptional part), the pairs of clusters have discrepancy less than ε. However, this k guaranteed by the regularity lemma can be enormously large and not applicable for practical purposes. Our goal is to give a moderate k, where the sum of the inner variances of 2k1 clusters is estimated as mentioned earlier by the spectral gap between the (k1)st and kth smallest positive normalized Laplacian eigenvalues, even in the worst-case scenario. This is the generalization of a theorem of [8] that was applicable only to the k=2 situation. In that situation, 2k1=k, and so the Fiedler vector was used for clustering into k=2 parts. If k>2, then we use k1 nontrivial eigenvectors, the ensemble of which form what we call Fiedler-carpet. In special cases, e.g., in the case of generalized multiclass random or quasirandom graphs, both the objective function of the k-means algorithm and the k-way discrepancy dramatically decrease compared to the values for k1 [17], where the number of clusters is one more than the number of eigenvectors used in the representation. However, in the generic case, the number of eigenvectors used for the classification is much smaller than the number of clusters, which is promising from the point of view of computational complexity.

The organization of this article is as follows. In Section 2, we define the most important notions and facts concerning normalized Laplacian spectra and multiway discrepancy of rectangular arrays with nonnegative entries [10,11,13,17,18]. In Section 3, the main theorem is stated and proved. The proof is based on the energy minimizing representation, and we analyze the structure of the vertex representatives and the underlying spectral subspaces that are mapped in a convenient way. In Appendix A, a few technical details, simulation results, and plots of the so-obtained Fiedler-carpet are presented. In Section 4, an application to the directed graph of a migration dataset is discussed. This discussion is supplemented with more figures in Appendix C. Appendix B contains pseudocode for the algorithms, which are discussed. In Section 5, the main contributions are summarized and conclusions are drawn.

2 Minimal and regular cuts versus spectra

Let W be the n×n edge-weight matrix of a graph G on n vertices. It is symmetric and has 0 diagonal and nonnegative entries. In the case of a simple graph, W=(wij) is the usual adjacency matrix. The generalized degrees are di=j=1nwij, for i=1,,n. Assume that dis are all positive and the diagonal degree-matrix D contains them in its main diagonal. The Laplacian of G is L=DW, while its normalized Laplacian is

LD=D1/2LD1/2=ID1/2WD1/2.

Because of the normalization, LD is not affected by the scaling of the edge-weights. Therefore, we can assume that i=1ndi=1. LD is positive semidefinite, and if W is irreducible (G is connected), then its eigenvalues are

0=λ0<λ1λn12

with unit-norm pairwise orthogonal eigenvectors u0,u1,,un1. In particular, u0=(d1,,dn)TdT.

For 1<d<n, the row vectors of the n×d matrix X=(D1/2u1,,D1/2ud) are optimal d-dimensional representatives r1,,rn of the vertices. More precisely, X minimizes the energy function:

(1)Qd(X)=i=1n1j=i+1nwijrirj2=tr(XTLX),

where the general vertex representatives r1,,rnRd are row vectors of the n×d matrix X, and they are required to satisfy the constraints XTDX=Id and i=1ndiri=0. Intuitively, minimizing the energy Qd(X) forces representatives of vertices connected with large edge-weights to be close to each other. The minimum of Qd(X), attained at X, is the sum of the d smallest positive eigenvalues of LD.

The weighted k-variance of these representatives is defined as follows:

(2)Sk2(X)=min(V1,,Vk)Pki=1kjVidjrjci2,

where Vol(U)=jUdj, ci=1Vol(Vi)jVidjrj is the weighted center of the cluster Vi, and the minimization is over the set Pk of proper k-partitions Pk=(V1,,Vk) of the vertex set.

We will use the weighted k-means algorithm to approach this minimum. In [19], it is stated that if the data satisfy the k-clusterable criterion (Sk2ε2Sk12 with a small enough ε), then there is a PTAS (polynomial time approximation scheme) for the k-means problem. This is the situation we usually encounter.

It is well known that the sum of the k bottom eigenvalues of the normalized Laplacian matrix estimates from below the k-way normalized cut of G, which is fk(G)=minPkPkf(Pk,G), where

f(Pk,G)=a=1k1b=a+1k1Vol(Va)+1Vol(Vb)w(Va,Vb)=a=1kw(Va,V¯a)Vol(Va)=ka=1kw(Va,Va)Vol(Va).

Here, w(Va,Vb)=iVajVbwij is the weighted cut between the cluster pairs. Since i=1k1λi is the overall minimum of Qk (on the orthogonality constraints) and fk(G) is the minimum over partition vectors (having stepwise constant coordinates over the parts of Pk), the relation

(3)i=1k1λifk(G)

is easy to prove. This estimate is sharper if the eigensubspace spanned by the corresponding eigenvectors is closer to that of the partition vectors in the convenient k-partition of the vertices, the one produced by the weighted k-means algorithm. Therefore, Sk2(Xk1) indicates the quality of the k-clustering based on the k1 bottom eigenvectors (excluding the trivial one). Later, it will be used that neither Qk nor Sk2(Xk1) is affected by the orientation of the orthonormal eigenvectors.

In [8], it is proved that S22(X1)λ1λ2, so the larger the gap after the first positive eigenvalue of LD, the sharper the estimate in (3) is. Here, this statement is generalized to the gap between λk1 and λk, but instead of k clusters, we must consider 2k1 ones, which are based on (k1)-dimensional vertex representatives. This contrasts with the typical situation in the literature (see, e.g., [20,21,22]), where the number of clusters is the same as the number of eigenvectors used for the classification. The message of Theorem 1 of Section 3 is that the number of clusters is much higher in the generic case than the dimension of the representatives, at least in the minimum multiway cut problems. With the discrepancy objective, the famous Szemerédi’s regularity lemma [16] also suggests this phenomenon.

Now we consider the discrepancy view. The theory can be better illustrated by considering rectangular arrays of nonnegative entries; adjacency matrices of simple, edge-weighted, and directed graphs are special cases of such matrices.

In many applications, for example, when microarrays are analyzed, the data are collected in the form of an m×n rectangular matrix C=(cij) of nonnegative real entries. (If the entries are integer frequency counts, then the array is called contingency table in statistics.) We assume that C is nondegenerate, i.e., CCT (when mn) or CTC (when m>n) is irreducible. Consequently, the row-sums drow,i=j=1ncij and column-sums dcol,j=i=1mcij of C are strictly positive, and the diagonal matrices Drow=diag(drow,1,,drow,m) and Dcol=diag(dcol,1,,dcol,n) are regular. Without the loss of generality, we also assume that i=1nj=1mcij=1, since neither our main object, the normalized contingency table

(4)CD=Drow1/2CDcol1/2,

nor the multiway discrepancies to be introduced are affected by any uniform scaling of the entries of C. It is known that the singular values of CD are in the [0,1] interval. The positive singular values, enumerated in nonincreasing order, are the real numbers denoted by

1=s0>s1sr1>0,

where r=rank(CD)=rank(C). Provided C is nondegenerate, 1 is a single singular value; it is called the trivial singular value and is denoted by s0 since it corresponds to the trivial pair of singular vectors, which are disregarded in clustering problems. This is a well-known fact of correspondence analysis; for further explanation, see [13,23] and the subsequent paragraph.

For a given integer 1kmin{m,n}, we are looking for k-dimensional representatives r1,,rmRk of the rows and q1,,qnRk of the columns of C such that they minimize the energy function:

(5)Qk=i=1mj=1ncijriqj2

subject to

(6)i=1mdrow,iririT=Ikandj=1ndcol,jqjqjT=Ik.

When minimized, the objective function Qk favors k-dimensional placement of the rows and columns such that representatives of columns and rows with large association (cij) are close to each other. It is easy to prove that the minimum is obtained by the singular value decomposition (SVD):

(7)CD=k=0r1skvkukT,

where rmin{n,m} is the rank of CD. The constrained minimum of Qk is 2ki=0k1si, and it is attained with row- and column-representatives that are row vectors of the matrices Drow1/2(v0,v1,,vk1) and Dcol1/2(u0,u1,,uk1), respectively.

Note that if the entries of C are frequency counts and their sum (N) is the sample size, then the χ2 statistic, which measures the deviation from independence, is

(8)χ2=Ni=1r1si2.

If the χ2 test based on this statistic indicates significant deviance from independence (i.e., from the rank 1 approximation of C), then one may look for the optimal rank k approximation (1<k<r=rank(C)), which is constructed using the first k singular vector pairs.

When bi-clustering the rows and columns of C one may also look for subtables close to independent ones. This is measured by the discrepancy. In [10], the multiway discrepancy of the rectangular matrix C of nonnegative entries in the proper k-partition R1,,Rk of its rows and C1,,Ck of its columns is defined as follows:

(9)md(C;R1,,Rk,C1,,Ck)=max1a,bkXRa,YCbc(X,Y)ρ(Ra,Cb)Vol(X)Vol(Y)Vol(X)Vol(Y)=max1a,bkXRa,YCbρ(X,Y)ρ(Ra,Cb)Vol(X)Vol(Y),

where c(X,Y)=iXjYcij is the cut between XRa and YCb, Vol(X)=iXdrow,i is the volume of the row-subset X, and Vol(Y)=jYdcol,j is the volume of the column-subset Y, whereas ρ(Ra,Cb)=c(Ra,Cb)Vol(Ra)Vol(Cb) denotes the relative density between Ra and Cb. The minimum k-way discrepancy of C itself is expressed as follows:

mdk(C)=minR1,,RkC1,,Ckmd(C;R1,,Rk,C1,,Ck).

In [10], the following is proved:

sk9mdk(C)(k+29klnmdk(C)),

provided 0<mdk(C)<1. In the forward direction, the following is established in [13]. Given the m×n contingency table C, consider the spectral clusters R1,,Rk of its rows and C1,,Ck of its columns, obtained by applying the weighted k-means algorithm to the (k1)-dimensional row- and column-representatives. Let Sk,row2 and Sk,col2 denote the minima of the weighted k-means algorithm, applied to the rows and columns, respectively. Then, under some balancing conditions for the margins and for the cluster sizes, mdk(C)B(2k(Sk,row+Sk,col)+sk), where B is some constant that depends only on the balancing conditions, and does not depend on m and n. Roughly speaking, the two directions together imply that if sk is “small” and “much smaller” than sk1, then one may expect a simultaneous k-clustering of the rows and columns of C with small k-way discrepancy. This is the case for generalized random and quasirandom graphs [17].

This notion can be extended to an edge-weighted graph G and denoted by mdk(G). In that setup, C plays the role of the weighted adjacency matrix. Here, the singular values of the normalized adjacency matrix are the absolute values of the eigenvalues, which are encountered in the decreasing order.

At this point, we introduce some new matrices, originally defined by physicists (see [15]). The modularity matrix of an edge-weighted graph G is defined as M=WddT, where the entries of W sum to 1. The normalized modularity matrix of G (see [24]) is expressed as follows:

MD=D1/2MD1/2=D1/2WD1/2ddT=W DddT.

The normalized modularity matrix is the normalized edge-weight matrix deprived of the trivial dyad. Obviously, LD=IWD=IMDddT, see [13].

Therefore, the k1 largest singular values of MD are the absolute values of the k1 largest absolute value eigenvalues of WD (except the trivial 1). These, in turn, are 1 minus the k1 positive eigenvalues of LD, which are farthest from 1. If those are all less than 1, then these are 1λ1,,1λk1. In this case, the regularity-based spectral clustering boils down to the minimum cut objective.

Otherwise, for a 1<k<n integer fixed, in the modularity-based spectral clustering, we look for the proper k-partition V1,,Vk of the vertices such that the within- and between-cluster discrepancies are minimized. To motivate the introduction of the exact discrepancy measure, observe that the ij entry of M is wijdidj, which is the difference between the actual edge-weight between the vertices i and j and the edge-weight that is expected under independent attachment of them with probabilities di and dj, respectively. Consequently, the difference between the actual and the expected total edge-weight between the subsets X,YV is expressed as follows:

iXjY(wijdidj)=w(X,Y)Vol(X)Vol(Y).

A directed edge-weighted graph G=(V,W) is described by its quadratic, but usually not symmetric weighted adjacency matrix W=(wij) of zero diagonal, where wij is the nonnegative weight of the ji edge (ij). The row-sums din,i=j=1nwij and column-sums dout,j=i=1nwij of W are the in- and out-degrees, while Din=diag(din,1,,din,n) and Dout=diag(dout,1,,dout,n) are the diagonal in- and out-degree matrices. The multiway discrepancy of the directed, edge-weighted graph G=(V,W) in the in-clustering Vin,1,,Vin,k and out-clustering Vout,1,,Vout,k of its vertices is expressed as follows:

md(G;Vin,1,,Vin,k,Vout,1,,Vout,k)=max1a,bkXVout,a,YVin,bw(X,Y)ρ(Vout,b,Vin,a)Volin(X)Volout(Y)Voln(X)Volout(Y),

where w(X,Y) is the sum of the weights of the XY edges, whereas Volin(X)=iXdin,i and Volout(Y)=jYdout,j are the out- and in-volumes, respectively. The minimum k-way discrepancy of the directed edge-weighted graph G=(V,W) is expressed as follows:

mdk(G)=minVin,1,,Vin,kVout,1,,Vout,kmd(G;Vin,1,,Vin,k,Vout,1,,Vout,k).

In [10], it is proved that

sk9mdk(G)(k+29klnmdk(G)),

where sk is the kth largest nontrivial singular value of the normalized weighted adjacency matrix WD=Din1/2WDout1/2. In Section 4, we apply the SVD of WD to find migration patterns, i.e., emigration and immigration trait clusters.

3 Mapping the Fiedler-carpet: more clusters than eigenvectors

Theorem 1

LetG=(V,W)be connected edge-weighted graph with generalized degreesd1,,dnand assume thati=1ndi=1. Let0=λ0<λ1λn12denote the eigenvalues of the normalized Laplacian matrixLDofG. Then, for the weighted2k1-variance of the optimal(k1)-dimensional vertex representatives, comprising row vectors of the matrixXk1, the following upper estimate holds:

S2k12(Xk1)j=1k1λjλk,
providedλk1<λk.

Proof

Recall that, with the notation of Section 2, Xk1=(D1/2u1,,D1/2uk1), where the trivial D1/2u0=1 vector is disregarded, and u0,u1,uk1 are unit-norm, pairwise orthogonal eigenvectors corresponding to the eigenvalues 0=λ0<λ1λk1 of LD, respectively. As the trivial eigenvector is disregarded, we only use the coordinates of the vectors xjD1/2uj=(xj1,,xjn)T for j=1,,k1.

Since u1,uk1 form an orthonormal system and they are orthogonal to the u0=d vector, for the coordinates of xj, the following relations hold:

(10)i=1ndixji=0,i=1ndixji2=1(j=1,,k1),i=1ndixjixli=0(jl).

Now we will find a “witness,” i.e., a vector y=(y1,,yn)T such that for it, the conditions

(11)i=1ndiyi=0

and

(12)i=1ndixjiyi=0,j=1,,k1

hold. Moreover, we will find y with coordinates in the following form:

(13)yij=1k1xjiajb,i=1,,n,

where a1,,ak1 and b are appropriate real numbers. We will show that there exist such real numbers so that yi’s defined by them satisfy conditions (11) and (12).

Indeed, when we already have a1,,ak1, the aforementioned conditions together with i=1ndi=1 yield

(14)b=j=1k1bj,wherebj=i=1ndixjiaj,j=1,,k1.

With this choice of b, the fulfillment of (12) means that for j=1,,k1:

i=1ndixjiyi=i=1ndixjil=1k1xlialbl=0.

But (10) implies that

i=1ndixjibl=0

for l=1,,k1. This provides the following system of equations for a1,,ak1:

(15)fj=i=1ndixjil=1k1xlial=0,j=1,,k1.

We are looking for the root of the f=(f1,,fk1):Rk1Rk1 function of stepwise linear coordinate functions. To prove that f has a root, we will use the multi-dimensional generalization of the Bolzano theorem: a continuous map between two normed metric spaces of the same dimensions takes a connected set into a connected one. Because of symmetry considerations, the range contains the origin, see Appendix A for further details.

Now let us define the two cluster centers for the jth coordinates by

cj1=ajbjandcj2=aj+bj.

Observe that

xjiajbj=cj1xjiifxji<ajxjicj2ifxjiaj;

therefore,

(16)xjiajbj=min{xjicj1,xjicj2}

holds for i=1,,n; j=1,,k1. For j=1,,k1, they form 2k1 centers in k1 dimensions.

Let

σ2(y)=i=1ndiyi2

be the variance of the coordinates of y with respect to the discrete measure d1,,dn. Due to (16), σ2(y)S2k12(Xk1). Define the normalized vector y, i.e., the vector zRn of the following coordinates:

zi=yiσ(y),i=1,,n;

obviously, i=1ndizi2=1. Then,

maximzizmj=1k1xjixjm1σ(y),

since, by the definition of yi, the relation

yiymj=1k1xjixjm(im)

holds.

Let Xk=(Xk1,z)=(x1,xk1,z) be n×k matrix, containing valid k-dimensional representatives r1,,rn of the vertices in its rows; recall that the n×k matrix Xk=(x1,xk1,D1/2uk) contains the optimal k-dimensional representatives in its rows. Observe that Xk and Xk differ only in their last columns. Let ri denote the vector comprised of the first k1 coordinates of ri, i=1,,n. These are optimal (k1)-dimensional representatives of the vertices. By the optimality of the k-dimensional representation and using equation (1),

λ1++λkλ1+λk1=tr(XkTLXk)tr(Xk1TLXk1)tr(XkTLXk)tr(Xk1TLXk1)=i=1n1m=i+1nwimrirm2i=1n1m=i+1nwimrirm2=i=1n1m=i+1nwim[rirm2+(zizm)2]i=1n1m=i+1nwimrirm21+maxim(zizm)2rirm21+1σ2(y)1+1S2k12(Xk1),

which, by subtracting 1 from both the left- and right-hand sides and taking the reciprocals, finishes the proof.□

Note that only if λk1<λk, uk and xk are not in the subspace spanned by u1,,uk1. Theorem 1 indicates the following clustering property of the (k1)st and kth smallest normalized Laplacian eigenvalues: the greater the gap between them, the better the optimal k-dimensional representatives of the vertices can be classified into 2k1 clusters.

Figure 1 shows a graph with three well-separated clusters, as well as the image of the corresponding map f:R2R2 of the Fiedler-carpet as used in the proof of Theorem 1.

Figure 1 A graph with three well-separated clusters, and the image of its Fiedler-carpet.
Figure 1

A graph with three well-separated clusters, and the image of its Fiedler-carpet.

The image of f contains the origin, and we can find by inspection a pair (a1,a2) for which f(a1,a2) is approximately zero; namely, choosing a1=0.19099 and a2=0.35688 gives f(a1,a2)(0.000002,0.00000001). The two-dimensional representative of vertex i is the point (x1i,x2i) for i=1,n; these are plotted in Figure 2, where the colors indicate the cluster memberships and the black point denotes the approximate root of f.

Figure 2 Two-dimensional vertex representatives of the graph of Figure 1. The black point denotes the approximate root of ff.
Figure 2

Two-dimensional vertex representatives of the graph of Figure 1. The black point denotes the approximate root of f.

Some remarks are in order:

  1. Theorem 1 is the generalization of Theorem 2.2.3 of [8]. In the k=2 case, 2k1=k, but in general, the number of clusters is much larger than that of the relevant eigenvectors.

  2. The statement of the theorem has relevance, since for any k>0, the relation j=1k1λj<(k1)λk holds; but with analysis of variance considerations, S2k12k1 also holds. In particular, when k=2, only one eigenvector is used for the representation. The total inertia of the coordinates is 1, and it can be divided into the sum of nonnegative within- and between-cluster inertias. The within-cluster inertia is the sum of the inner variances of the two clusters, which is S22, and it is at most 1. (Here, the variances are calculated with respect to the discrete distribution d1,,dn.) When (k1)-dimensional representatives are used, then the total inertia is tr(Xk1TDXk1)=tr(Ik1)=k1, and the sum of the inner variances is again at most k1, but it is further bounded with j=1k1λjλk by Theorem 1.

  3. The vector u1 is called Fiedler-vector of the (nonnormalized) Laplacian. Here, we use the ensemble of the first k1 transformed eigenvectors of the normalized Laplacian together, the so-called Fiedler-carpet.

4 Applications

We investigated the international migrant stock by the country of origin and destination in the years 2015 and 2019. The focus is on 41 European countries plus the United States of America and Canada. The data[1] are based on the official registered migrants numbers, where columns and rows correspond to the country of origin and the country of destination, respectively.

Here, the quadratic, but not symmetric edge-weight matrix contains weights of bidirected edges (the diagonal is zero): the i,j entry is the number of persons going ji. Via SVD of the normalized table, we can find emigration (column) and immigration (row) clusters, between which the migration is the best homogeneous (in terms of discrepancy).

Both for the 2015 and 2019 data, there was a gap after four nontrivial singular values, and therefore, the corresponding four singular vector pairs were used to find five emigration and immigration trait clusters.

  • Singular values, 2015:

    1, 0.79098, 0.71857, 0.67213, 0.56862, 0.45293, 0.40896, 0.38178, 0.36325, 0.34785, 0.32648, 0.31769, 0.2996, 0.27927, 0.26566, 0.24718, 0.22638, 0.20632, 0.18349, 0.1651, 0.14384, 0.1359, 0.12721, 0.12092, 0.11816, 0.10374, 0.09545, 0.08278, 0.0738, 0.06371, 0.05673, 0.04553, 0.03488, 0.03107, 0.02967, 0.02693, 0.01557, 0.00788, 0.00584, 0.00519, 0.00191, 0.0017, 0.00099.

  • Singular values, 2019:

    1, 0.77844, 0.70989, 0.65059, 0.55122, 0.43612, 0.39512, 0.36194, 0.3558, 0.33882, 0.32174, 0.30719, 0.29601, 0.28181, 0.26865, 0.259, 0.22421, 0.1917, 0.17988, 0.1516, 0.13671, 0.13243, 0.12397, 0.11542, 0.10598, 0.09216, 0.08889, 0.07958, 0.06835, 0.06154, 0.05377, 0.04412, 0.03436, 0.03124, 0.02899, 0.02745, 0.01507, 0.00814, 0.00619, 0.0051, 0.00216, 0.00129, 0.00089.

In the discrepancy-based spectral clustering, the number of clusters is one more than the number of structural singular values (excluding the trivial 1). In our case, the number of clusters was five. The five emigration and immigration trait clusters for 2015 are shown in Tables 1 and 2, whereas those for 2019 are shown in Tables 3 and 4. Figure 3(a) and (b) illustrate the immigration–emigration cluster-pairs with the countries rearranged by their cluster memberships. The frequency counts are represented with light to dark squares.

Table 1

Country memberships of emigration trait clusters, 2015

Cluster #Emigration countries
1Austria, Belgium, Bulgaria, Canada, Czechia, Denmark, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Monaco, Netherlands, North Macedonia, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Spain, Sweden, Switzerland, United Kingdom, United States of America
2Bosnia and Herzegovina, Croatia, Montenegro, Slovenia
3Albania
4Belarus, Estonia, Republic of Moldova, Ukraine
5Russian Federation
Table 2

Country memberships of immigration trait clusters, 2015

Cluster #Immigration countries
1Albania, Austria, Belgium, Bulgaria, Canada, Czechia, Denmark, Finland, France, Germany, Hungary, Iceland, Ireland, Liechtenstein, Luxembourg, Malta, Monaco, Netherlands, Norway, Portugal, Romania, Slovakia, Spain, Sweden, Switzerland, United Kingdom, United States of America
2Greece, Italy, North Macedonia
3Bosnia and Herzegovina, Croatia, Montenegro, Serbia, Slovenia
4Belarus, Estonia, Latvia, Lithuania, Ukraine
5Poland, Republic of Moldova, Russian Federation
Table 3

Country memberships of emigration trait clusters, 2019

Cluster #Emigration countries
1Austria, Belgium, Bulgaria, Canada, Czechia, Denmark, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Monaco, Netherlands, North Macedonia, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Spain, Sweden, Switzerland, United Kingdom, United States of America
2Bosnia and Herzegovina, Croatia, Montenegro, Slovenia
3Albania
4Belarus, Estonia, Republic of Moldova, Ukraine
5Russian Federation
Table 4

Country memberships of immigration trait clusters, 2019

Cluster #Immigration countries
1Albania, Austria, Belgium, Bulgaria, Canada, Czechia, Denmark, Finland, France, Germany, Hungary, Iceland, Ireland, Liechtenstein, Luxembourg, Malta, Monaco, Netherlands, Norway, Portugal, Romania, Slovakia, Spain, Sweden, Switzerland, United Kingdom, United States of America
2Bosnia and Herzegovina, Croatia, Montenegro, Serbia, Slovenia
3Greece, Italy, North Macedonia
4Poland, Republic of Moldova, Russian Federation
5Belarus, Estonia, Latvia, Lithuania, Ukraine
Figure 3 Immigration–emigration cluster-pairs with the countries rearranged by their cluster memberships. The frequency counts are represented with light to dark squares. (a) 2015 and (b) 2019.
Figure 3

Immigration–emigration cluster-pairs with the countries rearranged by their cluster memberships. The frequency counts are represented with light to dark squares. (a) 2015 and (b) 2019.

In 2015, by χ2 test, we found the smallest discrepancy between the emigration trait cluster number 2 and immigration trait cluster number 4, i.e., the subtable formed by them was close to an independent table of rank 1. The clusters were similar in the 2 years; in 2019, the smallest discrepancy was found between the emigration trait cluster number 2 and immigration trait cluster number 5, i.e., between the Balcanian and Baltic countries in both years. Correspondence analysis results with cluster memberships are found in Appendix C.

Here, we wanted to find as homogeneous cluster pairs as possible, as for migration patterns vis-a-vis the emigration–immigration. Our graph was a relatively small and sparse one, so Figure 3(a) and (b) show a large and some small clusters for both emigration and immigration. However, the emigration–immigration cluster pair with the smallest discrepancy is spotted by the χ2 test. In larger and more dense networks, e.g., in metabolic networks with bidirected edges between the vertices (enzymes), the weights representing the intensity of chemical reactions, so-called autocatalytic subnetworks could be found with our method, based on discrepancies.

5 Summary and author contributions

Spectral clustering is a collection of methods for clustering data points or vertices of a graph based on combinatorial criteria with spectral relaxation. Here, we generalize spectral clustering in several ways:

  1. Instead of simple or edge-weighed graphs, we consider directed graphs and rectangular arrays of nonnegative entries. For these, the method of correspondence analysis gives low-dimensional representation of the row and column items by means of SVD of the normalized table. This is applied to real-life data.

  2. Instead of the usual multiway minimum cut objective, we consider discrepancy minimization, for which a near optimal solution is given by the k-means algorithm applied to the row and column representatives, where the number of clusters (k) is concluded from the gap in the singular spectrum. There are theoretical results supporting this. Eventually, the near optimal clusters can be refined to decrease discrepancy.

  3. The number of clusters can be larger than the number of eigenvectors entered in the classification. The two are the same only in case of generalized random or quasirandom graphs, see [17]. In Theorem 1, an exact estimate for the sum of the inner variances of 2k1 clusters is given by means of the spectral gap between the (k1)st and kth smallest normalized Laplacian eigenvalues by using the Fiedler-carpet of the corresponding eigenvectors.

  4. When the number of clusters is the same as the number of eigenvectors entered in the classification, Davis-Kahan type subspace perturbation theorems are applicable, as in this case, the k-partition vectors span a k-dimensional subspace. However, if the number of parts is larger than that of the eigenvectors, then finer methods are applicable, like our construction in the proof, using a “witness.”

Acknowledgments

The research was done under the auspices of the Budapest Semesters in Mathematics program, in the framework of an undergraduate online research course in the fall semester 2021, with the participation of US undergraduate students. Also, Fatma Abdelkhalek’s work is related to a scholarship under the Stipendium Hungaricum program between Egypt and Hungary; further, Vilas Winstein's work is related to the DYNASNET (Dynamic and Structure in Networks) Project of the Rényi Institute, Budapest. The paper is dedicated to Gábor Tusnády for his 80th birthday, with whom the first author of this paper posed a conjecture in [8]. Here, the conjecture is proved in the k>2 case, albeit with much more clusters than the number of the eigenvalues preceding the spectral gap.

  1. Funding information: The authors acknowledge the contribution of Pro Mathematica and Arte and Fulbright Hungary organizations for the generous donation of funding for the open access fees.

  2. Conflict of interest: Authors state no conflict of interest.

  3. Data Availability Statement: The third-party dataset analyzed during the current study is based on the official registered migrants numbers of the United Nations, Department of Economic and social Affairs, Population Division, International Migrant Stock 2019. Availability of the data in https://www.un.org/en/development/desa/population/migration/data/estimates2/estimates19.asp.

Appendix A

In the f:RR case: f=f1, a=a1 and with the notation A=minix1i, B=maxix1i for

f(a)=i=1ndix1ix1ia

we have that f(A)=1 and f(B)=1. As f is continuous, it must have a root in (A,B), by the Bolzano theorem. Also note that this root of f is around the median of the coordinates of x1 with respect to the discrete measure d1,,dn.

Then consider the f=(f1,f2):R2R2 case. The coordinate functions are stepwise linear, continuous functions. The system of equations is expressed as follows:

(A1)f1(a1,a2)=i=1ndix1ix1ia1+i=1ndix1ix2ia2=0,f2(a1,a2)=i=1ndix2ix1ia1+i=1ndix2ix2ia2=0.

With the notation A=minix1i, B=maxix1i, C=minix2i, D=maxix2i, where A<0,B>0,C<0,D>0,

f(A,C)=(1,1),f(B,D)=(1,1),f(A,D)=(1,1),f(B,C)=(1,1).

Furthermore, f(x,y)=(1,1) if xA,yC; f(x,y)=(1,1) if xB,yD; f(x,y)=(1,1) if xA,yD; and f(B,C)=(1,1) if xB,yC.

We want to show that f has a root within the rectangle [A,B]×[C,D]. By the multivariate version of the Bolzano theorem, the f-map of this rectangle is a connected region in R2 that contains the points (1,1),(1,1),(1,1),(1,1) as “corners.” We will show that it contains (0,0) too.

Note that together with uj, the vector uj is also a unit-norm eigenvector, so instead of xj, we can as well use xj for j=1,2, which gives four possible domains of f: next to the rectangle [A,B]×[C,D], the rectangles [B,A]×[C,D], [A,B]×[D,C], and [B,A]×[D,C] are also closed, bounded regions, and the f-images of them show symmetry with respect to the coordinate axes. Therefore, it suffices to prove that the map of the union of them contains the origin. In Section 2, we saw that neither the objective function (Qk), nor the clustering is affected by the orientation of the eigenvectors, so the orientation is not denoted in the sequel. Also notice that with counter-orienting u1 or/and u2: if (a1,a2) is a root of f, then a1 instead of a1 and/or a2 instead of a2 will result in a root of f too.

The images are closed, bounded regions (usually not rectangles), but we will show that the opposite sides of them are parallel curves and sandwich the f1 and f2 axes, respectively. As the f-values sweep the region between these boundaries, the total range should contain the origin. Now the above, below, right, and left boundaries are investigated.

  1. Above: Consider the boundary curve between (1,1) and (1,1). Along that, a2=C and A<a1<B. Let H{i:x1i>a1}. Then, H and H¯; further,

    (A2)f2(a1,C)=i=1ndix2ix1ia1+i=1ndix2i(x2iC)=iHdix2i(x1ia1)iH¯dix2i(x1ia1)+1=2iHdix2i(x1ia1)+1=2iH¯dix2i(a1x1i)+1,

    where we intensively used conditions (10).

  2. Below: Consider the boundary curve between (1,1) and (1,1). Along that, a2=D and A<a1<B. Then,

    (A3)f2(a1,D)=i=1ndix2ix1ia1i=1ndix2i(x2iD)=iHdix2i(x1ia1)iH¯dix2i(x1ia1)1=2iHdix2i(x1ia1)1=2iH¯dix2i(a1x1i)1.
  3. Between (horizontally): Consider the case when a2=u(C,D) fixed and A<a1<B. Then

    f2(a1,u)=i=1ndix2ix1ia1+i=1ndix2ix2iu=iHdix2i(x1ia1)iH¯dix2i(x1ia1)+i:x2i>udix2i(x2iu)+i:x2iudix2i(ux2i)=2iHdix2i(x1ia1)1+2i:x2i>udix2i22ui:x2i>udix2i.
So the f2(a1,u) arcs are all parallel to the boundary curves f2(a1,C) and f2(a1,D) and to each other. In particular,

f2(a1,0)=2iHdix2i(x1ia1)1+2i:x2i>0dix2i2.

This arc is either closer to the above or the below curve (which are in distance 2 from each other), depending on whether i:x2i>0dix2i2 is less or greater than 12, but is strictly positive by condition (10). Hence, if u and u are “close” to 0, then the f2(a1,u) and f2(a1,u) arcs are not identical, otherwise it can happen that for some uu:

(A4)i:x2i>udix2i2ui:x2i>udix2i=i:x2i>udix2i2ui:x2i>udix2i.

The same is true vertically.

  1. Right: Consider the boundary curve between (1,1) and (1,1). Along that, a1=A and C<a2<D. Let F{i:x2i>a2}. Then, F and F¯; further,

    (A5)f1(A,a2)=i=1ndix1i(x1iA)i=1ndix1ix2ia2=1+iFdix1i(x2ia2)iF¯dix1i(x2ia2)=1+2iFdix1i(x2ia2)=1+2iF¯dix1i(a2x2i).
  2. Left: As from the left, consider the boundary curve between (1,1) and (1,1). Along that, a1=B and C<a2<D. Then,

    (A6)f1(B,a2)=1+2iFdix1i(x2ia2)=1+2iF¯dix1i(a2x2i).
  3. Between (vertically): Consider the case when a1=v(A,B) fixed and C<a2<D. Then,

    f1(v,a2)=1+2iFdix1i(x2ia2)1+2i:x1i>vdix1i22vi:x1i>vdix1i.
So the f1(v,a2) arcs are all parallel to the boundary curves f1(A,a2) and f1(B,a2) and to each other. In particular,

f1(0,a2)=2iFdix1i(x2ia2)1+2i:x1i>0dix1i2.

This arc is either closer to the right or the left curve (which are in distance 2 from each other), depending on whether i:x1i>0dix1i2 is less or greater than 12, but is strictly positive by condition (10). For this reason, if v and v are “close” to 0, then the f1(v,a2) and f1(v,a2) arcs are not identical, otherwise it can happen that for some vv:

(A7)i:x1i>vdix1i2vi:x1i>vdix1i=i:x1i>vdix1i2vi:x1i>vdix1i.

Therefore, any grid on the rectangle of the domain (its horizontal and vertical lines parallel to the a1 and a2 axes) is mapped by f onto a lattice with horizontal and vertical, parallel arcs. This proves that f is one-to-one whenever these arcs are not identical. The possible inconvenient phenomenon, when both equations (A4) and (A7) hold for some uu and vv pairs, is experienced at the dark parts of Figure A1 near the boundaries. However, f is injective in the neighborhood of the origin that does not contain any of the finitely many eigenvector coordinates (because there it has purely linear coordinate functions). This also depends on the underlying graph: if it shows symmetries, then its weighted Laplacian has multiple eigenvalues and/or multiple coordinates of the eigenvectors that may cause complications.

Figure A1 Images of the four possible orientations of the three-dimensional Fiedler-carpet of the graph in Figure A2.
Figure A1

Images of the four possible orientations of the three-dimensional Fiedler-carpet of the graph in Figure A2.

To prove that the above–below boundaries sandwich the f1 axis and the right–left boundaries sandwich the f2 axis, the following investigations are made. Because the investigations are of similar vein, only the first of them will be discussed in details. We distinguish between eight cases (denoted by an acronym), depending on, where half of boundary is considered. The estimates are supported by specific orientations of the unit norm eigenvectors. If u1 is oriented so that for the coordinates of x1=D1/2u1

i:x1i>0dix1i2<12

holds, then it is called positive orientation, whereas the opposite is negative. Likewise, the orientation of u2 is positive if for the coordinates of of x2=D1/2u2

(A8)i:x2i>0dix2i2<12

holds, otherwise it is negative.

  1. AL (Above boundary, left half), when a1>0: using the last but one line of equation (A2),

    f2(a1,C)=2iHdix2i(x1ia1)+1=2iH,x2i>0dix2i(x1ia1)+2iH,x2i0dix2i(x1ia1)+12iH,x2i0dix2i(x1ia1)+1.

    To prove that f2(a1,C)0, it suffices to prove that

    iH,x2i0dix2i(x1ia1)12,

    which is equivalent to

    iH,x2i0di(x2i)(x1ia1)12.

    We use the Cauchy-Schwarz inequality by keeping in mind that x1ia1>0 (iH) and because of a1>0, x1ia1<x1i. Therefore,

    iHx2i0(di(x2i))(di(x1ia1))2iHx2i0dix2i2iHx2i0dix1i21212

    holds true if u1 is positively and u2 is negatively oriented.

  2. AR (Above boundary, right half), when a10: using the last line of equation (A2), to prove that f2(a1,C)0 it suffices to prove that

    iH¯,x2i0dix2i(a1x1i)12,

    which is equivalent to

    iH¯,x2i0di(x2i)(a1x1i)12.

    We again use the Cauchy-Schwarz inequality by keeping in mind that a1x1i0 (iH¯) and a1x1i=xi1(a1)x1i as now a10 and x1i>a1. Therefore,

    iH¯,x2i0(di(x2i))(di(a1x1i))2iH¯,x2i0dix2i2iH¯,x2i0di(x1i)21212

    holds true with negatively orienting u1 and negatively u2.

  3. BL (Below boundary, left half), when a1>0: using the last but one line of equation (A3),

    f2(a1,D)=2iHdix2i(x1ia1)1=2iH,x2i<0dix2i(x1ia1)+2iH,x2i0dix2i(x1ia1)12iH,x2i0dix2i(x1ia1)1.

    To prove that f2(a1,D)0, it suffices to prove that

    iH,x2i0dix2i(x1ia1)12.

    We use the Cauchy-Schwarz inequality by keeping in mind that x1ia1>0 (iH) and because of a1>0, x1ia1<x1i. Therefore,

    iHx2i0(dix2i)(di(x1ia1))2iHx2i0dix2i2iHx2i0dix1i21212

    holds true if u1 is positively and u2 is positively oriented.

  4. BR (Below boundary, right half), when a10: using the last line of equation (A3), to prove that f2(a1,D)0, it suffices to prove that

    iH¯,x2i0dix2i(a1x1i)12.

    We again use the Cauchy-Schwarz inequality by keeping in mind that a1x1i0(iH¯) and a1x1i=xi1(a1)x1i as now a10 and x1i>a1. Therefore,

    iH¯,x2i0(dix2i)(di(a1x1i))2iH¯,x2i0dix2i2iH¯,x2i0di(x1i)21212

    holds true with negatively orienting u1 and positively u2.

  5. RB (Right boundary, below half), when a2>0: using the last but one line of equation (A5),

    f1(A,a2)2iF,x1i0dix1i(x2ia2)+1.

    To prove that f1(A,a2)0, it suffices to prove that

    iF,x1i0dix1i(x2ia2)12,

    which is equivalent to

    iF,x1i0di(x1i)(x2ia2)12.

    By the Cauchy-Schwarz inequality,

    iFx1i0(di(x1i))(di(x2ia2))2iFx1i0dix1i2iFx1i0dix2i21212

    holds true if u1 is negatively and u2 is positively oriented.

  6. RA (Right boundary, above half), when a20: using the last line of equation (A5), to prove that f1(A,a2)0, it suffices to prove that

    iF¯,x1i0dix1i(a2x2i)12,

    which is equivalent to

    iF¯,x1i0di(x1i)(a2x2i)12.

    By the Cauchy-Schwarz inequality,

    iF¯,x1i0(di(x1i))(di(a2x2i))2iF¯,x1i0dix1i2iF¯,x1i0di(x2i)21212

    holds true with negatively orienting u1 and negatively u2.

  7. LB (Left boundary, below half), when a2>0: using the last but one line of equation (A6),

    f1(B,a2)=2iFdix1i(x2ia2)12iF,x1i0dix1i(x2ia2)1.

    To prove that f1(B,a2)0 it suffices to prove that

    iF,x1i0dix1i(x2ia2)12,

    By the Cauchy-Schwarz inequality,

    iFx1i0(dix1i)(di(x2ia2))2iFx1i0dix1i2iFx1i0dix2i21212

    holds true if u1 is positively and u2 is positively oriented.

  8. LA (Left boundary, above half), when a20: using the last line of equation (A6), to prove that f1(B,a2)0 it suffices to prove that

    iF¯,x1i0dix1i(a2x2i)12.

    By the Cauchy-Schwarz inequality,

    iF¯,x1i0(dix1i)(di(a2x2i))2iF¯,x1i0dix1i2iF¯,x1i0di(x2i)21212

    holds true with positively orienting u1 and negatively u2.

Consequently, the convenient orientation of the AL scenario matches that of the LA one. Similarly, the AR-RA, BL-LB, and BR-RB scenarios can be realized with the same orientation of u1,u2. So the ranges under the four different orientations are connected regions (by the multivariate analogue of the Bolzano theorem) and all contain the “corners” (1,1),(1,1),(1,1),(1,1). Therefore, the union is also a connected region in R2. As it is bounded from above, from below, from the right, and from the left with curves that enclose the origin, it should contain the origin too. Moreover, in each orientation, there should be a quadrant that contains the origin.

Indeed, in the AL case, f2(a1,C) is greater than or less than 1, depending on whether the absolute value of iH,x2i>0dix2i(x1ia1) or that of iH,x2i0dix2i(x1ia1) is larger. By the Cauchy-Schwarz inequality, this happens with the positive orientation of u1 and either with the positive or negative orientation of u2. Simultaneously, in the BL case, f2(a1,D) greater than or less than 1, depending on whether the absolute value of iH,x2i>0dix2i(x1ia1) or that of iH,x2i0dix2i(x1ia1) is larger. So with the positive orientation of u1 and either with the positive or negative orientation of u2, f2(a1,C), and f2(a1,D) are between 2 and 2, they are parallel, in distance 2 from each other and sandwich the f1 axis. Moreover, depending on the orientation of u2, the f1 axis is either between the arcs f2(a1,C) and f2(a1,0) or between the arcs f2(a1,D) and f2(a1,0).

Likewise, the AR, BR estimates imply that with the negative orientation of u1 and either with the positive or negative orientation of u2, f2(a1,C), and f2(a1,D) are between 2 and 2, they are parallel, in distance 2 from each other and sandwich the f1 axis. However, changing the orientation of u1 just means that a1 instead of a1 can be the first coordinate of the root of f. Consequently, with the negative orientation of u1 and with the positive orientation of u2, the f2 axis is sandwiched by the f2(a1,D) and f2(a1,0) arcs, whereas the f1 axis is sandwiched by the f1(0,a2) and f1(A,a2) arcs. This complies with the vertical RB situation.

In summary, depending on the orientation of u1 and u2, the root is in a definite quadrant of the domain (with coordinate axes a1,a2), and its f-map is in a definite quadrant of the range (separated by curves with a1=0 or a2=0). For example, if u1 is negatively and u2 is positively oriented, then a root with a1<0 and a2>0 is expected in the intersection of the BR and RB regions; if both u1 and u2 are negatively oriented, then a root with a1<0 and a2<0 is expected in the intersection of the AR and RA regions. Moreover, the signs of the coordinates of the root are compatible with the orientations of the two eigenvectors.

More simply, the orientation of u1 can be chosen arbitrarily. If it is negative, then a1<0 can be expected for the sign of the first coordinate of the root; otherwise, with a1 another root is expected with the positive orientation of u1. The same holds vertically, depending on the orientation of u2. From equation (A1), it is obvious that counterorienting u1 and/or u2 will result in the same equations multiplied by 1, with a1 and/or a2 instead of a1 and/or a2 in the equations.

In Figure A1, we show the images of four different possible orientations of the Fiedler-carpet associated with the graph in Figure A2. For example, in the (+,+) orientation, we equidistantly subdivided the intervals [A,B] and [C,D] and then mapped the so-obtained grid over the [A,B]×[C,D] rectangle by the f function. The maps show the f-images of the grid points.

Figure A2 Graph with graph6 string HFRJIOY.
Figure A2

Graph with graph6 string HFRJIOY.

By equation (A1) and denoting the orientations in the superscripts, we have that

f1+(a1,a2)=f1++(a1,a2),f2+(a1,a2)=f2++(a1,a2),f1+(a1,a2)=f1++(a1,a2),f2+(a1,a2)=f2++(a1,a2),f1(a1,a2)=f1++(a1,a2),f2(a1,a2)=f2++(a1,a2).

Therefore, the (+) panel is the reflection of the (++) one through the f2 axis, the (+) panel is the reflection of the (++) one through the f1 axis, and the () panel is the reflection of the (++) one through the origin. Because of this symmetry, either all of them or none of them contain the origin, i.e., (0,0) of the f1,f2 plane. But we saw that in each orientation, the image of a definite quadrant sandwiches both the f1 and f2 axes, and the signs of the coordinates of the root are exactly the same as the orientations of the corresponding eigenvectors.

We can also plot the map of the three-dimensional Fiedler-carpet of this graph. In Figure A3, different viewpoints are shown. The blue, orange, and green lines are the coordinate axes in R3. As can be seen from the images, they intersect at the origin inside of the three-dimensional body. Every plot we have created shows this intersection lying inside the map of the three-dimensional Fiedler-carpet.

Figure A3 Image of the three-dimensional Fiedler-carpet, in two different viewpoints, of the graph in Figure A2.
Figure A3

Image of the three-dimensional Fiedler-carpet, in two different viewpoints, of the graph in Figure A2.

If k>2, then f:RkRk maps the k-dimensional hyperrectangle with vertices of jth coordinate minixji or maxixji into the k-dimensional region with vertices of jth coordinate ±1.

Along the one-dimensional faces of this k-dimensional range, all but one ai is fixed at its minimum/maximum. Without loss of generality, assume that A<a1<B. Akin to the k=2 case, we are able to show that for each A<a1<B: fj(a1,M1)0 and fj(a1,M˜1)0(j=2,,k), where M1=(minmx2m,,minmxkm) is the (k1)-tuple of the values of a2,,ak all fixed at their minimum and M˜1=(maxmx2m,,maxmxkm) is the (k1)-tuple of the values of a2,,ak all fixed at their maximum values, respectively. Indeed, for j=2,,k:

fj(a1,M1)=i=1ndixjix1ia1+l=2ki=1ndixji(xliminmxlm)=i=1ndixjix1ia1+1+0=2iHdixji(x1ia1)+1=2iH¯dixji(a1x1i)+1,

and as in the second double summation, only the term for l=j is 1, the others are zeros. Likewise,

fj(a1,M˜1)=i=1ndixjix1ia1+l=2ki=1ndixji(xlimaxmxlm)=i=1ndixjix1ia11+0=2iHdixji(x1ia1)1=2iH¯dixji(a1x1i)1.

as in the second double summation, only the term for l=j is 1, the others are zeros. Note that counter-orienting u1 just results in a1 instead of a1 in the root of f.

The corresponding “corners” are: f(A,M1)=(1,1,,1), f(B,M1)=(1,1,,1), f(A,M˜1)=(1,1,,1), and f(B,M˜1)=(1,1,,1). These points are in one hyperplane, in which the pieces of the parallel arcs fj(a1,M1) and fj(a1,M˜1) are in distance 2 from each other and sandwich the f1 axis (j=2,,k). Depending on the orientations of the eigenvectors, one is confined to an orthant, and this will favor the sandwiching of the fl axis, when another al moves along a one-dimensional face and the others are fixed at their minima/maxima, for l=2,,k. Therefore, the connected regions between these parallel curves sandwich the corresponding coordinate axes. Consequently, their intersection, which is subset of the whole connected region, contains the origin too. Note that if (a1,,aj,,ak) is a root of f in some orientation of the eigenvectors, then by counter orienting only uj, (a1,,aj,,ak) will be a root of the so-obtained f-function. Moreover, the signs of the coordinates of the root are compatible with the orientations of the eigenvectors (2k possibilities). Also, in any combination of the orientations, the root is in a definite orthant (one of the 2k possibilities) of the k-dimensional range (divided by surfaces, along which one of the ajs is zero).

Appendix B

Some pseudocodes follow. First for rectangular arrays of nonnegative entries.

Algorithm: finding regular biclustering of a contingency table
Input:m×n nondegenerate contingency table C and the number of clusters k.
1. Compute the row- and column-sums, drow,1,,drow,m and dcol,1,,dcol,n;
form the diagonal matrices Drow=diag(drow,1,,drow,m) and Dcol=diag(dcol,1,,dcol,n);
form the normalized contingency table CD=Drow1/2CDcol1/2.
2. Compute the k1 largest singular values (disregarding the trivial 1)
and the corresponding left and right singular vectors of CD: v1,,vk1 and u1,,uk1;
3. Find representatives r1,,rm of the rows as row vectors of the matrix (Drow1/2v1,,Drow1/2vk1);
Find representatives q1,,qn of the columns as row vectors of the matrix (Dcol1/2u1,,Dcol1/2uk1).
4. Cluster the points r1,,rm by the weighted k-means algorithm with weights drow,1,,drow,m into k clusters;
Cluster the points q1,,qn by the weighted k-means algorithm with weights dcol,1,,dcol,n into k clusters.
Output: Clusters R1,,Rk of the row-set {1,,m} and clusters C1,,Ck of the column-set {1,,n}.

The algorithm for edge-weighted graphs with more clusters than eigenvectors.

Algorithm: finding the minimum cut objective of an edge-weighted graph
Input:n×n edge-weight matrix W and the number k of the smallest separated normalized Laplacian eigenvalues of the edge-weighted graph.
1. Calculate the degree-vector d, degree-matrix D, and the normalized Laplacian matrix LD=ID1/2WD1/2.
2. Compute the k1 smallest positive eigenvalues (disregarding the trivial 0) and the corresponding eigenvectors u1,,uk1 of LD.
3. Find representatives r1,,rn of the vertices as row vectors of the matrix (D1/2u1,,D1/2uk1).
4. Cluster the points r1,,rn by the weighted k-means algorithm with the components of d as weights into 2k1 clusters.
Output: Clusters V1,,V2k1 of the vertices.

The weighted k-means algorithm follows. If the weights are the same (all equal to 1), then this is the usual k-means algorithm.

Algorithm: weighted k-means clustering
Input: finite dimensional points r1,,rn with weights d1,,dn and the number of clusters k.
1. Initialize: V1(0),,Vk(0), the clusters of {1,,n}.
2. Iterate: fort=1,2,
a. calculate the cluster centers ca(t)=1jVa(t1)djjVa(t1)djrj (a=1,,k);
b. relocate the points: rj is assigned to cluster Va(t) for which rjca(t) is minimum,
until convergence.
Output: Clusters V1,,Vk of {1,,n}.

Appendix C

Pairwise plots of the correspondence analysis results based on the first three coordinate axes (coordinates of the left singular vectors for immigration and right singular vectors for emigration data) are shown in Figure A4(a) and (b) for 2 years. The cluster memberships obtained in Section 4 are illustrated with different colors.

Figure A4 Pairwise plots of the correspondence analysis results based on the first three singular vector pairs, enhanced with the cluster memberships, illustrated by different colors. (a) 2015 and (b) 2019.
Figure A4

Pairwise plots of the correspondence analysis results based on the first three singular vector pairs, enhanced with the cluster memberships, illustrated by different colors. (a) 2015 and (b) 2019.

References

[1] A. J. Hoffman, Eigenvalues and partitionings of the edges of a graph, Linear Algebra Appl. 5 (1972), 137–146. 10.1142/9789812796936_0042Search in Google Scholar

[2] M. Fiedler, Algebraic connectivity of graphs, Czech. Math. J. 23 (1973), 298–305. 10.21136/CMJ.1973.101168Search in Google Scholar

[3] D. M. Cvetković, M. Doob, and H. Sachs, Spectra of Graphs, Academic Press, New York, 1979. Search in Google Scholar

[4] F. Chung, Spectral graph theory, In: CBMS Regional Conference Series in Mathematics, vol. 92, American Mathematical Society, Providence RI, 1997. 10.1090/cbms/092Search in Google Scholar

[5] C. J. Alpert and S.-Z. Yao, Spectral partitioning: the more eigenvectors, the better, In: B. T. Preas, P. G. Karger, B. S. Nobandegani, M. Pedram, Eds., Proceedings on 32nd ACM/IEEE International Conference on Design Automation, 1995, Association of Computer Machinery, New York, pp. 195–200. 10.1145/217474.217529Search in Google Scholar

[6] B. Mohar, Isoperimetric inequalities, growth and the spectrum of graphs, Linear Algebra Appl. 103 (1988), 119–131. 10.1016/0024-3795(88)90224-8Search in Google Scholar

[7] S. Hoory, N. Linial, and A. Widgerson, Expander graphs and their applications, Bull. Amer. Math. Soc. (N. S.) 43 (2006), 439–561. 10.1090/S0273-0979-06-01126-8Search in Google Scholar

[8] M. Bolla and G. Tusnády, Spectra and optimal partitions of weighted graphs, Discret. Math. 128 (1994), 1–20. 10.1016/0012-365X(94)90100-7Search in Google Scholar

[9] M. Bolla, K. Friedl, and A. Krámli, Singular value decomposition of large random matrices (for two-way classification of microarrays), J. Multivariate Anal. 101 (2010), 434–446. 10.1007/978-3-7908-2062-1_11Search in Google Scholar

[10] M. Bolla, Relating multiway discrepancy and singular values of nonnegative rectangular matrices, Discrete Appl. Math. 203 (2016), 26–34. 10.1016/j.dam.2015.09.013Search in Google Scholar

[11] B. Bollobás and V. Nikiforov, Hermitian matrices and graphs: singular values and discrepancy, Discrete Math. 285 (2004), 17–32. 10.1016/j.disc.2004.05.006Search in Google Scholar

[12] Y. Kluger, R. Basri, J. T. Chang, and M. Gerstein, Spectral biclustering of microarray data: coclustering genes and conditions, Genome Res. 13 (2003), 703–716. 10.1101/gr.648603Search in Google Scholar PubMed PubMed Central

[13] M. Bolla, Spectral Clustering and Biclusterig, Wiley, Chichester, UK, 2013. 10.1002/9781118650684Search in Google Scholar

[14] F. Chung and R. Graham, Quasi-random graphs with given degree sequences, Random Struct. Algorithms 12 (2008), 1–19. 10.1002/rsa.20188Search in Google Scholar

[15] M. E. J. Newman, Networks, An Introduction, Oxford University Press, Oxford, United Kingdom, 2010. 10.1093/acprof:oso/9780199206650.001.0001Search in Google Scholar

[16] E. Szemerédi, Regular partitions of graphs, In: J-C. Bermond, J-C. Fournier, M. Las Vergnas, D. Sotteau, Eds., Colloque Inter. CNRS. No. 260, Problémes Combinatoires et Théorie Graphes, CNRS, Paris, 1976, pp. 399–401. Search in Google Scholar

[17] M. Bolla, Generalized quasirandom properties of expanding graph sequences, Nonlinearity 33 (2020), no. 4, 1405–1424. 10.1088/1361-6544/ab60d2Search in Google Scholar

[18] N. Alon, A. Coja-Oghlan, H. Han, M. Kang, V. Rödl, and M. Schacht, Quasi-randomness and algorithmic regularity for graphs with general degree distributions, Siam J. Comput. 39 (2010), no. 6, 2336–2362. 10.1007/978-3-540-73420-8_68Search in Google Scholar

[19] R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy, The effectiveness of Lloyd-type methods for the k-means problem, J. ACM 59 (2012), no. 6, Article 28. 10.1109/FOCS.2006.75Search in Google Scholar

[20] U. VonLuxburg, A tutorial on spectral clustering, Stat. Comput. 17 (2006), 395–416. 10.1007/s11222-007-9033-zSearch in Google Scholar

[21] A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: analysis and an algorithm, In: T. G. Dietterich, S. Becker, Z. Ghahramani, Eds., Proceedings on 14th Neural Information Processing Systems Conference (NIPS 2001), MIT Press, Cambridge, USA, 2010. Search in Google Scholar

[22] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000), 888–905. 10.1109/34.868688Search in Google Scholar

[23] K. Faust, Using correspondence analysis for joint displays of affiliation networks, In: P. J. Carrington, J. Scott, Eds, Models and Methods in Social Network Analysis, Vol. 7, Cambridge University Press, Cambridge, 2005, pp. 117–147. 10.1017/CBO9780511811395.007Search in Google Scholar

[24] M. Bolla, Penalized versions of the Newman-Girvan modularity and their relation to multi-way cuts and k-means clustering, Phys. Rev. E 84 (2011), 016108.10.1103/PhysRevE.84.016108Search in Google Scholar PubMed

Received: 2021-11-29
Revised: 2022-02-21
Accepted: 2022-04-13
Published Online: 2022-06-10

© 2022 Marianna Bolla et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 27.3.2023 from https://www.degruyter.com/document/doi/10.1515/spma-2022-0167/html
Scroll Up Arrow