A simple spectral algorithm for recovering planted partitions

Sam Cole 1 , Shmuel Friedland 1 ,  and Lev Reyzin 1
• 1 Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, , Chicago, USA

Abstract

In this paper, we consider the planted partition model, in which n = ks vertices of a random graph are partitioned into k “clusters,” each of size s. Edges between vertices in the same cluster and different clusters are included with constant probability p and q, respectively (where 0 ≤ q < p ≤ 1). We give an efficient algorithm that, with high probability, recovers the clusters as long as the cluster sizes are are least (√n). Informally, our algorithm constructs the projection operator onto the dominant k-dimensional eigenspace of the graph’s adjacency matrix and uses it to recover one cluster at a time. To our knowledge, our algorithm is the first purely spectral algorithm which runs in polynomial time and works even when s = Θ (√n), though there have been several non-spectral algorithms which accomplish this. Our algorithm is also among the simplest of these spectral algorithms, and its proof of correctness illustrates the usefulness of the Cauchy integral formula in this domain.

If the inline PDF is not rendering correctly, you can download the PDF file here.

• [1] Nir Ailon, Yudong Chen, and Huan Xu. Breaking the small cluster barrier of graph clustering. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 995-1003, 2013.

• [2] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. Random Struct. Algorithms, 13(3-4):457-466, 1998.

• [3] Noga Alon, Michael Krivelevich, and Van H. Vu. On the concentration of eigenvalues of random symmetric matrices. Israel Journal of Mathematics, 131(1):259-267, 2002.

• [4] Brendan P. W. Ames. Guaranteed clustering and biclustering via semidefinite programming. Mathematical Programming, 147(1-2):429-465, 2014.

• [5] Brendan P.W. Ames and Stephen A. Vavasis. Nuclear norm minimization for the planted clique and biclique problems. Math. Program., 129(1):69-89, 2011.

• [6] Afonso S. Bandeira and Ramon van Handel. Sharp nonasymptotic bounds on the norm of randommatrices with independent entries. Ann. Probab., 44(4):2479-2506, 07 2016.

• [7] Béla Bollobás and Paul Erdos. Cliques in random graphs. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 80, pages 419-427. Cambridge Univ Press, 1976.

• [8] Yudong Chen, S. Sanghavi, and Huan Xu. Improved graph clustering. Information Theory, IEEE Transactions on, 60(10):6440-6455, October 2014.

• [9] Yudong Chen and Jiaming Xu. Statistical-computational phase transitions in planted models: The high-dimensional setting. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 244-252, 2014.

• [10] Amin Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability and Computing, 19(02):227-284, 2010.

• [11] Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251 - 280, 1990.

• [12] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear time with high probability. In Proceedings of ANALCO, pages 67-75, 2011.

• [13] Paul Erdos and Alfréd Rényi. On random graphs I. Publicationes Mathematicae (Debrecen), 6:290-297, 1959 1959.

• [14] Uriel Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Struct. Algorithms, 16(2):195-208, 2000.

• [15] Uriel Feige and Dorit Ron. Finding hidden cliques in linear time. In Proceedings of AofA, pages 189-204, 2010.

• [16] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. In Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013, pages 655-664, 2013.

• [17] Shmuel Friedland. Matrices. World Scientific, 2015.

• [18] Zoltán Füredi and János Komlós. The eigenvalues of random symmetric matrices. Combinatorica, 1(3):233-241, 1981.

• [19] Joachim Giesen and Dieter Mitsche. Reconstructing many partitions using spectral techniques. In Proceedings of the 15th International Symposium on Fundamentals of Computation Theory, 2005.

• [20] Gene H. Golub and Charles F. Van Loan. Matrix Computations (3rd Ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996.

• [21] Ming Gu. Subspace iteration randomization and singular value problems. SIAM Journal on Scientific Computing, 37(3):A1139-A1173, 2015.

• [22] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217-288, May 2011.

• [23] Mark Jerrum. Large cliques elude the metropolis process. Random Struct. Algorithms, 3(4):347-360, 1992.

• [24] Richard M. Karp. Probabilistic analysis of graph-theoretic algorithms. In Proceedings of Computer Science and Statistics 12th Annual Symposium on the Interface, page 173, 1979.

• [25] N. Kishore Kumar and J. Schneider. Literature survey on low rank approximation of matrices. Linear and Multilinear Algebra, pages 1-33, 2016.

• [26] Ludek Kucera. Expected complexity of graph partitioning problems. Discrete Applied Mathematics, 57(2-3):193-212, 1995.

• [27] François Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC ’14, pages 296-303, New York, NY, USA, 2014. ACM.

• [28] Frank McSherry. Spectral partitioning of random graphs. In FOCS, pages 529-537, 2001.

• [29] Nam H. Nguyen, Thong T. Do, and Trac D. Tran. A fast and efficient algorithm for low-rank approximation of a matrix. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 215-224. ACM, 2009.

• [30] Samet Oymak and Babak Hassibi. Finding dense clusters via “low rank + sparse” decomposition. arXiv preprint arXiv:1104.5186, 2011.

• [31] G.W. Stewart. Introduction to matrix computations. Computer science and applied mathematics. Academic Press, 1973.

• [32] Van Vu. Spectral norm of random matrices. Combinatorica, 27(6):721-736, 2007.

• [33] Van Vu. A simple SVD algorithm for finding hidden partitions. arXiv preprint arXiv:1404.3918, 2014.

OPEN ACCESS

Special Matrices

Special Matrices is a peer-reviewed, open access electronic journal that publishes original articles of wide significance and originality in all areas of research involving structured matrices present in various branches of pure and applied mathematics and their noteworthy applications in physics, engineering, and other sciences.