Accessible Unlicensed Requires Authentication Published by De Gruyter February 5, 2020

Fast Bilinear Algorithms for Symmetric Tensor Contractions

Edgar Solomonik ORCID logo and James Demmel

Abstract

In matrix-vector multiplication, matrix symmetry does not permit a straightforward reduction in computational cost. More generally, in contractions of symmetric tensors, the symmetries are not preserved in the usual algebraic form of contraction algorithms. We introduce an algorithm that reduces the bilinear complexity (number of computed elementwise products) for most types of symmetric tensor contractions. In particular, it lowers the bilinear complexity of symmetrized contractions of symmetric tensors of order s + v and v + t by a factor of ( s + t + v ) ! s ! t ! v ! to leading order. The algorithm computes a symmetric tensor of bilinear products, then subtracts unwanted parts of its partial sums. Special cases of this algorithm provide improvements to the bilinear complexity of the multiplication of a symmetric matrix and a vector, the symmetrized vector outer product, and the symmetrized product of symmetric matrices. While the algorithm requires more additions for each elementwise product, the total number of operations is in some cases less than classical algorithms, for tensors of any size. We provide a round-off error analysis of the algorithm and demonstrate that the error is not too large in practice. Finally, we provide an optimized implementation for one variant of the symmetry-preserving algorithm, which achieves speedups of up to 4.58 × for a particular tensor contraction, relative to a classical approach that casts the problem as a matrix-matrix multiplication.

Funding source: National Science Foundation

Award Identifier / Grant number: ACI-1548562

Funding statement: This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. Via XSEDE, the authors made use of the TACC Stampede2 supercomputer.

Acknowledgements

We would like to thank Devin Matthews, Toru Shiozaki, Hung Woei Neoh, and anonymous reviewers for helpful comments that served to improve this paper.

References

[1] A. A. Albert, On Jordan algebras of linear transformations, Trans. Amer. Math. Soc. 59 (1946), 524–555. Search in Google Scholar

[2] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen, LAPACK Users’ Guide, SIAM, Philadelphia, 1992. Search in Google Scholar

[3] G. Ballard, J. Demmel, O. Holtz, B. Lipshitz and O. Schwartz, Communication-optimal parallel algorithm for Strassen’s matrix multiplication, Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures—SPAA ’12, ACM, New York (2012), 193–204. Search in Google Scholar

[4] D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Comput. 9 (1990), no. 3, 251–280. Search in Google Scholar

[5] E. Deumens, V. F. Lotrich, A. Perera, M. J. Ponton, B. A. Sanders and R. J. Bartlett, Software design of ACES III with the super instruction architecture, WIREs Comput. Molecular Sci. 1 (2011), no. 6, 895–901. Search in Google Scholar

[6] E. Epifanovsky, M. Wormit, T. Kuś, A. Landau, D. Zuev, K. Khistyaev, P. Manohar, I. Kaliman, A. Dreuw and A. I. Krylov, New implementation of high-level correlated methods using a general block-tensor library for high-performance electronic structure calculations, J. Comput. Chem. (2013), 10.1002/jcc.23377. 10.1002/jcc.23377Search in Google Scholar

[7] A. Grüneis, G. H. Booth, M. Marsman, J. Spencer, A. Alavi and G. Kresse, Natural orbitals for wave function based correlated calculations using a plane wave basis set, J. Chem. Theory Comput. 7 (2011), no. 9, 2780–2785. Search in Google Scholar

[8] W. Hackbusch, A sparse matrix arithmetic based on -matrices. I. Introduction to -matrices, Computing 62 (1999), no. 2, 89–108. Search in Google Scholar

[9] M. Hanrath and A. Engels-Putzka, An efficient matrix-matrix multiplication based antisymmetric tensor contraction engine for general order coupled cluster, J. Chem. Phys. 133 (2010), no. 6, Article ID 064108. Search in Google Scholar

[10] M. Head-Gordon, J. A. Pople and M. J. Frisch, MP2 energy evaluation by direct methods, Chem. Phys. Lett. 153 (1988), no. 6, 503–506. Search in Google Scholar

[11] S. Hirata, Tensor Contraction Engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories, J. Phys. Chem. A 107 (2003), no. 46, 9887–9897. Search in Google Scholar

[12] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, Stud. Appl. Math. 6 (1927), no. 1–4, 164–189. Search in Google Scholar

[13] J. Huang, D. A. Matthews and R. A. van de Geijn, Strassen’s algorithm for tensor contraction, SIAM J. Sci. Comput. 40 (2018), no. 3, C305–C326. Search in Google Scholar

[14] M. Kállay and P. R. Surján, Higher excitations in coupled-cluster theory, J. Chem. Phys. 115 (2001), no. 7, Article ID 2945. Search in Google Scholar

[15] V. Khoromskaia and B. N. Khoromskij, Tensor Numerical Methods in Quantum Chemistry, De Gruyter, Berlin, 2018. Search in Google Scholar

[16] B. N. Khoromskij, Tensor Numerical Methods in Scientific Computing, adon Ser. Comput. Appl. Math. 19, De Gruyter, Berlin, 2018. Search in Google Scholar

[17] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500. Search in Google Scholar

[18] C. L. Lawson, R. J. Hanson, D. R. Kincaid and F. T. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Software (TOMS), 5 (1979), no. 3, 308–323. Search in Google Scholar

[19] V. Lotrich, N. Flocke, M. Ponton, B. A. Sanders, E. Deumens, R. J. Bartlett and A. Perera, An infrastructure for scalable and portable parallel programs for computational chemistry, Proceedings of the 23rd International Conference on Supercomputing—ICS ’09, ACM, New York (2009), 523–524. Search in Google Scholar

[20] D. A. Matthews and J. F. Stanton, Aquarius: Scalability and extensibility by design, Abstracts Papers Amer. Chem. Soc. 248 (2014). Search in Google Scholar

[21] J. Noga and P. Valiron, Improved algorithm for triple-excitation contributions within the coupled cluster approach, Molecular Phys. 103 (2005), no. 15–16, 2123–2130. Search in Google Scholar

[22] R. Orús, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Ann. Physics 349 (2014), 117–158. Search in Google Scholar

[23] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (2011), no. 5, 2295–2317. Search in Google Scholar

[24] V. Pan, How can we speed up matrix multiplication?, SIAM Rev. 26 (1984), no. 3, 393–415. Search in Google Scholar

[25] S. Rajbhandari, A. Nikam, P.-W. Lai, K. Stock, S. Krishnamoorthy and P. Sadayappan, Framework for distributed contractions of tensors with symmetry, preprint (2013), Ohio State University. Search in Google Scholar

[26] M. D. Schatz, T. M. Low, R. A. van de Geijn and T. G. Kolda, Exploiting symmetry in tensors for high performance: multiplication with symmetric tensors, SIAM J. Sci. Comput. 36 (2014), no. 5, C453–C479. Search in Google Scholar

[27] Y. Shao, Advances in methods and algorithms in a modern quantum chemistry program package, Phys. Chem. Chem. Phys. 8 (2006), no. 27, 3172–3191. Search in Google Scholar

[28] E. Solomonik, Provably Efficient Algorithms for Numerical Tensor Algebra, PhD thesis, University of California, Berkeley, 2014. Search in Google Scholar

[29] E. Solomonik and J. Demmel, Contracting symmetric tensors using fewer multiplications, Technical report, ETH Zürich, 2015. Search in Google Scholar

[30] E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton and J. Demmel, A massively parallel tensor contraction framework for coupled-cluster computations, J. Parallel Distributed Comput. 74 (2014), no. 12, 3176–3190. Search in Google Scholar

[31] V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969), 354–356. Search in Google Scholar

[32] V. Strassen, Rank and optimal computation of generic tensors, Linear Algebra Appl. 52/53 (1983), 645–685. Search in Google Scholar

[33] L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966), 279–311. Search in Google Scholar

[34] M. Valiev, E. J. Bylaska, N. Govind, K. Kowalski, T. P. Straatsma, H. J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T. Windus and W. A. de Jong, NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations, Comput. Phys. Commun. 181 (2010), no. 9, 1477–1489. Search in Google Scholar

[35] V. V. Williams, Multiplying matrices faster than Coppersmith–Winograd, Proceedings of the 2012 ACM Symposium on Theory of Computing—STOC’12, ACM, New York (2012), 887–898. Search in Google Scholar

[36] J. Xia, S. Chandrasekaran, M. Gu and X. S. Li, Fast algorithms for hierarchically semiseparable matrices, Numer. Linear Algebra Appl. 17 (2010), no. 6, 953–976. Search in Google Scholar

[37] K. Ye and L.-H. Lim, Algorithms for structured matrix-vector product of optimal bilinear complexity, 2016 IEEE Information Theory Workshop (ITW), IEEE Press, Piscataway (2016), 310–314. Search in Google Scholar

[38] K. Ye and L.-H. Lim, Fast structured matrix computations: tensor rank and Cohn–Umans method, Found. Comput. Math. 18 (2018), no. 1, 45–95. Search in Google Scholar

Received: 2019-04-28
Revised: 2019-10-02
Accepted: 2020-01-09
Published Online: 2020-02-05
Published in Print: 2021-01-01

© 2021 Walter de Gruyter GmbH, Berlin/Boston