Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Oldenbourg March 7, 2020

Large-scale graph generation: Recent results of the SPP 1736 – Part II

Ulrich Meyer and Manuel Penschuck

Abstract

The selection of input data is a crucial step in virtually every empirical study. Experimental campaigns in algorithm engineering, experimental algorithmics, network analysis, and many other fields often require suited network data. In this context, synthetic graphs play an important role, as data sets of observed networks are typically scarce, biased, not sufficiently understood, and may pose logistic and legal challenges. Just like processing huge graphs becomes challenging in the big data setting, new algorithmic approaches are necessary to generate such massive instances efficiently. Here, we update our previous survey [35] on results for large-scale graph generation obtained within the DFG priority programme SPP 1736 (Algorithms for Big Data); to this end, we broaden the scope and include recently published results.

ACM CCS:

Article note

Parts of this work were previously published as [35]. Here, we include additional details and new results obtained since the publication of [35].


Funding statement: This work was partially supported by the DFG under grant ME 2088/4.

Literature

1. Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9), 1988.10.1145/48529.48535Search in Google Scholar

2. M. Alam, M. Khan, and M. Marathe. Distributed-memory parallel algorithms for generating massive scale-free networks using preferential attachment model. In P. of the Int. Conf. on High Performance Computing, Networking, Storage and Analysis, page 91. ACM, 2013.10.1145/2503210.2503291Search in Google Scholar

3. Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002.10.1103/RevModPhys.74.47Search in Google Scholar

4. Eugenio Angriman, Alexander van der Grinten, Moritz von Looz, Henning Meyerhenke, Martin Nöllenburg, Maria Predari, and Charilaos Tzovas. Guidelines for experimental algorithmics: A case study in network analysis. Algorithms, 12(7):127, 2019. doi:10.3390/a12070127.Search in Google Scholar

5. Lars Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1–24, 2003. doi:10.1007/s00453-003-1021-x.Search in Google Scholar

6. Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, Oct. 1999. doi:10.1126/science.286.5439.509.Search in Google Scholar

7. Vladimir Batagelj and Ulrik Brandes. Efficient generation of large random networks. Phys. Rev. E, 71:036113, Mar. 2005. doi:10.1103/PhysRevE.71.036113.Search in Google Scholar

8. Edward Bender and Rodney Canfield. The asymptotic number of labeled graphs with given degree sequences. J. Comb. Theory, Ser. A, 24(3):296–307, 1978.10.1016/0097-3165(78)90059-6Search in Google Scholar

9. Thomas Bläsius, Tobias Friedrich, Maximilian Katzmann, Ulrich Meyer, Manuel Penschuck, and Christopher Weyand. Efficiently generating geometric inhomogeneous and hyperbolic random graphs. In ESA 2019, pages 21:1–21:14, 2019. doi:10.4230/LIPIcs.ESA.2019.21.Search in Google Scholar

10. Bela Bollobás. Random graphs. Academic Press, 1985.Search in Google Scholar

11. Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan. Directed scale-free graphs. In ACM-SIAM symposium on Discrete algorithms, pages 132–139, 2003.Search in Google Scholar

12. Karl Bringmann, Ralph Keusch, and Johannes Lengler. Geometric inhomogeneous random graphs. Theor. Comput. Sci., 760:35–54, 2019. doi:10.1016/j.tcs.2018.08.014.Search in Google Scholar

13. Corrie J. Carstens. Topology of Complex Networks: Models and Analysis. PhD thesis, RMIT University, 2016.Search in Google Scholar

14. Corrie J. Carstens, Annabell Berger, and Giovanni Strona. Curveball: a new generation of sampling algorithms for graphs with fixed degree sequence. CoRR, 2016. arXiv:1609.05137.Search in Google Scholar

15. Corrie J. Carstens, Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, and Dorothea Wagner. Parallel and I/O-efficient randomisation of massive networks using global curveball trades. In ESA 2018, 2018. doi:10.4230/LIPIcs.ESA.2018.11.Search in Google Scholar

16. Kyrylo Chykhradze, Anton Korshunov, Nazar Buzun, Roman Pastukhov, Nikolay Kuzyurin, Denis Turdakov, and Hangkyu Kim. Distributed Generation of Billion-node Social Graphs with Overlapping Community Structure. In CompleNet 2014, 2014. doi:10.1007/978-3-319-05401-8_19.10.1007/978-3-319-05401-8_19Search in Google Scholar

17. DFG, German Research Foundation. Priority Programmes. URL: http://www.dfg.de/en/research_funding/programmes/coordinated_programmes/priority_programmes/index.html.Search in Google Scholar

18. Sergey Dorogovtsev, Jos’e F. F. Mendes, and A. N. Samukhin. Anomalous percolation properties of growing networks. Phys. Rev. E, 64:066110, Nov. 2001.10.1103/PhysRevE.64.066110Search in Google Scholar PubMed

19. Daniel Funke, Sebastian Lamm, Ulrich Meyer, Manuel Penschuck, Peter Sanders, Christian Schulz, Darren Strash, and Moritz von Looz. Communication-free massively distributed graph generation. J. Parallel Distrib. Comput., 131:200–217, 2019. doi:10.1016/j.jpdc.2019.03.011.Search in Google Scholar

20. Daniel Funke, Sebastian Lamm, Peter Sanders, Christian Schulz, Darren Strash, and Moritz von Looz. Communication-free massively distributed graph generation. In IPDPS 2018, 2018.10.1109/IPDPS.2018.00043Search in Google Scholar

21. Minos N. Garofalakis, Johannes Gehrke, and Rajeev Rastogi, editors. Data Stream Management – Processing High-Speed Data Streams. Data-Centric Systems and Applications. Springer, 2016. doi:10.1007/978-3-540-28608-0.Search in Google Scholar

22. Nicholas J. Gotelli and Gary R. Graves. Null models in ecology. Smithsonian Institution, 1996.Search in Google Scholar

23. Luca Gugelmann, Konstantinos Panagiotou, and Ueli Peter. Random hyperbolic graphs: Degree sequence and clustering – (extended abstract). In ICALP 2012, pages 573–585, 2012. doi:10.1007/978-3-642-31585-5_51.Search in Google Scholar

24. Seifollah L. Hakimi. On realizability of a set of integers as degrees of the vertices of a linear graph. I. Journal of the Society for Industrial and Applied Mathematics, 10(3):496–506, 1962. doi:10.1137/0110037.Search in Google Scholar

25. Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, and Dorothea Wagner. I/O-efficient generation of massive graphs following the LFR benchmark. ACM Journal of Experimental Algorithmics, 23, 2018. doi:10.1145/3230743.Search in Google Scholar

26. Michael Hamann, Ulrich Meyer, Manuel Penschuck, and Dorothea Wagner. I/O-efficient generation of massive graphs following the LFR benchmark. In ALENEX 2017, pages 58–72, 2017. doi:10.1137/1.9781611974768.5.Search in Google Scholar

27. Václav Havel. Poznámka o existenci konečných grafů. Časopis pro pěstování matematiky, 080(4):477–480, 1955. URL: http://eudml.org/doc/19050.10.21136/CPM.1955.108220Search in Google Scholar

28. Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. The web as a graph: Measurements, models, and methods. In COCOON ’99, pages 1–17, 1999. doi:10.1007/3-540-48686-0_1.Search in Google Scholar

29. Pavel Krapivsky, Geoff Rodgers, and Sidney Redner. Degree distributions of growing networks. Physical Review Letters, 86(23):5401, 2001.10.1103/PhysRevLett.86.5401Search in Google Scholar PubMed

30. Dmitri V. Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguñá. Hyperbolic geometry of complex networks. Phys. Rev. E, 82:036106, Sep. 2010. doi:10.1103/PhysRevE.82.036106.Search in Google Scholar PubMed

31. Andrea Lancichinetti and Santo Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E, 80:016118, Jul. 2009. doi:10.1103/PhysRevE.80.016118.Search in Google Scholar PubMed

32. Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. Benchmark graphs for testing community detection algorithms. Phys. Rev. E, 78:046110, Oct. 2008. doi:10.1103/PhysRevE.78.046110.Search in Google Scholar PubMed

33. Anil Maheshwari and Norbert Zeh. A survey of techniques for designing I/O-efficient algorithms. In Algorithms for Memory Hierarchies, pages 36–61, 2003. doi:10.1007/3-540-36574-5_3.Search in Google Scholar

34. Ulrich Meyer and Manuel Penschuck. Generating massive scale-free networks under resource constraints. In ALENEX 2016, pages 39–52, 2016. doi:10.1137/1.9781611974317.4.Search in Google Scholar

35. Ulrich Meyer and Manuel Penschuck. Large-scale graph generation and big data: An overview on recent results. Bulletin of the EATCS, 122, 2017. URL: http://eatcs.org/beatcs/index.php/beatcs/article/view/494.Search in Google Scholar

36. Ulrich Meyer, Peter Sanders, and Jop F. Sibeyn, editors. Algorithms for Memory Hierarchies, Advanced Lectures, volume 2625 of LNCS. Springer, 2003.10.1007/3-540-36574-5Search in Google Scholar

37. Ron Milo, Shai S. Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824–827, 2002. doi:10.1126/science.298.5594.824.Search in Google Scholar PubMed

38. Manuel Penschuck. Generating practical random hyperbolic graphs in near-linear time and with sub-linear memory. In 16th Int. Symposium on Experimental Algorithms, SEA 2017, 2017.Search in Google Scholar

39. Derek De Solla Price. A general theory of bibliometric and other cumulative advantage processes. JASIS, 27(5):292–306, 1976. doi:10.1002/asi.4630270505.Search in Google Scholar

40. Peter Sanders and Christian Schulz. Scalable generation of scale-free graphs. Inf. Process. Lett., 116(7):489–491, 2016. doi:10.1016/j.ipl.2016.02.004.Search in Google Scholar

41. Wolfgang E. Schlauch, Emőke Ágnes Horvát, and Katharina A. Zweig. Different flavors of randomness: comparing random graph models with fixed degree sequences. Social Network Analysis and Mining, 5(1):1–14, 2015. doi:10.1007/s13278-015-0267-z.Search in Google Scholar

42. Christian Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. Networkit: A tool suite for large-scale complex network analysis. Network Science, 4(4):508–530, 2016. doi:10.1017/nws.2016.20.Search in Google Scholar

43. Christian L. Staudt, Michael Hamann, Ilya Safro, Alexander Gutfraind, and Henning Meyerhenke. Generating Scaled Replicas of Real-World Complex Networks, pages 17–28. Springer International Publishing, Cham, 2017. doi:10.1007/978-3-319-50901-3_2.Search in Google Scholar

44. Remco Van Der Hofstad. Random graphs and complex networks. Available on http://www.win.tue.nl/rhofstad/NotesRGCN.pdf, page 11, 2009.Search in Google Scholar

45. Moritz von Looz and Henning Meyerhenke. Querying probabilistic neighborhoods in spatial data sets efficiently. In IWOCA 2016, pages 449–460, 2016. doi:10.1007/978-3-319-44543-4_35.Search in Google Scholar

46. Moritz von Looz, Henning Meyerhenke, and Roman Prutkin. Generating random hyperbolic graphs in subquadratic time. In ISAAC 2015, pages 467–478, 2015. doi:10.1007/978-3-662-48971-0_40.Search in Google Scholar

47. Moritz von Looz, Mustafa Safa Özdayi, Sören Laue, and Henning Meyerhenke. Generating massive complex networks with hyperbolic geometry faster in practice. In HPEC 2016, pages 1–6, 2016. doi:10.1109/HPEC.2016.7761644.Search in Google Scholar

Received: 2019-10-29
Revised: 2020-02-13
Accepted: 2020-02-20
Published Online: 2020-03-07
Published in Print: 2020-05-27

© 2020 Walter de Gruyter GmbH, Berlin/Boston