Abstract
In this paper, we propose a hybrid parallel programming approach for a numerical solution of a two-dimensional acoustic wave equation using an implicit difference scheme for a single computer. The calculations are carried out in an implicit finite difference scheme. First, we transform the differential equation into an implicit finite-difference equation and then using the alternating direction implicit (ADI) method, we split the equation into two sub-equations. Using the cyclic reduction algorithm, we calculate an approximate solution. Finally, we change this algorithm to parallelize on graphics processing unit (GPU), GPU + Open Multi-Processing (OpenMP), and Hybrid (GPU + OpenMP + message passing interface (MPI)) computing platforms. The special focus is on improving the performance of the parallel algorithms to calculate the acceleration based on the execution time. We show that the code that runs on the hybrid approach gives the expected results by comparing our results to those obtained by running the same simulation on a classical processor core, Compute Unified Device Architecture (CUDA), and CUDA + OpenMP implementations.
Funding source: FWO Odysseus project
Funding source: Ministry of Education and Science of the Republic of Kazakhstan
Funding source: EPSRC
Award Identifier / Grant number: EP/R003025/1
Funding source: Leverhulme Research
Award Identifier / Grant number: RPG-2017-151
Funding source: MESRK
Award Identifier / Grant number: AP08052028
Award Identifier / Grant number: AP08053051
Award Identifier / Grant number: AP05130994
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: The authors were supported by FWO Odysseus project. MR was supported in parts by the EPSRC Grant EP/R003025/1, by the Leverhulme Research Grant RPG-2017-151. AA was supported by the MESRK Grants AP08052028 and AP08053051 of the Committee of Science, Ministry of Education and Science of the Republic of Kazakhstan.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
[1] D. W. Peaceman and H. H. Rachford, “The Numerical Solution of Parabolic and Elliptic Differential Equations,” J. Soc. Ind. Appl. Math., vol. 3, no. 1, 1955, issn: 03684245. url: https://www.jstor.org/stable/2098834.10.1137/0103003Search in Google Scholar
[2] N. Bell and M. Garland, Efficient Sparse Matrix-vector Multiplication on CUDA, NVIDIA Technical Report, 2008, https://www.nvidia.com/docs/IO/66889/nvr-2008-004.pdf.Search in Google Scholar
[3] E. Elsen, P. LeGresley, and E. Darve, “Large calculation of the flow over a hypersonic vehicle using a GPU,” J. Comput. Phys., vol. 227, pp. 10148–10161, 2008, https://doi.org/10.1016/j.jcp.2008.08.023.Search in Google Scholar
[4] Y. Zhang, J. Cohen, and J. Owens, “Fast tridiagonal solvers on the GPU,” ACM Sigplan Not., vol. 45, no. 5, pp. 127–136, 2010, https://doi.org/10.1145/1837853.1693472.Search in Google Scholar
[5] Y. Zhang, J. Cohen, A. Davidson, and J. Owens, A Hybrid Method for Solving Tridiagonal Systems on the GPU, GPU Computing Gems Jade Edition, Applications of GPU Computing Series, pp. 117–132, 2012.10.1016/B978-0-12-385963-1.00011-3Search in Google Scholar
[6] A. Davidson and J. Owens, “Register packing for cyclic reduction: a case study,” Proceedings of the FourthWorkshop on General Purpose Processing on Graphics Processing Units, vol. 4, ACM, 2011, https://doi.org/10.1145/1964179.1964185.Search in Google Scholar
[7] A. Davidson, Y. Zhang, and J. Owens, “An auto-tuned method for solving large tridiagonal systems on the GPU,” Parallel and Distributed Processing Symposium (IPDPS), IEEE International, IEEE, 2011, pp. 956–965, 2011.10.1109/IPDPS.2011.92Search in Google Scholar
[8] D. Goddeke and R. Strzodka, “Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid, Parallel and Distributed Systems,” IEEE Trans., vol. 22, no. 1, pp. 22–32, 2011, https://doi.org/10.1109/tpds.2010.61.Search in Google Scholar
[9] H. Kim, S. Wu, L. Chang, and W. Hwu, “A scalable tridiagonal solver for GPUs, Parallel Processing (ICPP),” 2011 International Conference on IEEE, pp. 444–453, 2011.10.1109/ICPP.2011.41Search in Google Scholar
[10] N. Sakharnykh, “Tridiagonal solvers on the GPU and applications to fluid simulation,” GPU Technology Conference, 2009.Search in Google Scholar
[11] Z. Wei, B. Jang, Y. Zhang, and Y. Jia, “Parallelizing Alternating Direction Implicit Solver on GPUs,” International Conference on Computational Science, ICCS, Procedia Computer Science, vol. 18, pp. 389–398, 2013.10.1016/j.procs.2013.05.202Search in Google Scholar
[12] F. Bodin and S. Bihan, “Heterogeneous multicore parallel programming for graphics processing units,” J. Sci. Program., vol. 17, no. 4, pp. 325–336, 2009, https://doi.org/10.1155/2009/784893.Search in Google Scholar
[13] C. T. Yang, C. L. Huang, and C. F. Lin, “Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters,” Comput. Phys. Commun., vol. 182, pp. 266–269, 2011, https://doi.org/10.1016/j.cpc.2010.06.035.Search in Google Scholar
[14] Y. Liu and R. Xiong, “A MPI + OpenMP + CUDA Hybrid Parallel Scheme for MT Occam Inversion,” Int. J. Grid Distr. Comput., vol. 9, no. 9, pp. 67–82, 2016, https://doi.org/10.14257/ijgdc.2016.9.9.07.Search in Google Scholar
[15] A. L. Davina and J. E. Roman, “MPI-CUDA parallel linear solvers for block-tridiagonal matrices in the context of SLEPc’s eigensolvers,” Parallel Comput., vol. 74, pp. 118–135, 2018.10.1016/j.parco.2017.11.006Search in Google Scholar
[16] D. Mu, P Chen, and L. Wang, “Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI,” Earthq Sci., vol. 26, no. 6, pp. 377–393, 2013, https://doi.org/10.1007/s11589-013-0047-7.Search in Google Scholar
[17] P. Alonso, R. Cortina, F. J. Martínez-Zaldívar, and J. Ranilla, “Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA,” J. Supercomput., vol. 58, pp. 215–225, 2011. https://doi.org/10.1007/s11227-009-0360-z.Search in Google Scholar
[18] C. Garetto and M. Ruzhansky, “Hyperbolic Second Order Equations with Non-Regular Time Dependent Coefficients,” Arch. Rational Mech. Anal., vol. 217, no. 1, pp. 113–154, 2015, https://doi.org/10.1007/s00205-014-0830-1.Search in Google Scholar
[19] M. Ruzhansky and N. Tokmagambetov, “Wave equation for operators with discrete spectrum and irregular propagation speed,” Arch. Ration. Mech. Anal., vol. 226, no. 3, pp. 1161–1207, 2017, https://doi.org/10.1007/s00205-017-1152-x.Search in Google Scholar
[20] M. Ruzhansky and N. Tokmagambetov, “Very weak solutions of wave equation for Landau Hamiltonian with irregular electromagnetic field,” Lett. Math. Phys., vol. 107, pp. 591–618, 2017, https://doi.org/10.1007/s11005-016-0919-6.Search in Google Scholar
[21] M. Ruzhansky and N. Tokmagambetov, “On a very weak solution of the wave equation for a Hamiltonian in a singular electromagnetic field,” Math. Notes, vol. 103, no. 5–6, pp. 856–858, 2018, https://doi.org/10.1134/s0001434618050206.Search in Google Scholar
[22] J. C. Munoz, M. Ruzhansky, and N. Tokmagambetov, “Wave propagation with irregular dissipation and applications to acoustic problems and shallow waters,” J. Math. Pures Appl., vol. 123, pp. 127–147, 2019, https://doi.org/10.1016/j.matpur.2019.01.012.Search in Google Scholar
[23] J. C. Munoz, M. Ruzhansky, and N. Tokmagambetov, “Acoustic and Shallow Water Wave Propagation with Irregular Dissipation,” Funct. Anal. Appl., vol. 53, no. 2, pp. 153–156, 2019, https://doi.org/10.1134/s0016266319020114.Search in Google Scholar
[24] M. Ruzhansky and N. Tokmagambetov, “Wave Equation for 2D Landau Hamiltonian,” Appl. Comput. Math., vol. 18, no. 1, pp. 69–78, 2019.Search in Google Scholar
[25] A. A. Samarskii, The Theory of Difference Schemes, Boca Raton, CRC Press, 2001.10.1201/9780203908518Search in Google Scholar
[26] NVIDIA, Nvidia, https://www.nvidia.com/, Accessed 2019.Search in Google Scholar
[27] G. Karniadakis and R. M. Kirby, Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and Their Implementation, Cambridge University Press, PAP/CDR edition, 17–30, 2003.10.1017/CBO9780511812583Search in Google Scholar
[28] 2D wave GPU implementationhttps://github.com/Arshynbek/2Dwave-GPU-implementation.Search in Google Scholar
[29] R. W. Hockney, “A fast direct solution of Poisson’s equation using Fourier analysis,” J. ACM, vol. 12, no. 1, pp. 95–113, 1965, https://doi.org/10.1145/321250.321259.Search in Google Scholar
[30] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with cuda,” Queue, vol. 6, no. 2, pp. 40–53, 2008. https://doi.org/10.1145/1365490.1365500.Search in Google Scholar
[31] A. Klockner, T. Warburton, J. Bridge, and J. S. Hesthaven, “Nodal discontinuous Galerkin methods on graphics processors,” J. Comput. Phys., vol. 228, no. 21, pp. 7863–7882, 2009, https://doi.org/10.1016/j.jcp.2009.06.041.Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston