Skip to content
Accessible Unlicensed Requires Authentication Published by De Gruyter July 25, 2020

A parallel hybrid implementation of the 2D acoustic wave equation

Arshyn Altybay, Michael Ruzhansky and Niyaz Tokmagambetov ORCID logo


In this paper, we propose a hybrid parallel programming approach for a numerical solution of a two-dimensional acoustic wave equation using an implicit difference scheme for a single computer. The calculations are carried out in an implicit finite difference scheme. First, we transform the differential equation into an implicit finite-difference equation and then using the alternating direction implicit (ADI) method, we split the equation into two sub-equations. Using the cyclic reduction algorithm, we calculate an approximate solution. Finally, we change this algorithm to parallelize on graphics processing unit (GPU), GPU + Open Multi-Processing (OpenMP), and Hybrid (GPU + OpenMP + message passing interface (MPI)) computing platforms. The special focus is on improving the performance of the parallel algorithms to calculate the acceleration based on the execution time. We show that the code that runs on the hybrid approach gives the expected results by comparing our results to those obtained by running the same simulation on a classical processor core, Compute Unified Device Architecture (CUDA), and CUDA + OpenMP implementations.

2010 Mathematics Subject Classification: 35L05; 76B15; 68Q85

Corresponding author: Niyaz Tokmagambetov, Department of Mathematics: Analysis, Logic and Discrete Mathematics, Ghent University, Gent, Belgium; Al-Farabi Kazakh National University, Almaty, Kazakhstan, E-mail:

Funding source: FWO Odysseus project

Funding source: Ministry of Education and Science of the Republic of Kazakhstan

Funding source: EPSRC

Award Identifier / Grant number: EP/R003025/1

Funding source: Leverhulme Research

Award Identifier / Grant number: RPG-2017-151

Funding source: MESRK

Award Identifier / Grant number: AP08052028

Award Identifier / Grant number: AP08053051

Award Identifier / Grant number: AP05130994

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: The authors were supported by FWO Odysseus project. MR was supported in parts by the EPSRC Grant EP/R003025/1, by the Leverhulme Research Grant RPG-2017-151. AA was supported by the MESRK Grants AP08052028 and AP08053051 of the Committee of Science, Ministry of Education and Science of the Republic of Kazakhstan.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.


[1] D. W. Peaceman and H. H. Rachford, “The Numerical Solution of Parabolic and Elliptic Differential Equations,” J. Soc. Ind. Appl. Math., vol. 3, no. 1, 1955, issn: 03684245. url: .Search in Google Scholar

[2] N. Bell and M. Garland, Efficient Sparse Matrix-vector Multiplication on CUDA, NVIDIA Technical Report, 2008, .Search in Google Scholar

[3] E. Elsen, P. LeGresley, and E. Darve, “Large calculation of the flow over a hypersonic vehicle using a GPU,” J. Comput. Phys., vol. 227, pp. 10148–10161, 2008, in Google Scholar

[4] Y. Zhang, J. Cohen, and J. Owens, “Fast tridiagonal solvers on the GPU,” ACM Sigplan Not., vol. 45, no. 5, pp. 127–136, 2010, in Google Scholar

[5] Y. Zhang, J. Cohen, A. Davidson, and J. Owens, A Hybrid Method for Solving Tridiagonal Systems on the GPU, GPU Computing Gems Jade Edition, Applications of GPU Computing Series, pp. 117–132, 2012.Search in Google Scholar

[6] A. Davidson and J. Owens, “Register packing for cyclic reduction: a case study,” Proceedings of the FourthWorkshop on General Purpose Processing on Graphics Processing Units, vol. 4, ACM, 2011, in Google Scholar

[7] A. Davidson, Y. Zhang, and J. Owens, “An auto-tuned method for solving large tridiagonal systems on the GPU,” Parallel and Distributed Processing Symposium (IPDPS), IEEE International, IEEE, 2011, pp. 956–965, 2011.Search in Google Scholar

[8] D. Goddeke and R. Strzodka, “Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid, Parallel and Distributed Systems,” IEEE Trans., vol. 22, no. 1, pp. 22–32, 2011, in Google Scholar

[9] H. Kim, S. Wu, L. Chang, and W. Hwu, “A scalable tridiagonal solver for GPUs, Parallel Processing (ICPP),” 2011 International Conference on IEEE, pp. 444–453, 2011.Search in Google Scholar

[10] N. Sakharnykh, “Tridiagonal solvers on the GPU and applications to fluid simulation,” GPU Technology Conference, 2009.Search in Google Scholar

[11] Z. Wei, B. Jang, Y. Zhang, and Y. Jia, “Parallelizing Alternating Direction Implicit Solver on GPUs,” International Conference on Computational Science, ICCS, Procedia Computer Science, vol. 18, pp. 389–398, 2013.Search in Google Scholar

[12] F. Bodin and S. Bihan, “Heterogeneous multicore parallel programming for graphics processing units,” J. Sci. Program., vol. 17, no. 4, pp. 325–336, 2009, in Google Scholar

[13] C. T. Yang, C. L. Huang, and C. F. Lin, “Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters,” Comput. Phys. Commun., vol. 182, pp. 266–269, 2011, in Google Scholar

[14] Y. Liu and R. Xiong, “A MPI + OpenMP + CUDA Hybrid Parallel Scheme for MT Occam Inversion,” Int. J. Grid Distr. Comput., vol. 9, no. 9, pp. 67–82, 2016, in Google Scholar

[15] A. L. Davina and J. E. Roman, “MPI-CUDA parallel linear solvers for block-tridiagonal matrices in the context of SLEPc’s eigensolvers,” Parallel Comput., vol. 74, pp. 118–135, 2018.Search in Google Scholar

[16] D. Mu, P Chen, and L. Wang, “Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI,” Earthq Sci., vol. 26, no. 6, pp. 377–393, 2013, in Google Scholar

[17] P. Alonso, R. Cortina, F. J. Martínez-Zaldívar, and J. Ranilla, “Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA,” J. Supercomput., vol. 58, pp. 215–225, 2011. in Google Scholar

[18] C. Garetto and M. Ruzhansky, “Hyperbolic Second Order Equations with Non-Regular Time Dependent Coefficients,” Arch. Rational Mech. Anal., vol. 217, no. 1, pp. 113–154, 2015, in Google Scholar

[19] M. Ruzhansky and N. Tokmagambetov, “Wave equation for operators with discrete spectrum and irregular propagation speed,” Arch. Ration. Mech. Anal., vol. 226, no. 3, pp. 1161–1207, 2017, in Google Scholar

[20] M. Ruzhansky and N. Tokmagambetov, “Very weak solutions of wave equation for Landau Hamiltonian with irregular electromagnetic field,” Lett. Math. Phys., vol. 107, pp. 591–618, 2017, in Google Scholar

[21] M. Ruzhansky and N. Tokmagambetov, “On a very weak solution of the wave equation for a Hamiltonian in a singular electromagnetic field,” Math. Notes, vol. 103, no. 5–6, pp. 856–858, 2018, in Google Scholar

[22] J. C. Munoz, M. Ruzhansky, and N. Tokmagambetov, “Wave propagation with irregular dissipation and applications to acoustic problems and shallow waters,” J. Math. Pures Appl., vol. 123, pp. 127–147, 2019, in Google Scholar

[23] J. C. Munoz, M. Ruzhansky, and N. Tokmagambetov, “Acoustic and Shallow Water Wave Propagation with Irregular Dissipation,” Funct. Anal. Appl., vol. 53, no. 2, pp. 153–156, 2019, in Google Scholar

[24] M. Ruzhansky and N. Tokmagambetov, “Wave Equation for 2D Landau Hamiltonian,” Appl. Comput. Math., vol. 18, no. 1, pp. 69–78, 2019.Search in Google Scholar

[25] A. A. Samarskii, The Theory of Difference Schemes, Boca Raton, CRC Press, 2001.Search in Google Scholar

[26] NVIDIA, Nvidia, , Accessed 2019.Search in Google Scholar

[27] G. Karniadakis and R. M. Kirby, Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and Their Implementation, Cambridge University Press, PAP/CDR edition, 17–30, 2003.Search in Google Scholar

[28] 2D wave GPU implementation.Search in Google Scholar

[29] R. W. Hockney, “A fast direct solution of Poisson’s equation using Fourier analysis,” J. ACM, vol. 12, no. 1, pp. 95–113, 1965, in Google Scholar

[30] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with cuda,” Queue, vol. 6, no. 2, pp. 40–53, 2008. in Google Scholar

[31] A. Klockner, T. Warburton, J. Bridge, and J. S. Hesthaven, “Nodal discontinuous Galerkin methods on graphics processors,” J. Comput. Phys., vol. 228, no. 21, pp. 7863–7882, 2009, in Google Scholar

Received: 2019-09-11
Accepted: 2020-05-03
Published Online: 2020-07-25
Published in Print: 2020-11-18

© 2020 Walter de Gruyter GmbH, Berlin/Boston