Skip to content
Licensed Unlicensed Requires Authentication Published online by De Gruyter August 5, 2022

A comparative study of the spectrogram, scalogram, melspectrogram and gammatonegram time-frequency representations for the classification of lung sounds using the ICBHI database based on CNNs

Zakaria Neili and Kenneth Sundaraj


In lung sound classification using deep learning, many studies have considered the use of short-time Fourier transform (STFT) as the most commonly used 2D representation of the input data. Consequently, STFT has been widely used as an analytical tool, but other versions of the representation have also been developed. This study aims to evaluate and compare the performance of the spectrogram, scalogram, melspectrogram and gammatonegram representations, and provide comparative information to users regarding the suitability of these time-frequency (TF) techniques in lung sound classification. Lung sound signals used in this study were obtained from the ICBHI 2017 respiratory sound database. These lung sound recordings were converted into images of spectrogram, scalogram, melspectrogram and gammatonegram TF representations respectively. The four types of images were fed separately into the VGG16, ResNet-50 and AlexNet deep-learning architectures. Network performances were analyzed and compared based on accuracy, precision, recall and F1-score. The results of the analysis on the performance of the four representations using these three commonly used CNN deep-learning networks indicate that the generated gammatonegram and scalogram TF images coupled with ResNet-50 achieved maximum classification accuracies.

Corresponding author: Zakaria Neili, Electronics Department, University of Badji Mokhtar Annaba, Annaba, Algeria, E-mail:

  1. Research funding: None declared.

  2. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Competing interests: Authors state no conflict of interest.

  4. Informed consent: Informed consent was obtained from all individuals included in this study.

  5. Ethical approval: The local Institutional Review Board deemed the study exempt from review.


1. Pasterkamp, H, Kraman, SS, Wodicka, GR. Respiratory sounds: advances beyond the stethoscope. Am J Respir Crit Care Med 1997;156:974–87. in Google Scholar PubMed

2. Forum of International Respiratory Societies. The Global Impact of Respiratory Disease, 2nd Edition. Sheffield: European Respiratory Society; 2017.Search in Google Scholar

3. WHO. Global surveillance, prevention and control of chronic respiratory diseases: a comprehensive approach. Geneva, Switzerland: WHO; 2007.Search in Google Scholar

4. Acharya, J, Basu, A, Ser, W. Feature extraction techniques for low-power ambulatory wheeze detection wearables. IEEE Eng Med Biol Soc Conf Proc 2017:4574–7. in Google Scholar PubMed

5. Zhang, J, Ser, W, Yu, J, Zhang, TT. A novel wheeze detection method for wearable monitoring systems. IEEE IUCE Conf Proc 2009:331–4. in Google Scholar

6. Bahoura, M. Pattern recognition methods applied to respiratory sounds classification into normal and wheeze classes. Comput Biol Med 2009;39:824–43. in Google Scholar PubMed

7. Lin, BS, Lin, BS. Automatic wheezing detection using speech recognition technique. J Med Biol Eng 2016;36:545–54. in Google Scholar

8. Jakovljević, N, Lončar-Turukalo, T. Hidden Markov model based respiratory sound classification. IFMBE Biomed Health Informatics Conf Proc 2017;39–43. in Google Scholar

9. Pramono, RXA, Bowyer, S, Rodriguez-Villegas, E. Automatic adventitious respiratory sound analysis: a systematic review. PloS One 2017;12:e0177926. in Google Scholar PubMed PubMed Central

10. Mushtaq, Z, Su, SF, Tran, QV. Spectral images based environmental sound classification using CNN with meaningful data augmentation. Appl Acoust 2020;172:107581. in Google Scholar

11. Tian, C, Xu, Y, Zuo, W. Image denoising using deep CNN with batch renormalization. Neural Network 2020;121:461–73. in Google Scholar PubMed

12. Aslan, MF, Unlersen, MF, Sabanci, K, Durdu, A. CNN-based transfer learning – BiLSTM network: a novel approach for COVID-19 infection detection. Appl Soft Comput 2020;98:106912. in Google Scholar PubMed PubMed Central

13. Hu, Q, Souza, LFDF, Holanda, GB, Alves, SS, Silva, FHDS, Han, T, et al.. An effective approach for CT lung segmentation using mask region-based convolutional neural networks. Artif Intell Med 2020;103:101792. in Google Scholar PubMed

14. Kisilev, P, Sason, E, Barkan, E, Hashoul, S. Medical image description using multi-task-loss CNN. LNCS Book Series 2016;10008:121–9. in Google Scholar

15. Gour, N, Khanna, P. Multi-class multi-label ophthalmological disease detection using transfer learning based convolutional neural network. Biom Signal Proc and Con 2020;66:102329. in Google Scholar

16. Zuluaga-Gomez, J, Al-Masry, Z, Benaggoune, K, Meraghni, S, Zerhouni, N. A CNN-based methodology for breast cancer diagnosis using thermal images. Comput Methods Biomech Biomed Eng Imaging Vis 2020;9:131–45. in Google Scholar

17. Vasanthselvakumar, R, Balasubramanian, M, Sathiya, S. Automatic detection and classification of chronic kidney diseases using CNN architecture. AISC Book Series 2020;1079:735–44. in Google Scholar

18. Ranjan, R, Bhushan, B, Palaniswami, M, Verma, A. A convolutional neural network approach for quantification of tremor severity in neurological movement disorders. SAI Intelligent Systems Conf Proc 2020:416–29. in Google Scholar

19. Bengio, Y, Simard, P, Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Network 1994;5:157–66. in Google Scholar PubMed

20. Salehinejad, H, Sankar, S, Barfett, J, Colak, E, Valaee, S. Recent advances in recurrent neural networks. arXiv preprint 2017. in Google Scholar

21. Alhussein, M, Muhammad, G. Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 2018;6:41034–41. in Google Scholar

22. Abdel-Hamid, O, Mohamed, AR, Jiang, H, Deng, L, Penn, G, Yu, D. Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 2014;22:1533–45. in Google Scholar

23. Han, K, He, Y, Bagchi, D, Fosler-Lussier, E, Wang, D. Deep neural network based spectral feature mapping for robust speech recognition. ISCA Interspeech Conf Proc 2015:2484–8. in Google Scholar

24. Chien, JC, Wu, HD, Chong, FC, Li, CI. Wheeze detection using cepstral analysis in Gaussian mixture models. IEEE Eng Med Biol Soc Conf Proc 2007:3168–71. in Google Scholar PubMed

25. Neili, Z, Fezari, M, Redjati, A. ELM and K-NN machine learning in classification of breath sounds signals. Int J Electr Comput Eng 2020;10:3528–36. in Google Scholar

26. Orjuela-Cañón, AD, Gómez-Cajas, DF, Jiménez-Moreno, R. Artificial neural networks for acoustic lung signals classification. LNCS Book Series 2014;8827:214–21. in Google Scholar

27. Serbes, G, Sakar, CO, Kahya, YP, Aydin, N. Pulmonary crackle detection using time-frequency and time-scale analysis. Digit Signal Process 2013;23:1012–21. in Google Scholar

28. Jin, F, Sattar, F, Goh, DY. New approaches for spectro-temporal feature extraction with applications to respiratory sound classification. Neurocomputing 2014;123:362–71. in Google Scholar

29. Khodabakhshi, MB, Moradi, MH. The attractor recurrent neural network based on fuzzy functions: an effective model for the classification of lung abnormalities. Comput Biol Med 2017;84:124–36. in Google Scholar PubMed

30. Altan, G, Kutlu, Y, Pekmezci, AÖ, Nural, S. Deep learning with 3D-second order difference plot on respiratory sounds. Biom Signal Proc and Con 2018;45:58–69. in Google Scholar

31. Altan, G, Kutlu, Y, Allahverdi, N. Deep learning on computerized analysis of chronic obstructive pulmonary disease. IEEE J Biom and Health Info 2020;24:1344–50. in Google Scholar PubMed

32. Demir, F, Abdullah, DA, Sengur, A. A new deep CNN model for environmental sound classification. IEEE Access 2020;8:66529–37. in Google Scholar

33. Chen, H, Yuan, X, Pei, Z, Li, M, Li, J. Triple-classification of respiratory sounds using optimized s-transform and deep residual networks. IEEE Access 2020;7:32845–52. in Google Scholar

34. Jácome, C, Ravn, J, Holsbø, E, Aviles-Solis, JC, Melbye, H, Ailo Bongo, L. Convolutional neural network for breathing phase detection in lung sounds. Sensors 2019;19:1798. in Google Scholar PubMed PubMed Central

35. Bardou, D, Zhang, K, Ahmad, SM. Lung sounds classification using convolutional neural networks. Artif Intell Med 2018;88:58–69. in Google Scholar PubMed

36. Acharya, J, Basu, A. Deep neural network for respiratory sound classification in wearable devices enabled by patient specific model tuning. IEEE Trans Biomed Circuits Syst 2020;14:535–44. in Google Scholar PubMed

37. Shi, L, Du, K, Zhang, C, Ma, H, Yan, W. Lung sound recognition algorithm based on VGGish-BiGRU. IEEE Access 2019;7:139438–49. in Google Scholar

38. Aykanat, M, Kılıç, Ö, Kurt, B, Saryal, S. Classification of lung sounds using convolutional neural networks. J Image Video Process 2017;65. in Google Scholar

39. Gupta, S, Agrawal, M, Deepak, D. Gammatonegram based triple classification of lung sounds using deep convolutional neural network with transfer learning. Biom Signal Proc and Con 2021;70:102947. in Google Scholar

40. Demir, F, Ismael, AM, Sengur, A. Classification of lung sounds with CNN model using parallel pooling structure. IEEE Access 2020;8:105376–83. in Google Scholar

41. Jayalakshmy, S, Sudha, GF. Scalogram based prediction model for respiratory disorders using optimized convolutional neural networks. Artif Intell Med 2020;103:101809. in Google Scholar PubMed

42. García-Ordás, MT, Benítez-Andrades, JA, García-Rodríguez, I, Benavides, C, Alaiz-Moretón, H. Detecting respiratory pathologies using convolutional neural networks and variational autoencoders for unbalancing data. Sensors 2020;20:1214. in Google Scholar PubMed PubMed Central

43. Rocha, BM, Pessoa, D, Marques, A, Carvalho, P, Paiva, RP. Automatic classification of adventitious respiratory sounds: a (un)solved problem? Sensors 2021;21:57. in Google Scholar PubMed PubMed Central

44. Demir, F, Sengur, A, Bajaj, V. Convolutional neural networks based efficient approach for classification of lung diseases. Health Inf Sci Syst 2020;8:4. in Google Scholar

45. Shuvo, SB, Ali, SN, Swapnil, SI, Hasan, T, Bhuiyan, MIH. A lightweight CNN model for detecting respiratory diseases from lung auscultation sounds using EMD-CWT-based hybrid scalogram. IEEE J Biomed Health Inform 2020;25:2595–603. in Google Scholar

46. Rocha, BM, Filos, D, Mendes, L, Vogiatzis, I, Perantoni, E, Kaimakamis, E, et al.. Α respiratory sound database for the development of automated classification. IFMBE Proc Book Series 2017;66:33–7. in Google Scholar

47. Grinsted, A, Moore, JC, Jevrejeva, S. Application of the cross wavelet transform and wavelet coherence to geophysical time series. Nonlinear Process Geophys 2004;11:561–6. in Google Scholar

48. Ren, Z, Qian, K, Zhang, Z, Pandit, V, Baird, A, Schuller, B. Deep scalogram representations for acoustic scene classification. IEEE/CAA J Autom Sin 2018;5:662–9. in Google Scholar

49. Daubechies, I. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inf Theory 1990;36:961–1005. in Google Scholar

50. Rioul, O, Vetterli, M. Wavelets and signal processing. IEEE Sig Process Mag 1991;8:14–38. in Google Scholar

51. Patterson, RD, Robinson, KEN, Holdsworth, J, McKeown, D, Zhang, C, Allerhand, M. Complex sounds and auditory images. Hearing Symp Conf Proc 1992:429–46. in Google Scholar

52. Glasberg, BR, Moore, BC. Derivation of auditory filter shapes from notched-noise data. Hear Res 1990;47:103–38. in Google Scholar

53. Simonyan, K, Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint 2015. in Google Scholar

54. He, K, Zhang, X, Ren, S, Sun, J. Deep residual learning for image recognition. IEEE ICVPR Conf Proc 2016:770–8. in Google Scholar

55. Alom, MZ, Taha, TM, Yakopcic, C, Westberg, S, Sidike, P, Nasrin, MS, et al.. The history began from AlexNet: a comprehensive survey on deep learning approaches. arXiv preprint 2018. in Google Scholar

56. Altan, G, Kutlu, Y, Gökçen, A. Chronic obstructive pulmonary disease severity analysis using deep learning on multi-channel lung sounds. Turk J Electr Eng Comput Sci 2020;28:2979–96. in Google Scholar

57. Altan, G, Kutlu, Y. Hessenberg ELM autoencoder kernel for deep learning. J Eng Techn Appl Sci 2018;3:141–51. in Google Scholar

58. Ruder, S, Peters, ME, Swayamdipta, S, Wolf, T. Transfer learning in natural language processing. NAACL Conf Proc 2019:15–8. in Google Scholar

59. Ahmed, KB, Bouhorma, M, Ahmed, MB, Radenski, A. Visual sentiment prediction with transfer learning and big data analytics for smart cities. IEEE CiSt Conf Proc 2016:800–5. in Google Scholar

Received: 2021-11-01
Accepted: 2022-06-21
Published Online: 2022-08-05

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Scroll Up Arrow