Accessible Requires Authentication Published by De Gruyter September 14, 2019

Automatic hypernasality grade assessment in cleft palate speech based on the spectral envelope method

Jing Zhang, Sen Yang, Xiyue Wang, Ming Tang, Heng Yin and Ling He


Due to velopharyngeal incompetence, airflow overflows from the oral cavity to the nasal cavity, which results in hypernasality. Hypernasality greatly reduces speech intelligibility and affects the daily communication of patients with cleft palate. Accurate assessment of hypernasality grades can provide assisted diagnosis for speech-language pathologists (SLPs) in clinical settings. Utilizing a support vector machine (SVM), this paper classifies speech recordings into four grades (normal, mild, moderate and severe hypernasality) based on vocal tract characteristics. Linear prediction (LP) analysis is widely used to model the vocal tract. Glottal source information may be included in the LP-based spectrum. The stabilized weighted linear prediction (SWLP) method, which imposes the temporal weights on the closed-phase interval of the glottal cycle, is a more robust approach for modeling the vocal tract. The extended weighted linear prediction (XLP) method weights each lagged speech signal separately, which achieves a finer time scale on the spectral envelope than the SWLP method. Tested speech recordings were collected from 60 subjects with cleft palate and 20 control subjects, and included a total of 4640 Mandarin syllables. The experimental results showed that the spectral envelope of normal speech decreases faster than that of hypernasal speech in the high-frequency part. The experimental results also indicate that the SWLP- and XLP-based methods have smaller correlation coefficients between normal and hypernasal speech than the LP method. Thus, the SWLP and XLP methods have better ability to distinguish hypernasal from normal speech than the LP method. The classification accuracies of the four hypernasality grades using the SWLP and XLP methods range from 83.86% to 97.47%. The selection of the model order and the size of the weight function are also discussed in this paper.

  1. Author Statement

  2. Research funding: This research was funded by the National Natural Science Foundation of China, grant number 61503264. This research was partially supported by research grants from the National Natural Science Foundation of China, grant number 61571314.

  3. Conflict of interest: Authors have no conflict of interest.

  4. Informed consent: All test subjects took part on the basis of informed consent.

  5. Ethical approval: We confirm that this study was conducted in compliance with the World Medical Association Declaration of Helsinki. Ethical approval was given by the Ethics Committee of the West China Hospital of Stomatology (No. WCHSIRB-CT-2013-011).


[1] Vijayalakshmi P, Reddy MR, O’Shaughnessy D. Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans Biomed Eng 2007;54:621–9.10.1109/TBME.2006.88919117405369 Search in Google Scholar

[2] Maier A, Reuß A, Hacker C, Schuster M, Nöth E. Analysis of hypernasal speech in children with cleft lip and palate. In: 11th International Conference on Text, Speech and Dialogue; 2008 Sept 8–12; Brno, Czech Republic. Berlin: Springer; 2008:389–96. Search in Google Scholar

[3] Vogel AP, Ibrahim HM, Reilly S, Kilpatrick N. A comparative study of two acoustic measures of hypernasality. J Speech Lang Hear Res 2009;52:1640–51.10.1044/1092-4388(2009/08-0161)19951929 Search in Google Scholar

[4] Haque S, Ali MH, Haque AKMF. Cross-gender acoustic differences in hypernasal speech and detection of hypernasality. In: International Workshop on Computational Intelligence (IWCI); 2016 Dec 12–13; Dhaka, Bangladesh. New York: IEEE; 2017:187–91. Search in Google Scholar

[5] Akafi E, Vali M, Moradi N. Detection of hypernasal speech in children with cleft palate. In: 19th Iranian Conference of Biomedical Engineering (ICBME); 2012 Dec 20–21; Tehran, Iran. New York: IEEE; 2013:237–41. Search in Google Scholar

[6] Mirzaei A, Vali M. Detection of hypernasality from speech signal using group delay and wavelet transform. In: 6th International Conference on Computer and Knowledge Engineering(ICCKE); 2016 Oct 20; Mashhad, Iran. New York: IEEE; 2017:189–93. Search in Google Scholar

[7] Rah DK, Ko YI, Lee C, Kim DW. A noninvasive estimation of hypernasality using a linear predictive model. Ann Biomed Eng 2001;29:587–94.1150162310.1114/1.1380422 Search in Google Scholar

[8] Cairns DA, Hansen JH, Riski JE. A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Trans Biomed Eng 1996;43:35–45.10.1109/10.4776998567004 Search in Google Scholar

[9] Nieto RG, Marín-Hurtado JI, Capacho-Valbuena LM, Suarez AA. Pattern recognition of hypernasality in voice of patients with cleft and lip palate. In: 2014 XIX Symposium on Image, Signal Processing and Artificial Vision; 2014 Sept 17–19; Armenia, Colombia. New York: IEEE; 2015:1–5. Search in Google Scholar

[10] Kataoka R, Warren DW, Zajac DJ, Mayo R, Lutz RW. The relationship between spectral characteristics and perceived hypernasality in children. J Acoust Soc Am 2001;109:2181–9.1138656910.1121/1.1360717 Search in Google Scholar

[11] Vijayalakshmi P, Nagarajan T, Rav J. Selective pole modification-based technique for the analysis and detection of hypernasality. In: TENCON 2009–2009 IEEE Region 10 Conference; 2009 Jan 23–26; Singapore, Singapore. New York: IEEE; 2009:1–5. Search in Google Scholar

[12] Zahorian SA, Jagharghi AJ. Spectral-shape features versus formants as acoustic correlates for vowels. J Acous Soc Am 1993;94:1966–82.10.1121/1.407520 Search in Google Scholar

[13] Dubey AK, Prasanna SRM, Dandapat S. Zero time windowing analysis of hypernasality in speech of Cleft Lip and palate children. In: Twenty Second National Conference on Communication (NCC); 2016 Mar 4–6; Guwahati, India. New York: IEEE; 2016:1–6. Search in Google Scholar

[14] Rendón SM, Arroyave JRO, Bonilla JFV, Londoño JDA, Domínguez CGC. Automatic detection of hypernasality in children. In: International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC); 2011 May 30–Jun 3; Canary Islands, Spain. Berlin: Springer; 2011:167–74. Search in Google Scholar

[15] Castellanos G, Daza G, Sanchez L, Castrillon O, Suarez J. Acoustic speech analysis for hypernasality detection in children. In: International Conference of the IEEE Engineering in Medicine and Biology Society; 2006 Aug 30–Sept 3; New York, USA. New York: IEEE; 2006:5507–10. Search in Google Scholar

[16] Dodderi T, Narra M, Varghese SM, Deepak DT. Spectral analysis of hypernasality in cleft palate children: a pre-post surgery comparison. J Clin Diagn Res 2016;10:1–3. Search in Google Scholar

[17] Golabbakhsh M, Abnavi F, Kadkhodaei EM, Derakhshandeh F, Khanlar F, Rong P, et al. Automatic identification of hypernasality in normal and cleft lip and palate patients with acoustic analysis of speech. J Acous Soc Am 2017;141:929–35.10.1121/1.4976056 Search in Google Scholar

[18] Lee GS, Wang CP, Fu S. Evaluation of hypernasality in vowels using voice low tone to high tone ratio. Cleft Palate Craniofac J 2009;46:47–52.10.1597/07-184.119115797 Search in Google Scholar

[19] Lee GS, Wang CP, Yang CCH, Kuo TBJ. Voice low tone to high tone ratio: a potential quantitative index for vowel [a:] and its nasalization. IEEE Trans Biomed Eng 2006;53:1437–9.1683095010.1109/TBME.2006.873694 Search in Google Scholar

[20] Orozco-Arroyave JR, Belalcazar-Bolanos EA, Arias-Londono JD, Vargas-Bonilla JF, Skodda S, Rusz J, et al. Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases. IEEE J Biomed Health Inform 2015;19:1820–8.10.1109/JBHI.2015.246737526277012 Search in Google Scholar

[21] Orozco-Arroyave JR, Murillo-Rendón S, Vargas-Bonilla JF, Delgado-Trejos E, Arias-Londoño JD, Castellanos-Domínguez G. Nonlinear dynamics for hypernasality detection. In: International Conference on Nonlinear Speech Processing; 2011 Nov 7–9; Las Palmas de Gran Canaria, Spain. Berlin: Springer; 2011:207–14. Search in Google Scholar

[22] Sandoval S, Berisha V, Utianski RL, Liss JM, Spanias A. Automatic assessment of vowel space area. J Acous Soc Am 2013;134:477–83.10.1121/1.4826150 Search in Google Scholar

[23] Miyoshi Y, Yamato K, Mizoguchi R, Yanagida M, Kakusho O. Analysis of speech signals of short pitch period by a sample-selective linear prediction. IEEE Trans Audio 1987;35:1233–40. Search in Google Scholar

[24] Lee CH. On robust linear prediction of speech. IEEE Trans Acoust 1988;36:642–50.10.1109/29.1574 Search in Google Scholar

[25] Magi C, Pohjalainen J, Bäckström T, Alku P. Stabilised weighted linear prediction. Speech Commun 2009;51:401–11.10.1016/j.specom.2008.12.005 Search in Google Scholar

[26] Deng H, Ward RK, Beddoes MP, Hodgson M. A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds. IEEE Trans Audio Speech Lang Process 2006;14:445–55.10.1109/TSA.2005.857811 Search in Google Scholar

[27] Yegnanarayana B, Veldhuis RNJ. Extraction of vocal-tract system characteristics from speech signals. IEEE Trans Speech Audio Process 1998;6:313–27.10.1109/89.701359 Search in Google Scholar

[28] Saeidi R, Pohjalainen J, Kinnunen T, Alku P. Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Process Lett 2010;17:599–602.10.1109/LSP.2010.2048649 Search in Google Scholar

[29] Gowda D, Pohjalainen J, Alku P, Kurimo M. Robust spectral representation using group delay function and stabilized weighted linear prediction for additive noise degradations. In: 2013 7th Conference on Speech Technology and Human-Computer Dialogue; 2013 Oct 16–19; Cluj-Napoca, Romania. New York: IEEE; 2014:1–7. Search in Google Scholar

[30] Pohjalainen J, Kallasjoki H, Palomäki KJ, Kurimo M, Alku P. Weighted linear prediction for speech analysis in noisy conditions. In: INTERSPEECH; 2009 Sept 6–10; Brighton, UK. Baixas, France: International Speech Communication Association; 2009:1315–8. Search in Google Scholar

[31] Gowda D, Pohjalainen J, Kurimo M, Alku P. Robust formant detection using group delay function and stabilized weighted linear prediction. In: INTERSPEECH; 2013 Aug 25–29; Lyon, France. Baixas, France: International Speech Communication Association; 2013:49–53. Search in Google Scholar

[32] Pohjalainen J, Saeidi R, Kinnunen T, Alku P. Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions. In: INTERSPEECH; 2010 Sept 26–30; Makuhari, Chiba, Japan; 2010:1477–80. Search in Google Scholar

[33] Pohjalainen J, Alku P. Extended weighted linear prediction using the autocorrelation snapshot-a robust speech analysis method and its application to recognition of vocal emotions. In: INTERSPEECH; 2013 Aug 25–29; Lyon, France. Baixas, France: International Speech Communication Association; 2013:1931–5. Search in Google Scholar

[34] Keronen S, Pohjalainen J, Alku P, Kurimo M. Noise robust feature extraction based on extended weighted linear prediction in LVCSR. In: INTERSPEECH; 2011 Aug 27–31; Florence, Italy. Baixas, France: International Speech Communication Association; 2011:1265–8. Search in Google Scholar

[35] Titze IR, Palaparthi A. Sensitivity of source–filter interaction to specific vocal tract shapes. IEEE/ACM Trans Audio Speech Lang Process 2016;24:2507–15.10.1109/TASLP.2016.2616543 Search in Google Scholar

[36] Murty KSR, Yegnanarayana B. Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Lett 2005;13:52–5. Search in Google Scholar

[37] Song Z. Application of MATLAB in Speech Signal Analysis and Synthesis. Beijing, China: Beijing University Press; 2013. Search in Google Scholar

[38] Makhoul J. Linear prediction: a tutorial review. Proc IEEE 1975;63:561–80.10.1109/PROC.1975.9792 Search in Google Scholar

[39] Childers DG, Wong CF. Measuring and modeling vocal source-tract interaction. IEEE Trans Biomed Eng 1994;41:663–71.10.1109/10.3017337927387 Search in Google Scholar

[40] Drugman T, Stylianou Y. Fast inter-harmonic reconstruction for spectral envelope estimation in high-pitched voices. IEEE Signal Process Lett 2014;21:1418–22.10.1109/LSP.2014.2338399 Search in Google Scholar

[41] Lee LM, Wang HC. An extended Levinson-Durbin algorithm for the analysis of noisy autoregressive process. IEEE Signal Process Lett 1996;3:13–5.10.1109/97.475824 Search in Google Scholar

[42] He L, Zhang J, Liu Q, Yin H, Lech M. Automatic evaluation of hypernasality and consonant misarticulation in cleft palate speech. IEEE Signal Process Lett 2014;21:1298–301.10.1109/LSP.2014.2333061 Search in Google Scholar

[43] Chen X, Zhao H. The research of endpoint detection and initial/final segmentation for Chinese Whispered Speech. In: International Conference on Signal Processing; 2006 Nov 16–20; Beijing, China. New York: IEEE; 2006:16–20. Search in Google Scholar

[44] Chen B. Boundary detection of Chinese initials and finals based on seneff’s auditory spectrum features. Acta Acustica 2012;37:104–12. Search in Google Scholar

[45] Han D. Time-frequency method for Chinese monosyllable and dual-syllable intial-final segmentation. Comput Eng Appl 2012;48:153–6. Search in Google Scholar

[46] Lu D, Zhou P. Research on speech endpoint detection and initial/final segmentation based on the dual-threshold algorithm. J Guilin Univ Electron Technol 2011;31:480–4. Search in Google Scholar

[47] Wang Y, Feng HW, Zhang LP. I/F segmentation for Chinese continuous speech based on vowel detection. Comput Eng Appl 2011;47:134–6. Search in Google Scholar

[48] Markel JD, Gray AJ. Linear Prediction of Speech. Berlin: Springer; 1976. Search in Google Scholar

[49] Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273–97.10.1007/BF00994018 Search in Google Scholar

[50] Bergmeir C, Benítez JM. On the use of cross-validation for time series predictor evaluation. Inf Sci 2012;191:192–213.10.1016/j.ins.2011.12.028 Search in Google Scholar

[51] Wu F, Kenstowicz M. Duration reflexes of syllable structure in Mandarin. Lingua 2015;164:87–99.10.1016/j.lingua.2015.06.010 Search in Google Scholar

[52] Strube HW. Determination of the instant of glottal closure from the speech wave. J Acous Soc Am 1974;56:1625–9.10.1121/1.1903487 Search in Google Scholar

[53] Deller JR, Proakis JG, Hansen JH. Discrete Time Processing of Speech Signals. Englewood Cliffs: Prentice-Hall; 1993. Search in Google Scholar

[54] Fogerty D, Humes LE. Perceptual contributions to monosyllabic word intelligibility: segmental, lexical, and noise replacement factors. J Acous Soc Am 2010;128:3114–25.10.1121/1.3493439 Search in Google Scholar

Received: 2018-09-16
Accepted: 2019-05-07
Published Online: 2019-09-14
Published in Print: 2020-01-28

©2020 Walter de Gruyter GmbH, Berlin/Boston