Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Intelligent Systems

Editor-in-Chief: Fleyeh, Hasan

CiteScore 2018: 1.03

SCImago Journal Rank (SJR) 2018: 0.188
Source Normalized Impact per Paper (SNIP) 2018: 0.533

See all formats and pricing
More options …
Volume 25, Issue 3


Gaussian Mixture Model Based Classification of Stuttering Dysfluencies

P. Mahesha / D.S. Vinod
Published Online: 2015-06-12 | DOI: https://doi.org/10.1515/jisys-2014-0140


The classification of dysfluencies is one of the important steps in objective measurement of stuttering disorder. In this work, the focus is on investigating the applicability of automatic speaker recognition (ASR) method for stuttering dysfluency recognition. The system designed for this particular task relies on the Gaussian mixture model (GMM), which is the most widely used probabilistic modeling technique in ASR. The GMM parameters are estimated from Mel frequency cepstral coefficients (MFCCs). This statistical speaker-modeling technique represents the fundamental characteristic sounds of speech signal. Using this model, we build a dysfluency recognizer that is capable of recognizing dysfluencies irrespective of a person as well as what is being said. The performance of the system is evaluated for different types of dysfluencies such as syllable repetition, word repetition, prolongation, and interjection using speech samples from the University College London Archive of Stuttered Speech (UCLASS).

Keywords: Dysfluency; EM algorithm; GMM; MFCC; stuttering

MSC 2010: 68T01; 68T10; 68T27


  • [1]

    O. C. Ai, M. Hariharan, S. Yaacob and L. S. Chee, Classification of speech dysfluencies with MFCC and LPCC features, Elsevier International Journal of Expert Systems with Applications 39 (2012), 2157–2165.Google Scholar

  • [2]

    L. S. Chee, O. C. Ai, M. Hariharan and S. Yaacob, Automatic detection of prolongations and repetitions using LPCC, in: Proc. of International Conference for Technical Postgraduates (TECHPOS), pp. 1–4, 2009.Google Scholar

  • [3]

    L. S. Chee, O. C. Ai, M. Hariharan and S. Yaacob, MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA, in: Proc. of IEEE Student Conference on Research and Development (SCOReD), pp. 146–149, 2009.Google Scholar

  • [4]

    A. K. Cordes, P. F. Ingham and J. C. Ingham, Time-interval analysis of interjudge and intrajudge agreement for stuttering event judgments, J. Speech Hear. Res. 35 (1992), 483–494.Google Scholar

  • [5]

    A. Czyzewski, A. Kaczmarek and B. Kostek, Intelligent processing of stuttered speech, J. Intell. Inf. Syst. 21 (2003), 143–171.Google Scholar

  • [6]

    J. R. Dalton and C. Q. Peterson, The use of voice recognition as a control interface for word processing, J. Occup. Theory Health Care 11 (1997), 75–81.Google Scholar

  • [7]

    S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition, IEEE Trans. Audio, Speech, and Language Processing 28 (1980), 357–366.Google Scholar

  • [8]

    S. Devis, P. Howell and J. Batrip, The UCLASS archive of stuttered speech, J. Speech Lang. Hear. Res. 52 (2009), 556–569.Google Scholar

  • [9]

    L. J. Ferrier, N. Jarell, T. Carpenter and C. Shane, A case study of a dysarthric speaker using the Dracgin dictate speech recognition system, J. Comp. User. Speech Hear. 8 (1992), 33–52.Google Scholar

  • [10]

    C. Y. Fook, H. Muthusamy, L. S. Chee, S. B. Yaacob and A. H. B. Adom, Comparison of speech parameterization techniques for the classification of speech disfluencies, Turk. J. Elec. Eng. Comput. Sci. 21, 1983–1994, 2013.Google Scholar

  • [11]

    M. Hariharan, L. S. Chee, O. C. A. and S. Yaacob, Classification of speech dysfluencies using LPC based parameterization techniques, J. Med. Syst. 36 (2012), 1821–1830.Web of ScienceGoogle Scholar

  • [12]

    P. Howell and M. Huckvale, Facilities to assist people to research into stammered speech, stammering research, An On-line Journal Published by the British Stammering Association, pp. 130–242, 2004.Google Scholar

  • [13]

    P. Howell, S. Sackin and K. Glenn, Automatic recognition of repetitions and prolongations in stuttered speech, in: Proc. of the First World Congress on Fluency Disorders, pp. 372–374, 1995.Google Scholar

  • [14]

    P. Howell, S. Sackin and K. Glenn, Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: II. ANN recognition of repetitions and prolongations with supplied word segment markers, J. Speech Lang. Hear. Res. 40 (1997), 1085–1096.Google Scholar

  • [15]

    W. Johnson, R. M. Boehmler, W. G. Dahlstrom, F. L. Darley, L. D. Goodstein, J. A. Kools, J. N. Neeley, W. F. Prather, D. Sherman, C. G. Thurman, W. D. Trotter, D. Williams and M. A. Young, The onset of stuttering; research findings and implications, University of Minnesota Press, Minneapolis, 1959.Google Scholar

  • [16]

    Z. Ju and H. Liu, A unified Fuzzy framework for human hand motion recognition, IEEE Trans. Fuzzy Syst. 19 (2011), 901–913.Google Scholar

  • [17]

    Z. Ju and H. Liu, Fuzzy Gaussian mixture models, Pattern Recog. 45 (2012), 1146–1158.Google Scholar

  • [18]

    D. Kully and E. Boberg, An investigation of interclinic agreement in the identification of fluent and stuttered syllables, J. Fluency Disord. 13 (1988), 309–318.Google Scholar

  • [19]

    N. M. Laird, A. P. Dempster and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion), J R Stat. Soc. B 39 (1980), 1–38.Google Scholar

  • [20]

    Y. Liu, M. Russell and M. Carey, The role of dynamic features in text-dependent and independent speaker verification, in: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1, pp. 669–672, 2006.Google Scholar

  • [21]

    J. Palfy, Analysis of dysfluencies by computational intelligence, Inf. Sci. Technol. Bull. ACM Slovakia 6 (2014), 1–15.Google Scholar

  • [22]

    L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice Hall Signal Processing Series, Pearson Education with Dorlen Kindersley, India, 1993.Google Scholar

  • [23]

    K. M. Ravikumar, B. Reddy, R. Rajagopal and H. C. Nagaraj, Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies, in: Proc. of World Academy Science, Engineering and Technology, pp. 270–273, 2008.

  • [24]

    K. M. Ravikumar, R. Rajagopal and H. C. Nagaraj, An approach for objective assessment of stuttered speech using MFCC features, Digital Signal Process. 9, pp. 19–24, 2009.Google Scholar

  • [25]

    R. A. Redner and H. F. Walker, Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 (1984), 195–239.Google Scholar

  • [26]

    D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, J. Speech Commun. 17 (1995), 91–108.Google Scholar

  • [27]

    D. Reynolds, Gaussian mixture models, MIT Lincoln Laboratory, 244 Wood St., Lexington, MA 02140, USA.

  • [28]

    D. A. Reynolds, R. C. Rose and M. J. T. Smith, PC-based TMS320C30 implementation of the Gaussian mixture model text-independent speaker recognition system, in: Proc. of International Conference on Signal Processing Applications Technology, Boston, USA, pp. 967–973, 1992.Google Scholar

  • [29]

    R. T. Ritchings, M. McGillion and C. J. Moore, Pathological voice quality assessment using artificial neural networks, J. Med. Eng. Phys. 24, ISSN 1350-4533 (2002), 561–564.Google Scholar

  • [30]

    P. Stoica and R. Moses, Spectral analysis of signals, Prentice Hall, I edition, Prentice Hall, Upper Saddle River, New Jersey, 2005.Google Scholar

  • [31]

    I. Świetlicka, W. Kuniszyk-Jóźkowiak and E. Smołka, Artificial neural networks in the disabled speech analysis, in: Computer Recognition System, Springer Berlin/Heidelberg, 57/2009, pp. 347–354, 2009.Google Scholar

  • [32]

    T. Tian-Swee, L. Helbin, A. K. Ariff, T. Chee-Ming and S. H. Salleh, Application of Malay speech technology in Malay Speech Therapy Assistance Tools, in: Proc. of International Conference on Intelligent and Advanced Systems (ICIAS), pp. 330–334, 2007.Google Scholar

  • [33]

    J. Wang and C. Jo, Vocal folds disorder detection using pattern recognition methods, in: Proc. of 29th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, pp. 3253–3256, 2007.Google Scholar

  • [34]

    M. Wisniewski, W. Kuniszyk-Jzkowiak, E. Smoka and W. Suszynski, Automatic detection of prolonged fricative phonemes with the hidden Markov models approach, J. Med. Inform. Technol. 11 (2007), 293–297.Google Scholar

  • [35]

    M. Wisniewski, W. Kuniszyk-Jzkowiak, E. Smoka and W. Suszynski, Improved approach to automatic detection of speech disorders based on the hidden Markov models approach, J. Med. Inform. Technol. 15/2010, ISSN 1642-6037, 145–152, 2010.Google Scholar

  • [36]

    C. F. Wu, On the convergence properties of the EM algorithm, Ann. Stat. 11 (1983), 95–103.Google Scholar

  • [37]

    E. Yairi and R. Curlee, The clinical-research connection in early childhood stuttering, Am. J. Speech-Lang. Pathol. 6 (1997), 85–86.Google Scholar

  • [38]

    J. S. Yaruss, Clinical measurement of stuttering behaviors, J. Contemp. Issues Commun. Sci. Disord. 24 (1997), 33–44.Google Scholar

About the article

Corresponding author: P. Mahesha, Department of Computer Science and Engineering, S.J. College of Engineering, Mysore, India, e-mail:

Received: 2014-07-11

Published Online: 2015-06-12

Published in Print: 2016-07-01

Citation Information: Journal of Intelligent Systems, Volume 25, Issue 3, Pages 387–399, ISSN (Online) 2191-026X, ISSN (Print) 0334-1860, DOI: https://doi.org/10.1515/jisys-2014-0140.

Export Citation

©2016 by De Gruyter.Get Permission

Comments (0)

Please log in or register to comment.
Log in