Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Paladyn, Journal of Behavioral Robotics

Editor-in-Chief: Schöner, Gregor

1 Issue per year

Open Access
Online
ISSN
2081-4836
See all formats and pricing
More options …

A Methodology for Recognition of Emotions Based on Speech Analysis, for Applications to Human-Robot Interaction. An Exploratory Study

Mohammad Rabiei / Alessandro Gasparetto
Published Online: 2014-04-09 | DOI: https://doi.org/10.2478/pjbr-2014-0001

Abstract

A system for recognition of emotions based on speech analysis can have interesting applications in human-robot interaction. In this paper, we carry out an exploratory study on the possibility to use a proposed methodology to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust) based on phonetic and acoustic properties of emotive speech with the minimal use of signal processing algorithms. We set up an experimental test, consisting of choosing three types of speakers, namely: (i) five adult European speakers, (ii) five Asian (Middle East) adult speakers and (iii) five adult American speakers. The speakers had to repeat 6 sentences in English (with durations typically between 1 s and 3 s) in order to emphasize rising-falling intonation and pitch movement. Intensity, peak and range of pitch and speech rate have been evaluated. The proposed methodology consists of generating and analyzing a graph of formant, pitch and intensity, using the open-source PRAAT program. From the experimental results, it was possible to recognize the basic emotions in most of the cases

Keywords : Emotion; Human-Robot Interaction; Speech Analysis

References

  • [1] D. Ververidis, C. Kotropoulos, Emotional speech recognition - resources features and methods, Journal of Speech Communication, 48, 2006, 1162-1181CrossrefGoogle Scholar

  • [2] F. Metze, T. Polzehl and M. Wagner, Fusion of acoustic and linguistic speech features for emotion detection, Procedings of. International Semantic Computing Conference on (14-16 Sep 2009, Berleley, CA, USA)Google Scholar

  • [3] F. Metze, A. Batliner, F. Eyben, T. Polzehl and B. Schuller, Emotion recognition using imperfect speech recognition. Procedings. Annual Conference. of the International Speech Communication Association (2009, Makuhari, Japan), 1-6Google Scholar

  • [4] C. Peter, Affect and Emotion in Human-Computer Interaction, From Theory to Applications, 6 (2008), 48-68Google Scholar

  • [5] B. Schuller, Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the frst challenge - Speech Communication, 53 (2011), 1062-1087CrossrefWeb of ScienceGoogle Scholar

  • [6] R. Cowie, N. Sussman and A. Ben-Ze’e, Emotions: concepts and defnitions, The HUMAINE Handbook, Springer, 2010Google Scholar

  • [7] N.G. Ward, A Vega, Studies in the use of time into utterance as a predictive feature for language modeling, Technical Report UTEP-CS- 22, University of Texas, Department of Computer Science, 2010Google Scholar

  • [8] Wu. Dongrui, D. Narayanan and S. Shrikanth, Acoustic feature analysis in speech emotion primitive’s estimation, Procedings of. International Inter Speech Conference, (2010, Makuhari, Chiba, Japan), 2010, 785-788Google Scholar

  • [9] P. Ekman, Are there basic emotions? Psychological Review, 99 (1992), 550-553PubMedCrossrefGoogle Scholar

  • [10] N. Fox, If it’s not left, it’s right: Electroencephalograph asymmetry and the development of emotion, Am. Psychol, 46 (1991), 863-872Google Scholar

  • [11] P. Ekman, Darwin’s Compassionate View of Human Nature, JAMA February 10, 303 (2010), 557-558Web of ScienceGoogle Scholar

  • [12] T. Bnziger, K. Scherer, The role of intonation in emotional expressions, Speech Common, 46 (2005), 252-267Google Scholar

  • [13] A. S. Cohen, K. S. Minor and G. Najolia, laboratory-based procedure for measuring emotional expression from natural speech, Journal of Behavior Research Methods, Instruments and Computers, 41 (2009), 204-212Google Scholar

  • [14] S. Paulmann, S. Jessen and S.A. Kotz, Investigating the multimodal nature of human communication, Journal of Psychophysiol. 23,2 (2009), 63-76Google Scholar

  • [15] M. Swerts, E. Krahmer, Facial expression and prosodic prominence: Efects of modality and facial area. Journal of Phonetics, 36,2 (2008), 219-238Web of ScienceGoogle Scholar

  • [16] T. Sobol-Shikler, P. Robinson, Classi?cation of complex information: Inference of co-occurring afective states from their expressions in speech, IEEE Trans, 32, 7 (2010), 1284-1297Google Scholar

  • [17] Lee. Sungbok, S. Narayanan, Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection, IEEE Transactions on,17,4 (2009), 582-596Google Scholar

  • [18] F. Enos, Detecting Deception in Speech, PhD thesis, Submitted to Columbia University, 2010, URL http://www.cs.columbia. edu/wfrank/enosfthesis.pdfGoogle Scholar

  • [19] T.F. Yap, J. Epps, E. Ambikairajah and E.H.C. Choi, Formant frequencies under cognitive load: efects and classifcation, EURASIP Journal on Advances in Signal Processing (In press), ID: 219253Google Scholar

  • [20] A. Jacob, P. Mythili, socio friendly approach to the analysis of emotive speech, Procedings of International Conference on Communication Technology and System Design, 2012, 577-583Google Scholar

  • [21] N.G. Ward, A. Vega, Towards the use of inferred cognitive states in language modeling, 11th IEEE Workshop on Automatic Speech Recognition and Understanding (2009, Merano, Italy), IEEE, 2010, 323-326Google Scholar

  • [22] L. I. Perlovsky, Toward physics of the mind: Concepts, emotions, consciousness, and symbols, Journal of Physics of Life Reviews, 3 (2006b), 22-55Google Scholar

  • [23] P. Boersma, D. Weenink, Praat (Version 4.5.25)Google Scholar

  • [24] , Latest version available for download from www.praat.orgGoogle Scholar

  • [25] Z. Zeng, M. Pantic, G.I. Roisman and T.S. Huang, A survey of affect recognition methods: Audio, Visual, and Spontaneous expressions, IEEE Trans, Pattern Anal.Mach. Intell, 31 (2009), 39-58Web of ScienceGoogle Scholar

  • [26] T. Polzehl, A. Schmitt and F. Metze, Salient features for anger recognition in German and English IVR systems, Spoken Dialogue Systems Technology and Design, Springer, Boston, 2010, 83-105Google Scholar

  • [27] M. Grimm, K. Kroschel, E. Mower and S. Narayanan, Primitives based evaluation and estimation of emotions in speech, Journal of Speech Communication, 49 (2007a), 787-800CrossrefGoogle Scholar

  • [28] J. Kaiser, On a simple algorithm to calculate the ‘energy’ of a signal, International Conference on Acoustics, Speech and Signal Processing, 1 (1990), 381-384Google Scholar

  • [29] B. Kreifelts, T. Ethofer, T. Shiozawa, DWildgruber andW. Grodd, Cerebral representation of non-verbal emotional perception, fMRIreveals audiovisual integration area between voice- and face-sensitive regions in the superior temporal sulcus, Journal of Neuropsychologia 47, 14 (2009), 3059-3066CrossrefGoogle Scholar

  • [30] D. Bitouk, R. Verma and A. Nenkova, Class-level spectral features for emotion recognition, Journal of Speech Communication, 52 (2010), 613-625CrossrefGoogle Scholar

  • [31] L. Zhang, J. Barnden, Afect sensing using linguistic, semantic and cognitive cues in multi-threaded improvisational dialogue, Journal of Cognitive Computation, 4 (2012), 436-459 Google Scholar

About the article

Published Online: 2014-04-09

Published in Print: 2014-01-01


Citation Information: Paladyn, Journal of Behavioral Robotics, ISSN (Online) 2081-4836, DOI: https://doi.org/10.2478/pjbr-2014-0001.

Export Citation

© 2014 Mohammad Rabiei and Alessandro Gasparetto. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Comments (0)

Please log in or register to comment.
Log in