Abstract
Humans can express their own emotion and estimate the emotional states of others during communication. This paper proposes a unified model that can estimate the emotional states of others and generate emotional self-expressions. The proposed model utilizes a multimodal restricted Boltzmann machine (RBM) —a type of stochastic neural network. RBMs can abstract latent information from input signals and reconstruct the signals from it. We use these two characteristics to rectify issues affecting previously proposed emotion models: constructing an emotional representation for estimation and generation for emotion instead of heuristic features, and actualizing mental simulation to infer the emotion of others from their ambiguous signals. Our experimental results showed that the proposed model can extract features representing the distribution of categories of emotion via self-organized learning. Imitation experiments demonstrated that using our model, a robot can generate expressions better than with a direct mapping mechanism when the expressions of others contain emotional inconsistencies.Moreover, our model can improve the estimated belief in the emotional states of others through the generation of imaginary sensory signals from defective multimodal signals (i.e., mental simulation). These results suggest that these abilities of the proposed model can facilitate emotional human–robot communication in more complex situations.
References
[1] C. Breazeal, D. Buchsbaum, G. J. Daphna, D. Gatenby, B. Blumberg, Learning From and About Others: Towards Using Imitation to Bootstrap the Social Understanding of Others by Robots, Artificial life, 11, 1–2, 31–62 (2005) 10.1162/1064546053278955Search in Google Scholar
[2] A. Andra, P. Robinson, An android head for social-emotional intervention for children with autism spectrum conditions, Affective Computing and Intelligent Interaction, 183–190, Springer (2011) 10.1007/978-3-642-24571-8_19Search in Google Scholar
[3] G. Trovato, M. Zecca, T. Kishi, N. Endo, K. Hashimoto, A. Takanishi, GENERATION OF HUMANOID ROBOT’S FACIAL EXPRESSIONS FOR CONTEXT-AWARE COMMUNICATION, International Journal of Humanoid Robotics, 10, 1, 1350013 (2013) 10.1142/S0219843613500138Search in Google Scholar
[4] T. Kishi, T. Kojima, N. Endo, M. Destephe, T. Otani, L. Jamone, P. Kryczka, G. Trovato, K. Hashimoto, S. Cosentino, A. Takanishi, Impression Survey of the Emotion Expression Humanoid Robot with Mental Model based Dynamic Emotions, IEEE International Conference on Robotics and Automation, 1663–1668 (2013) 10.1109/ICRA.2013.6630793Search in Google Scholar
[5] F. Hegel, T. Spexard, B. Wrede, G. Horstmann, T. Vogt, Playing a different imitation game: Interaction with an Empathic Android Robot, IEEE-RAS International Conference on Humanoid Robots, 56–61 (2006) 10.1109/ICHR.2006.321363Search in Google Scholar
[6] C. Breazeal, L. Aryananda, Recognition of Affective Communicative Intent in Robot-Directed Speech, Autonomous robots, 12, 1, 83–104 (2002) Search in Google Scholar
[7] C. Breazeal, Emotion and sociable humanoid robots, International Journal of Human-Computer Studies, 59, 1 119–155 (2003) 10.1016/S1071-5819(03)00018-1Search in Google Scholar
[8] Y. Matsui, M. Kanoh, S. Kato, T. Nakamura, H. Itoh, A Model for Generating Facial Expressions Using Virtual Emotion Based on Simple Recurrent Network, Journal of Advanced Computational Intelligence and Intelligent Informatics, 14, 5, 453–463 (2010) Search in Google Scholar
[9] M. Kanoh, S. Kato, H. Itoh, Facial Expressions Using Emotional Space in Sensitivity Communication Robot "Iffiot", IEEE/RSJ International Conference on Intelligent Robots and Systems, 1586–1591 (2004) Search in Google Scholar
[10] M. Kanoh, S. Iwata, S. Kato, H. Itoh, EMOTIVE FACIAL EXPRESSIONS OF SENSITIVITY COMMUNICATION ROBOT "IFBOT", Kansei Engineering Internationa, 5, 3, 35–42 (2005) Search in Google Scholar
[11] I. Lütkebohle, F. Hegel, S. Schulz, M. Hackel, B. Wrede, S. Wachsmuth, G. Sagerer, The Bielefeld Anthropomorphic Robot Head "Flobi", IEEE International Conference on Robotics and Automation, 3384–3391 (2010) 10.1109/ROBOT.2010.5509173Search in Google Scholar
[12] A. Lim, H.G. Okuno, The MEI Robot: Towards Using Motherese to Develop Multimodal Emotional Intelligence, IEEE Transactions on Autonomous Mental Development, 6, 2, 126–138 (2014) Search in Google Scholar
[13] A. Lim, H.G. Okuno, A Recipe for Empathy. Integrating the Mirror System, Insula, Somatosensory Cortex and Motherese, International Journal of Social Robotics, 7, 1, 35–49 (2015) 10.1007/s12369-014-0262-ySearch in Google Scholar
[14] P. Ekman,W. V. Friesen, J. C. Hager, The Facial Action Coding System (2002) Search in Google Scholar
[15] G. di Pellegrino, L. Fadiga, L. Fogassi, V. Gallese, G. Rizzolatti, Understanding motor events: a neurophysiological study, Experimental brain research, 91, 1, 176–180 (1992) Search in Google Scholar
[16] M. Iacoboni, Imitation, Empathy, and Mirror Neurons, Annual review of psychology, 60, 653–670 (2009) 10.1146/annurev.psych.60.110707.163604Search in Google Scholar
[17] V. Gallese A. Goldman, Mirror neurons and the simulation theory of mind-reading, Trends in cognitive sciences, 2, 12, 493– 501 (1998) 10.1016/S1364-6613(98)01262-5Search in Google Scholar
[18] F.V. Overwallem K. Baetens, Understanding others’ actions and goals by mirror and mentalizing systems: A meta-analysis, Neuroimage, 48, 3, 564–584 (2009) 10.1016/j.neuroimage.2009.06.009Search in Google Scholar PubMed
[19] Y. Kim, H. Lee, E. Mower Provost, DEEP LEARNING FOR ROBUST FEATURE GENERATION IN AUDIOVISUAL EMOTION RECOGNITION, IEEE International Conference on Acoustics, Speech and Signal Processing, 3687–3691 (2013) 10.1109/ICASSP.2013.6638346Search in Google Scholar
[20] G. E. Hinton, R. Salakhutdinov, Reducing the Dimensionality of Datawith Neural Networks, Science, 313, 5786, 504–507 (2006) Search in Google Scholar
[21] G. Hinton, Technical report, Department of Computer Science University of Toronto (2010) Search in Google Scholar
[22] S. Sukhbaatar, T. Makino, K. Aihara, T. Chikayama, Robust Generation of Dynamical Patterns in Human Motion by a Deep Belief Nets, Asian Conference on Machine Learning, 231–246 (2011) Search in Google Scholar
[23] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A.Y. Ng, Multimodal Deep Learning, Proceedings of the 28th international conference on machine learning, 689–696 (2011) Search in Google Scholar
[24] N. Srivastava, R. Salakhutdinov,Multimodal Learningwith Deep Belief Nets, International Conference on Machine Learning Workshop (2012) Search in Google Scholar
[25] N. Srivastava, R. Salakhutdinov,Multimodal Learningwith Deep BoltzmannMachines, Journal ofMachine Learning Research, 15, 1, 2949–2980 (2014) Search in Google Scholar
[26] K. H. Cho, A. Ilin, T. Raiko, Improved Learning of Gaussian– Bernoulli Restricted Boltzmann Machines, Artiffcial Neural Networks and Machine Learning, 10–17 (2011) 10.1007/978-3-642-21735-7_2Search in Google Scholar
[27] L.M. Oberman, P. Winkielman, V.S. Ramachandra, Face to face: Blocking facial mimicry can selectively impair recognition of emotional expressions, Social neuroscience, 2, 3–4, 167–178 (2007) 10.1080/17470910701391943Search in Google Scholar PubMed
[28] C. Busso, M. Bulut, C.C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J.N. Chang, S. Lee, and S.S. Narayanan, IEMOCAP: Interactive emotional dyadic motion capture database, Journal of Language Resources and Evaluation, 42, 4, 335–359 (2008) Search in Google Scholar
[29] E. Mower, M.J. Matarić, S.S. Narayanan, A Framework for Automatic Human Emotion Classification Using Emotion Profiles, IEEE Transactions on Audio, Speech, and Language Processing, 19, 5, 1057–1070 (2011) Search in Google Scholar
[30] A. Mehrabian, Silent messages, 3rd edition (Wadsworth Belmont, CA, 1971) Search in Google Scholar
[31] M.S. Beauchamp, N.E. Yasar, R.E. Frye, T. Ro, Touch, sound and vision in human superior temporal sulcus, NeuroImage, 41, 3, 1011–1020 (2008) Search in Google Scholar
[32] S. Campanella, P. Belin, Integrating face and voice in person perception, Trends in Cognitive sciences, 11, 535–543 (2007) Search in Google Scholar
[33] R. Watoson, M. Latinus, T. Noguchi, O. Garrod, F. Crabbe, P. Belin, Crossmodal adaptation in right posterior superior temporal sulcus during face-voice emotional integration, The Journal of Neuroscience, 34, 6813–6821 (2014) Search in Google Scholar
[34] J. Russell, A circumplex model of affect, Journal of personality and social psychology 39, 1161 (1980) 10.1037/h0077714Search in Google Scholar
[35] T. Horii, Y. Nagai, M. Asada, Touch and emotion:Modeling of developmental differentiation of emotion lead by tactile dominance, IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (2013) 10.1109/DevLrn.2013.6652538Search in Google Scholar
© 2016 Takato Horii et al.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.