Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Paladyn, Journal of Behavioral Robotics

Editor-in-Chief: Schöner, Gregor

Covered by SCOPUS

CiteScore 2018: 2.17

SCImago Journal Rank (SJR) 2018: 0.336
Source Normalized Impact per Paper (SNIP) 2018: 1.707

ICV 2017: 99.90

Open Access
See all formats and pricing
More options …

Voice-awareness control for a humanoid robot consistent with its body posture and movements

Takuma Otsuka / Kazuhiro Nakadai
  • Honda Research Institute Japan, Co., Ltd., Wako, Saitama, 351-0114, Japan
  • Graduate School of Information Science and Engineering, Tokyo Institute of Technology
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Toru Takahashi / Kazunori Komatani / Tetsuya Ogata / Hiroshi G. Okuno
Published Online: 2010-03-31 | DOI: https://doi.org/10.2478/s13230-010-0009-x


This paper presents voice-awareness control consistent with robot’s head movements. For a natural spoken communication between robots and humans, robots must behave and speak the way humans expect them to. The consistency between the robot’s voice quality and its body motion is one of the most especially striking factors in naturalness of robot speech. Our control is based on a new model of spectral envelope modification for vertical head motion, and left-right balance modulation for horizontal head motion. We assume that a pitch-axis rotation, or a vertical head motion, and a yaw-axis rotation, or a horizontal head motion, effect the voice quality independently. The spectral envelope modification model is constructed based on the analysis of human vocalizations. The left-right balance model is established by measuring impulse responses using a pair of microphones. Experimental results show that the voice-awareness is perceivable in a robot-to-robot dialogue when the robots stand up to 150 cm away. The dynamic change in the voice quality is also confirmed in the experiment.

Keywords: voice awareness; robot speech signal control; 2D voice manipulation; source filter model; human robot interaction


  • [1] K. Aoki, T. Kamakura, and Y. Kumamoto. Parametric loudspeaker – characteristics of acoustic field and suitablemodulation of carrier ultrasound. Electronics and Communications in Japan (Part III: Fundamental Electronic Science), 74(9):76–82, 2007.Google Scholar

  • [2] P. Birkholz, D. Jackèl, and B. J. Kröger. Construction and control of a three-dimensional vocal tract model. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’06), pages 873–876, 2006.Google Scholar

  • [3] C. Breazeal and B. Scassellati. A context-dependent attention system for a social robot. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI99), pages 1146–1151, 1999.Google Scholar

  • [4] R. A. J. Clark, K. Richmond, and S. King. Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Communication, 49(4):317–330, 2007..Google Scholar

  • [5] ARNIS Sound Technologies Co., Ltd. Soundlocus. http://www.arns.com/english/tech1.html, 2009.

  • [6] R. Dillmann, R. Becher, and P. Steinhaus. ARMAR II - a learning and cooperative multimodal humanoid robot system. International Journal of Humanoid Robotics, 1(1):143–155, 2004.Google Scholar

  • [7] D. Erickson. Expressive speech: Production, perception and application to speech synthesis. Acoustical Science and Technology, 26(4):317–325, 2005.Google Scholar

  • [8] G. Fant. Acoustical Theory of Speech Production: With Calculations based on X-Ray Studies of Russian Articulations. Mouton, The Hague, The Netherlands, 1970.Google Scholar

  • [9] S. Fujie, D. Watanabe, Y. Ichikawa, H. Taniyama, K. Hosoya, Y. Matsuyama, and T. Kobayashi. Multi-modal integration for personalized conversation: Towards a humanoid in daily life. In 8th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2008), pages 617–622, Dec. 2008..Google Scholar

  • [10] E. T. Hall. Hidden Dimension. Doubleday Publishing, 1996.Google Scholar

  • [11] Z. Inanoglu and S. Young. Intonation modelling and adaptation for emotional prosody generation. Affective Computing and Intelligent Interaction, Lecture Notes in Computer Science 3784:286–293, 2005.Google Scholar

  • [12] Kawada Industries, Inc. Upper body humanoid robot HIRO. http://global.kawada.jp/mechatronics/hiro.html, 2009.

  • [13] ISO. ISO 226:2003: Acoustics – Normal equal-loudness-level contours. International Organization for Standardization, 2003.Google Scholar

  • [14] K. Kaneko, F. Kanehiro, S. Kajita, H. Hirukawa, T. Kawasaki, M. Hirata, K. Akachi, and T. Isozumi. Humanoid robot HRP-2. In IEEE International Conference on Robotics and Automation (ICRA-2004), volume 2, pages 1083–1090 Vol.2, 26-May 1, 2004.Google Scholar

  • [15] H. Kawahara, M. Morise, R. Nisimura, T. Irino, and H. Banno. Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’08), pages 3933–3936, 2008.Google Scholar

  • [16] H. Kawahara, R. Nisimura, T. Irino, M. Morise, T. Takahashi, and H. Banno. Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’09), pages 3377–3680, 2009.Google Scholar

  • [17] H. Kenmochi and H. Ohshita. Vocaloid – commercial singing synthesizer based on sample concatenation. In Proceedings of INTERSPEECH, pages 4010–4011, 2007.Google Scholar

  • [18] H. D. Kim. Binaural Active Audition for Humanoid Robots. PhD thesis, Graduate School of Informatics, Kyoto University, Sep. 2009.Google Scholar

  • [19] Y. Kubota, M. Yoshida, K. Komatani, T. Ogata, and H. G. Okuno. Design and implementation of 3D auditory scene visualizer towards auditory awareness with face tracking. In IEEE International Symposium on Multimedia (ISM2008), pages 468–476, Dec. 2008.Google Scholar

  • [20] D. Matsui, T. Minato, K. F. MacDorman, and H. Ishiguro. Generating natural motion in an android by mapping human motion. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2005), pages 3301–3308, Aug. 2005.Google Scholar

  • [21] K. Nakadai and H. Tsujino. Towards new human-humanoid communication: Listening during speaking by using ultrasonic directional speaker. In IEEE International Conference on Robots and Automation (ICRA-2005), pages 1483–1488, Apr. 2005.Google Scholar

  • [22] K. Nakadai, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino. An open source software system for robot audition HARK and its evaluation. In 8th IEEE-RAS International Conference on Humanoids (Humanoids 2008), pages 561–566, Dec. 2008.Google Scholar

  • [23] T. Otsuka, K. Nakadai, T. Takahashi, K. Komatani, T. Ogata, and H. G. Okuno. Voice quality manipulation for humanoid robots consistent with their head movements. In 9th IEEE-RAS International Conference on Humanoids (Humanoids-2009), pages 405–410, Dec. 2009.Google Scholar

  • [24] T. Otsuka, K. Nakadai, Toru Takahashi, K. Komatani, T. Ogata, and H. G. Okuno. Incremental Polyphonic Audio to Score Alignment using Beat Tracking for Singer Robots. In Proceedings of IEEE/RSJ Int’l Conference on Intelligent Robots and Systems, pages 2289–2296, 2009.Google Scholar

  • [25] T. Tasaki, S. Matsumoto, H. Ohba, M. Toda, K. Komatani, T. Ogata, and H. G. Okuno. Distance-based dynamic interaction of humanoid robot with multiple people. Innovations in Applied Artificial Intelligence, Lecture Notes in Artificial Intelligence 3533:111–120, 2005.Google Scholar

  • [26] A. Vurma and J. Ross. Where Is a Singer’s Voice if It Is Placed “Forward”. Journal of Voice, 16(3):383–391, 2002.CrossrefGoogle Scholar

About the article

Received: 2010-02-21

Accepted: 2010-03-19

Published Online: 2010-03-31

Published in Print: 2010-03-01

Citation Information: Paladyn, Journal of Behavioral Robotics, Volume 1, Issue 1, Pages 80–88, ISSN (Online) 2081-4836, DOI: https://doi.org/10.2478/s13230-010-0009-x.

Export Citation

© Takuma Otsuka et al.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Comments (0)

Please log in or register to comment.
Log in