7 A Multimodal Control System for Autonomous Vehicles Using Speech, Gesture, and Gaze Recognition

Norihide Kitaoka, Takuma Nakagawa, Ryota Nishimura, Yoshio Ishiguro, Shin’ichi Kojima and Shin Ohsuga


The development of autonomous vehicles has recently become the focus of much research. One area of investigation is the development of systems that will allow people to easily direct autonomous vehicles without any special technical skills or training. Human-machine interface technologies are likely to be used to achieve this objective. In this study, we propose an intuitive, multimodal interface system that uses speech, gesture, and gaze recognition to transmit user commands to the vehicle. We designed the multimodal understanding and dialog control components of the interface system separately, using finite-state transducers (FSTs), a technique which is also used to control conventional automated dialog systems. Our multimodal understanding and dialog control components can be thought of as a cascade of two separate transducers, but since cascading transducers can be combined into one transducer, our system is actually driven by a single FST. During trials, our proposed interface system allowed users to successfully direct and operate an autonomous vehicle using only speech, head movement, and gaze direction.

