Speech and Gaze Control for Desktop Environments
[chapter in] Multimodal Human Computer Interaction and Pervasive Services, edited by P. Grifoni, Information Science Reference (USA), ISBN: 978-1-60566-386-9
ABSTRACT
This chapter illustrates a multimodal system based on the integration of speech- and gaze- based inputs for interaction with a real desktop environment. In this system, multimodal interactions aim at overcoming the instrinsic limit of each input channel taken alone. The chapter introduces the main eye tracking and speech recognition technologies, and describes a multimodal system that integrates the two input channels by generating a real-time vocal grammar based on gaze-driven contextual information. The proposed approach shows how the combined used of auditive and visual clues actually permits to achieve mutual disambiguation in the interaction with a real desktop environment. As a result, the system enables the use of low cost audio-visual devices for every day tasks even when traditional pointing devices, such as a keyboard or a mouse, are unsuitable for use with a Personal Computer.
[CCPe09] E. Castellina, F. Corno, P. Pellegrino, "Speech and Gaze Control for Desktop Environments," [chapter in] Multimodal Human Computer Interaction and Pervasive Services, edited by P. Grifoni, Information Science Reference (USA), ISBN: 978-1-60566-386-9