tailieunhanh - Báo cáo khoa học: "Predicting Evidence of Understanding by Monitoring User’s Task Manipulation in Multimodal Conversations"

The aim of this paper is to develop animated agents that can control multimodal instruction dialogues by monitoring user’s behaviors. First, this paper reports on our Wizard-of-Oz experiments, and then, using the collected corpus, proposes a probabilistic model of fine-grained timing dependencies among multimodal communication behaviors: speech, gestures, and mouse manipulations. A preliminary evaluation revealed that our model can predict a instructor’s grounding judgment and a listener’s successful mouse manipulation quite accurately, suggesting that the model is useful in estimating the user’s understanding, and can be applied to determining the agent’s next action. . | Predicting Evidence of Understanding by Monitoring User s Task Manipulation in Multimodal Conversations Yukiko I. Nakano Yoshiko Arimoto Tokyo University of Agriculture and Technology 2-24-16 Nakacho Koganei-shi Tokyo 184-8588 Japan nakano kmurata meno-moto @ Kazuyoshi Murata Yasuhiro Asa Tokyo University of Technology 1404-1 Katakura Hachioji Tokyo 192-0981 Japan ar@ Mika Enomoto Hirohiko Sagawa Central Research Laboratory Hitachi Ltd. 1-280 Higashi-koigakubo Kokub-unji-shi Tokyo 185-8601 Japan @ Abstract The aim of this paper is to develop animated agents that can control multimodal instruction dialogues by monitoring user s behaviors. First this paper reports on our Wizard-of-Oz experiments and then using the collected corpus proposes a probabilistic model of fine-grained timing dependencies among multimodal communication behaviors speech gestures and mouse manipulations. A preliminary evaluation revealed that our model can predict a instructor s grounding judgment and a listener s successful mouse manipulation quite accurately suggesting that the model is useful in estimating the user s understanding and can be applied to determining the agent s next action. 1 Introduction In face-to-face conversation speakers adjust their utterances in progress according to the listener s feedback expressed in multimodal manners such as speech facial expression and eye-gaze. In taskmanipulation situations where the listener manipulates objects by following the speaker s instructions correct task manipulation by the listener serves as more direct evidence of understanding Brennan 2000 Clark and Krych 2004 and affects the speaker s dialogue control strategies. Figure 1 shows an example of a software instruction dialogue in a video-mediated situation originally in Japanese . While the learner says Pointing gesture preparation Instructor That 204ms pause Instructor at the most 395ms pause Learner Mouse move .