tailieunhanh - Báo cáo khoa học: "Edit Machines for Robust Multimodal Language Processing"
Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, handcrafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. Spoken language (speech-only) understanding systems have addressed this issue of lack of robustness of hand-crafted grammars by exploiting classification techniques to extract fillers of a frame representation. | Edit Machines for Robust Multimodal Language Processing Srinivas Bangalore AT T Labs-Research 180 Park Ave Florham Park NJ 07932 srini@ Michael Johnston AT T Labs-Research 180 Park Ave Florham Park NJ 07932 johnston@ Abstract Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However handcrafted multimodal grammars can be brittle with respect to unexpected erroneous or disfluent inputs. Spoken language speech-only understanding systems have addressed this issue of lack of robustness of hand-crafted grammars by exploiting classification techniques to extract fillers of a frame representation. In this paper we illustrate the limitations of such classification approaches for multimodal integration and understanding and present an approach based on edit machines that combine the expressiveness of multimodal grammars with the robustness of stochastic language models of speech recognition. We also present an approach where the edit operations are trained from data using a noisy channel model paradigm. We evaluate and compare the performance of the hand-crafted and learned edit machines in the context of a multimodal conversational system MATCH . 1 Introduction Over the years there have been several multimodal systems that allow input and or output to be conveyed over multiple channels such as speech graphics and gesture for example put that there Bolt 1980 CUBRICON Neal and Shapiro 1991 QuickSet Cohen et al. 1998 SmartKom Wahlster 2002 Match Johnston et al. 2002 . Multimodal integration and interpretation for such interfaces is elegantly expressed using multimodal grammars Johnston and Bangalore 2000 . These grammars support composite multimodal inputs by aligning speech input words and gesture input represented as sequences of gesture symbols while expressing the relation between the speech and gesture input and their combined semantic representation. In Bangalore and Johnston 2000 Johnston .
đang nạp các trang xem trước