tailieunhanh - Báo cáo khoa học: "Learning Semantic Correspondences with Less Supervision"

A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state. We show that our model generalizes across three domains of increasing difficulty—Robocup sportscasting, weather forecasts (a new domain), and NFL recaps. . | Learning Semantic Correspondences with Less Supervision Percy Liang UC Berkeley pliang@ Michael I. Jordan UC Berkeley jordan@ Dan Klein UC Berkeley klein@ Abstract A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting we present a generative model that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state. We show that our model generalizes across three domains of increasing difficulty Robocup sportscasting weather forecasts a new domain and NFL recaps. 1 Introduction Recent work in learning semantics has focused on mapping sentences to meaning representations . some logical form given aligned sen-tence meaning pairs as training data Ge and Mooney 2005 Zettlemoyer and Collins 2005 Zettlemoyer and Collins 2007 Lu et al. 2008 . However this degree of supervision is unrealistic for modeling human language acquisition and can be costly to obtain for building large-scale broadcoverage language understanding systems. A more flexible direction is grounded language acquisition learning the meaning of sentences in the context of an observed world state. The grounded approach has gained interest in various disciplines Siskind 1996 Yu and Ballard 2004 Feldman and Narayanan 2004 Gorniak and Roy 2007 . Some recent work in the NLP community has also moved in this direction by relaxing the amount of supervision to the setting where each sentence is paired with a small set of candidate meanings Kate and Mooney 2007 Chen and Mooney 2008 . The goal of this paper is to reduce the amount of supervision even further. We assume that we are given a world state represented by a set of records along with a text an unsegmented sequence of words. For example in the weather forecast domain Section