tailieunhanh - Báo cáo khoa học: "Grammar Approximation by Representative Sublanguage: A New Model for Language Learning"

We propose a new language learning model that learns a syntactic-semantic grammar from a small number of natural language strings annotated with their semantics, along with basic assumptions about natural language syntax. We show that the search space for grammar induction is a complete grammar lattice, which guarantees the uniqueness of the learned grammar. | Grammar Approximation by Representative Sublanguage A New Model for Language Learning Smaranda Muresan Institute for Advanced Computer Studies University of Maryland College Park MD 20742 USA smara@ Owen Rambow Center for Computational Learning Systems Columbia University New York NY 10027 USA rambow@ Abstract We propose a new language learning model that learns a syntactic-semantic grammar from a small number of natural language strings annotated with their semantics along with basic assumptions about natural language syntax. We show that the search space for grammar induction is a complete grammar lattice which guarantees the uniqueness of the learned grammar. 1 Introduction There is considerable interest in learning computational While much attention has focused on learning syntactic grammars either in a supervised or unsupervised manner recently there is a growing interest toward learning grammars parsers that capture semantics as well Bos et al. 2004 Zettlemoyer and Collins 2005 Ge and Mooney 2005 . Learning both syntax and semantics is arguably more difficult than learning syntax alone. In formal grammar learning theory it has been shown that learning from good examples or representative examples is more powerful than learning from all the examples Freivalds et al. 1993 . Haghighi and Klein 2006 show that using a handful of proto 1 This research was supported by the National Science Foundation under Digital Library Initiative Phase II Grant Number IIS-98-17434 Judith Klavans and Kathleen McKeown PIs . We would like to thank Judith Klavans for her contributions over the course of this research Kathy McKeown for her input and several anonymous reviewers for very useful feedback on earlier drafts of this paper. 832 types significantly improves over a fully unsupervised PCFG induction model their prototypes were formed by sequences of POS tags for example prototypical NPs were DT NN JJ NN . In this paper we present a new .