tailieunhanh - Báo cáo khoa học: "Fast Unsupervised Incremental Parsing"

This paper describes an incremental parser and an unsupervised learning algorithm for inducing this parser from plain text. The parser uses a representation for syntactic structure similar to dependency links which is well-suited for incremental parsing. In contrast to previous unsupervised parsers, the parser does not use part-of-speech tags and both learning and parsing are local and fast, requiring no explicit clustering or global optimization. | Fast Unsupervised Incremental Parsing Yoav Seginer Institute for Logic Language and Computation Universiteit van Amsterdam Plantage Muidergracht 24 1018TV Amsterdam The Netherlands yseginer@ Abstract This paper describes an incremental parser and an unsupervised learning algorithm for inducing this parser from plain text. The parser uses a representation for syntactic structure similar to dependency links which is well-suited for incremental parsing. In contrast to previous unsupervised parsers the parser does not use part-of-speech tags and both learning and parsing are local and fast requiring no explicit clustering or global optimization. The parser is evaluated by converting its output into equivalent bracketing and improves on previously published results for unsupervised parsing from plain text. 1 Introduction Grammar induction the learning of the grammar of a language from unannotated example sentences has long been of interest to linguists because of its relevance to language acquisition by children. In recent years interest in unsupervised learning of grammar has also increased among computational linguists as the difficulty and cost of constructing annotated corpora led researchers to look for ways to train parsers on unannotated text. This can either be semi-supervised parsing using both annotated and unannotated data McClosky et al. 2006 or unsupervised parsing training entirely on unannotated text. The past few years have seen considerable improvement in the performance of unsupervised 384 parsers Klein and Manning 2002 Klein and Manning 2004 Bod 2006a Bod 2006b and for the first time unsupervised parsers have been able to improve on the right-branching heuristic for parsing English. All these parsers learn and parse from sequences of part-of-speech tags and select for each sentence the binary parse tree which maximizes some objective function. Learning is based on global maximization of this objective function over the whole corpus. In .