tailieunhanh - Báo cáo khoa học: "PRINCIPLE-BASED PARSING WITHOUT OVERGENERATION"

Overgeneration is the main source of computational complexity in previous principle-based parsers. This paper presents a message passing algorithm for principle-based parsing that avoids the overgeneration problem. This algorithm has been implemented in C + + and successfully tested with example sentences from (van Riemsdijk and Williams, 1986). 1. I n t r o d u c t i o n Unlike rule-based grammars that use a large number of rules to describe patterns in a language, Government-Binding (GB) Theory (Chomsky, 1981; Haegeman, 1991; van Riemsdijk and Williams, 1986) ezplains these patterns in terms of more foundmental and universal. | PRINCIPLE-BASED PARSING WITHOUT OVERGENERATION1 Dekang Lin Department of Computing Science University of Manitoba Winnipeg Manitoba Canada R3T 2N2 E-mail lindek@ Abstract Overgeneration is the main source of computational complexity in previous principle-based parsers. This paper presents a message passing algorithm for principle-based parsing that avoids the overgeneration problem. This algorithm has been implemented in CTT and successfully tested with example sentences from van Riemsdijk and Williams 1986 . 1. Introduction Unlike rule-based grammars that use a large number of rules to describe patterns in a language Government-Binding GB Theory Chomsky 1981 Haegeman 1991 van Riemsdijk and Williams 1986 explains these patterns in terms of more foundmental and universal principles. A key issue in building a principle-based parser is how to procedurally interpret the principles. Since GB principles are constraints over syntactic structures one way to implement the principles is to 1. generate candidate structures of the sentence that satisfy X-bar theory and subcategorization frames of the words in the sentence. 2. filter out structures that violates any one of the principles. 3. the remaining structures are accepted as parse trees of the sentence. This implementation of GB theory is very inefficient since there are a large number of structures being generated and then filtered out. The problem of producing too many illicit structures is called overgeneraiion and has been recognized as the culprit of computational difficulties in principle-based parsing Berwick 1991 . Many methods have been proposed to alleviate the overgeneration problem by detecting illicit structures as early as possible such as optimal ordering of principles Fong 1991 coroutining Dorr 1991 Johnson 1991 . 1 The author wishes to thank the anonymous referees for their helpful comments and suggestions. This research was supported by Natural Sciences and Engineering Research Council of

TÀI LIỆU LIÊN QUAN