tailieunhanh - Báo cáo khoa học: "A Decision-Based Approach to Rhetorical Parsing Daniel Marcu"
We present a shift-reduce rhetorical parsing algorithm that learns to construct rhetorical structures of texts from a corpus of discourse-parse action sequences. The algorithm exploits robust lexical, syntactic, and semantic knowledge sources. | A Decision-Based Approach to Rhetorical Parsing Daniel Marcu Information Sciences Institute and Department of Computer Science University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292-6601 marcu@ Abstract We present a shift-reduce rhetorical parsing algorithm that learns to construct rhetorical structures of texts from a corpus of discourse-parse action sequences. The algorithm exploits robust lexical syntactic and semantic knowledge sources. 1 Introduction The application of decision-based learning techniques over rich sets of linguistic features has improved significantly the coverage and performance of syntactic and to various degrees semantic parsers Simmons and Yu 1992 Magerman 1995 Hermjakob and Mooney 1997 . In this paper we apply a similar paradigm to developing a rhetorical parser that derives the discourse structure of unrestricted texts. Crucial to our approach is the reliance on a corpus of 90 texts which were manually annotated with discourse trees and the adoption of a shift-reduce parsing model that is well-suited for learning. Both the corpus and the parsing model are used to generate learning cases of how texts should be partitioned into elementary discourse units and how discourse units and segments should be assembled into discourse trees. 2 The Corpus We used a corpus of 90 rhetorical structure trees which were built manually using rhetorical relations that were defined informally in the style of Mann and Thompson 1988 30 trees were built for short personal news stories from the MUC7 coreference corpus Hirschman and Chinchor 1997 30 trees for scientific texts from the Brown corpus and 30 trees for editorials from the Wall Street Journal WSJ . The average number of words for each text was 405 in the MUC corpus 2029 in the Brown corpus and 878 in the WSJ corpus. Each MUC text was tagged by three annotators each Brown and WSJ text was tagged by two annotators. The rhetorical structure assigned to each text is a
đang nạp các trang xem trước