tailieunhanh - Báo cáo khoa học: "Text-level Discourse Parsing with Rich Linguistic Features"

In this paper, we develop an RST-style textlevel discourse parser, based on the HILDA discourse parser (Hernault et al., 2010b). We significantly improve its tree-building step by incorporating our own rich linguistic features. We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions. | Text-level Discourse Parsing with Rich Linguistic Features Vanessa Wei Feng Department of Computer Science University of Toronto Toronto ON M5S 3G4 Canada weifeng@ Graeme Hirst Department of Computer Science University of Toronto Toronto ON M5S 3G4 Canada gh@ Abstract In this paper we develop an RST-style textlevel discourse parser based on the HILDA discourse parser Hernault et al. 2010b . We significantly improve its tree-building step by incorporating our own rich linguistic features. We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions. 1 Introduction In a well-written text no unit of the text is completely isolated interpretation requires understanding the unit s relation with the context. Research in discourse parsing aims to unmask such relations in text which is helpful for many downstream applications such as summarization information retrieval and question answering. However most existing discourse parsers operate on individual sentences alone whereas discourse parsing is more powerful for text-level analysis. Therefore in this work we aim to develop a textlevel discourse parser. We follow the framework of Rhetorical Structure Theory Mann and Thompson 1988 and we take the HILDA discourse parser Her-nault et al. 2010b as the basis of our work because it is the first fully implemented text-level discourse parser with state-of-the-art performance. We significantly improve the performance of Hilda s treebuilding step introduced in Section below by incorporating rich linguistic features Section . In our experiments Section 6 we also analyze the 60 difficulty with extending traditional sentence-level discourse parsing to text-level parsing by comparing discourse parsing performance under different discourse conditions. 2 Discourse-annotated corpora The RST Discourse Treebank Rhetorical .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN