tailieunhanh - Báo cáo khoa học: "A LANGUAGE-INDEPENDENT AN APHORARE SOLUTION SYSTEM FOR UNDERSTANDING MULTILINGUAL TEXTS"

This paper describes a new discourse module within our multilingual NLP system. Because of its unique data-driven architecture, the discourse module is language-independent. Moreover, the use of hierarchically organized multiple knowledge sources makes the module robust and trainable using discourse-tagged corpora. Separating discourse phenomena from knowledge sources makes the discourse module easily extensible to additional phenomena. | A LANGUAGE-INDEPENDENT ANAPHORA RESOLUTION SYSTEM FOR UNDERSTANDING MULTILINGUAL TEXTS Chinatsu Aone and Douglas McKee Systems Research and Applications SRA 2000 15th Street North Arlington VA 22201 aonec@ mckeed@sra. com Abstract This paper describes a new discourse module within our multilingual NLP system. Because of its unique data-driven architecture the discourse module is language-independent. Moreover the use of hierarchically organized multiple knowledge sources makes the module robust and trainable using discourse-tagged corpora. Separating discourse phenomena from knowledge sources makes the discourse module easily extensible to additional phenomena. 1 Introduction This paper describes a new discourse module within our multilingual natural language processing system which has been used for understanding texts in English Spanish and Japanese cf. 1 2 .1 The following design principles underlie the discourse module Language-independence No processing code depends on language-dependent facts. Extensibility It is easy to handle additional phenomena. Robustness The discourse module does its best even when its input is incomplete or wrong. Trainability The performance can be tuned for particular domains and applications. In the following we first describe the architecture of the discourse module. Then we discuss how its performance is evaluated and trained using discourse-tagged corpora. Finally we compare our approach to other research. 1 Our system has been used in several data extraction tasks and a prototype machine translation system. Diccouroe Module Figure 1 Discourse Architecture 2 Discourse Architecture Our discourse module consists of two discourse processing submodules the Discourse Administrator and the Resolution Engine and three discourse knowledge bases the Discourse Knowledge Source KB the Discourse Phenomenon KB and the Discourse Domain KB . The Discourse Administrator is a development-time tool for defining the three discourse KB s. The

TÀI LIỆU LIÊN QUAN