tailieunhanh - Báo cáo khoa học: "Improving Name Tagging by Reference Resolution and Relation Detection"

Information extraction systems incorporate multiple stages of linguistic analysis. Although errors are typically compounded from stage to stage, it is possible to reduce the errors in one stage by harnessing the results of the other stages. We demonstrate this by using the results of coreference analysis and relation extraction to reduce the errors produced by a Chinese name tagger. We use an N-best approach to generate multiple hypotheses and have them re-ranked by subsequent stages of processing. We obtained thereby a reduction of 24% in spurious and incorrect name tags, and a reduction of 14% in missed tags. . | Improving Name Tagging by Reference Resolution and Relation Detection Heng Ji Ralph Grishman Department of Computer Science New York University New York NY 10003 UsA heng j i@ grishman@ Abstract Information extraction systems incorporate multiple stages of linguistic analysis. Although errors are typically compounded from stage to stage it is possible to reduce the errors in one stage by harnessing the results of the other stages. We demonstrate this by using the results of coreference analysis and relation extraction to reduce the errors produced by a Chinese name tagger. We use an N-best approach to generate multiple hypotheses and have them re-ranked by subsequent stages of processing. We obtained thereby a reduction of 24 in spurious and incorrect name tags and a reduction of 14 in missed tags. 1 Introduction Systems which extract relations or events from a document typically perform a number of types of linguistic analysis in preparation for information extraction. These include name identification and classification parsing or partial parsing semantic classification of noun phrases and coreference analysis. These tasks are reflected in the evaluation tasks introduced for MUC-6 named entity coreference template element and MUC-7 template relation . In most extraction systems these stages of analysis are arranged sequentially with each stage using the results of prior stages and generating a single analysis that gets enriched by each stage. This provides a simple modular organization for the extraction system. Unfortunately each stage also introduces a certain level of error into the analysis. Furthermore these errors are compounded - for example errors in name recognition may lead to errors in parsing. The net result is that the final output relations or events may be quite inaccurate. This paper considers how interactions between the stages can be exploited to reduce the error rate. For example the results of coreference analysis or .