tailieunhanh - Báo cáo khoa học: "Factorizing Complex Models: A Case Study in Mention Detection"

As natural language understanding research advances towards deeper knowledge modeling, the tasks become more and more complex: we are interested in more nuanced word characteristics, more linguistic properties, deeper semantic and syntactic features. One such example, explored in this article, is the mention detection and recognition task in the Automatic Content Extraction project, with the goal of identifying named, nominal or pronominal references to real-world entities—mentions— and labeling them with three types of information: entity type, entity subtype and mention type. . | Factorizing Complex Models A Case Study in Mention Detection Radu Florian Hongyan Jing Nanda Kambhatla and Imed Zitouni IBM TJ Watson Research Center Yorktown Heights NY 10598 raduf hjing nanda izitouni @ Abstract As natural language understanding research advances towards deeper knowledge modeling the tasks become more and more complex we are interested in more nu-anced word characteristics more linguistic properties deeper semantic and syntactic features. One such example explored in this article is the mention detection and recognition task in the Automatic Content Extraction project with the goal of identifying named nominal or pronominal references to real-world entities mentions and labeling them with three types of information entity type entity subtype and mention type. In this article we investigate three methods of assigning these related tags and compare them on several data sets. A system based on the methods presented in this article participated and ranked very competitively in the ACE 04 evaluation. 1 Introduction Information extraction is a crucial step toward understanding and processing natural language data its goal being to identify and categorize important information conveyed in a discourse. Examples of information extraction tasks are identification of the actors and the objects in written text the detection and classification of the relations among them and the events they participate in. These tasks have applications in among other fields summarization information retrieval data mining question answering and language understanding. One of the basic tasks of information extraction is the mention detection task. This task is very similar to named entity recognition NER as the ob jects of interest represent very similar concepts. The main difference is that the latter will identify however only named references while mention detection seeks named nominal and pronominal references. In this paper we will call the identified references

TÀI LIỆU LIÊN QUAN