tailieunhanh - Báo cáo khoa học: " Named Entity Recognition using an HMM-based Chunk Tagger"

NER performs what is known as surface parsing, delimiting sequences of tokens that answer these important questions. NER can also be used as the first step in a chain of processors: a next level of processing could relate two or more NEs, or perhaps even give semantics to that relationship using a verb. In this way, further processing could discover the "what" and "how" of a sentence or body of text. While NER is relatively simple and it is fairly easy to build a system with reasonable performance, there are still a large number of ambiguous cases that make. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 473-480. Named Entity Recognition using an HMM-based Chunk Tagger GuoDong Zhou Jian Su Laboratories for Information Technology 21 Heng Mui Keng Terrace Singapore 119613 zhougd@ sujian@ Abstract This paper proposes a Hidden Markov Model HMM and an HMM-based chunk tagger from which a named entity NE recognition NER system is built to recognize and classify names times and numerical quantities. Through the HMM our system is able to apply and integrate four types of internal and external evidences 1 simple deterministic internal feature of the words such as capitalization and digitalization 2 internal semantic feature of important triggers 3 internal gazetteer feature 4 external macro context feature. In this way the NER problem can be resolved effectively. Evaluation of our system on MUC-6 and MUC-7 English NE tasks achieves F-measures of and respectively. It shows that the performance is significantly better than reported by any other machine-learning system. Moreover the performance is even consistently better than those based on handcrafted rules. 1 Introduction Named Entity NE Recognition NER is to classify every word in a document into some predefined categories and none-of-the-above . In the taxonomy of computational linguistics tasks it falls under the domain of information extraction which extracts specific kinds of information from documents as opposed to the more general task of document management which seeks to extract all of the information found in a document. Since entity names form the main content of a document NER is a very important step toward more intelligent information extraction and management. The atomic elements of information extraction -- indeed of language as a whole -- could be considered as the who where and how much in a sentence. NER performs what is known as surface parsing delimiting .

TỪ KHÓA LIÊN QUAN