tailieunhanh - Báo cáo khoa học: "A Bootstrapping Approach to Named Entity Classification Using Successive Learners"

This paper presents a new bootstrapping approach to named entity (NE) classification. This approach only requires a few common noun/pronoun seeds that correspond to the concept for the target NE type, . he/she/man/woman for PERSON NE. The entire bootstrapping procedure is implemented as training two successive learners: (i) a decision list is used to learn the parsing-based high precision NE rules; (ii) a Hidden Markov Model is then trained to learn string sequence-based NE patterns. | A Bootstrapping Approach to Named Entity Classification Using Successive Learners Cheng Niu Wei Li Jihong Ding Rohini K. Srihari Cymfony Inc. 600 Essjay Road Williamsville NY 14221. USA. cniu wei jding rohini @ Abstract This paper presents a new bootstrapping approach to named entity NE classification. This approach only requires a few common noun pronoun seeds that correspond to the concept for the target NE type . he she man woman for PERSON NE. The entire bootstrapping procedure is implemented as training two successive learners i a decision list is used to learn the parsing-based high precision NE rules ii a Hidden Markov Model is then trained to learn string sequence-based NE patterns. The second learner uses the training corpus automatically tagged by the first learner. The resulting NE system approaches supervised NE performance for some NE types. The system also demonstrates intuitive support for tagging user-defined NE types. The differences of this approach from the co-training-based NE bootstrapping are also discussed. 1 Introduction Named Entity NE tagging is a fundamental task for natural language processing and information extraction. An NE tagger recognizes and classifies text chunks that represent various proper names time or numerical expressions. Seven types of named entities are defined in the Message Understanding Conference MUC standards namely PeRsON PER ORGANIZATION ORG location LOC time date MonEy and PERCENT1 MUC-7 1998 . 1 This paper only focuses on classifying proper names. Time and numerical NEs are not yet explored using this method. There is considerable research on NE tagging using different techniques. These include systems based on handcrafted rules Krupka 1998 as well as systems using supervised machine learning such as the Hidden Markov Model HMM Bikel 1997 and the Maximum Entropy Model Borthwick 1998 . The state-of-the-art rule-based systems and supervised learning systems can reach near-human performance for NE .