Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: " Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper describes how a machinelearning named entity recognizer (NER) on upper case text can be improved by using a mixed case NER and some unlabeled text. The mixed case NER can be used to tag some unlabeled mixed case text, which are then used as additional training material for the upper case NER. We show that this approach reduces the performance gap between the mixed case NER and the upper case NER substantially, by 39% for MUC-6 and 22% for MUC-7 named entity test data. Our method is thus useful in improving the accuracy of NERs on upper. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 481-488. Teaching a Weaker Classifier Named Entity Recognition on Upper Case Text Hai Leong Chieu DSO National Laboratories 20 Science Park Drive Singapore 118230 chaileon@dso.org.sg Hwee Tou Ng Department of Computer Science School of Computing National University of Singapore 3 Science Drive 2 Singapore 117543 nght@comp.nus.edu.sg Abstract This paper describes how a machinelearning named entity recognizer NER on upper case text can be improved by using a mixed case NER and some unlabeled text. The mixed case NER can be used to tag some unlabeled mixed case text which are then used as additional training material for the upper case NER. We show that this approach reduces the performance gap between the mixed case NER and the upper case NER substantially by 39 for MUC-6 and 22 for MUC-7 named entity test data. Our method is thus useful in improving the accuracy of NERs on upper case text such as transcribed text from automatic speech recognizers where case information is missing. 1 Introduction In this paper we propose using a mixed case named entity recognizer NER that is trained on labeled text to further train an upper case NER. In the Sixth and Seventh Message Understanding Conferences MUC-6 1995 mUc-7 1998 the named entity task consists of labeling named entities with the classes PERSON ORGANIZATION LOCATION DATE TIME MONEY and PERCENT. We conducted experiments on upper case named entity recognition and showed how unlabeled mixed case text can be used to improve the results of an upper case NER on the official MUC-6 and MUC-7 Mixed Case Consuela Washington a longtime House staffer and an expert in securities laws is a leading candidate to be chairwoman of the Securities and Exchange Commission in the Clinton administration. Upper Case CONSUELA WASHINGTON A LONGTIME HOUSE STAFFER AND AN EXPERT IN SECURITIES LAWS IS A LEADING CANDIDATE TO BE .