tailieunhanh - Báo cáo khoa học: "Arabic Named Entity Recognition: Using Features Extracted from Noisy Data"

Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challenging task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. | Arabic Named Entity Recognition Using Features Extracted from Noisy Data Yassine Benajiba1 Imed Zitouni2 Mona Diab1 Paolo Rosso3 1 Center for Computational Learning Systems Columbia University 2 IBM . Watson Research Center Yorktown Heights 3 Natural Language Engineering Lab. - ELiRF Universidad Politecnica de Valencia ybenajiba mdiab @ izitouni@ prosso@ Abstract Building an accurate Named Entity Recognition NER system for languages with complex morphology is a challenging task. In this paper we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. We bootstrap noisy features by projection from an Arabic-English parallel corpus that is automatically tagged with a baseline NER system. The feature space covers lexical morphological and syntactic features. The proposed approach yields an improvement of up to F-measure absolute . 1 Introduction Named Entity Recognition NER has earned an important place in Natural Language Processing NLP as an enabling process for other tasks. When explicitly taken into account research shows that it helps such applications achieve better performance levels Babych and Hartley 2003 Thompson and Dozier 1997 . NER is defined as the computational identification and classification of Named Entities NEs in running text. For instance consider the following text Barack Obama is visiting the Middle East. A NER system should be able to identify Barack Obama and Middle East as NEs and classify them as Person PER and Geo-Political Entity GPE respectively. The class-set used to tag NEs may vary according to user needs. In this research we adopt the Automatic Content Extraction ACE 2007 nomenclature1. According to Nadeau and Sekine 2007 optimization of the feature set is the key component in enhancing the performance of a global NER system. In this paper we investigate the possibility of building a high .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.