Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Arabic Named Entity Recognition: Using Features Extracted from Noisy Data"

Toàn Thắng 86 5 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challenging task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. | Arabic Named Entity Recognition Using Features Extracted from Noisy Data Yassine Benajiba1 Imed Zitouni2 Mona Diab1 Paolo Rosso3 1 Center for Computational Learning Systems Columbia University 2 IBM T.J. Watson Research Center Yorktown Heights 3 Natural Language Engineering Lab. - ELiRF Universidad Politecnica de Valencia ybenajiba mdiab @ccls.columbia.edu izitouni@us.ibm.com prosso@dsic.upv.es Abstract Building an accurate Named Entity Recognition NER system for languages with complex morphology is a challenging task. In this paper we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. We bootstrap noisy features by projection from an Arabic-English parallel corpus that is automatically tagged with a baseline NER system. The feature space covers lexical morphological and syntactic features. The proposed approach yields an improvement of up to 1.64 F-measure absolute . 1 Introduction Named Entity Recognition NER has earned an important place in Natural Language Processing NLP as an enabling process for other tasks. When explicitly taken into account research shows that it helps such applications achieve better performance levels Babych and Hartley 2003 Thompson and Dozier 1997 . NER is defined as the computational identification and classification of Named Entities NEs in running text. For instance consider the following text Barack Obama is visiting the Middle East. A NER system should be able to identify Barack Obama and Middle East as NEs and classify them as Person PER and Geo-Political Entity GPE respectively. The class-set used to tag NEs may vary according to user needs. In this research we adopt the Automatic Content Extraction ACE 2007 nomenclature1. According to Nadeau and Sekine 2007 optimization of the feature set is the key component in enhancing the performance of a global NER system. In this paper we investigate the possibility of building a high .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Transforming Standard Arabic to Colloquial Arabic"

Báo cáo khoa học: "The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content"

Báo cáo khoa học: "Computational Research in Arabic"

Báo cáo khoa học: "Arabic Retrieval Revisited: Morphological Hole Filling"

Báo cáo khoa học: "Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study"

Báo cáo khoa học: "Unsupervised Morphology Rivals Supervised Morphology for Arabic MT"

Báo cáo khoa học: "Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment"

Báo cáo khoa học: "Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features"

Báo cáo khoa học: "Arabic Named Entity Recognition: Using Features Extracted from Noisy Data"

Báo cáo khoa học: "Simultaneous Tokenization and Part-of-Speech Tagging for Arabic without a Morphological Analyzer"

crossorigin="anonymous">

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.