tailieunhanh - Báo cáo khoa học: "Recognizing Named Entities in Tweets"

The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. . | Recognizing Named Entities in Tweets Xiaohua Liu t Shaodian Zhang Furu Wei t Ming Zhou t School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai 200240 China tMicrosoft Research Asia Beijing 100190 China t xiaoliu fuwei mingzhou @ Abstract The challenges of Named Entities Recognition NER for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors KNN classifier with a linear Conditional Random Fields CRF model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semisupervised learning. 1 Introduction Named Entities Recognition NER is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons organizations and locations Nadeau and Sekine 2007 . Proposed solutions to NER fall into three categories 1 The rule-based Krupka and Hausman 1998 2 the machine learning based Finkel and Manning 2009 Singh et al. 2010 and 3 hybrid methods Jansche and Abney 2002 . With the availability of annotated corpora such as ACE05 Enron Minkov et al. 2005 and This work has been done while the author was visiting Microsoft Research Asia. 359 CoNLL03 Tjong Kim Sang and De Meulder 2003 the data driven methods now become the dominating methods. However current NER mainly focuses on formal text such as news articles Mccallum and Li 2003 Etzioni et al. 2005 . Exceptions

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.