tailieunhanh - Báo cáo khoa học: "Joint Inference of Named Entity Recognition and Normalization for Tweets"

Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. | Joint Inference of Named Entity Recognition and Normalization for Tweets Xiaohua Liu t Ming Zhou t Furu Wei t Zhongyang Fu Xiangyang Zhou School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai 200240 China B School of Computer Science and Technology Shandong University Jinan 250100 China tMicrosoft Research Asia Beijing 100190 China t xiaoliu fuwei mingzhou @ v-xzho@ Abstract Tweets represent a critical source of fresh information in which named entities occur frequently with rich variations. We study the problem of named entity normalization NEN for tweets. Two main challenges are the errors propagated from named entity recognition NER and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly our model introduces a binary random variable for each pair of words with the same lemma across similar tweets whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set and show that our method outperforms the baseline that handles these two tasks separately boosting the F1 from to for NER and the Accuracy from to for NEN respectively. 1 Introduction Tweets short messages of less than 140 characters shared through the Twitter service 1 have become an important source of fresh information. As a result the task of named entity recognition NER for tweets which aims to identify mentions of rigid designators from tweets belonging to named-entity types such as persons organizations and locations 2007 has attracted increasing research interest. For example Ritter et al. 2011 develop a system that exploits a CRF model to segment named 1 http 526 entities and then uses a .

TỪ KHÓA LIÊN QUAN