tailieunhanh - Báo cáo khoa học: "The Good, the Bad, and the Unknown: Morphosyllabic Sentiment Tagging of Unseen Words"

The omnipresence of unknown words is a problem that any NLP component needs to address in some form. While there exist many established techniques for dealing with unknown words in the realm of POS-tagging, for example, guessing unknown words’ semantic properties is a less-explored area with greater challenges. In this paper, we study the semantic field of sentiment and propose five methods for assigning prior sentiment polarities to unknown words based on known sentiment carriers. | The Good the Bad and the Unknown Morphosyllabic Sentiment Tagging of Unseen Words Karo Moilanen and Stephen Pulman Oxford University Computing Laboratory Wolfson Building Parks Road Oxford OX1 3QD England @ Abstract The omnipresence of unknown words is a problem that any NLP component needs to address in some form. While there exist many established techniques for dealing with unknown words in the realm of POS-tagging for example guessing unknown words semantic properties is a less-explored area with greater challenges. In this paper we study the semantic field of sentiment and propose five methods for assigning prior sentiment polarities to unknown words based on known sentiment carriers. Tested on 2000 cases the methods mirror human judgements closely in three- and twoway polarity classification tasks and reach accuracies above 63 and 81 respectively. 1 Introduction One of the first challenges in sentiment analysis is the vast lexical diversity of subjective language. Gaps in lexical coverage will be a problem for any sentiment classification algorithm that does not have some way of intelligently guessing the polarity of unknown words. The problem is exacerbated further by misspellings of known words and POS-tagging errors which are often difficult to distinguish from genuinely unknown words. This study explores the extent to which it is possible to categorise words which present themselves as unknown but which may contain known components using morphological syllabic and shallow parsing devices. 2 Morphosyllabic Modelling Our core sentiment lexicon contains 41109 entries tagged with positive neutral N or nega tive - prior polarities . lovely vast N murder - across all word classes. Polarity reversal lexemes are tagged as - . never N - . We furthermore maintain an auxiliary lexicon of 314967 known neutral words such as names of people organisations and geographical locations. Each unknown word is run through a .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.