tailieunhanh - Báo cáo khoa học: "Subjectivity and Sentiment Analysis of Modern Standard Arabic"

Although Subjectivity and Sentiment Analysis (SSA) has been witnessing a flurry of novel research, there are few attempts to build SSA systems for Morphologically-Rich Languages (MRL). In the current study, we report efforts to partially fill this gap. We present a newly developed manually annotated corpus of Modern Standard Arabic (MSA) together with a new polarity lexicon. | Subjectivity and Sentiment Analysis of Modern Standard Arabic Mona T. Diab Center for Computational Learning Systems Columbia University NYC USA mdiab@ Muhammad Abdul-Mageed Department of Linguistics School of Library Info. Science Indiana University Bloomington USA mabdulma@ Abstract Although Subjectivity and Sentiment Analysis SSA has been witnessing a flurry of novel research there are few attempts to build SSA systems for Morphologically-Rich Languages MRL . In the current study we report efforts to partially fill this gap. We present a newly developed manually annotated corpus of Modern Standard Arabic MSA together with a new polarity corpus is a collection of newswire documents annotated on the sentence level. We also describe an automatic SSA tagging system that exploits the annotated data. We investigate the impact of different levels of preprocessing settings on the SSA classification task. We show that by explicitly accounting for the rich morphology the system is able to achieve significantly higher levels of performance. 1 Introduction Subjectivity and Sentiment Analysis SSA is an area that has been witnessing a flurry of novel research. In natural language subjectivity refers to expression of opinions evaluations feelings and speculations Banfield 1982 Wiebe 1994 and thus incorporates sentiment. The process of subjectivity classification refers to the task of classifying texts into either objective . Mubarak stepped down or subjective . Mubarak the hateful dictator stepped down . Subjective text is further classified with sentiment or polarity. For sentiment classification the task refers to identifying whether the subjective text is positive . What an excellent camera negative . I hate this camera neutral . I believe there will be a meeting. or sometimes mixed . It is good but I hate it texts. Most of the SSA literature has focused on English and other Indio-European languages. Very few studies

TỪ KHÓA LIÊN QUAN