tailieunhanh - Báo cáo khoa học: "A Framework for Figurative Language Detection Based on Sense Differentiation"

Various text mining algorithms require the process of feature selection. High-level semantically rich features, such as figurative language uses, speech errors etc., are very promising for such problems as . writing style detection, but automatic extraction of such features is a big challenge. In this paper, we propose a framework for figurative language use detection. | A Framework for Figurative Language Detection Based on Sense Differentiation Daria Bogdanova University of Saint Petersburg Saint Petersburg Abstract Various text mining algorithms require the process of feature selection. High-level semantically rich features such as figurative language uses speech errors etc. are very promising for such problems as . writing style detection but automatic extraction of such features is a big challenge. In this paper we propose a framework for figurative language use detection. This framework is based on the idea of sense differentiation. We describe two algorithms illustrating the mentioned idea. We show then how these algorithms work by applying them to Russian language data. 1 Introduction Various text mining algorithms require the process of feature selection. For example authorship attribution algorithms need to determine features to quantify the writing style. Previous work on authorship attribution among computer scientists is mostly based on low-level features such as word frequencies sentence length counts n-grams etc. A significant advantage of such features is that they can be easily extracted from any corpus. But the study by Batov and Sorokin 1975 shows that such features do not always provide accurate measures for authorship attribution. The linguistic approach to the problem involves such high-level characteristics as the use of figurative language irony sound devices and so on. Such characteristics are very promising for the mentioned above tasks but the extraction of these features is extremely hard to automate. As a result very few attempts have been made to exploit high-level features for stylometric purposes Stamatatos 2009 . Therefore our long-term objective is the extraction of high-level semantically rich features. Since the mentioned topic is very broad we focus our attention only on some particular prob lems and approaches. In this paper we examine one of such problems the .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG