tailieunhanh - Báo cáo khoa học: "Cross-Lingual Genre Classification"

Classifying text genres across languages can bring the benefits of genre classification to the target language without the costs of manual annotation. This article introduces the first approach to this task, which exploits text features that can be considered stable genre predictors across languages. My experiments show this method to perform equally well or better than full text translation combined with monolingual classification, while requiring fewer resources. | Cross-Lingual Genre Classification Philipp Petrenz School of Informatics University of Edinburgh 10 Crichton Street Edinburgh Eh8 9AB uK Abstract Classifying text genres across languages can bring the benefits of genre classification to the target language without the costs of manual annotation. This article introduces the first approach to this task which exploits text features that can be considered stable genre predictors across languages. My experiments show this method to perform equally well or better than full text translation combined with monolingual classification while requiring fewer resources. 1 Introduction Automated text classification has become standard practice with applications in fields such as information retrieval and natural language processing. The most common basis for text classification is by topic Joachims 1998 Sebas-tiani 2002 but other classification criteria have evolved including sentiment Pang et al. 2002 authorship de Vel et al. 2001 Stamatatos et al. 2000a and author personality Oberlander and Nowson 2006 as well as categories relevant to filter algorithms . spam or inappropriate contents for minors . Genre is another text characteristic often described as orthogonal to topic. It has been shown by Biber 1988 and others after him that the genre of a text affects its formal properties. It is therefore possible to use cues . lexical syntactic structural from a text as features to predict its genre which can then feed into information retrieval applications Karlgren and Cutting 1994 Kessler et al. 1997 Finn and Kushmer-ick 2006 Freund et al. 2006 . This is because users may want documents that serve a particular communicative purpose as well as being on a particular topic. For example a web search on the topic crocodiles may return an encyclopedia entry a biological fact sheet a news report about attacks in Australia a blog post about a safari experience a fiction novel set in South Africa or a poem about

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.