tailieunhanh - Báo cáo khoa học: "Language Independent Authorship Attribution using Character Level Language Models"

We present a method for computerassisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. | Language Independent Authorship Attribution using Character Level Language Models Fuchun Peng Dale Schuurmanst Vlado Keselp Shaojun Wang School of Computer Science University of Waterloo Canada f3peng dale sjwang @ Faculty of Computing Science Dalhousie University Canada vlado@ Abstract We present a method for computer-assisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach we present experimental results on Greek English and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular we obtain a 18 accuracy improvement over the best published results for a Greek data set while using a far simpler technique than previous investigations. 1 Introduction Automated authorship attribution is the problem of identifying the author of an anonymous text or text whose authorship is in doubt Love 2002 . A famous example is the Federalist Papers of which twelve are claimed to have been written both by Alexander Hamilton and James Madison Holmes and Forsyth 1995 . Recently vast repositories of electronic text have become available on the Internet making the problem of managing large text collections increasingly important. Automated text categorization TC is a useful way to organize a large document collection by imposing a desired categorization scheme. For example categorizing documents by their author is an important case that has become increasingly useful but also increasingly difficult in the age of web-documents that can be easily copied translated and edited. Author attribution is becoming an important application in web information management and is beginning to play a role in areas such as information

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.