tailieunhanh - Báo cáo khoa học: "Automatic Authorship Attribution"
In this paper we present an approach to automatic authorship attribution dealing with real-world (or unrestricted) text. Our method is based on the computational analysis of the input text using a text-processing tool. Besides the style markers relevant to the output of this tool we also use analysis-dependent style markers, that is, measures that represent the way in which the text has been processed. No word frequency counts, nor other lexically-based measures are taken into account. We show that the proposed set of style markers is able to distinguish texts of various authors of a weekly newspaper using multiple. | Proceedings of EACL 99 Automatic Authorship Attribution E. Stamatatos N. Fakotakis and G. Kokkinakis Dept of Electrical and Computer Engineering University of Patras 26500 - Patras Greece stamatatos@ Abstract In this paper we present an approach to automatic authorship attribution dealing with real-world or unrestricted text. Our method is based on the computational analysis of the input text using a text-processing tool. Besides the style markers relevant to the output of this tool we also use analysis-dependent style markers that is measures that represent the way in which the text has been processed. No word frequency counts nor other lexically-based measures are taken into account. We show that the proposed set of style markers is able to distinguish texts of various authors of a weekly newspaper using multiple regression. All the experiments we present were performed using real-world text downloaded from the World Wide Web. Our approach is easily trainable and fully-automated requiring no manual text preprocessing nor sampling. 1 Introduction The vast majority of the attempts to computer-assisted authorship attribution has been focused on literary texts. In particular a lot of attention has been paid to the establishment of the authorship of anonymous or doubtful texts. A typical paradigm is the case of the Federalist papers twelve of which are of disputed authorship Mosteller and Wallace 1984 Holmes and Forsyth 1995 . Moreover the lack of a generic and formal definition of the idiosyncratic style of an author has led to the employment of statistical methods . discriminant analysis principal components etc. . Nowadays the wealth of text available in the World Wide Web in electronic form for a wide variety of genres and languages as well as the development of reliable text-processing tools open the way for the solution of the authorship attribution problem as regards real-world text. The most important approaches to authorship attribution .
đang nạp các trang xem trước