tailieunhanh - Báo cáo khoa học: "Authorship Attribution with Author-aware Topic Models"

Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation (LDA) topic model can be used to improve authorship attribution accuracy, we show that employing a previously-suggested Author-Topic (AT) model outperforms LDA when applied to scenarios with many authors. | Authorship Attribution with Author-aware Topic Models Yanir Seroussi Fabian Bohnert Ingrid Zukerman Faculty of Information Technology Monash University Clayton Victoria 3800 Australia Abstract Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation LDA topic model can be used to improve authorship attribution accuracy we show that employing a previously-suggested Author-Topic AT model outperforms LDA when applied to scenarios with many authors. In addition we define a model that combines LDA and AT by representing authors and documents over two disjoint topic sets and show that our model outperforms LDA AT and support vector machines on datasets with many authors. 1 Introduction Authorship attribution AA has attracted much attention due to its many applications in . computer forensics criminal law military intelligence and humanities research Stamatatos 2009 . The traditional problem which is the focus of our work is to attribute test texts of unknown authorship to one of a set of known authors whose training texts are supplied in advance . a supervised classification problem . While most of the early work on AA focused on formal texts with only a few possible authors researchers have recently turned their attention to informal texts and tens to thousands of authors Koppel et al. 2011 . In parallel topic models have gained popularity as a means of analysing such large text corpora Blei 2012 . In Seroussi et al. 2011 we showed that methods based on Latent Dirichlet Allocation LDA - a popular topic model 264 by Blei et al. 2003 - yield good AA performance. However LDA does not model authors explicitly and we are not aware of any previous studies that apply author-aware topic models to traditional AA. This paper aims to address this gap. In addition to being the first to the best of our knowledge to apply Rosen-Zvi et al. s 2004 Author-Topic .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.