Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "A Computational Model of Text Reuse in Ancient Literary Texts"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alternation patterns, so as to be able to align even sentences with low surface similarity. We demonstrate its ability to characterize text reuse in the Greek New Testament. | A Computational Model of Text Reuse in Ancient Literary Texts John Lee Spoken Language Systems MIT Computer Science and Artificial Intelligence Laboratory Cambridge MA 02139 UsA jsylee@csail.mit.edu Abstract We propose a computational model of text reuse tailored for ancient literary texts available to us often only in small and noisy samples. The model takes into account source alternation patterns so as to be able to align even sentences with low surface similarity. We demonstrate its ability to characterize text reuse in the Greek New Testament. 1 Introduction Text reuse is the transformation of a source text into a target text in order to serve a different purpose. Past research has addressed a variety of text-reuse applications including journalists turning a news agency text into a newspaper story Clough et al. 2002 editors adapting an encyclopedia entry to an abridged version Barzilay and Elhadad 2003 and plagia-rizers disguising their sources by removing surface similarities Uzuner et al. 2005 . A common assumption in the recovery of text reuse is the conservation of some degree of lexical similarity from the source sentence to the derived sentence. A simple approach then is to define a lexical similarity measure and estimate a score threshold given a sentence in the target text if the highest-scoring sentence in the source text is above the threshold then the former is considered to be derived from the latter. Obviously the effectiveness of this basic approach depends on the degree of lexical similarity source sentences that are quoted verbatim are easier to identify than those that have been transformed by a skillful plagiarizer. 472 The crux of the question therefore is how to identify source sentences despite their lack of surface similarity to the derived sentences. Ancient literary texts which are the focus of this paper present some distinctive challenges in this respect. 1.1 Ancient Literary Texts Borrowed material embedded in the flow of a writer s