tailieunhanh - Báo cáo khoa học: "Online Plagiarism Detection Through Exploiting Lexical, Syntactic, and Semantic Information"

In this paper, we introduce a framework that identifies online plagiarism by exploiting lexical, syntactic and semantic features that includes duplication-gram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We establish an ensemble framework to combine the predictions of each model. Results demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms and commercial software. . | Online Plagiarism Detection Through Exploiting Lexical Syntactic and Semantic Information Wan-Yu Lin Graduate Institute of Networking and Multimedia National Taiwan University r99944016@csie . Nanyun Peng Institute of Computational Linguistic Peking University pengnanyun@pku . Chun-Chao Yen Graduate Institute of Networking and Multimedia National Taiwan University r96944016@csie . Shou-de Lin Graduate Institute of Networking and Multimedia National Taiwan University sdlin@ . Abstract In this paper we introduce a framework that identifies online plagiarism by exploiting lexical syntactic and semantic features that includes duplication-gram reordering and alignment of words POS and phrase tags and semantic similarity of sentences. We establish an ensemble framework to combine the predictions of each model. Results demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms and commercial software. Keywords Plagiarism Detection Lexical Syntactic Semantic 1. Introduction Online plagiarism the action of trying to create a new piece of writing by copying reorganizing or rewriting others work identified through search engines is one of the most commonly seen misusage of the highly matured web technologies. As implied by the experiment conducted by Braumoeller and Gaines 2001 a powerful plagiarism detection system can effectively discourage people from plagiarizing others work. A common strategy people adopt for onlineplagiarism detection is as follows. First they identify several suspicious sentences from the write-up and feed them one by one as a query to a search engine to obtain a set of documents. Then human reviewers can manually examine whether these documents are truly the sources of the suspicious sentences. While it is quite straightforward and effective the limitation of this strategy is obvious. First since the length of .

TỪ KHÓA LIÊN QUAN