tailieunhanh - Báo cáo khoa học: "Detecting Erroneous Sentences using Automatically Mined Sequential Patterns"

This paper studies the problem of identifying erroneous/correct sentences. The problem has important applications, ., providing feedback for writers of English as a Second Language, controlling the quality of parallel bilingual sentences mined from the Web, and evaluating machine translation results. In this paper, we propose a new approach to detecting erroneous sentences by integrating pattern discovery with supervised learning models. Experimental results show that our techniques are promising. . | Detecting Erroneous Sentences using Automatically Mined Sequential Patterns Guihua Sun Chongqing University sunguihua5018@ Zhongyang Xiong Chongqing University zyxiong@ Xiaohua Liu Gao Cong Ming Zhou Microsoft Research Asia xiaoliu gaocong mingzhou @ John Lee t Chin-Yew Lin MIT Microsoft Research Asia jsylee@ cyl@ Abstract This paper studies the problem of identifying erroneous correct sentences. The problem has important applications . providing feedback for writers of English as a Second Language controlling the quality of parallel bilingual sentences mined from the Web and evaluating machine translation results. In this paper we propose a new approach to detecting erroneous sentences by integrating pattern discovery with supervised learning models. Experimental results show that our techniques are promising. 1 Introduction Detecting erroneous correct sentences has the following applications. First it can provide feedback for writers of English as a Second Language ESL as to whether a sentence contains errors. Second it can be applied to control the quality of parallel bilingual sentences mined from the Web which are critical sources for a wide range of applications such as statistical machine translation Brown et al. 1993 and cross-lingual information retrieval Nie et al. 1999 . Third it can be used to evaluate machine translation results. As demonstrated in Corston-Oliver et al. 2001 Gamon et al. 2005 the better human reference translations can be distinguished from machine translations by a classification model the worse the machine translation system is. Work done while the author was a visiting student at MSRA Work done while the author was a visiting student at MSRA The previous work on identifying erroneous sentences mainly aims to find errors from the writing of ESL learners. The common mistakes Yukio et al. 2001 Gui and Yang 2003 made by ESL learners include spelling lexical collocation sentence .