tailieunhanh - Báo cáo khoa học: "Grammatical Error Correction with Alternating Structure Optimization"
We present a novel approach to grammatical error correction based on Alternating Structure Optimization. As part of our work, we introduce the NUS Corpus of Learner English (NUCLE), a fully annotated one million words corpus of learner English available for research purposes. We conduct an extensive evaluation for article and preposition errors using various feature sets. Our experiments show that our approach outperforms two baselines trained on non-learner text and learner text, respectively. . | Grammatical Error Correction with Alternating Structure Optimization Daniel Dahlmeier1 and Hwee Tou Ng1 2 1NUS Graduate School for Integrative Sciences and Engineering 2Department of Computer Science National University of Singapore danielhe nght @ Abstract We present a novel approach to grammatical error correction based on Alternating Structure Optimization. As part of our work we introduce the NUS Corpus of Learner English NUCLE a fully annotated one million words corpus of learner English available for research purposes. We conduct an extensive evaluation for article and preposition errors using various feature sets. Our experiments show that our approach outperforms two baselines trained on non-learner text and learner text respectively. Our approach also outperforms two commercial grammar checking software packages. 1 Introduction Grammatical error correction GEC has been recognized as an interesting as well as commercially attractive problem in natural language processing NLP in particular for learners of English as a foreign or second language EFL ESL . Despite the growing interest research has been hindered by the lack of a large annotated corpus of learner text that is available for research purposes. As a result the standard approach to GEC has been to train an off-the-shelf classifier to re-predict words in non-learner text. Learning GEC models directly from annotated learner corpora is not well explored as are methods that combine learner and non-learner text. Furthermore the evaluation of GEC has been problematic. Previous work has either evaluated on artificial test instances as a substitute for real learner errors or on proprietary data that is not available to 915 other researchers. As a consequence existing methods have not been compared on the same test set leaving it unclear where the current state of the art really is. In this work we aim to overcome both problems. First we present a novel approach to GEC based on Alternating .
đang nạp các trang xem trước