tailieunhanh - Báo cáo khoa học: "Automated Whole Sentence Grammar Correction Using a Noisy Channel Model"

Automated grammar correction techniques have seen improvement over the years, but there is still much room for increased performance. Current correction techniques mainly focus on identifying and correcting a specific type of error, such as verb form misuse or preposition misuse, which restricts the corrections to a limited scope. We introduce a novel technique, based on a noisy channel model, which can utilize the whole sentence context to determine proper corrections. | Automated Whole Sentence Grammar Correction Using a Noisy Channel Model Y. Albert Park Department of Computer Science and Engineering 9500 Gilman Drive La Jolla CA 92037-404 USA yapark@ Roger Levy Department of Linguistics 9500 Gilman Drive La Jolla CA 92037-108 USA rlevy@ Abstract Automated grammar correction techniques have seen improvement over the years but there is still much room for increased performance. Current correction techniques mainly focus on identifying and correcting a specific type of error such as verb form misuse or preposition misuse which restricts the corrections to a limited scope. We introduce a novel technique based on a noisy channel model which can utilize the whole sentence context to determine proper corrections. We show how to use the EM algorithm to learn the parameters of the noise model using only a data set of erroneous sentences given the proper language model. This frees us from the burden of acquiring a large corpora of corrected sentences. We also present a cheap and efficient way to provide automated evaluation results for grammar corrections by using BLEU and METEOR in contrast to the commonly used manual evaluations. 1 Introduction The process of editing written text is performed by humans on a daily basis. Humans work by first identifying the writer s intent and then transforming the text so that it is coherent and error free. They can read text with several spelling errors and grammatical errors and still easily identify what the author originally meant to write. Unfortunately current computer systems are still far from such capabilities when it comes to the task of recognizing incorrect text input. Various approaches have spell checkers such as Aspell do not take context into consideration which prevents them from finding misspellings which have the same form as valid words. Also current grammar correction systems are mostly rule-based searching the text for defined types of rule violations in the English

TỪ KHÓA LIÊN QUAN