tailieunhanh - Báo cáo khoa học: "An Improved Error Model for Noisy Channel Spelling Correction"

The noisy channel model has been applied to a wide range of problems, including spelling correction. These models consist of two components: a source model and a channel model. Very little research has gone into improving the channel model for spelling correction. This paper describes a new channel model for spelling correction, based on generic string to string edits. Using this model gives significant performance improvements compared to previously proposed models. | An Improved Error Model for Noisy Channel Spelling Correction Eric Brill and Robert C. Moore Microsoft Research One Microsoft Way Redmond Wa. 98052 brill bobmoore @ Abstract The noisy channel model has been applied to a wide range of problems including spelling correction. These models consist of two components a source model and a channel model. Very little research has gone into improving the channel model for spelling correction. This paper describes a new channel model for spelling correction based on generic string to string edits. Using this model gives significant performance improvements compared to previously proposed models. Introduction The noisy channel model Shannon 1948 has been successfully applied to a wide range of problems including spelling correction. These models consist of two components a source model and a channel model. For many applications people have devoted considerable energy to improving both components with resulting improvements in overall system accuracy. However relatively little research has gone into improving the channel model for spelling correction. This paper describes an improvement to noisy channel spelling correction via a more powerful model of spelling errors be they typing mistakes or cognitive errors than has previously been employed. Our model works by learning generic string to string edits along with the probabilities of each of these edits. This more powerful model gives significant improvements in accuracy over previous approaches to noisy channel spelling correction. 1 Noisy Channel Spelling Correction This paper will address the problem of automatically training a system to correct generic single word spelling We do not address the problem of correcting specific word set confusions such as to too two see Golding and Roth 1999 . We will define the spelling correction problem abstractly as follows Given an alphabet E a dictionary D consisting of strings in E and a string s where s Ề D and s