Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Exploring Distributional Similarity Based Models for Query Spelling Correction"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

A query speller is crucial to search engine in improving web search relevance. This paper describes novel methods for use of distributional similarity estimated from query logs in learning improved query spelling correction models. The key to our methods is the property of distributional similarity between two terms: it is high between a frequently occurring misspelling and its correction, and low between two irrelevant terms only with similar spellings. We present two models that are able to take advantage of this property. . | Exploring Distributional Similarity Based Models for Query Spelling Correction Mu Li Microsoft Research Asia 5F Sigma Center Zhichun Road Haidian District Beijing China 100080 muli@microsoft.com Yang Zhang School of Computer Science and Technology Tianjin University Tianjin China 300072 yangzhang@tju.edu.cn Muhua Zhu School of Information Science and Engineering Northeastern University Shenyang Liaoning China 110004 zhumh@ics.neu.edu.cn Ming Zhou Microsoft Research Asia 5F Sigma Center Zhichun Road Haidian District Beijing China 100080 mingzhou@microsoft.com Abstract A query speller is crucial to search engine in improving web search relevance. This paper describes novel methods for use of distributional similarity estimated from query logs in learning improved query spelling correction models. The key to our methods is the property of distributional similarity between two terms it is high between a frequently occurring misspelling and its correction and low between two irrelevant terms only with similar spellings. We present two models that are able to take advantage of this property. Experimental results demonstrate that the distributional similarity based models can significantly outperform their baseline systems in the web query spelling correction task. 1 Introduction Investigations into query log data reveal that more than 10 of queries sent to search engines contain misspelled terms Cucerzan and Brill 2004 . Such statistics indicate that a good query speller is crucial to search engine in improving web search relevance because there is little opportunity that a search engine can retrieve many relevant contents with misspelled terms. The problem of designing a spelling correction program for web search queries however poses special technical challenges and cannot be well solved by general purpose spelling correction methods. Cucerzan and Brill 2004 discussed in detail specialties and difficulties of a query spell checker and illustrated why the existing .