tailieunhanh - Báo cáo khoa học: "Selecting Query Term Alterations for Web Search by Exploiting Query Contexts"
Query expansion by word alterations (alternative forms of a word) is often used in Web search to replace word stemming. This allows users to specify particular word forms in a query. However, if many alterations are added, query traffic will be greatly increased. In this paper, we propose methods to select only a few useful word alterations for query expansion. The selection is made according to the appropriateness of the alteration to the query context (using a bigram language model), or according to its expected impact on the retrieval effectiveness (using a regression model). Our experiments on two TREC. | Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Guihong Cao Dept. of Computer Science and Operations Research University of Montreal Canada caogui@ Stephen Robertson Microsoft Research at Cambridge Cambridge UK ser@ Jian-Yun Nie Dept. of Computer Science and Operations Research University of Montreal Canada nie@ Abstract Query expansion by word alterations alternative forms of a word is often used in Web search to replace word stemming. This allows users to specify particular word forms in a query. However if many alterations are added query traffic will be greatly increased. In this paper we propose methods to select only a few useful word alterations for query expansion. The selection is made according to the appropriateness of the alteration to the query context using a bigram language model or according to its expected impact on the retrieval effectiveness using a regression model . Our experiments on two TREC collections will show that both methods only select a few expansion terms but the retrieval effectiveness can be improved significantly. 1 Introduction Word stemming is a basic NLP technique used in most of Information Retrieval IR systems. It transforms words into their root forms so as to increase the chance to match similar words terms that are morphological variants. For example with stemming controlling can match controlled because both have the same root control . Most stemmers such as the Porter stemmer Porter 1980 and Krovetz stemmer Krovetz 1993 deal with stemming by stripping word suffixes according to a set of morphological rules. Rule-based approaches are intuitive and easy to implement. However while in general most words can be stemmed correctly there is often erroneous stemming that unifies unrelated words. For instance jobs is stemmed to job in both find jobs in Apple and Steve Jobs at Apple . This is particularly problematic in Web search where users often use .
đang nạp các trang xem trước