tailieunhanh - Báo cáo khoa học: "Using Search-Logs to Improve Query Tagging"

Syntactic analysis of search queries is important for a variety of information-retrieval tasks; however, the lack of annotated data makes training query analysis models difficult. We propose a simple, efficient procedure in which part-of-speech tags are transferred from retrieval-result snippets to queries at training time. | Using Search-Logs to Improve Query Tagging Kuzman Ganchev Keith Hall Ryan McDonald Slav Petrov Google Inc. kuzman kbhall ryanmcd slav @ Abstract Syntactic analysis of search queries is important for a variety of information-retrieval tasks however the lack of annotated data makes training query analysis models difficult. We propose a simple efficient procedure in which part-of-speech tags are transferred from retrieval-result snippets to queries at training time. Unlike previous work our final model does not require any additional resources at run-time. Compared to a state-of-the-art approach we achieve more than 20 relative error reduction. Additionally we annotate a corpus of search queries with part-of-speech tags providing a resource for future work on syntactic query analysis. 1 Introduction Syntactic analysis of search queries is important for a variety of tasks including better query refinement improved matching and better ad targeting Barr et al. 2008 . However search queries differ substantially from traditional forms of written language . no capitalization few function words fairly free word order etc. and are therefore difficult to process with natural language processing tools trained on standard corpora Barr et al. 2008 . In this paper we focus on part-of-speech POS tagging queries entered into commercial search engines and compare different strategies for learning from search logs. The search logs consist of user queries and relevant search results retrieved by a search engine. We use a supervised POS tagger to label the result snippets and then transfer the tags to the queries producing a set of noisy labeled queries. These labeled queries are then added to the training data and 238 the tagger is retrained. We evaluate different strategies for selecting which annotation to transfer and find that using the result that was clicked by the user gives comparable performance to using just the top result or to aggregating over the top-k .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN