tailieunhanh - Báo cáo khoa học: "Growing Related Words from Seed via User Behaviors: A Re-ranking Based Approach"

Motivated by Google Sets, we study the problem of growing related words from a single seed word by leveraging user behaviors hiding in user records of Chinese input method. Our proposed method is motivated by the observation that the more frequently two words cooccur in user records, the more related they are. First, we utilize user behaviors to generate candidate words. Then, we utilize search engine to enrich candidate words with adequate semantic features. | Growing Related Words from Seed via User Behaviors A Re-ranking Based Approach Yabin Zheng Zhiyuan Liu Lixing Xie State Key Laboratory on Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology Tsinghua University Beijing 100084 China lavender087 @ Abstract Motivated by Google Sets we study the problem of growing related words from a single seed word by leveraging user behaviors hiding in user records of Chinese input method. Our proposed method is motivated by the observation that the more frequently two words cooccur in user records the more related they are. First we utilize user behaviors to generate candidate words. Then we utilize search engine to enrich candidate words with adequate semantic features. Finally we reorder candidate words according to their semantic relatedness to the seed word. Experimental results on a Chinese input method dataset show that our method gains better performance. 1 Introduction What is the relationship between ỀBinAA S Natural Language Processing and AlW t Artificial Intelligence We may regard NLP as a research branch of AI. Problems arise when we want to find more words related to the input query seed word. For example if seed word I B ft s Natural Language Processing is entered into Google Sets Google 2010 Google Sets returns an ordered list of related words such as AlWt Artificial Intelligence and AMM Computer . Generally speaking it performs a large-scale clustering algorithm that can gather related words. In this paper we want to investigate the advantage of user behaviors and re-ranking framework in related words retrieval task using Chinese input method user records. We construct a User-Word bipartite graph to represent the information hiding in user records. The bipartite graph keeps users on one side and words on the other side. The underlying idea is that the more frequently two words co-occur in .

TỪ KHÓA LIÊN QUAN