Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "An Active Learning Approach to Finding Related Terms"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. | An Active Learning Approach to Finding Related Terms David Vickrey Oscar Kipersztok Daphne Koller Stanford University Boeing Research Technology Stanford Univeristy dvickrey@cs.stanford.edu oscar.kipersztok koller@cs.stanford.edu @boeing.com Abstract We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast our system involves the user throughout the labeling process using active learning to intelligently explore the space of similar words. In particular our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data a standard thesaurus WordNet contextual similarity enabling it to capture many types of similarity groups synonyms of crash types of car etc. . We evaluate on a hand-labeled evaluation set our system improves over a strong baseline by 36 . 1 Introduction Set expansion is a well-studied NLP problem where a machine-learning algorithm is given a fixed set of seed words and asked to find additional members of the implied set. For example given the seed set elephant horse bat the algorithm is expected to return other mammals. Past work e.g. Roark Charniak 1998 Ghahramani Heller 2005 Wang Cohen 2007 Pantel et al. 2009 generally focuses on semi-automatic acquisition of the remaining members of the set by mining large amounts of unlabeled data. State-of-the-art set expansion systems work well for well-defined sets of nouns e.g. US Presidents particularly when given a large seed set. Set expansions is more difficult with fewer seed words and for other kinds of sets. The seed words may have multiple senses and the user may have in mind a variety of attributes that the answer must match. For example suppose