Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Domain Adaptation with Active Learning for Word Sense Disambiguation"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
When a word sense disambiguation (WSD) system is trained on one domain but applied to a different domain, a drop in accuracy is frequently observed. This highlights the importance of domain adaptation for word sense disambiguation. In this paper, we first show that an active learning approach can be successfully used to perform domain adaptation of WSD systems. Then, by using the predominant sense predicted by expectation-maximization (EM) and adopting a count-merging technique, we improve the effectiveness of the original adaptation process achieved by the basic active learning approach. . | Domain Adaptation with Active Learning for Word Sense Disambiguation Yee Seng Chan and Hwee Tou Ng Department of Computer Science National University of Singapore 3 Science Drive 2 Singapore 117543 chanys nght @comp.nus.edu.sg Abstract When a word sense disambiguation WSD system is trained on one domain but applied to a different domain a drop in accuracy is frequently observed. This highlights the importance of domain adaptation for word sense disambiguation. In this paper we first show that an active learning approach can be successfully used to perform domain adaptation of WSD systems. Then by using the predominant sense predicted by expectation-maximization EM and adopting a count-merging technique we improve the effectiveness of the original adaptation process achieved by the basic active learning approach. 1 Introduction In natural language a word often assumes different meanings and the task of determining the correct meaning or sense of a word in different contexts is known as word sense disambiguation WSD . To date the best performing systems in WSD use a corpus-based supervised learning approach. With this approach one would need to collect a text corpus in which each ambiguous word occurrence is first tagged with its correct sense to serve as training data. The reliance of supervised WSD systems on annotated corpus raises the important issue of domain dependence. To investigate this Escudero et al. 2000 and Martinez and Agirre 2000 conducted experiments using the DSO corpus which 49 contains sentences from two different corpora namely Brown Corpus BC and Wall Street Journal WSJ . They found that training a WSD system on one part BC or WSJ of the DSO corpus and applying it to the other can result in an accuracy drop of more than 10 highlighting the need to perform domain adaptation of WSD systems to new domains. Escudero et al. 2000 pointed out that one of the reasons for the drop in accuracy is the difference in sense priors i.e. the proportions of the .