Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Data point selection for cross-language adaptation of dependency parsers"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. . | Data point selection for cross-language adaptation of dependency parsers Anders Sngaard Center for Language Technology University of Copenhagen Njalsgade 142 DK-2300 Copenhagen S soegaard@hum.ku.dk Abstract We consider a very simple yet effective approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. The strategy achieves much better results than a non-adapted baseline and state-of-the-art unsupervised dependency parsing and results are comparable to more complex projection-based cross language adaptation algorithms. 1 Introduction While unsupervised dependency parsing has seen rapid progress in recent years results are still far from the results that can be achieved with supervised parsers and not yet good enough to solve real-world problems. In this paper we will be interested in an alternative strategy namely cross-language adaptation of dependency parsers. The idea is briefly put to learn how to parse Arabic for example from say a Danish treebank comparing unlabeled data from both languages. This is similar to but more difficult than most domain adaptation or transfer learning scenarios where differences between source and target distributions are smaller. Most previous work in cross-language adaptation has used parallel corpora to project dependency 682 structures across translations using word alignments Smith and Eisner 2009 Spreyer and Kuhn 2009 Ganchev et al. 2009 but in this paper we show that similar results can be achieved by much simpler means. Specifically we build on the cross-language adaptation algorithm for closely related languages developed by .