Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Selective Sharing for Multilingual Dependency Parsing"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. | Selective Sharing for Multilingual Dependency Parsing Tahira Naseem CSAIL MIT tahira@csail.mit.edu Regina Barzilay CSAIL MIT regina@csail.mit.edu Amir Globerson Hebrew University gamir@cs.huji.ac.il Abstract We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps selection of syntactic dependents and their ordering. Being largely languageuniversal the selection component is learned in a supervised fashion from all the training languages. In contrast the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4 .1 1 Introduction Current top performing parsing algorithms rely on the availability of annotated data for learning the syntactic structure of a language. Standard approaches for extending these techniques to resourcelean languages either use parallel corpora or rely on 1The source code for the work presented in this paper is available at http groups.csail.mit.edu rbg code unidep 629 annotated trees from other source languages. These techniques have been shown to work well for language families with many annotated resources such as Indo-European languages . Unfortunately for many languages there are no available parallel corpora or annotated resources in related languages. For such languages the only .