tailieunhanh - Báo cáo khoa học: "Partial Parsing from Bitext Projections"
Recent work has shown how a parallel corpus can be leveraged to build syntactic parser for a target language by projecting automatic source parse onto the target sentence using word alignments. The projected target dependency parses are not always fully connected to be useful for training traditional dependency parsers. In this paper, we present a greedy non-directional parsing algorithm which doesn’t need a fully connected parse and can learn from partial parses by utilizing available structural and syntactic information in them. . | Partial Parsing from Bitext Projections Prashanth Mannem and Aswarth Dara Language Technologies Research Center International Institute of Information Technology Hyderabad AP India - 500032 prashanth @ Abstract Recent work has shown how a parallel corpus can be leveraged to build syntactic parser for a target language by projecting automatic source parse onto the target sentence using word alignments. The projected target dependency parses are not always fully connected to be useful for training traditional dependency parsers. In this paper we present a greedy non-directional parsing algorithm which doesn t need a fully connected parse and can learn from partial parses by utilizing available structural and syntactic information in them. Our parser achieved statistically significant improvements over a baseline system that trains on only fully connected parses for Bulgarian Spanish and Hindi. It also gave a significant improvement over previously reported results for Bulgarian and set a benchmark for Hindi. 1 Introduction Parallel corpora have been used to transfer information from source to target languages for Part-Of-Speech POS tagging word sense disambiguation Yarowsky et al. 2001 syntactic parsing Hwa et al. 2005 Ganchev et al. 2009 Jiang and Liu 2010 and machine translation Koehn 2005 Tiedemann 2002 . Analysis on the source sentences was induced onto the target sentence via projections across word aligned parallel corpora. Equipped with a source language parser and a word alignment tool parallel data can be used to build an automatic treebank for a target language. The parse trees given by the parser on the source sentences in the parallel data are projected onto the target sentence using the word alignments from the alignment tool. Due to the usage of automatic source parses automatic word alignments and differences in the annotation schemes of source and target languages the projected parses are not always fully connected and .
đang nạp các trang xem trước