tailieunhanh - Báo cáo khoa học: "Using bilingual dependencies to align words in Enlish/French parallel corpora"
This paper describes a word and phrase alignment approach based on a dependency analysis of French/English parallel corpora, referred to as alignment by “syntax-based propagation.” Both corpora are analysed with a deep and robust dependency parser. Starting with an anchor pair consisting of two words that are translations of one another within aligned sentences, the alignment link is propagated to syntactically connected words. | Using bilingual dependencies to align words in Enlish French parallel corpora Sylwia Ozdowska ERSS - CNRS Université de Toulouse le Mirail 5 allées Antonio Machado 31058 Toulouse Cedex France ozdowska@ Abstract This paper describes a word and phrase alignment approach based on a dependency analysis of French English parallel corpora referred to as alignment by syntax-based propagation. Both corpora are analysed with a deep and robust dependency parser. Starting with an anchor pair consisting of two words that are translations of one another within aligned sentences the alignment link is propagated to syntactically connected words. 1 Introduction It is now an acknowledged fact that alignment of parallel corpora at the word and phrase level plays a major role in bilingual linguistic resource extraction and machine translation. There are basically two kinds of systems working at these segmentation levels the most widespread rely on statistical models in particular the IBM ones Brown et al. 1993 others combine simpler association measures with different kinds of linguistic information Arhenberg et al. 2000 Barbu 2004 . Mainly dedicated to machine translation purely statistical systems have gradually been enriched with syntactic knowledge Wu 2000 Yamada Knight 2001 Ding et al. 2003 Lin Cherry 2003 . As pointed out in these studies the introduction of linguistic knowledge leads to a significant improvement in alignment quality. In the method described hereafter syntactic information is the kernel of the alignment process. In deed syntactic dependencies identified on both sides of English French bitexts with a parser are used to discover correspondences between words. This approach has been chosen in order to capture frequent alignments as well as sparse and or corpus-specific ones. Moreover as stressed in previous research using syntactic dependencies seems to be particularly well suited to coping with the problem of linguistic variation across languages Hwa
đang nạp các trang xem trước