tailieunhanh - Báo cáo khoa học: "Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus"

Example-based parsing has already been proposed in literature. In particular, attempts are being made to develop techniques for language pairs where the source and target languages are different, . Direct Projection Algorithm (Hwa et al., 2005). This enables one to develop parsed corpus for target languages having fewer linguistic tools with the help of a resourcerich source language. The DPA algorithm works on the assumption of Direct Correspondence which simply means that the relation between two words of the source language sentence can be projected directly between the corresponding words of the parallel target language sentence. However, we find. | Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus Shailly Goyal Niladri Chatterjee Department of Mathematics Indian Institute of Technology Delhi Hauz Khas New Delhi - 110 016 India shailly_goyal niladri_iitd @ Abstract Example-based parsing has already been proposed in literature. In particular attempts are being made to develop techniques for language pairs where the source and target languages are different . Direct Projection Algorithm Hwa et al. 2005 . This enables one to develop parsed corpus for target languages having fewer linguistic tools with the help of a resourcerich source language. The DPA algorithm works on the assumption of Direct Correspondence which simply means that the relation between two words of the source language sentence can be projected directly between the corresponding words of the parallel target language sentence. However we find that this assumption does not hold good all the time. This leads to wrong parsed structure of the target language sentence. As a solution we propose an algorithm called pseudo DPA pDPA that can work even if Direct Correspondence assumption is not guaranteed. The proposed algorithm works in a recursive manner by considering the embedded phrase structures from outermost level to the innermost. The present work discusses the pDPA algorithm and illustrates it with respect to English-Hindi language pair. Link Grammar based parsing has been considered as the underlying parsing scheme for this work. 1 Introduction Example-based approaches for developing parsers have already been proposed in literature. These approaches either use examples from the same language . Bod et al. 2003 Streiter 2002 or they try to imitate the parse of a given sentence using the parse of the corresponding sentence in some other language Hwa et al. 2005 Yarowsky and Ngai 2001 . In particular Hwa et al. 2005 have proposed a scheme called direct projection algorithm DPA which assumes