tailieunhanh - Báo cáo khoa học: "Word Alignment in English-Hindi Parallel Corpus Using Recency-Vector Approach: Some Studies"
Word alignment using recency-vector based approach has recently become popular. One major advantage of these techniques is that unlike other approaches they perform well even if the size of the parallel corpora is small. This makes these algorithms worth-studying for languages where resources are scarce. In this work we studied the performance of two very popular recency-vector based approaches, proposed in (Fung and McKeown, 1994) and (Somers, 1998), respectively, for word alignment in English-Hindi parallel corpus. But performance of the above algorithms was not found to be satisfactory | Word Alignment in English-Hindi Parallel Corpus Using Recency-Vector Approach Some Studies Niladri Chatterjee Department of Mathematics Indian Institute of Technology Delhi Hauz Khas New Delhi INDIA - 110016 niladri_iitd@ Saumya Agrawal Department of Mathematics Indian Institute of Technology Kharagpur West Bengal INDIA-721302 saumya_agrawal2000@ Abstract Word alignment using recency-vector based approach has recently become popular. One major advantage of these techniques is that unlike other approaches they perform well even if the size of the parallel corpora is small. This makes these algorithms worth-studying for languages where resources are scarce. In this work we studied the performance of two very popular recency-vector based approaches proposed in Fung and McKeown 1994 and Somers 1998 respectively for word alignment in English-Hindi parallel corpus. But performance of the above algorithms was not found to be satisfactory. However subsequent addition of some new constraints improved the performance of the recency-vector based alignment technique significantly for the said corpus. The present paper discusses the new version of the algorithm and its performance in detail. 1 Introduction Several approaches including statistical techniques Gale and Church 1991 Brown et al. 1993 lexical techniques Huang and Choi 2000 Tiedemann 2003 and hybrid techniques Ahren-berg et al. 2000 have been pursued to design schemes for word alignment which aims at establishing links between words of a source language and a target language in a parallel corpus. All these schemes rely heavily on rich linguistic resources either in the form of huge data of parallel texts or various language grammar related tools such as parser tagger morphological analyser etc. Recency vector based approach has been proposed as an alternative strategy for word alignment. Approaches based on recency vectors typically consider the positions of the word in the corresponding texts .
đang nạp các trang xem trước