tailieunhanh - Báo cáo khoa học: "Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora"
This paper presents a new model for word alignments between parallel sentences, which allows one to accurately estimate different parameters, in a computationally efficient way. An application of this model to bilingual terminology extraction, where terms are identified in one language and guessed, through the alignment process, in the other one, is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision, demonstrating the validity of the model. . | Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora Eric Gaussier Xerox Research Centre Europe 6 Chemin de Maupertuis 38240 Meylan F. Abstract This paper presents a new model for word alignments between parallel sentences which allows one to accurately estimate different parameters in a computationally efficient way. An application of this model to bilingual terminology extraction where terms are identified in one language and guessed through the alignment process in the other one is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision demonstrating the validity of the model. 1 Introduction Early works Gale and Church 1993 Brown et al. 1993 and to a certain extent Kay and Roscheisen 1993 presented methods to extract bilingual lexicons of words from a parallel corpus relying on the distribution of the words in the set of parallel sentences or other units . Brown et al. 1993 then extended their method and established a sound probabilistic model series relying on different parameters describing how words within paraUel sentences are aligned to each other. On the other hand Dagan et al. 1993 proposed an algorithm borrowed to the field of dynamic programming and based on the output of their previous work to find the best alignment subject to certain constraints between words in parallel sentences. A similar algorithm was used by Vogel et al. 1996 . Investigating alignments at the sentence level allows to clean and to refine the lexicons otherwise extracted from a parallel corpus as a whole . pruning what Melamed 1996 calls indirect associations . Now what differentiates the models and algorithms proposed are the sets of parameters and constraints they rely on their ability to find an appropriate solution under the constraints de fined and their ability to nicely integrate new parameters. We want to present here a model of the possible .
đang nạp các trang xem trước