tailieunhanh - Báo cáo khoa học: "Combining Clues for Word Alignment"
In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by calculating similarity measures or learned from word aligned data. The clue alignment approach, which is proposed in this paper, makes it possible to combine association clues taking different kinds of linguistic information into account. . | Combining Clues for Word Alignment Jorg Tiedemann Department of Linguistics Uppsala University Box 527 SE-751 20 Uppsala Sweden joerg@ Abstract In this paper a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency part-of-speech phrase type and the actual wordform strings. Clues can be found by calculating similarity measures or learned from word aligned data. The clue alignment approach which is proposed in this paper makes it possible to combine association clues taking different kinds of linguistic information into account. It allows a dynamic to-kenization into token units of varying size. The approach has been applied to an English Swedish parallel text with promising results. 1 Introduction Parallel corpora carry a huge amount of bilingual lexical information. Word alignment approaches focus on the automatic identification of translation relations in translated texts. Alignments are usually represented as a set of links between words and phrases of source and target language segments. An alignment can be complete . all items in both segments have been linked to corresponding items in the other language or incomplete otherwise. Alignments may include null links which can be modeled as links to an empty element . In word alignment we have to find an appropriate model M for the alignment of source and target language texts modeling estimate parameters of the model M . from empirical data parameter estimation find the optimal alignment of words and phrases for a given translation according to the model M and its parameters alignment recovery . Modeling the relations between lexical units of translated texts is not a trivial task due to the diversity of natural languages. There are generally two approaches the estimation approach which is used in . statistical machine translation and the association .
đang nạp các trang xem trước