tailieunhanh - Báo cáo khoa học: "Machine Translation between Turkic Languages"

We present an approach to MT between Turkic languages and present results from an implementation of a MT system from Turkmen to Turkish. Our approach relies on ambiguous lexical and morphological transfer augmented with target side rule-based repairs and rescoring with statistical language models. | Machine Translation between Turkic Languages A. Cuneyd TANTUG Istanbul Technical University Istanbul Turkey tantug@ Abstract Esref ADALI Istanbul Technical University Istanbul Turkey adali@ o Kemal OFLAZER Sabanci University Istanbul Turkey azer@ We present an approach to MT between Turkic languages and present results from an implementation of a MT system from Turkmen to Turkish. Our approach relies on ambiguous lexical and morphological transfer augmented with target side rule-based repairs and rescoring with statistical language models. 1 Introduction Machine translation is certainly one of the toughest problems in natural language processing. It is generally accepted however that machine translation between close or related languages is simpler than full-fledged translation between languages that differ substantially in morphological and syntactic structure. In this paper we present a machine translation system from Turkmen to Turkish both of which belong to the Turkic language family. Turkic languages essentially exhibit the same characteristics at the morphological and syntactic levels. However except for a few pairs the languages are not mutually intelligible owing to substantial divergences in their lexicons possibly due to different regional and historical influences. Such divergences at the lexical level along with many but minor divergences at morphological and syntactic levels make the translation problem rather non-trivial. Our approach is based on essentially morphological processing and direct lexical and morphological transfer augmented with substantial multi-word processing on the source language side and statistical processing on the target side where data for statistical language modelling is more readily available. 189 2 Related Work Studies on machine translation between close languages are generally concentrated around certain Slavic languages . Czech Slovak Czech Polish Czech Lithuanian Hajic et al. 2003

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN