Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Translating from Morphologically Complex Languages: A Paraphrase-Based Approach"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. | Translating from Morphologically Complex Languages A Paraphrase-Based Approach Preslav Nakov Department of Computer Science National University of Singapore 13 Computing Drive Singapore 117417 nakov@comp.nus.edu.sg Hwee Tou Ng Department of Computer Science National University of Singapore 13 Computing Drive Singapore 117417 nght@comp.nus.edu.sg Abstract We propose a novel approach to translating from a morphologically complex language. Unlike previous research which has targeted word inflections and concatenations we focus on the pairwise relationship between morphologically related words which we treat as potential paraphrases and handle using paraphrasing techniques at the word phrase and sentence level. An important advantage of this framework is that it can cope with derivational morphology which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay whose morphology is mostly derivational into English show significant improvements over rivaling approaches based on five automatic evaluation measures for 320 000 sentence pairs 9.5 million English word tokens . 1 Introduction Traditionally statistical machine translation SMT models have assumed that the word should be the basic token-unit of translation thus ignoring any wordinternal morphological structure. This assumption can be traced back to the first word-based models of IBM Brown et al. 1993 which were initially proposed for two languages with limited morphology French and English. While several significantly improved models have been developed since then including phrase-based Koehn et al. 2003 hierarchical Chiang 2005 treelet Quirk et al. 2005 and syntactic Galley et al. 2004 models they all preserved the assumption that words should be atomic. 1298 Ignoring morphology was fine as long as the main research interest remained focused on languages with limited e.g. English French Spanish or minimal e.g. Chinese morphology. Since