tailieunhanh - Báo cáo khoa học: "Lexical transfer using a vector-space model"

It exploits a vector-space model developed in information retrieval research. We present a preliminary result from our computational experiment. Introduction Many machine translation systems have been developed and commercialized. When these systems are faced with unknown domains, however, their performance degrades. Although there are several reasons behind this poor performance, in this paper, we concentrate on one of the major problems, ., building a bilingual dictionary for transfer. | Lexical transfer using a vector-space model Eiichiro SUMITA ATR Spoken Language Translation Research Laboratories 2-2 Hikaridai Seika Soraku Kyoto 619-0288 Japan sumita@ Abstract Building a bilingual dictionary for transfer in a machine translation system is conventionally done by hand and is very time-consuming. In order to overcome this bottleneck we propose a new mechanism for lexical transfer which is simple and suitable for learning from bilingual corpora. It exploits a vector-space model developed in information retrieval research. We present a preliminary result from our computational experiment. Introduction Many machine translation systems have been developed and commercialized. When these systems are faced with unknown domains however their performance degrades. Although there are several reasons behind this poor performance in this paper we concentrate on one of the major problems . building a bilingual dictionary for transfer. A bilingual dictionary consists of rules that map a part of the representation of a source sentence to a target representation by taking grammatical differences such as the word order between the source and target languages into consideration. These rules usually use case-frames as their base and accompany syntactic and or semantic constraints on mapping from a source word to a target word. For many machine translation systems experienced experts on individual systems compile the bilingual dictionary because this is a complicated and difficult task. In other words this task is knowledge-intensive and labor-intensive and therefore time-consuming. Typically the developer of a machine translation system has to spend several years building a general-purpose bilingual dictionary. Unfortunately such a general-purpose dictionary is not almighty in that 1 when faced with a new domain unknown source words may emerge and or some domain-specific usages of known words may appear and 2 the accuracy of the target word selection