tailieunhanh - Báo cáo khoa học: "Two Languages Are More Informative Than One *"

This paper presents a new approach for resolving lexical ambiguities in one language using statistical data on lexical relations in another language. This approach exploits the differences between mappings of words to senses in different languages. We concentrate on the problem of target word selection in machine translation, for which the approach is directly applicable, and employ a statistical model for the selection mechanism. The model was evaluated using two sets of Hebrew and German examples and was found to be very useful for disambiguation. . | Two Languages Are More Informative Than One Ido Dagan Computer Science Department Technion Haifa Israel and IBM Scientific Center Haifa Israel dagan@ Alon Itai Computer Science Department Technion Haifa Israel itai@ Ulrike Schwall IBM Scientific Center Institute for Knowledge Based Systems Heidelberg Germany schwall@dhdibml Abstract This paper presents a new approach for resolving lexical ambiguities in one language using statistical data on lexical relations in another language. This approach exploits the differences between mappings of words to senses in different languages. We concentrate on the problem of target word selection in machine translation for which the approach is directly applicable and employ a statistical model for the selection mechanism. The model was evaluated using two sets of Hebrew and German examples and was found to be very useful for disambiguation. 1 Introduction The resolution of lexical ambiguities in non-restricted text is one of the most difficult tasks of natural language processing. A related task in machine translation is target word selection - the task of deciding which target language word is the most appropriate equivalent of a source language word in context. In addition to the alternatives introduced from the different word senses of the source language word the target language may specify additional alternatives that differ mainly in their usages. Traditionally various linguistic levels were used to deal with this problem syntactic semantic and pragmatic. Computationally the syntactic methods are the easiest but are of no avail in the frequent situation when the different senses of the word show This research was partially supported by grant number 120-741 of the Israel Council for Research and Development the same syntactic behavior having the same part of speech and even the same subcategorization frame. Substantial application of semantic or pragmatic knowledge about the word and its .

TỪ KHÓA LIÊN QUAN