tailieunhanh - Báo cáo khoa học: "Measure Word Generation for English-Chinese SMT Systems"
Measure words in Chinese are used to indicate the count of nouns. Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words. In this paper, we propose a statistical model to generate appropriate measure words of nouns for an English-to-Chinese SMT system. We model the probability of measure word generation by utilizing lexical and syntactic knowledge from both source and target sentences. . | Measure Word Generation for English-Chinese SMT Systems Dongdong Zhang1 Mu Li1 Nan Duan2 Chi-Ho Li1 Ming Zhou1 Microsoft Research Asia 2Tianjin University Beijing China Tianjin China dozhang muli v-naduan chl mingzhou @ Abstract Measure words in Chinese are used to indicate the count of nouns. Conventional statistical machine translation SMT systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words. In this paper we propose a statistical model to generate appropriate measure words of nouns for an English-to-Chinese SMT system. We model the probability of measure word generation by utilizing lexical and syntactic knowledge from both source and target sentences. Our model works as a post-processing procedure over output of statistical machine translation systems and can work with any SMT system. Experimental results show our method can achieve high precision and recall in measure word generation. 1 Introduction In linguistics measure words MW are words or morphemes used in combination with numerals or demonstrative pronouns to indicate the count of nouns which are often referred to as head words HW . Chinese measure words are grammatical units and occur quite often in real text. According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium LDC for Chinese-to-English machine translation evaluation the average occurrence is and 9 measure The uncommon cases of verbs are not considered. words per sentence respectively. Unlike in Chinese there is no special set of measure words in English. Measure words are usually used for mass nouns and any semantically appropriate nouns can function as the measure words. For example in the phrase three bottles of water the word bottles acts as a measure word. Countable nouns are almost never modified by .
đang nạp các trang xem trước