tailieunhanh - Báo cáo khoa học: "Applying Morphology Generation Models to Machine Translation"
We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages. Our inflection generation models are trained independently of the SMT system. | Applying Morphology Generation Models to Machine Translation Kristina Toutanova Microsoft Research Redmond WA USA kristout@ Hisami Suzuki Microsoft Research Redmond WA USA hisamis@ Achim Ruopp Butler Hill Group Redmond WA USA v-acruop@ Abstract We improve the quality of statistical machine translation SMT by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages. Our inflection generation models are trained independently of the SMT system. We investigate different ways of combining the inflection prediction component with the SMT system by training the base MT system on fully inflected forms or on word stems. We applied our inflection generation models in translating English into two morphologically complex languages Russian and Arabic and show that our model improves the quality of SMT over both phrasal and syntax-based SMT systems according to BLEU and human judgements. 1 Introduction One of the outstanding problems for further improving machine translation MT systems is the difficulty of dividing the MT problem into sub-problems and tackling each sub-problem in isolation to improve the overall quality of MT. Evidence for this difficulty is the fact that there has been very little work investigating the use of such independent subcomponents though we started to see some successful cases in the literature for example in word alignment Fraser and Marcu 2007 target language capitalization Wang et al. 2006 and case marker generation Toutanova and Suzuki 2007 . This paper describes a successful attempt to integrate a subcomponent for generating word inflections into a statistical machine translation SMT system. Our research is built on previous work in the area of using morpho-syntactic information for improving SMT. Work in this area is motivated by two advantages offered by morphological analysis 1 it provides linguistically .
đang nạp các trang xem trước