tailieunhanh - Báo cáo khoa học: "Computational Complexity of Statistical Machine Translation"

In this paper we study a set of problems that are of considerable importance to Statistical Machine Translation (SMT) but which have not been addressed satisfactorily by the SMT research community. Over the last decade, a variety of SMT algorithms have been built and empirically tested whereas little is known about the computational complexity of some of the fundamental problems of SMT. Our work aims at providing useful insights into the the computational complexity of those problems. We prove that while IBM Models 1-2 are conceptually and computationally simple, computations involving the higher (and more useful) models are hard | Computational Complexity of Statistical Machine Translation Raghavendra Udupa U. IBM India Research Lab New Delhi India uraghave@ Hemanta K. Maji Dept. of Computer Science University of Illinois at Urbana-Champaigne Abstract In this paper we study a set of problems that are of considerable importance to Statistical Machine Translation SMT but which have not been addressed satisfactorily by the SMT research community. Over the last decade a variety of SMT algorithms have been built and empirically tested whereas little is known about the computational complexity of some of the fundamental problems of SMT. Our work aims at providing useful insights into the the computational complexity of those problems. We prove that while IBM Models 1-2 are conceptually and computationally simple computations involving the higher and more useful models are hard. Since it is unlikely that there exists a polynomial time solution for any of these hard problems unless P NP and P P P our results highlight and justify the need for developing polynomial time approximations for these computations. We also discuss some practical ways of dealing with complexity. 1 Introduction Statistical Machine Translation is a data driven machine translation technique which uses probabilistic models of natural language for automatic translation Brown et al. 1993 Al-Onaizan et al. 1999 . The parameters of the models are estimated by iterative maximum-likelihood training on a large parallel corpus of natural language texts using the EM algorithm Brown et al. 1993 . The models are then used to decode . translate texts from the source language to the target language 1 2 Tillman 2001 Wang 1997 Germann et al. 2003 Udupa et al. 2004 . The models are independent of the language pair and therefore can be used to build a translation system for any language pair as long as a parallel corpus of texts is available for training. Increasingly parallel corpora are becoming available .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.