Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. . | cdec A Decoder Alignment and Learning Framework for Finite-State and Context-Free Translation Models Chris Dyer University of Maryland redpony@umd.edu Jonathan Weese Johns Hopkins University jweese@cs.jhu.edu Hendra Setiawan University of Maryland hendra@umiacs.umd.edu Adam Lopez University of Edinburgh alopez@inf.ed.ac.uk Ferhan Ture University of Maryland fture@cs.umd.edu Vladimir Eidelman University of Maryland vlad@umiacs.umd.edu Juri Ganitkevitch Johns Hopkins University juri@cs.jhu.edu Phil Blunsom Oxford University pblunsom@comlab.ox.ac.uk Philip Resnik University of Maryland resnik@umiacs.umd.edu Abstract We present cdec an open source framework for decoding aligning with and training a number of statistical machine translation models including word-based models phrase-based models and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests the decoder strictly separates model-specific translation logic from general rescoring pruning and inference algorithms. From this unified representation the decoder can extract not only the 1- or k-best translations but also alignments to a reference or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C implementation means that memory use and runtime performance are significantly better than comparable decoders. 1 Introduction The dominant models used in machine translation and sequence tagging are formally based on either weighted finite-state transducers FSTs or weighted synchronous context-free grammars SCFGs Lopez 2008 . Phrase-based models Koehn et al. 2003 lexical translation models Brown et al. 1993 and finite-state conditional random fields Sha and Pereira 2003 exemplify the former and hierarchical phrase-based models the latter Chiang 2007 . We introduce a software package called cdec that manipulates both classes in a unified way.1 Although open source .