tailieunhanh - Báo cáo khoa học: "cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models"
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. . | cdec A Decoder Alignment and Learning Framework for Finite-State and Context-Free Translation Models Chris Dyer University of Maryland redpony@ Jonathan Weese Johns Hopkins University jweese@ Hendra Setiawan University of Maryland hendra@ Adam Lopez University of Edinburgh alopez@ Ferhan Ture University of Maryland fture@ Vladimir Eidelman University of Maryland vlad@ Juri Ganitkevitch Johns Hopkins University juri@ Phil Blunsom Oxford University pblunsom@ Philip Resnik University of Maryland resnik@ Abstract We present cdec an open source framework for decoding aligning with and training a number of statistical machine translation models including word-based models phrase-based models and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests the decoder strictly separates model-specific translation logic from general rescoring pruning and inference algorithms. From this unified representation the decoder can extract not only the 1- or k-best translations but also alignments to a reference or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C implementation means that memory use and runtime performance are significantly better than comparable decoders. 1 Introduction The dominant models used in machine translation and sequence tagging are formally based on either weighted finite-state transducers FSTs or weighted synchronous context-free grammars SCFGs Lopez 2008 . Phrase-based models Koehn et al. 2003 lexical translation models Brown et al. 1993 and finite-state conditional random fields Sha and Pereira 2003 exemplify the former and hierarchical phrase-based models the latter Chiang 2007 . We introduce a software package called cdec that manipulates both classes in a unified Although open source .
đang nạp các trang xem trước