tailieunhanh - Báo cáo khoa học: "Statistical Machine Translation by Parsing"

In an ordinary syntactic parser, the input is a string, and the grammar ranges over strings. This paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and/or the grammar to range over string tuples. Such algorithms can infer the synchronous structures hidden in parallel texts. It turns out that these generalized parsers can do most of the work required to train and apply a syntax-aware statistical machine translation system. | Statistical Machine Translation by Parsing I. Dan Melamed Computer Science Department New York University New York NY . 10003-6806 lastname @ Abstract In an ordinary syntactic parser the input is a string and the grammar ranges over strings. This paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and or the grammar to range over string tuples. Such algorithms can infer the synchronous structures hidden in parallel texts. It turns out that these generalized parsers can do most of the work required to train and apply a syntax-aware statistical machine translation system. 1 Introduction A parser is an algorithm for inferring the structure of its input guided by a grammar that dictates what structures are possible or probable. In an ordinary parser the input is a string and the grammar ranges over strings. This paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and or the grammar to range over string tuples. Such inference algorithms can perform various kinds of analysis on parallel texts also known as multitexts. Figure 1 shows some of the ways in which ordinary parsing can be generalized. A synchronous parser is an algorithm that can infer the syntactic structure of each component text in a multitext and simultaneously infer the correspondence relation between these structures. 1 When a parser s input can have fewer dimensions than the parser s grammar we call it a translator. When a parser s grammar can have fewer dimensions than the parser s input we call it a synchronizer. The corresponding processes are called translation and synchronization. To our knowledge synchronization has never been explored as a class of algorithms. Neither has the relationship between parsing and word alignment. The relationship between translation and ordinary parsing was noted a long time 1A suitable set of ordinary parsers can also infer the syntac- .