tailieunhanh - Báo cáo khoa học: "Relating Probabilistic Grammars and Automata"

Both probabilistic context-free grammars (PCFGs) and shift-reduce probabilistic pushdown automata (PPDAs) have been used for language modeling and maximum likelihood parsing. We investigate the precise relationship between these two formalisms, showing that, while they define the same classes of probabilistic languages, they appear to impose different inductive biases. | Relating Probabilistic Grammars and Automata Steven Abney David McAllester Fernando Pereira AT T Labs-Research 180 Park Ave Florham Park NJ 07932 abney dmac pereira Abstract Both probabilistic context-free grammars PCFGs and shift-reduce probabilistic pushdown automata PPDAs have been used for language modeling and maximum likelihood parsing. We investigate the precise relationship between these two formalisms showing that while they define the same classes of probabilistic languages they appear to impose different inductive biases. 1 Introduction Current work in stochastic language models and maximum likelihood parsers falls into two main approaches. The first approach Collins 1998 Charniak 1997 uses directly the definition of stochastic grammar defining the probability of a parse tree as the probability that a certain top-down stochastic generative process produces that tree. The second approach Briscoe and Carroll 1993 Black et al. 1992 Magerman 1994 Ratnaparkhi 1997 Chelba and Jelinek 1998 defines the probability of a parse tree as the probability that a certain shift-reduce stochastic parsing automaton outputs that tree. These two approaches correspond to the classical notions of context-free grammars and nondeterministic pushdown automata respectively. It is well known that these two classical formalisms define the same language class. In this paper we show that probabilistic context-free grammars PCFGs and probabilistic pushdown automata PPDAs define the same class of distributions on strings thus extending the classical result to the stochastic case. We also touch on the perhaps more interesting question of whether PCFGs and shift-reduce parsing models have the same inductive bias with respect to the automatic learning of model parameters from data. Though we cannot provide a definitive answer the constructions we use to answer the equivalence question involve blowups in the number of parameters in both directions suggesting that the two .