tailieunhanh - Báo cáo khoa học: "Dynamic programming for parsing and estimation of stochastic unification-based grammars∗"

Stochastic unification-based grammars (SUBGs) define exponential distributions over the parses generated by a unificationbased grammar (UBG). Existing algorithms for parsing and estimation require the enumeration of all of the parses of a string in order to determine the most likely one, or in order to calculate the statistics needed to estimate a grammar from a training corpus. This paper describes a graph-based dynamic programming algorithm for calculating these statistics from the packed UBG parse representations of Maxwell and Kaplan (1995) which does not require enumerating all parses. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 279-286. Dynamic programming for parsing and estimation of stochastic unification-based grammars Stuart Geman Division of Applied Mathematics Brown University geman@ Mark Johnson Cognitive and Linguistic Sciences Brown University MarUJohnson@ Abstract Stochastic unification-based grammars SUBGs define exponential distributions over the parses generated by a unificationbased grammar UBG . Existing algorithms for parsing and estimation require the enumeration of all of the parses of a string in order to determine the most likely one or in order to calculate the statistics needed to estimate a grammar from a training corpus. This paper describes a graph-based dynamic programming algorithm for calculating these statistics from the packed UBG parse representations of Maxwell and Kaplan 1995 which does not require enumerating all parses. Like many graphical algorithms the dynamic programming algorithm s complexity is worst-case exponential but is often polynomial. The key observation is that by using Maxwell and Kaplan packed representations the required statistics can be rewritten as either the max or the sum of a product of functions. This is exactly the kind of problem which can be solved by dynamic programming over graphical models. We would like to thank Eugene Charniak Miyao Yusuke Mark Steedman as well as Stefan Riezler and the team at Parc naturally all errors remain our own. This research was supported by NsF awards DMS 0074276 and ITR IIS 0085940. 1 Introduction Stochastic Unification-Based Grammars SUBGs use log-linear models also known as exponential or MaxEnt models and Markov Random Fields to define probability distributions over the parses of a unification grammar. These grammars can incorporate virtually all kinds of linguistically important constraints including non-local and non-context-free constraints and are .