tailieunhanh - Báo cáo khoa học: "Estimating Compact Yet Rich Tree Insertion Grammars"
We present a Bayesian nonparametric model for estimating tree insertion grammars (TIG), building upon recent work in Bayesian inference of tree substitution grammars (TSG) via Dirichlet processes. Under our general variant of TIG, grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal, which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blunsom (2010). | Estimating Compact Yet Rich Tree Insertion Grammars Elif Yamangil and Stuart M. Shieber Harvard University Cambridge Massachusetts USA elif shieber @ Abstract We present a Bayesian nonparametric model for estimating tree insertion grammars TIG building upon recent work in Bayesian inference of tree substitution grammars TSG via Dirichlet processes. Under our general variant of TIG grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blun-som 2010 . We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data. 1 Introduction There is a deep tension in statistical modeling of grammatical structure between providing good expressivity to allow accurate modeling of the data with sparse grammars and low complexity making induction of the grammars and parsing of novel sentences computationally practical. Recent work that incorporated Dirichlet process DP nonparametric models into TSGs has provided an efficient solution to the problem of segmenting training data trees into elementary parse tree fragments to form the grammar Cohn et al. 2009 Cohn and Blunsom 2010 Post and Gildea 2009 . DP inference tackles this problem by exploring the space of all possible segmentations of the data in search for fragments that are on the one hand large enough so 110 that they incorporate the useful dependencies and on the other small enough so that they recur and have a chance to be useful in analyzing unseen data. The elementary trees combined in a TSG are intuitively primitives of the language yet certain linguistic phenomena notably various forms of modification split them up preventing their reuse leading to less sparse grammars .
đang nạp các trang xem trước