tailieunhanh - Báo cáo khoa học: "Bayesian Learning of a Tree Substitution Grammar"

Tree substitution grammars (TSGs) offer many advantages over context-free grammars (CFGs), but are hard to learn. Past approaches have resorted to heuristics. In this paper, we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significantly better than heuristically extracted ones on parsing accuracy. | Bayesian Learning of a Tree Substitution Grammar Matt Post and Daniel Gildea Department of Computer Science University of Rochester Rochester NY 14627 Abstract Tree substitution grammars TSGs offer many advantages over context-free grammars CFGs but are hard to learn. Past approaches have resorted to heuristics. In this paper we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significantly better than heuristically extracted ones on parsing accuracy. 1 Introduction Tree substition grammars TSGs have potential advantages over regular context-free grammars CFGs but there is no obvious way to learn these grammars. In particular learning procedures are not able to take direct advantage of manually annotated corpora like the Penn Treebank which are not marked for derivations and thus assume a standard CFG. Since different TSG derivations can produce the same parse tree learning procedures must guess the derivations the number of which is exponential in the tree size. This compels heuristic methods of subtree extraction or maximum likelihood estimators which tend to extract large subtrees that overfit the training data. These problems are common in natural language processing tasks that search for a hidden segmentation. Recently many groups have had success using Gibbs sampling to address the complexity issue and nonparametric priors to address the overfitting problem DeNero et al. 2008 Goldwater et al. 2009 . In this paper we apply these techniques to learn a tree substitution grammar evaluate it on the Wall Street Journal parsing task and compare it to previous work. 2 Model Tree substitution grammars TSGs extend CFGs and their probabilistic counterparts which concern us here by allowing nonterminals to be rewritten as subtrees of arbitrary size. Although nonterminal rewrites are still context-free in practice TSGs can loosen the independence assumptions of CFGs because larger rules capture more .

TỪ KHÓA LIÊN QUAN