tailieunhanh - Báo cáo khoa học: "Simple, Accurate Parsing with an All-Fragments Grammar"

We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. | Simple Accurate Parsing with an All-Fragments Grammar Mohit Bansal and Dan Klein Computer Science Division University of California Berkeley mbansal klein @ Abstract We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple deterministic grammar symbol refinement in contrast to recent work on latent symbol refinement. Moreover our parser requires no explicit lexicon machinery instead parsing input sentences as character streams. Despite its simplicity our parser achieves accuracies of over 88 F1 on the standard English WSJ task which is competitive with substantially more complicated state-of-the-art lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient including a coarse-to-fine inference scheme and a new graph encoding. 1 Introduction Modern NLP systems have increasingly used data-intensive models that capture many or even all substructures from the training data. In the domain of syntactic parsing the idea that all training fragments1 might be relevant to parsing has a long history including tree-substitution grammar data-oriented parsing approaches Scha 1990 Bod 1993 Goodman 1996a Chiang 2003 and tree kernel approaches Collins and Duffy 2002 . For machine translation the key modern advancement has been the ability to represent and memorize large training substructures be it in contiguous phrases Koehn et al. 2003 or syntactic trees 1 In this paper a fragment means an elementary tree in a tree-substitution grammar while a subtree means a fragment that bottoms out in terminals. Galley et al. 2004 Chiang 2005 Deneefe and Knight 2009 . In all such systems a central challenge is efficiency there are generally a combinatorial number of .

TỪ KHÓA LIÊN QUAN