tailieunhanh - Báo cáo khoa học: "Efficient CCG Parsing: A* versus Adaptive Supertagging"
We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. | Efficient CCG Parsing A versus Adaptive Supertagging Michael Auli School of Informatics University of Edinburgh Adam Lopez hltcoE Johns Hopkins University alopez@ Abstract We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar CCG . First we consider adaptive supertagging a widely used approximate search technique that prunes most lexical categories from the parser s search space using a separate sequence model. Next we consider several variants on A a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A parsing on the more realistic but more stringent metric of CPU time. By itself A substantially reduces parser effort as measured by the number of edges considered during parsing but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A with adaptive supertagging decreases CPU time by 15 for our best model. 1 Introduction Efficient parsing of Combinatorial Categorial Grammar CCG Steedman 2000 is a longstanding problem in computational linguistics. Even with practical CCG that are strongly context-free Fowler and Penn 2010 parsing can be much harder than with Penn Treebank-style context-free grammars since the number of nonterminal categories is generally much larger leading to increased grammar constants. Where a typical Penn Treebank grammar 1577 may have fewer than 100 nonterminals Hocken-maier and Steedman 2002 we found that a CCG grammar derived from CCGbank contained nearly 1600. The same grammar assigns an average of 26 lexical categories per word resulting in a very large space of possible derivations. The most successful strategy to date for efficient parsing of CCG is to first prune the set
đang nạp các trang xem trước