tailieunhanh - Báo cáo khoa học: "An Earley Parsing Algorithm for Range Concatenation Grammars"

We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart. | An Earley Parsing Algorithm for Range Concatenation Grammars Laura Kallmeyer SFB 441 Universitat Tubingen 72074 Tubingen Germany lk@ Wolfgang Maier SFB 441 Universitat Tubingen 72074 Tubingen Germany Yannick Parmentier CNRS - LORIA Nancy Universite 54506 Vandmuvre France parmenti@ Abstract We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar RCG using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that compared to previous approaches the constraint propagation helps to considerably decrease the number of items in the chart. 1 Introduction RCGs Boullier 2000 have recently received a growing interest in natural language processing S0gaard 2008 Sagot 2005 Kallmeyer et al. 2008 Maier and S0gaard 2008 . RCGs generate exactly the class of languages parsable in deterministic polynomial time Bertsch and Neder-hof 2001 . They are in particular more powerful than linear context-free rewriting systems LCFRS Vijay-Shanker et al. 1987 . LCFRS is unable to describe certain natural language phenomena that RCGs actually can deal with. One example are long-distance scrambling phenomena Becker et al. 1991 Becker et al. 1992 . Other examples are non-semilinear constructions such as case stacking in Old Georgian Michaelis and Kracht 1996 and Chinese number names Radzinski 1991 . Boullier 1999 shows that RCGs can describe the permutations occurring with scrambling and the construction of Chinese number names. Parsing algorithms for RCG have been introduced by Boullier 2000 who presents a directional top-down parsing algorithm using pseudocode and Barthelemy et al. 2001 who add an oracle to Boullier s algorithm. The more restricted class of LCFRS has received more attention concerning parsing Villemonte de la Clergerie 2002