tailieunhanh - Báo cáo khoa học: "Probabilistic Parsing Strategies"

We present new results on the relation between context-free parsing strategies and their probabilistic counter-parts. We provide a necessary condition and a sufficient condition for the probabilistic extension of parsing strategies. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. | Probabilistic Parsing Strategies Mark-Jan Nederhof Faculty of Arts University of Groningen . Box 716 NL-9700 AS Groningen The Netherlands markjan@ Abstract We present new results on the relation between context-free parsing strategies and their probabilistic counter-parts. We provide a necessary condition and a sufficient condition for the probabilistic extension of parsing strategies. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. 1 Introduction Context-free grammars CFGs are standardly used in computational linguistics as formal models of the syntax of natural language associating sentences with all their possible derivations. Other computational models with the same generative capacity as CFGs are also adopted as for instance push-down automata PDAs . One of the advantages of the use of PDAs is that these devices provide an operational specification that determines which steps must be performed when parsing an input string something that is not offered by CFGs. In other words PDAs can be associated to parsing strategies for context-free languages. More precisely parsing strategies are traditionally specified as constructions that map CFGs to language-equivalent PDAs. Popular examples of parsing strategies are the standard constructions of top-down PDAs Harrison 1978 leftcorner PDAs Rosenkrantz and Lewis II 1970 shift-reduce PDAs Aho and Ullman 1972 and LR PDAs Sippu and Soisalon-Soininen 1990 . CFGs and PDAs have probabilistic counterparts called probabilistic CFGs PCFGs and probabilistic PDAs PPDAs . These models are very popular in natural language processing applications where they are used to define a probability distribution function on the domain of all derivations for sentences in the language of interest. In PCFGs and PPDAs probabilities are assigned to rules or transitions respectively. However these probabilities cannot be chosen entirely arbitrarily. For .