tailieunhanh - Báo cáo khoa học: "An alternative method of training probabilistic LR parsers"

We discuss existing approaches to train LR parsers, which have been used for statistical resolution of structural ambiguity. These approaches are nonoptimal, in the sense that a collection of probability distributions cannot be obtained. In particular, some probability distributions expressible in terms of a context-free grammar cannot be expressed in terms of the LR parser constructed from that grammar, under the restrictions of the existing approaches to training of LR parsers. We present an alternative way of training that is provably optimal, and that allows all probability distributions expressible in the context-free grammar to be carried over to the. | An alternative method of training probabilistic LR parsers Mark-Jan Nederhof Faculty of Arts University of Groningen . Box 716 NL-9700 AS Groningen The Netherlands markjan@ Abstract We discuss existing approaches to train LR parsers which have been used for statistical resolution of structural ambiguity. These approaches are non-optimal in the sense that a collection of probability distributions cannot be obtained. In particular some probability distributions expressible in terms of a context-free grammar cannot be expressed in terms of the LR parser constructed from that grammar under the restrictions of the existing approaches to training of LR parsers. We present an alternative way of training that is provably optimal and that allows all probability distributions expressible in the context-free grammar to be carried over to the LR parser. We also demonstrate empirically that this kind of training can be effectively applied on a large treebank. 1 Introduction The LR parsing strategy was originally devised for programming languages Sippu and Soisalon-Soininen 1990 but has been used in a wide range of other areas as well such as for natural language processing Lavie and Tomita 1993 Briscoe and Carroll 1993 Ruland 2000 . The main difference between the application to programming languages and the application to natural languages is that in the latter case the parsers should be nondetermin-istic in order to deal with ambiguous context-free grammars CFGs . Nondeterminism can be handled in a number of ways but the most efficient is tabulation which allows processing in polynomial time. Tabular LR parsing is known from the work by Tomita 1986 but can also be achieved by the generic tabulation technique due to Lang 1974 Billot and Lang 1989 which assumes an input pushdown transducer PDT . In this context the LR parsing strategy can be seen as a particular mapping from context-free grammars to PDTs. The acronym LR stands for Left-to-right processing of the .

TỪ KHÓA LIÊN QUAN