tailieunhanh - Báo cáo khoa học: "An Efficient Implementation of a New POP Model"

Two apparently opposing DOP models exist in the literature: one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which outperforms each of them separately. Together with a PCFGreduction of DOP we obtain improved accuracy and efficiency on the Wall Street Journal treebank Our results show an 11% relative reduction in error rate over previous models, and an average processing time of seconds per WSJ sentence. . | An Efficient Implementation of a New DOP Model Rens Bod ILLC University of Amsterdam School of Computing University of Leeds Nieuwe Achtergracht 166 NL-1018 wv Amsterdam rens@ Abstract Two apparently opposing DOP models exist in the literature one which computes the parse tree involving the most frequent subtrees from a treebank and one which computes the parse tree involving the fewest subtrees from a treebank. This paper proposes an integration of the two models which outperforms each of them separately. Together with a PCFG-reduction of DOP we obtain improved accuracy and efficiency on the Wall Street Journal treebank. Our results show an 11 relative reduction in error rate over previous models and an average processing time of seconds per WSJ sentence. 1 Introduction A Little History DOP and its Doppelgangers1 The distinctive feature of the DOP approach when it was proposed in 1992 was to model sentence structures on the basis of previously observed frequencies of sentence structure fragments without imposing any constraints on the size of these fragments. Fragments include for instance subtrees of depth 1 coưesponding to context-free rules as well as entire trees. To appreciate these innovations it should be noted that the model was radically different from all other statistical parsing models at the time. Other models started off with a predefined grammar and used a corpus only for estimating the rule probabilities as . in Fujisaki et al. 1989 Black et al. 1992 1993 Briscoe and 1 Thanks to Ivan Sag for this pun. Waegner 1992 Pereira and Schabes 1992 . The DOP model on the other hand was the first model to the best of our knowledge that proposed not to train a predefined grammar on a corpus but to directly use corpus fragments as a grammar. This approach has now gained wide usage as exemplified by the work of Collins 1996 1999 Charniak 1996 1997 Johnson 1998 Chiang 2000 and many others. The other innovation of DOP was to take in .

TỪ KHÓA LIÊN QUAN