Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Three Generative, Lexicalised Models for Statistical Parsing"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96). is derived from the analysis given in Generalized Phrase Structure Grammar (Gazdar et al. 95). The work makes two advances over previous models: First, Model 1 performs significantly better than (Collins 96), and Models 2 and 3 give further improvements. | Three Generative Lexicalised Models for Statistical Parsing Michael Collins Dept of Computer and Information Science University of Pennsylvania Philadelphia PA 19104 U.S.A. mcollinsSgradient.cis.upenn.edu Abstract In this paper we first propose a new statistical parsing model which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1 87.5 constituent precision recall an average improvement of 2.3 over Collins 96 . 1 Introduction Generative models of syntax have been central in linguistics since they were introduced in Chomsky 57 . Each sentence-tree pair S T in a language has an associated top-down derivation consisting of a sequence of rule applications of a grammar. These models can be extended to be statistical by defining probability distributions at points of non-determinism in the derivations thereby assigning a probability P S T to each S T pair. Probabilistic context free grammar Booth and Thompson 73 was an early example of a statistical grammar. A PCFG can be lexicalised by associating a headword with each non-terminal in a parse tree thus far Magerman 95 Jelinek et al. 94 and Collins 96 which both make heavy use of lexical information have reported the best statistical parsing performance on Wall Street Journal text. Neither of these models is generative instead they both estimate P T S directly. This paper proposes three new parsing models. Model 1 is essentially a generative version of the model described in Collins 96 . In Model 2 we extend the parser to make the complement adjunct distinction by adding probabilities over subcategorisation frames for head-words. In Model 3 we give a probabilistic treatment of wh-movement which This research was supported by ARPA Grant N6600194-C6043. is derived from the analysis given in Generalized Phrase Structure Grammar Gazdar et .