Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Statistical parsing with an automatically-extracted tree adjoining grammar"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We discuss the advantages of lexicalized tree-adjoining grammar as an alternative to lexicalized PCFG for statistical parsing, describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We nd that this induction method is an improvement over the EM-based method of Hwa, 1998 , and that the induced model yields results comparable to lexicalized PCFG. | Statistical parsing with an automatically-extracted tree adjoining grammar David Chiang Department of Computer and Information Science University of Pennsylvania 200 s 33rd St Philadelphia PA 19104 dchiangQlinc.cis.upenn.edu Abstract We discuss the advantages of lexical-ized tree-adjoining grammar as an alternative to lexicalized PCFG for statistical parsing describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We find that this induction method is an improvement over the EM-based method of Hwa 1998 and that the induced model yields results comparable to lexicalized PCFG. 1 Introduction Why use tree-adjoining grammar for statistical parsing Given that statistical natural language processing is concerned with the probable rather than the possible it is not because TAG can describe constructions like arbitrarily large Dutch verb clusters. Rather what makes TAG useful for statistical parsing are the structural descriptions it assigns to bread-and-butter sentences. The approach of Chelba and Jelinek 1998 to language modeling is illustrative even though the probability estimate of w appearing as the fcth word can be conditioned on the entire history W1 . W _1 the quantity of available training data limits the usable context to about two words but which two A trigram model chooses Wfc-1 and Wk-2 and works quite well a model which chose Wk-7 and Wfc_n would probably work less well. But Chelba and Jelinek 1998 chooses the lexical heads of the two previous constituents as determined by a shift-reduce parser and works better than a trigram model. Thus the virtual grammar serves to structure the history so that the two most useful words can be cho sen even though the structure of the problem itself is entirely linear. Similarly nothing about the parsing problem requires that we construct any structure other than phrase structure. But beginning with Magerman 1995 statistical parsers have used bilexical .