tailieunhanh - Báo cáo khoa học: "Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency"
We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust cross-linguistically, being able to exploit either attachment or distributional regularities that are salient in the data. . | Corpus-Based Induction of Syntactic Structure Models of Dependency and Constituency Dan Klein Computer Science Department Stanford University Stanford CA 94305-9040 klein@ Christopher D. Manning Computer Science Department Stanford University Stanford CA 94305-9040 manning@ Abstract We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust cross-linguistically being able to exploit either attachment or distributional regularities that are salient in the data. 1 Introduction The task of statistically inducing hierarchical syntactic structure over unannotated sentences of natural language has received a great deal of attention Carroll and Charniak 1992 Pereira and Schabes 1992 Brill 1993 Stolcke and Omohundro 1994 . Researchers have explored this problem for a variety of reasons to argue empirically against the poverty of the stimulus Clark 2001 to use induction systems as a first stage in constructing large treebanks van Zaanen 2000 to build better language models Baker 1979 Chen 1995 and to examine cognitive issues in language learning Solan et al. 2003 . An important distinction should be drawn between work primarily interested in the weak generative capacity of models where modeling hierarchical structure is only useful insofar as it leads to improved models over observed structures Baker 1979 Chen 1995 and work interested in the strong generative capacity of models where the unobserved structure itself is evaluated van Zaa-nen 2000 Clark 2001 Klein and Manning 2002 . This paper falls into the latter category we will be inducing models of .
đang nạp các trang xem trước