Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Prototype-Driven Grammar Induction"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We investigate prototype-driven learning for primarily unsupervised grammar induction. Prior knowledge is specified declaratively, by providing a few canonical examples of each target phrase type. This sparse prototype information is then propagated across a corpus using distributional similarity features, which augment an otherwise standard PCFG model. We show that distributional features are effective at distinguishing bracket labels, but not determining bracket locations. To improve the quality of the induced trees, we combine our PCFG induction with the CCM model of Klein and Manning (2002) | Prototype-Driven Grammar Induction Aria Haghighi Computer Science Division University of California Berkeley aria42@cs.berkeley.edu Dan Klein Computer Science Division University of California Berkeley klein@cs.berkeley.edu Abstract We investigate prototype-driven learning for primarily unsupervised grammar induction. Prior knowledge is specified declaratively by providing a few canonical examples of each target phrase type. This sparse prototype information is then propagated across a corpus using distributional similarity features which augment an otherwise standard PCFG model. We show that distributional features are effective at distinguishing bracket labels but not determining bracket locations. To improve the quality of the induced trees we combine our PCFG induction with the CCM model of Klein and Manning 2002 which has complementary stengths it identifies brackets but does not label them. Using only a handful of prototypes we show substantial improvements over naive PCFG induction for English and Chinese grammar induction. 1 Introduction There has been a great deal of work on unsupervised grammar induction with motivations ranging from scientific interest in language acquisition to engineering interest in parser construction Carroll and Charniak 1992 Clark 2001 . Recent work has successfully induced unlabeled grammatical structure but has not successfully learned labeled tree structure Klein and Manning 2002 Klein and Manning 2004 Smith and Eisner 2004 . In this paper our goal is to build a system capable of producing labeled parses in a target grammar with as little total effort as possible. We investigate a prototype-driven approach to grammar induction in which one supplies canonical examples of each target concept. For example we might specify that we are interested in trees which use the symbol NP and then list several examples of prototypical NPs determiner noun pronouns etc. see figure 1 for a sample prototype list . This prototype information is .