tailieunhanh - Báo cáo khoa học: "Parsing Noun Phrase Structure with CCG"
Statistical parsing of noun phrase (NP) structure has been hampered by a lack of goldstandard data. This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank.(N (N/N lung) (N (N/N cancer) (N deaths) ) )This structure is correct for most English NPs and is the best solution that doesn’t require manual reannotation. However, the resulting derivations often contain errors. This can be seen in the previous exWe correct these errors in CCGbank using a gold-standard corpus of NP structure, resultample, . | Parsing Noun Phrase Structure with CCG David Vadas and James R. Curran School of Information Technologies University of Sydney NSW 2006 Australia dvadasl james @ Abstract Statistical parsing of noun phrase NP structure has been hampered by a lack of gold-standard data. This is a significant problem for CCGbank where binary branching NP derivations are often incorrect a result of the automatic conversion from the Penn Treebank. We correct these errors in CCGbank using a gold-standard corpus of NP structure resulting in a much more accurate corpus. We also implement novel NER features that generalise the lexical information needed to parse NPs and provide important semantic information. Finally evaluating against DepBank demonstrates the effectiveness of our modified corpus and novel features with an increase in parser performance of . 1 Introduction Internal noun phrase np structure is not recovered by a number of widely-used parsers . Collins 2003 . This is because their training data the Penn Treebank Marcus et al. 1993 does not fully annotate NP structure. The flat structure described by the Penn Treebank can be seen in this example NP NN lung NN cancer NNS deaths CCGbank Hockenmaier and Steedman 2007 is the primary English corpus for Combinatory Categorial Grammar ccg Steedman 2000 and was created by a semi-automatic conversion from the Penn Treebank. However CCG is a binary branching grammar and as such cannot leave N P structure underspecified. Instead all NPs were made rightbranching as shown in this example N N N lung N N N cancer N deaths This structure is correct for most English NPs and is the best solution that doesn t require manual reannotation. However the resulting derivations often contain errors. This can be seen in the previous example where lung cancer should form a constituent but does not. The first contribution of this paper is to correct these CCGbank errors. We apply an automatic conversion process using the .
đang nạp các trang xem trước