tailieunhanh - Báo cáo khoa học: "Web-Scale Features for Full-Scale Parsing"
Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. | Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein Computer Science Division University of California Berkeley mbansal klein @ Abstract Counts from large corpora like the web can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities such as binary noun-verb PP attachments and noun compound bracketings. In this work we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of over the second-order dependency parser of McDonald and Pereira 2006 over the constituent parser of Petrov et al. 2006 and over a non-local constituent reranker. 1 Introduction Current state-of-the art syntactic parsers have achieved accuracies in the range of 90 F1 on the Penn Treebank but a range of errors remain. From a dependency viewpoint structural errors can be cast as incorrect attachments even for constituent phrase-structure parsers. For example in the Berkeley parser Petrov et al. 2006 about 20 of the errors are prepositional phrase attachment errors as in Figure 1 where a preposition-headed IN phrase was assigned an incorrect parent in the implied dependency tree. Here the Berkeley parser solid blue edges incorrectly attaches from debt to the noun phrase 30 billion whereas the correct attachment dashed gold edges is to the verb raising. However there are a range of error types as shown in Figure 2. Here a is a non-canonical PP 693 Figure 1 A PP attachment error in the parse output of the Berkeley parser on Penn Treebank . Guess edges are in solid blue gold edges are in dashed gold and edges common in guess and gold parses are in black. attachment ambiguity where by yesterday afternoon should attach to had .
đang nạp các trang xem trước