tailieunhanh - Báo cáo khoa học: "Comparing the Accuracy of CCG and Penn Treebank Parsers"

We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser. An accuracy comparison is performed by converting the CCG derivations into PTB trees. We show that the conversion is extremely difficult to perform, but are able to fairly compare the parsers on a representative subset of the PTB test section, obtaining results for the CCG parser that are statistically no different to those for the Berkeley parser. | Comparing the Accuracy of CCG and Penn Treebank Parsers Stephen Clark University of Cambridge Computer Laboratory 15 JJ Thomson Avenue Cambridge UK James R. Curran School of Information Technologies University of Sydney NSW 2006 Australia james@ Abstract We compare the CCG parser of Clark and Curran 2007 with a state-of-the-art Penn Treebank PTB parser. An accuracy comparison is performed by converting the CCG derivations into PTB trees. We show that the conversion is extremely difficult to perform but are able to fairly compare the parsers on a representative subset of the PTB test section obtaining results for the CCG parser that are statistically no different to those for the Berkeley parser. 1 Introduction There are a number of approaches emerging in statistical parsing. The first approach which began in the mid-90s and now has an extensive literature is based on the Penn Treebank PTB parsing task inferring skeletal phrase-structure trees for unseen sentences of the WSJ and evaluating accuracy according to the Parseval metrics. Collins 1999 is a seminal example. The second approach is to apply statistical methods to parsers based on linguistic formalisms such as HPSG LFG TAG and CCG with the grammar being defined manually or extracted from a formalism-specific treebank. Evaluation is typically performed by comparing against predicate-argument structures extracted from the treebank or against a test set of manually annotated grammatical relations GRs . Examples of this approach include Riezler et al. 2002 Miyao and Tsujii 2005 Briscoe and Carroll 2006 and Clark and Curran 2007 .1 Despite the many examples from both approaches there has been little comparison across the two groups which we refer to as PTB parsing and formalism-based parsing respectively. The 1A third approach is dependency parsing but we restrict the comparison in this paper to phrase-structure parsers. PTB parser we use for comparison is the publicly .