tailieunhanh - Báo cáo khoa học: "The Benefit of Stochastic PP Attachment to a Rule-Based Parser"

To study PP attachment disambiguation as a benchmark for empirical methods in natural language processing it has often been reduced to a binary decision problem (between verb or noun attachment) in a particular syntactic configuration. A parser, however, must solve the more general task of deciding between more than two alternatives in many different contexts. We combine the attachment predictions made by a simple model of lexical attraction with a full-fledged parser of German to determine the actual benefit of the subtask to parsing. We show that the combination of data-driven and rule-based components can reduce the number of. | The Benefit of Stochastic PP Attachment to a Rule-Based Parser Kilian A. Foth and Wolfgang Menzel Department of Informatics Hamburg University D-22527 Hamburg Germany foth menzel@ Abstract To study PP attachment disambiguation as a benchmark for empirical methods in natural language processing it has often been reduced to a binary decision problem between verb or noun attachment in a particular syntactic configuration. A parser however must solve the more general task of deciding between more than two alternatives in many different contexts. We combine the attachment predictions made by a simple model of lexical attraction with a full-fledged parser of German to determine the actual benefit of the subtask to parsing. We show that the combination of data-driven and rule-based components can reduce the number of all parsing errors by 14 and raise the attachment accuracy for dependency parsing of German to an unprecedented 92 . 1 Introduction Most NLP applications are either data-driven classification tasks are solved by comparing possible solutions to previous problems and their solutions or rule-based general rules are formulated which must be applicable to all cases that might be encountered . Both methods face obvious problems The data-driven approach is at the mercy of its training set and cannot easily avoid mistakes that result from biased or scarce data. On the other hand the rule-based approach depends entirely on the ability of a computational linguist to anticipate every construction that might ever occur. These handicaps are part of the reason why despite great advances many tasks in computational linguistics still cannot be performed nearly as well by computers as by human informants. Applied to the subtask of syntax analysis the dichotomy manifests itself in the existence of learnt and handwritten grammars of natural languages. A great many formalisms have been advanced that fall into either of the two variants but even the