tailieunhanh - Báo cáo khoa học: "What to do when lexicalization fails: parsing German with suffix analysis and smoothing"

In this paper, we present an unlexicalized parser for German which employs smoothing and suffix analysis to achieve a labelled bracket F-score of , higher than previously reported results on the NEGRA corpus. In addition to the high accuracy of the model, the use of smoothing in an unlexicalized parser allows us to better examine the interplay between smoothing and parsing results. | What to do when lexicalization fails parsing German with suffix analysis and smoothing Amit Dubey University of Edinburgh Abstract In this paper we present an unlexical-ized parser for German which employs smoothing and suffix analysis to achieve a labelled bracket F-score of higher than previously reported results on the NEGRA corpus. In addition to the high accuracy of the model the use of smoothing in an unlexicalized parser allows us to better examine the interplay between smoothing and parsing results. 1 Introduction Recent research on German statistical parsing has shown that lexicalization adds little to parsing performance in German Dubey and Keller 2003 Beil et al. 1999 . A likely cause is the relative productivity of German morphology compared to that of English German has a higher type token ratio for words making sparse data problems more severe. There are at least two solutions to this problem first to use better models of morphology or second to make unlexicalized parsing more accurate. We investigate both approaches in this paper. In particular we develop a parser for German which attains the highest performance known to us by making use of smoothing and a highly-tuned suffix analyzer for guessing part-of-speech POS tags from the input text. Rather than relying on smoothing and suffix analysis alone we also utilize treebank transformations Johnson 1998 Klein and Manning 2003 instead of a grammar induced directly from a treebank. The organization of the paper is as follows Section 2 summarizes some important aspects of our treebank corpus. In Section 3 we outline several techniques for improving the performance of unlex-icalized parsing without using smoothing including treebank transformations and the use of suffix analysis. We show that suffix analysis is not helpful on the treebank grammar but it does increase performance if used in combination with the treebank transformations we present. Section 4 describes how smoothing