tailieunhanh - Báo cáo khoa học: "Probabilistic Parsing for German using Sister-Head Dependencies"
We present a probabilistic parsing model for German trained on the Negra treebank. We observe that existing lexicalized parsing models using head-head dependencies, while successful for English, fail to outperform an unlexicalized baseline model for German. Learning curves show that this effect is not due to lack of training data. We propose an alternative model that uses sister-head dependencies instead of head-head dependencies. | Probabilistic Parsing for German using Sister-Head Dependencies Amit Dubey Department of Computational Linguistics Saarland University PO Box 15 11 50 66041 Saarbrucken Germany adubey@ Frank Keller School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW UK keller@ Abstract We present a probabilistic parsing model for German trained on the Negra treebank. We observe that existing lexicalized parsing models using head-head dependencies while successful for English fail to outperform an unlexicalized baseline model for German. Learning curves show that this effect is not due to lack of training data. We propose an alternative model that uses sister-head dependencies instead of head-head dependencies. This model outperforms the baseline achieving a labeled precision and recall of up to 74 . This indicates that sister-head dependencies are more appropriate for treebanks with very flat structures such as Negra. 1 Introduction Treebank-based probabilistic parsing has been the subject of intensive research over the past few years resulting in parsing models that achieve both broad coverage and high parsing accuracy . Collins 1997 Charniak 2000 . However most of the existing models have been developed for English and trained on the Penn Treebank Marcus et al. 1993 which raises the question whether these models generalize to other languages and to annotation schemes that differ from the Penn Treebank markup. The present paper addresses this question by proposing a probabilistic parsing model trained on Negra Skut et al. 1997 a syntactically annotated corpus for German. German has a number of syntactic properties that set it apart from English and the Negra annotation scheme differs in important respects from the Penn Treebank markup. While Ne-gra has been used to build probabilistic chunkers Becker and Frank 2002 skut and Brants 1998 the research reported in this paper is the first attempt to develop a probabilistic full
đang nạp các trang xem trước