tailieunhanh - Báo cáo khoa học: "Annotation Schemes and their Influence on Parsing Results"
Most of the work on treebank-based statistical parsing exclusively uses the WallStreet-Journal part of the Penn treebank for evaluation purposes. Due to the presence of this quasi-standard, the question of to which degree parsing results depend on the properties of treebanks was often ignored. In this paper, we use two similar German treebanks, T¨ Ba-D/Z and NeGra, u and investigate the role that different annotation decisions play for parsing. | Annotation Schemes and their Influence on Parsing Results Wolfgang Maier Seminar fur Sprachwissenschaft Universitat Tubingen Wilhelmstr. 19 72074 Tubingen Germany wmaier@ Abstract Most of the work on treebank-based statistical parsing exclusively uses the WallStreet-Journal part of the Penn treebank for evaluation purposes. Due to the presence of this quasi-standard the question of to which degree parsing results depend on the properties of treebanks was often ignored. In this paper we use two similar German treebanks TuBa-D Z and NeGra and investigate the role that different annotation decisions play for parsing. For these purposes we approximate the two treebanks by gradually taking out or inserting the corresponding annotation components and test the performance of a standard PCFG parser on all treebank versions. Our results give an indication of which structures are favorable for parsing and which ones are not. 1 Introduction The Wall-Street-Journal part WSJ of the Penn Treebank Marcus et al. 1994 plays a central role in research on statistical treebank-based parsing. It has not only become a standard for parser evaluation but also the foundation for the development of new parsing models. For the English WSJ high accuracy parsing models have been created some of them using extensions to classical PCFG parsing such as lexicalization and markovization Collins 1999 Charniak 2000 Klein and Manning 2003 . However since most research has been limited to a single language English and to a single treebank WSJ the question of how portable the parsers and their extensions are across languages and across treebanks often remained open. Only recently there have been attempts to evaluate parsing results with respect to the properties and the language of the treebank that is used. Gildea 2001 investigates the effects that certain treebank characteristics have on parsing results such as the distribution of verb subcategorization frames. He conducts .
đang nạp các trang xem trước