tailieunhanh - Báo cáo khoa học: "Rebanking CCGbank for improved NP interpretation"

Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser. . | Rebanking CCGbank for improved NP interpretation Matthew Honnibal and James R. Curran School of Information Technologies University of Sydney NSW 2006 Australia mhonn james @ Johan Bos University of Groningen The Netherlands bos@ Abstract Once released treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCG-bank that includes the predicate-argument structure of both verbs and nouns base-NP brackets verb-particle constructions and restrictive and non-restrictive nominal modifiers and evaluate the impact of these changes on a statistical parser. 1 Introduction Progress in natural language processing relies on direct comparison on shared data discouraging improvements to the evaluation data. This means that we often spend years competing to reproduce partially incorrect annotations. It also encourages us to approach related problems as discrete tasks when a new data set that adds deeper information establishes a new incompatible evaluation. Direct comparison has been central to progress in statistical parsing but it has also caused problems. Treebanking is a difficult engineering task coverage cost consistency and granularity are all competing concerns that must be balanced against each other when the annotation scheme is developed. The difficulty of the task means that we ought to view treebanking as an ongoing process akin to grammar development such as the many years of work on the ERG Flickinger 2000 . This paper demonstrates how a treebank can be rebanked to incorporate novel analyses and infor mation from existing resources. We chose to work on CCGbank Hockenmaier and

TỪ KHÓA LIÊN QUAN