Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Pure statistical parsing systems achieves high in-domain accuracy but performs poorly out-domain. In this paper, we propose two different approaches to produce syntactic dependency structures using a large-scale hand-crafted HPSG grammar. The dependency backbone of an HPSG analysis is used to provide general linguistic insights which, when combined with state-of-the-art statistical dependency parsing models, achieves performance improvements on out-domain tests. | Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar Yi Zhang LT-Lab DFKI GmbH and Dept of Computational Linguistics Saarland University D-66123 Saarbrucken Germany yzhang@coli.uni-sb.de Abstract Pure statistical parsing systems achieves high in-domain accuracy but performs poorly out-domain. In this paper we propose two different approaches to produce syntactic dependency structures using a large-scale hand-crafted HPSG grammar. The dependency backbone of an HPSG analysis is used to provide general linguistic insights which when combined with state-of-the-art statistical dependency parsing models achieves performance improvements on out-domain tests.1 1 Introduction Syntactic dependency parsing is attracting more and more research focus in recent years partially due to its theory-neutral representation but also thanks to its wide deployment in various NLP tasks machine translation textual entailment recognition question answering information extraction etc. . In combination with machine learning methods several statistical dependency parsing models have reached comparable high parsing accuracy McDonald et al. 2005b Nivre et al. 2007b . In the meantime successful continuation of CoNLL Shared Tasks since 2006 Buchholz and Marsi 2006 Nivre et al. 2007a Surdeanu et al. 2008 have witnessed how easy it has become to train a statistical syntactic dependency parser provided that there is annotated treebank. While the dissemination continues towards various languages several issues arise with such purely data-driven approaches. One common observation is that statistical parser performance drops significantly when tested on a dataset different from the training set. For instance when using The first author thanks the German Excellence Cluster of Multimodal Computing and Interaction for the support of the work. The second author is funded by the PIRE PhD scholarship program. Rui Wang Dept of Computational Linguistics Saarland University 66123 Saarbriicken Germany