tailieunhanh - Báo cáo khoa học: "Detecting Errors in Discontinuous Structural Annotation"

Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation (., partof-speech) and continuous structural annotation (., syntactic constituency), no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation. This is significant since the annotation of potentially discontinuous stretches of material is increasingly relevant, from treebanks for free-word order languages to semantic and discourse annotation. . | Detecting Errors in Discontinuous Structural Annotation Markus Dickinson Department of Linguistics The Ohio State University dickinso@ W. Detmar Meurers Department of Linguistics The Ohio State University dm@ Abstract Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation . part-of-speech and continuous structural annotation . syntactic constituency no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation. This is significant since the annotation of potentially discontinuous stretches of material is increasingly relevant from treebanks for free-word order languages to semantic and discourse annotation. In this paper we discuss how the variation n-gram error detection approach Dickinson and Meurers 2003a can be extended to discontinuous structural annotation. We exemplify the approach by showing how it successfully detects errors in the syntactic annotation of the German TIGER corpus Brants et al. 2002 . 1 Introduction Annotated corpora have at least two kinds of uses firstly as training material and as gold standard testing material for the development of tools in computational linguistics and secondly as a source of data for theoretical linguists searching for analytically relevant language patterns. Annotation errors and why they are a problem The high quality annotation present in gold standard corpora is generally the result of a manual or semi-automatic mark-up process. The annotation thus can contain annotation errors from automatic pre- processes human post-editing or human annotation. The presence of errors creates problems for both computational and theoretical linguistic uses from unreliable training and evaluation of natural language processing technology . van Halteren 2000 Kveton and .

TÀI LIỆU MỚI ĐĂNG
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.