Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Joint Evaluation of Morphological Segmentation and Syntactic Parsing"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

We present novel metrics for parse evaluation in joint segmentation and parsing scenarios where the gold sequence of terminals is not known in advance. The protocol uses distance-based metrics defined for the space of trees over lattices. Our metrics allow us to precisely quantify the performance gap between non-realistic parsing scenarios (assuming gold segmented and tagged input) and realistic ones (not assuming gold segmentation and tags). Our evaluation of segmentation and parsing for Modern Hebrew sheds new light on the performance of the best parsing systems to date in the different scenarios | Joint Evaluation of Morphological Segmentation and Syntactic Parsing Reut Tsarfaty Joakim Nivre Evelina Andersson Box 635 751 26 Uppsala University Uppsala Sweden tsarfaty@stp.lingfil.uu.se joakim.nivre evelina.andersson @lingfil.uu.se Abstract We present novel metrics for parse evaluation in joint segmentation and parsing scenarios where the gold sequence of terminals is not known in advance. The protocol uses distance-based metrics defined for the space of trees over lattices. Our metrics allow us to precisely quantify the performance gap between non-realistic parsing scenarios assuming gold segmented and tagged input and realistic ones not assuming gold segmentation and tags . Our evaluation of segmentation and parsing for Modern Hebrew sheds new light on the performance of the best parsing systems to date in the different scenarios. 1 Introduction A parser takes a sentence in natural language as input and returns a syntactic parse tree representing the sentence s human-perceived interpretation. Current state-of-the-art parsers assume that the space-delimited words in the input are the basic units of syntactic analysis. Standard evaluation procedures and metrics Black et al. 1991 Buchholz and Marsi 2006 accordingly assume that the yield of the parse tree is known in advance. This assumption breaks down when parsing morphologically rich languages Tsarfaty et al. 2010 where every space-delimited word may be effectively composed of multiple morphemes each of which having a distinct role in the syntactic parse tree. In order to parse such input the text needs to undergo morphological segmentation that is identifying the morphological segments of each word and assigning the corresponding part-of-speech PoS tags to them. 6 Morphologically complex words may be highly ambiguous and in order to segment them correctly their analysis has to be disambiguated. The multiple morphological analyses of input words may be represented via a lattice that encodes the different .