tailieunhanh - Báo cáo khoa học: "Construct State Modification in the Arabic Treebank"

Earlier work in parsing Arabic has speculated that attachment to construct state constructions decreases parsing performance. We make this speculation precise and define the problem of attachment to construct state constructions in the Arabic Treebank. We present the first statistics that quantify the problem. We provide a baseline and the results from a first attempt at a discriminative learning procedure for this task, achieving 80% accuracy. | Construct State Modification in the Arabic Treebank University of Pennsylvania gabbard@ Ryan Gabbard Seth Kulick Department of Computer and Information Science Linguistic Data Consortium Institute for Research in Cognitive Science University of Pennsylvania skulick@ Abstract Earlier work in parsing Arabic has speculated that attachment to construct state constructions decreases parsing performance. We make this speculation precise and define the problem of attachment to construct state constructions in the Arabic Treebank. We present the first statistics that quantify the problem. We provide a baseline and the results from a first attempt at a discriminative learning procedure for this task achieving 80 accuracy. 1 Introduction Earlier work on parsing the Arabic Treebank Kulick et al. 2006 noted that prepositional phrase attachment was significantly worse on the Arabic Treebank ATB than the English Penn Treebank PTB and speculated that this was due to the ubiquitous presence of construct state NPs in the ATB. Construct state NPs also known as iDAfa1 ijUd constructions are those in which roughly two or more words usually nouns are grouped tightly together often corresponding to what in English would be expressed with a noun-noun compound or a possessive construction Ryding 2005 . In the ATB these constructions are annotated as a NP headed by a NOUN with an NP complement. Kulick et al. 2006 noted that this created very different contexts for PP attachment to base NPs likely leading to the lower results for PP attachment. Throughout this paper we use the Buckwalter Arabic transliteration scheme Buckwalter 2004 . In this paper we make their speculation precise and define the problem of attachment to construct state constructions in the ATB by extracting out such iDAfa constructions2 and their modifiers. We provide the first statistics we are aware of that quantify the number and complexity of iDAfas in the ATB and the variety of .