tailieunhanh - Báo cáo khoa học: "A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality"

We present an enriched version of the Penn Arabic Treebank (Maamouri et al., 2004), where latent features necessary for modeling morpho-syntactic agreement in Arabic are manually annotated. We describe our process for efficient annotation, and present the first quantitative analysis of Arabic morphosyntactic phenomena. | A Corpus for Modeling Morpho-Syntactic Agreement in Arabic Gender Number and Rationality Sarah Alkuhlani and NizarHabash Center for Computational Learning Systems Columbia University salkuhlani habash @ Abstract We present an enriched version of the Penn Arabic Treebank Maamouri et al. 2004 where latent features necessary for modeling morpho-syntactic agreement in Arabic are manually annotated. We describe our process for efficient annotation and present the first quantitative analysis of Arabic morpho-syntactic phenomena. 1 Introduction Arabic morphology is complex partly because of its richness and partly because of its complex morpho-syntactic agreement rules which depend on features not necessarily expressed in word forms such as lexical rationality and functional gender and number. In this paper we present an enriched version of the Penn Arabic Treebank PATB part 3 Maamouri et al. 2004 that we manually annotated for these We describe a process for how to do the annotation efficiently and furthermore present the first quantitative analysis of morpho-syntactic phenomena in Arabic. This resource is important for building computational models of Arabic morphology and syntax that account for morpho-syntactic agreement patterns. It has already been used to demonstrate added value for Arabic dependency parsing Marton et al. 2011 . This paper is structured as follows Sections 2 and 3 present relevant linguistic facts and related work respectively. Section 4 describes our annotation process and Section 5 presents an analysis of the annotated corpus. 1The annotations are publicly available for research purposes. Please contact authors. The PATB must be acquired through the Linguistic Data Consortium LDC http . 2 Linguistic Facts Arabic has a rich and complex morphology. In addition to being both templatic root pattern and con-catenative stems affixes clitics Arabic s optional diacritics add to the degree of word ambiguity .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.