tailieunhanh - Báo cáo khoa học: "Identifying Broken Plurals, Irregular Gender, and Rationality in Arabic Text"

Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morphosyntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic. We compare two techniques, using simple maximum likelihood (MLE) with back-off and a support vector machine. | Identifying Broken Plurals Irregular Gender and Rationality in Arabic Text Sarah Alkuhlani and Nizar Habash Center for Computational Learning Systems Columbia University sma2149 nh2142 @ Abstract Arabic morphology is complex partly because of its richness and partly because of common irregular word forms such as broken plurals which resemble singular nouns and nouns with irregular gender feminine nouns that look masculine and vice versa . In addition Arabic morpho-syntactic agreement interacts with the lexical semantic feature of rationality which has no morphological realization. In this paper we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number and rationality in Arabic. We compare two techniques using simple maximum likelihood MLE with back-off and a support vector machine based sequence tagger Yamcha . We study a number of orthographic morphological and syntactic learning features. Our results show that the MLE technique is preferred for words seen in the training data while the Yam-cha technique is optimal for unseen words which are our real target. Furthermore we show that for unseen words morphological features help beyond orthographic features and that syntactic features help even more. A combination of the two techniques improves overall performance even further. 1 Introduction Arabic morphology is complex partly because of its richness and partly because of its complex morpho-syntactic agreement rules which depend on functional features not necessarily expressed in word forms. Particularly challenging are broken plurals which resemble singular nouns nouns with irregular gender masculine nouns that look feminine and feminine nouns that look masculine and the semantic feature of rationality which has no morphological realization Smrz 2007b Alkuhlani and Habash 2011 . These features heavily participate in Arabic morpho-syntactic agreement. Alkuhlani and Habash 2011 show .

TỪ KHÓA LIÊN QUAN