tailieunhanh - Báo cáo khoa học: " Memory-Based Learning of Morphology with Stochastic Transducers"
This paper discusses the supervised learning of morphology using stochastic transducers, trained using the ExpectationMaximization (EM) algorithm. Two approaches are presented: first, using the transducers directly to model the process, and secondly using them to define a similarity measure, related to the Fisher kernel method (Jaakkola and Haussler, 1998), and then using a Memory-Based Learning (MBL) technique. These are evaluated and compared on data sets from English, German, Slovene and Arabic. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 513-520. Memory-Based Learning of Morphology with Stochastic Transducers Alexander Clark ISSCO TIM University of Geneva UNI-MAIL Boulevard du Pont-d Atve CH-1211 Geneve 4 Switzerland Abstract This paper discusses the supervised learning of morphology using stochastic transducers trained using the ExpectationMaximization EM algorithm. Two approaches are presented first using the transducers directly to model the process and secondly using them to define a similarity measure related to the Fisher kernel method Jaakkola and Haussler 1998 and then using a Memory-Based Learning MBL technique. These are evaluated and compared on data sets from English German Slovene and Arabic. 1 Introduction Finite-state methods are in large part adequate to model morphological processes in many languages. A standard methodology is that of two-level morphology Koskenniemi 1983 which is capable of handling the complexity of Finnish though it needs substantial extensions to handle non-concatenative languages such as Arabic Kiraz 1994 . These models are primarily concerned with the mapping from deep lexical strings to surface strings and within this framework learning is in general difficult Itai 1994 . In this paper I present algorithms for learning the finite-state transduction between pairs of uninflected and inflected words. - supervised learning of morphology. The techniques presented here are however applicable to learning other types of string transductions. Memory-based techniques based on principles of non-parametric density estimation are a powerful form of machine learning well-suited to natural language tasks. A particular strength is their ability to model both general rules and specific exceptions in a single framework van den Bosch and Daelemans 1999 . However they have generally only been used in supervised learning techniques .
đang nạp các trang xem trước