tailieunhanh - Báo cáo khoa học: "A Cascaded Finite-State Parser for Syntactic Analysis of Swedish"

This report describes the development of a parsing system for written Swedish and is focused on a grammar, the main component of the system, semiautomatically extracted from corpora. A cascaded, finite-state algorithm is applied to the grammar in which the input contains coarse-grained semantic class information, and the output produced reflects not only the syntactic structure of the input, but grammatical functions as well. The grammar has been tested on a variety of random samples of different text genres, achieving precision and recall of and respectively, and average crossing rate of , when evaluated against manually. | Proceedings of EACL 99 A Cascaded Finite-State Parser for Syntactic Analysis of Swedish Dimitrios Kokkinakis and Sofie Johansson Kokkinakis Department of Swedish Sprakdata Box 200 SE-405 30 Goteborg University Goteborg Sweden svedk svesj @ Abstract This report describes the development of a parsing system for written Swedish and is focused on a grammar the main component of the system semi-automatically extracted from corpora. A cascaded finite-state algorithm is applied to the grammar in which the input contains coarse-grained semantic class information and the output produced reflects not only the syntactic structure of the input but grammatical functions as well. The grammar has been tested on a variety of random samples of different text genres achieving precision and recall of and respectively and average crossing rate of when evaluated against manually disambiguated annotated texts. 1 Introduction This report describes a parsing system for fast and accurate analysis of large bodies of written Swedish. The grammar has been implemented in a modular fashion as finite-state cascaded machines henceforth called Cass-SWE a name adopted from the parser used Cascaded analysis of syntactic structure Abney 1996 . Cass-SWE operates on part-of-speech annotated texts and is coupled with a pre-processing mechanism which distinguishes thousands of phrasal verbs idioms and multi-word expressions. Cass-SWE is designed in such a way that semantic information inherited by named-entity NE identification software is taken under consideration and grammatical functions are extracted heuristically using finite-state transducers. The grammar has been manually acquired from open-source texts by observing legitimately adjacent part-of-speech chains and how and which function words sig nal boundaries between phrasal constituents and clauses. 2 Background Cascaded Finite-State Automata Finite-state technology has had a great impact on a variety of Natural

TỪ KHÓA LIÊN QUAN