tailieunhanh - Báo cáo khoa học: "Memory-Based Morphological Analysis"

We present a general architecture for efficient and deterministic morphological analysis based on memory-based learning, and apply it to morphological analysis of Dutch. The system makes direct mappings from letters in context to rich categories that encode morphological boundaries, syntactic class labels, and spelling changes. Both precision and recall of labeled morphemes are over 84% on held-out dictionary test words and estimated to be over 93% in free text. 1 Introduction Morphological analysis is an essential component in language engineering applications ranging from spelling error correction to machine translation. . | Memory-Based Morphological Analysis Antal van den Bosch and Walter Daelemans ILK Computational Linguistics Tilburg University antalb walter @ Abstract We present a general architecture for efficient and deterministic morphological analysis based on memory-based learning and apply it to morphological analysis of Dutch. The system makes direct mappings from letters in context to rich categories that encode morphological boundaries syntactic class labels and spelling changes. Both precision and recall of labeled morphemes are over 84 on held-out dictionary test words and estimated to be over 93 in free text. 1 Introduction Morphological analysis is an essential component in language engineering applications ranging from spelling error correction to machine translation. Performing a full morphological analysis of a wordform is usually regarded as a segmentation of the word into morphemes combined with an analysis of the interaction of these morphemes that determine the syntactic class of the wordform as a whole. The complexity of wordform morphology varies widely among the world s languages but is regarded quite high even in the relatively simple cases such as English. Many wordforms in English and other western languages contain ambiguities in their morphological composition that can be quite intricate. General classes of linguistic knowledge that are usually assumed to play a role in this disambiguation process are knowledge of i the morphemes of a language ii the morphotac-tics . constraints on how morphemes are allowed to attach and iii spelling changes that can occur due to morpheme attachment. State-of-the art systems for morphological analysis of wordforms are usually based on two-level finite-state transducers fsts Kosken-niemi 1983 . Even with the availability of sophisticated development tools the cost and complexity of hand-crafting two-level rules is high and the representation of concatenative compound morphology with continuation lexicons is .

TÀI LIỆU LIÊN QUAN