tailieunhanh - Báo cáo khoa học: "A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German"
In this paper we present Morphy, an integrated tool for German morphology, part-ofspeech tagging and context-sensitive lemmatization. Its large lexicon of more than 320,000 word forms plus its ability to process German compound nouns guarantee a wide morphological coverage. Syntactic ambiguities can be resolved with a standard statistical part-of-speech tagger. By using the output of the tagger, the lemmatizer can determine the correct root even for ambiguous word forms. The complete package is freely available and can be downloaded from the World Wide Web. . | A Freely Available Morphological Analyzer Disambiguator and Context Sensitive Lemmatizer for German Wolfgang Lezius University of Paderborn Cognitive Psychology D-33098 Paderborn lezius@ Reinhard Rapp University of Mainz Faculty of Applied Linguistics D-76711 Germersheim rapp@ Manfred Wettier University of Paderborn Cognitive Psychology D-33098 Paderborn wettler@ Abstract In this paper we present Morphy an integrated tool for German morphology part-of-speech tagging and context-sensitive lem-matization. Its large lexicon of more than 320 000 word forms plus its ability to process German compound nouns guarantee a wide morphological coverage. Syntactic ambiguities can be resolved with a standard statistical part-of-speech tagger. By using the output of the tagger the lemmatizer can determine the correct root even for ambiguous word forms. The complete package is freely available and can be downloaded from the World Wide Web. Introduction Morphological analysis is the basis for many NLP applications including syntax parsing machine translation and automatic indexing. However most morphology systems are components of commercial products. Often as for example in machine translation these systems are presented as black boxes with the morphological analysis only used internally. This makes them unsuitable for research purposes. To our knowledge the only wide coverage morphological lexicon readily available is for the English language Karp Schabes et al. 1992 . There have been attempts to provide free morphological analyzers to the research community for other languages for example in the MULTEXT project Armstrong Russell et al. 1995 which developed linguistic tools for six European languages. However the lexicons provided are rather small for most language . In the case of German we hope to significantly improve this situation with the development of a new version of our morphological analyzer Morphy. In
đang nạp các trang xem trước