tailieunhanh - Báo cáo khoa học: "Lexicon acquisition with a large-coverage unification-based grammar"

We describe how unknown lexical entries are processed in a unification-based framework with large-coverage grammars and how from their usage lexical entries are extracted. To keep the time and space usage during parsing within bounds, information from external sources like Part of Speech (PoS) taggers and morphological analysers is taken into account when information is constructed for unknown words. | Lexicon acquisition with a large-coverage unification-based grammar Frederik Fouvry Computational Linguistics Saarland University PO Box 15 11 50 D-66041 Saarbnicken Germany fouvry@ Abstract We describe how unknown lexical entries are processed in a unification-based framework with large-coverage grammars and how from their usage lexical entries are extracted. To keep the time and space usage during parsing within bounds information from external sources like Part of Speech PoS taggers and morphological analysers is taken into account when information is constructed for unknown words. 1 Introduction For Natural Language Processing NLP in general and processing with linguistically rich frameworks more specifically unknown words are a problem. The following gives an idea of the extent of the problem. In an evaluation of a large-scale grammar for unrestricted text on a newspaper corpus we found that the number of failed parses due to unknown words accounted for around 89 of the total number of unsuccessful analyses. Even though this figure does not say anything about the grammar these failures may be hiding many others it shows the importance of the problem. For unification-based implementations which often refer to linguistic theories and are therefore rich in information one approach to deal with unknown words has been proposed several times to exploit the syntactic context of completed analyses to collect information about a new word. A few implementations have been developed to demonstrate the feasibility of the technique but to our knowledge it has not been applied yet to large-coverage grammars. In this note we discuss how we are applying it to such a grammar for unrestricted text. Starting from this standard technique we extend it and integrate PoS and morphological information originating from external resources. We will first describe the method of learning information from the syntactic context. Then we discuss the current results of our .

TỪ KHÓA LIÊN QUAN