tailieunhanh - Báo cáo khoa học: "Compounding and derivational morphology in a finite-state setting"

This paper proposes the application of finite-state approximation techniques on a unification-based grammar of word formation for a language like German. A refinement of an RTN-based approximation algorithm is proposed, which extends the state space of the automaton by selectively adding distinctions based on the parsing history at the point of entering a context-free rule. The selection of history items exploits the specific linguistic nature of word formation. | Compounding and derivational morphology in a finite-state setting Jonas Kuhn Department of Linguistics The University of Texas at Austin 1 University Station B5100 Austin TX 78712-11196 USA jonask@ Abstract This paper proposes the application of finite-state approximation techniques on a unification-based grammar of word formation for a language like German. A refinement of an RTN-based approximation algorithm is proposed which extends the state space of the automaton by selectively adding distinctions based on the parsing history at the point of entering a context-free rule. The selection of history items exploits the specific linguistic nature of word formation. As experiments show this algorithm avoids an explosion of the size of the automaton in the approximation construction. 1 The locus of word formation rules in grammars for NLP In English orthography compounds following productive word formation patterns are spelled with spaces or hyphens separating the components . classic car repair workshop . This is convenient from an NLP perspective since most aspects of word formation can be ignored from the point of view of the conceptually simpler token-internal processes of inflectional morphology for which standard finite-state techniques can be applied. Let us assume that to a first approximation spaces and punctuation are used to identify token boundaries. It makes it also very easy to access one or more of the components of a compound like classic car in the example which is required in many NLP techniques . in a vector space model . If an NLP task for English requires detailed information about the structure of compounds as complex multi-token units it is natural to use the formalisms of computational syntax for English . context-free grammars or possibly unificationbased grammars. This makes it possible to deal with the bracketing structure of compounding which would be impossible to cover in full generality in the finite-state .

TỪ KHÓA LIÊN QUAN