tailieunhanh - Báo cáo khoa học: "A Practical Classification of Multiword Expressions"

The paper proposes a methodology for dealing with multiword expressions in natural language processing applications. It provides a practically justified taxonomy of such units, and suggests the ways in which the individual classes can be processed computationally. While the study is currently limited to Polish and English, we believe our findings can be successfully employed in the processing of other languages, with emphasis on inflectional ones. | A Practical Classification of Multiword Expressions Radoslaw Moszczynski Institute of Computer Science Polish Academy of Sciences Ordona 21 01-237 Warszawa Poland rm@ Abstract The paper proposes a methodology for dealing with multiword expressions in natural language processing applications. It provides a practically justified taxonomy of such units and suggests the ways in which the individual classes can be processed computationally. While the study is currently limited to Polish and English we believe our findings can be successfully employed in the processing of other languages with emphasis on inflectional ones. 1 Introduction radoslaw moszczynskilt is generally acknowledged that multiword expressions constitute a serious difficulty in all kinds of natural language processing applications Sag et al. 2002 . It has also been shown that proper handling of such expressions can result in significantly better results in parsing Zhang et al. 2006 . The difficulties in processing multiword expressions result from their lexical variability and the fact that many of them can undergo syntactic transformations. Another problem is that the label multiword expressions covers many linguistic units that often have little in common. We believe that the past approaches to formalize the phenomenon such as IDAREX Segond and Breidt 1995 and Phrase Manager Pedrazzini 1994 suffered from trying to cover all multiword expressions as a whole. Such an approach as is shown below cannot efficiently cover all the phenomena related to multiword expressions. Therefore in the present paper we formulate a proposal of a taxonomy for multiword expressions useful for the purposes of natural language processing. The taxonomy is based on the stages in the NLP workflow in which the individual classes of units can be processed successfully. We also suggest the tools that can be used for processing the units in each of the classes. 2 An NLP Taxonomy of Multiword Expressions At this stage