tailieunhanh - Báo cáo khoa học: "CONCEPTUAL ASSOCIATION FOR COMPOUND NOUN ANALYSIS"

This paper describes research toward the automatic interpretation of compound nouns using corpus statistics. An initial study aimed at syntactic disambiguation is presented. The approach presented bases associations upon thesaurus categories. Association data is gathered from unambiguous cases extracted from a corpus and is then applied to the analysis of ambiguous compound nouns. While the work presented is still in progress, a first attempt to syntactically analyse a test set of 244 examples shows 75% correctness. Future work is aimed at improving this accuracy and extending the technique to assign semantic role information, thus producing a complete interpretation. . | CONCEPTUAL ASSOCIATION FOR COMPOUND NOUN ANALYSIS Mark Lauer Microsoft Institute 65 Epping Road North Ryde NSW 2113 t-markl @ AUSTRALIA Abstract This paper describes research toward the automatic interpretation of compound nouns using corpus statistics. An initial study aimed at syntactic disambiguation is presented. The approach presented bases associations upon thesaurus categories. Association data is gathered from unambiguous cases extracted from a corpus and is then applied to the analysis of ambiguous compound nouns. While the work presented is still in progress a first attempt to syntactically analyse a test set of 244 examples shows 75 correctness. Future work is aimed at improving this accuracy and extending the technique to assign semantic role information thus producing a complete interpretation. INTRODUCTION Compound Nouns Compound nouns CNs are a commonly occurring construction in language consisting of a sequence of nouns acting as a noun pottery coffee mug for example. For a detailed linguistic theory of compound noun syntax and semantics see Levi 1978 . Compound nouns are analysed syntactically by means of the rule N N N applied recursively. Compounds of more than two nouns are ambiguous in syntactic structure. A necessary part of producing an interpretation of a CN is an analysis of the attachments within the compound. Syntactic parsers cannot choose an appropriate analysis because attachments are not syntactically governed. The current work presents a system for automatically deriving a syntactic analysis of arbitrary CNs in English using corpus statistics. Task description The initial task can be formulated as choosing the most probable binary bracketing for a given noun sequence known to form a compound noun without knowledge of the context. . pottery coffee mug coffee mug holder Corpus Statistics The need for wide ranging lexical-semantic knowledge to support NLP commonly referred to as the ACQUISITION PROBLEM has generated a .