tailieunhanh - Báo cáo khoa học: "Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax"

A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. . | Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax Evelyne Tzoukermann Bell Laboratories Lucent Technologies 700 Mountain Avenue 2D-448 Christian Jacquemin Judith L. Klavans Institut de Recherche en Informatique Center for Research de Nantes BP 92208 on Information Access 2 chemin de la Houssinière Columbia University 44322 NANTES Cedex 3 535 w. 114th Street MC 1101 . Box 636 FRANCE New York NY 10027 USA Murray Hill NJ 07974 USA j Abstract A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger a derivational morphological processor for analysis and generation and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall and implications for indexing and retrieval are discussed. 1 Motivation Terms are known to be excellent descriptors of the informational content of textual documents Srinivasan 1996 but they are subject to numerous linguistic variations. Terms cannot be retrieved properly with coarse text simplification techniques . stemming their identification requires precise and efficient NLP techniques. We have developed a domain independent system for automatic term recognition from unrestricted text. The system presented in this paper takes as input a list of controlled terms and a corpus it detects and marks occurrences of term We would like to thank the NLP Group of Columbia University Bell Laboratories - Lucent Technologies and the Institut Universitaire de Technologic de Nantes for their support of the exchange visitor .

TỪ KHÓA LIÊN QUAN