Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Improving Automatic Indexing through Concept Combination and Term Enrichment"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Although indexes may overlap, the output of an automatic indexer is generally presented as a fiat and unstructured list of terms. Our purpose is to exploit term overlap and embedding so as to yield a substantial qualitative and quantitative improvement in automatic indexing through concept combination. The increase in the volume of indexing is 10.5% for free indexing and 52.3% for controlled indexing. The resulting structure of the indexed corpus is a partial conceptual analysis. 1 Overview The method, proposed here for improving automatic indexing, builds partial syntactic structures by combining overlapping indexes. . | Improving Automatic Indexing through Concept Combination and Term Enrichment Christian Jacquemin LIMSI-CNRS BP 133 F-91403 ORSAY Cedex FRANCE jacqueminQlimsi.fr Abstract Although indexes may overlap the output of an automatic indexer is generally presented as a flat and unstructured list of terms. Our purpose is to exploit term overlap and embedding so as to yield a substantial qualitative and quantitative improvement in automatic indexing through concept combination. The increase in the volume of indexing is 10.5 for free indexing and 52.3 for controlled indexing. The resulting structure of the indexed corpus is a partial conceptual analysis. 1 Overview The method proposed here for improving automatic indexing builds partial syntactic structures by combining overlapping indexes. It is complemented by a method for term acquisition which is described in Jacquemin 1996 . The text thus structured is reindexed new indexes are produced and new candidates are discovered. Most NLP approaches to automatic indexing concern free indexing and rely on large-scale shallow parsers with a particular concern for dependency relations Strzalkowski 1996 . For the purpose of controlled indexing we exploit the output of a NLP-based indexer and the structural relations between terms and variants in order to 1 enhance the coverage of the indexes 2 incrementally build an a posteriori conceptual analysis of the document and 3 interweave controlled indexing free indexing and thesaurus acquisition. These 3 goals are achieved by CONPARS CONceptual PARSer presented in this paper and illustrated by Figure 1. CONPARS is based on the output of We thank INIST-CNRS for providing US with thesauri and corpora in the agricultural domain and AFIRST for supporting this research through the SKETCHI project. a part-of-speech tagger for French described in Tzoukermann and Radev 1997 and FASTR a controlled indexer Jacquemin et al. 1997 . All the experiments reported in this paper are performed on data in .