tailieunhanh - Báo cáo khoa học: "An Integrated Platform for Computer-Aided Terminology"

Since they cluster terms through statistical measures of context similarities, these tools exploit recurring situations. Since single-word terms denote broader concepts than multi-word terms, they appear more frequently in corpora and are therefore more appropriate for statistical clustering. The contribution of this paper is to propose an integrated platform for computer-aided term extraction and structuring that results from the combination of LEXTER, a Term Extraction tool (Bouriganlt et al., 1996), and FASTR 1, a Term Normalization tool (Jacquemin et al., 1997). . | Proceedings of EACL 99 TERM EXTRACTION TERM CLUSTERING An Integrated Platform for Computer-Aided Terminology Didier Bourigault ERSS UMR 5610 CNRS Maison de la Recherche 5 allées Antonio Machado 31058 Toulouse cedex FRANCE Christian Jacquemin LIMSI-CNRS BP 133 91403 ORSAY FRANCE jacquemin@ Abstract A novel technique for automatic thesaurus construction is proposed. It is based on the complementary use of two tools 1 a Term Extraction tool that acquires term candidates from tagged corpora through a shallow grammar of noun phrases and 2 a Term Clustering tool that groups syntactic variants insertions . Experiments performed on corpora in three technical domains yield clusters of term candidates with precision rates between 93 and 98 . 1 Computational Terminology In the domain of corpus-based terminology two types of tools are currently developed tools for automatic term extraction Bourigault 1993 Justeson and Katz 1995 Daille 1996 Brun 1998 and tools for automatic thesaurus construction Grefenstette 1994 . These tools are expected to be complementary in the sense that the links and clusters proposed in automatic thesaurus construction can be exploited for structuring the term candidates produced by the automatic term extractors. In fact complementarity is difficult because term extractors provide mainly multi-word terms while tools for automatic thesaurus construction yield clusters of single-word terms. On the one hand term extractors focus on multi-word terms for ontological motivations single-word terms are too polysemous and too generic and it is therefore necessary to provide the user with multi-word terms that represent finer concepts in a domain. The counterpart of this focus is that automatic term extractors yield important volumes of data that require structuring through a postprocessor. On the other hand tools for automatic thesaurus construction focus on single-word terms for practical reasons. Since they cluster terms .

TỪ KHÓA LIÊN QUAN