tailieunhanh - Báo cáo khoa học: "Knowledge Acquisition from Texts : Using an Automatic Clustering Method Based on Noun-Modifier Relationship"

We describe the early stage of our methodology of knowledge acquisition from technical texts. First, a partial morpho-syntactic analysis is performed to extract "candidate terms". Then, the knowledge engineer, assisted by an automatic clustering tool, builds the "conceptual fields" of the domain. We focus on this conceptual analysis stage, describe the data prepared from the results of the morpho-syntactic analysis and show the results of the clustering module and their interpretation. We found that syntactic links represent good descriptors for candidate terms clustering since the clusters are often easily interpreted as "conceptual fields". . | Knowledge Acquisition from Texts Using an Automatic Clustering Method Based on Noun-Modifier Relationship Houssem Assadi Electricité de France - DER IMA and Paris 6 University - LAFORIA 1 avenue du General de Gaulle F-92141 Clamart France Abstract We describe the early stage of our methodology of knowledge acquisition from technical texts. First a partial morpho-syntactic analysis is performed to extract candidate terms . Then the knowledge engineer assisted by an automatic clustering tool builds the conceptual fields of the domain. We focus on this conceptual analysis stage describe the data prepared from the results of the morpho-syntactic analysis and show the results of the clustering module and their interpretation. We found that syntactic links represent good descriptors for candidate terms clustering since the clusters are often easily interpreted as conceptual fields . 1 Introduction Knowledge Acquisition KA from technical texts is a growing research area among the Knowledge-Based Systems KBS research community since documents containing a large amount of technical knowledge are available on electronic media. We focus on the methodological aspects of KA from texts. In order to build up the model of the subject field we need to perform a corpus-based semantic analysis. Prior to the semantic analysis morpho-syntactic analysis is performed by LEXTER a terminology extraction software Bourigault et al. 1996 LEXTER gives a network of noun phrases which are likely to be terminological units and which are connected by syntactical links. When dealing with medium-sized corpora a few hundred thousand words the terminological network is too voluminous for analysis by hand and it becomes necessary to use data analysis tools to process it. The main idea to make KA from medium-sized corpora a feasible and efficient task is to perform a robust syntactic analysis using LEXTER see section 2 followed by a semi-automatic semantic analysis where .

TỪ KHÓA LIÊN QUAN