Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Automatic Collection of Related Terms from the Web"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

This paper proposes a method of collecting a dozen terms that are closely related to a given seed term. The proposed method consists of three steps. The first step, compiling corpus step, collects texts that contain the given seed term by using search engines. The second step, automatic term recognition, extracts important terms from the corpus by using Nakagawa’s method. These extracted terms become the candidates for the final step. The final step, filtering step, removes inappropriate terms from the candidates based on search engine hits. An evaluation result shows that the precision of the method is 85%. . | Automatic Collection of Related Terms from the Web Satoshi Sato and Yasuhiro Sasaki Graduate School of Informatics Kyoto University Sakyo Kyoto 606-8501 Japan sato@i.kyoto-u.ac.jp sasaki@pine.kuee.kyoto-u.ac.jp Abstract This paper proposes a method of collecting a dozen terms that are closely related to a given seed term. The proposed method consists of three steps. The first step compiling corpus step collects texts that contain the given seed term by using search engines. The second step automatic term recognition extracts important terms from the corpus by using Naka-gawa s method. These extracted terms become the candidates for the final step. The final step filtering step removes inappropriate terms from the candidates based on search engine hits. An evaluation result shows that the precision of the method is 85 . 1 Introduction This study aims to realize an automatic method of collecting technical terms that are related to a given seed term. In case natural language processing is given as a seed term the method is expected to collect technical terms that are related to natural language processing such as morphological analysis parsing information retrieval and machine translation. The target application of the method is automatic or semi-automatic compilation of a glossary or technical-term dictionary for a certain domain. Recursive application of the method enables to collect a list of terms that are used in a certain domain the list becomes a glossary of the domain. A technical-term dictionary can be compiled by adding an explanation for every term in the glossary which is performed by term explainer Sato 2001 . Figure 1 System configuration Automatic acquisition of technical terms in a certain domain has been studied as automatic term recognition Kageura and Umino 1996 Kageura and Koyama 2000 and the methods require a large corpus that are manually prepared for a target domain. In contrast our system which is proposed in this paper requires only a seed .