tailieunhanh - Báo cáo khoa học: "Hierarchy Extraction based on Inclusion of Appearance"
In this paper, we propose a method of automatically extracting word hierarchies based on the inclusion relation of appearance patterns from corpora. We apply a complementary similarity measure to find a hierarchical word structure. This similarity measure was developed for the recognition of degraded machineprinted text in the field and can be applied to estimate one-to-many relations. Our purpose is to extract word hierarchies from corpora automatically. | Hierarchy Extraction based on Inclusion of Appearance Eiko Yamamoto Kyoko Kanzaki Hitoshi Isahara Computational Linguistics Group National Institute of Information and Communications Technology 3-5 Hikari-dai Seika-cho Soraku-gun Kyoto 619-0289 Japan. eiko@ kanzaki@ isahara@ Abstract In this paper we propose a method of automatically extracting word hierarchies based on the inclusion relation of appearance patterns from corpora. We apply a complementary similarity measure to find a hierarchical word structure. This similarity measure was developed for the recognition of degraded machine-printed text in the field and can be applied to estimate one-to-many relations. Our purpose is to extract word hierarchies from corpora automatically. As the initial task we attempt to extract hierarchies of abstract nouns cooccurring with adjectives in Japanese and compare with hierarchies in the EDR electronic dictionary. 1 Introduction The hierarchical relations of words are useful as language resources. Hierarchical semantic lexical databases such as WordNet Miller et al. 1990 and the EDR electronic dictionary 1995 are used for NLP research worldwide to fully understand a word meaning. In current thesauri in the form of hierarchical relations words are categorized manually and classified in a top-down manner based on human intuition. This is a good way to make a lexical database for users having a specific purpose. However word hierarchies based on human intuition tend to vary greatly depending on the lexicographer. In addition hierarchical relations based on various data may be needed depending on each user. Accordingly we try to extract a hierarchical relation of words automatically and statistically. In previous research ways of extracting from definition sentences in dictionaries Tsurumaru et al. 1986 Shoutsu et al. 2003 or from a corpus by using patterns such as a part of is-a or and Berland and Charniak 1999 Caraballo 1999 have been proposed.
đang nạp các trang xem trước