tailieunhanh - Báo cáo khoa học: "A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features"

Distributional similarity has been widely used to capture the semantic relatedness of words in many NLP tasks. However, various parameters such as similarity measures must be handtuned to make it work effectively. Instead, we propose a novel approach to synonym identification based on supervised learning and distributional features, which correspond to the commonality of individual context types shared by word pairs. Considering the integration with pattern-based features, we have built and compared five synonym classifiers. . | A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features Masato Hagiwara Graduate School of Information Science Nagoya University Furo-cho Chikusa-ku Nagoya 464-8603 JAPAN hagiwara@ Abstract Distributional similarity has been widely used to capture the semantic relatedness of words in many NLP tasks. However various parameters such as similarity measures must be hand-tuned to make it work effectively. Instead we propose a novel approach to synonym identification based on supervised learning and distributional features which correspond to the commonality of individual context types shared by word pairs. Considering the integration with pattern-based features we have built and compared five synonym classifiers. The evaluation experiment has shown a dramatic performance increase of over 120 on the F-1 measure basis compared to the conventional similarity-based classification. On the other hand the pattern-based features have appeared almost redundant. 1 Introduction Semantic similarity of words is one of the most important lexical knowledge for NLP tasks including word sense disambiguation and automatic thesaurus construction. To measure the semantic relatedness of words a concept called distributional similarity has been widely used. Distributional similarity represents the relatedness of two words by the commonality of contexts the words share based on the distributional hypothesis Harris 1985 which states that semantically similar words share similar contexts. A number of researches which utilized distributional similarity have been conducted including Hindle 1990 Lin 1998 Geffet and Dagan 2004 and many others. Although they have been successful in acquiring related words various parameters such as similarity measures and weighting are involved. As Weeds et al. 2004 pointed out it is not at all obvious that one universally best measure exists for all application thus they must be tuned by hand in an .

TỪ KHÓA LIÊN QUAN