tailieunhanh - Báo cáo khoa học: "AN ASSESSMENT EXTRACTED OF SEMANTIC INFORMATION FROM MACHINE READABLE AUTOMATICALLY DICTIONARIES"
In this paper we provide a quantitative evaluation of information automatically extracted from machine readable dictionaries. Our results show that for any one dictionary, 55-70% of the extracted information is garbled in some way. However, we show that these results can be dramatically reduced to about 6% by combining the information extracted from five dictionaries. It therefore appears that even if individual dictionaries are an unreliable source of semantic information, multiple dictionaries can play an important role in building large lexical-semantic databases. . | AN ASSESSMENT OF SEMANTIC INFORMATION AUTOMATICALLY EXTRACTED FROM MACHINE READABLE DICTIONARIES Jean Véronis1-2and Nancy Ide1 Department of Computer Science VASSAR COLLEGE Poughkeepsie New York 12601 . troupe Rcprésentation et Traitemcnt des Connaissanccs CENTRE National DE LA RECHERCHE SCIENTIFIQUE 31 Ch. Joseph Aiguier 13402 Marseille Cedex 09 France ABSTRACT In this paper we provide a quantitative evaluation of information automatically extracted from machine readable dictionaries. Our results show that for any one dictionary 55-70 of the extracted information is garbled in some way. However we show that these results can be dramatically reduced to about 6 by combining the information extracted from five dictionaries. It therefore appears that even if individual dictionaries are an unreliable source of semantic information multiple dictionaries can play an important role in building large lexical-semantic databases. I. INTRODUCTION In recent years it has become increasingly clear that die limited size of existing computational lexicons and the poverty of the semantic information they contain represents one of the primary bottlenecks in the development of realistic natural language processing NLP systems. The need for extensive lexical and semantic databases is evident in the recent initiation of a number of projects to construct massive generic lexicons for NLP project GENELEX in Europe or EDR in Japan . The manual construction of large lexical-semantic databases demands enormous human resources and there is a growing body of research into the possibility of automatically extracting at least a part of the required lexical and semantic information from everyday dictionaries. Everyday dictionaries arc obviously not structured in a way that enables their immediate use in NLP systems but several studies have shown that relatively simple procedures can be used to extract taxonomies and various other semantic relations for example Amsler 1980 Calzolari 1984 .
đang nạp các trang xem trước