tailieunhanh - Báo cáo khoa học: "Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions"

We present a novel framework for the discovery and representation of general semantic relationships that hold between lexical items. We propose that each such relationship can be identified with a cluster of patterns that captures this relationship. We give a fully unsupervised algorithm for pattern cluster discovery, which searches, clusters and merges highfrequency words-based patterns around randomly selected hook words. Pattern clusters can be used to extract instances of the corresponding relationships. . | Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions Dmitry Davidov ICNC Hebrew University of Jerusalem dmitry@ Ari Rappoport Institute of Computer Science Hebrew University of Jerusalem arir@ Abstract We present a novel framework for the discovery and representation of general semantic relationships that hold between lexical items. We propose that each such relationship can be identified with a cluster of patterns that captures this relationship. We give a fully unsupervised algorithm for pattern cluster discovery which searches clusters and merges high-frequency words-based patterns around randomly selected hook words. Pattern clusters can be used to extract instances of the corresponding relationships. To assess the quality of discovered relationships we use the pattern clusters to automatically generate SAT analogy questions. We also compare to a set of known relationships achieving very good results in both methods. The evaluation done in both English and Russian substantiates the premise that our pattern clusters indeed reflect relationships perceived by humans. 1 Introduction Semantic resources can be very useful in many NLP tasks. Manual construction of such resources is labor intensive and susceptible to arbitrary human decisions. In addition manually constructed semantic databases are not easily portable across text domains or languages. Hence there is a need for developing semantic acquisition algorithms that are as unsupervised and language independent as possible. A fundamental type of semantic resource is that of concepts represented by sets of lexical items and their inter-relationships. While there is relatively good agreement as to what concepts are and which concepts should exist in a lexical resource identifying types of important lexical relationships is a rather difficult task. Most established resources . WordNet represent

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.