Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Constructing Semantic Space Models from Parsed Corpora"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Traditional vector-based models use word co-occurrence counts from large corpora to represent lexical meaning. In this paper we present a novel approach for constructing semantic spaces that takes syntactic relations into account. We introduce a formalisation for this class of models and evaluate their adequacy on two modelling tasks: semantic priming and automatic discrimination of lexical relations. | Constructing Semantic Space Models from Parsed Corpora Sebastian Padó Department of Computational Linguistics Saarland University POBox 15 11 50 66041 Saarbrucken Germany pado@coli.uni-sb.de Mirella Lapata Department of Computer Science University of Sheffield Regent Court 211 Portobello Street Sheffield S1 4DP UK mlap@dcs.shef.ac.uk Abstract Traditional vector-based models use word co-occurrence counts from large corpora to represent lexical meaning. In this paper we present a novel approach for constructing semantic spaces that takes syntactic relations into account. We introduce a formalisation for this class of models and evaluate their adequacy on two modelling tasks semantic priming and automatic discrimination of lexical relations. 1 Introduction Vector-based models of word co-occurrence have proved a useful representational framework for a variety of natural language processing NLP tasks such as word sense discrimination Schutze 1998 text segmentation Choi et al. 2001 contextual spelling correction Jones and Martin 1997 automatic thesaurus extraction Grefenstette 1994 and notably information retrieval Salton et al. 1975 . Vector-based representations of lexical meaning have been also popular in cognitive science and figure prominently in a variety of modelling studies ranging from similarity judgements McDonald 2000 to semantic priming Lund and Burgess 1996 Lowe and McDonald 2000 and text comprehension Landauer and Dumais 1997 . In this approach semantic information is extracted from large bodies of text under the assumption that the context surrounding a given word provides important information about its meaning. The semantic properties of words are represented by vectors that are constructed from the observed distributional patterns of co-occurrence of their neighbouring words. Co-occurrence information is typically collected in a frequency matrix where each row corresponds to a unique target word and each column represents its linguistic context. .