tailieunhanh - Báo cáo khoa học: "Predicting Strong Associations on the Basis of Corpus Data"

Current approaches to the prediction of associations rely on just one type of information, generally taking the form of either word space models or collocation measures. At the moment, it is an open question how these approaches compare to one another. In this paper, we will investigate the performance of these two types of models and that of a new approach based on compounding. The best single predictor is the log-likelihood ratio, followed closely by the document-based word space model. We will show, however, that an ensemble method that combines these two best approaches with the compounding algorithm achieves. | Predicting Strong Associations on the Basis of Corpus Data Yves Peirsman Dirk Geeraerts Research Foundation - Flanders QLVL University of Leuven QLVL University of Leuven Leuven Belgium Leuven Belgium Abstract Current approaches to the prediction of associations rely on just one type of information generally taking the form of either word space models or collocation measures. At the moment it is an open question how these approaches compare to one another. In this paper we will investigate the performance of these two types of models and that of a new approach based on compounding. The best single predictor is the log-likelihood ratio followed closely by the document-based word space model. We will show however that an ensemble method that combines these two best approaches with the compounding algorithm achieves an increase in performance of almost 30 over the current state of the art. 1 Introduction Associations are words that immediately come to mind when people hear or read a given cue word. For instance a word like pepper calls up salt and wave calls up sea. Aitchinson 2003 and Schulte im Walde and Melinger 2005 show that such associations can be motivated by a number of factors from semantic similarity to collocation. Current computational models of association however tend to focus on one of these by using either collocation measures Michelbacher et al. 2007 or word space models Sahlgren 2006 Peirsman et al. 2008 . To this day two general problems remain. First the literature lacks a comprehensive comparison between these general types of models. Second we are still looking for an approach that combines several sources of information so as to correctly predict a larger variety of associations. Most computational models of semantic relations aim to model semantic similarity in particu lar Landauer and Dumais 1997 Lin 1998 Pado and Lapata 2007 . In Natural Language Processing these models have .

TỪ KHÓA LIÊN QUAN