tailieunhanh - Báo cáo khoa học: "Distributional Semantics in Technicolor"
Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. . | Distributional Semantics in Technicolor Elia Bruni University of Trento Gemma Boleda University of Texas at Austin Marco Baroni Nam-Khanh Tran University of Trento Abstract Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques we build visual and multimodal distributional models and compare them to standard textual models. Our results show that while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks accounting for semantic relatedness they are as good or better models of the meaning of words with visual correlates such as color terms even in a nontrivial task that involves nonliteral uses of such words. Moreover we show that visual and textual information are tapping on different aspects of meaning and indeed combining them in multimodal models often improves performance. 1 Introduction Traditional semantic space models represent meaning on the basis of word co-occurrence statistics in large text corpora Turney and Pantel 2010 . These models as well as virtually all work in computational lexical semantics rely on verbal information only while human semantic knowledge also relies on non-verbal experience and representation Louw-erse 2011 crucially on the information gathered through perception. Recent developments in computer vision make it possible to computationally model one vital human perceptual channel vision Mooney 2008 . A few studies have begun to use visual information extracted from images as part of distributional semantic models Bergsma and Van 136 Durme 2011 Bergsma and Goebel 2011 Bruni et al. 2011 Feng and Lapata 2010 Leong and Mihal-cea 2011 . These preliminary studies all focus on how vision may help text-based models in general terms by evaluating performance on for instance word similarity datasets such as WordSim353. This paper .
đang nạp các trang xem trước