tailieunhanh - Learning Similarity Metrics for Event Identification in Social Media

Several online RSS service providers (essentially, Web- based RSS readers) have proposed alternative solu- tions [2, 3]. In these “outsourced aggregation” scenar- ios, a centralized service provides a remote procedure interface which end-user applications may be built upon (or refactored to use). Such an application would store all its state—the set of subscribed feeds, the set of “old” and “new” entries—on the central server. It would then poll only this server to receive all updated data. The central RSS aggregation service would take responsibil- ity for polling the authoritative RSS feeds in the wider Internet. This addresses the bandwidth problem, in a way: A web site owner will certainly service fewer RSS requests as. | Learning Similarity Metrics for Event Identification in Social Media Hila Becker Columbia University hila@ Mor Naaman Rutgers University mor@ Luis Gravano Columbia University gravano@ ABSTRACT Social media sites . Flickr YouTube and Facebook are a popular distribution outlet for users looking to share their experiences and interests on the Web. These sites host substantial amounts of user-contributed materials . photographs videos and textual content for a wide variety of real-world events of different type and scale. By automatically identifying these events and their associated user-contributed social media documents which is the focus of this paper we can enable event browsing and search in state-of-the-art search engines. To address this problem we exploit the rich context associated with social media content including user-provided annotations . title tags and automatically generated information . content creation time . Using this rich context which includes both textual and non-textual features we can define appropriate document similarity metrics to enable online clustering of media to events. As a key contribution of this paper we explore a variety of techniques for learning multi-feature similarity metrics for social media documents in a principled manner. We evaluate our techniques on large-scale real-world datasets of event images from Flickr. Our evaluation results suggest that our approach identifies events and their associated social media documents more effectively than the state-of-the-art strategies on which we build. Categories and Subject Descriptors Information Storage and Retrieval Information Search and Retrieval General Terms Experimentation Measurement Keywords Event Identification Social Media Similarity Metric Learning Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or .