tailieunhanh - Báo cáo khoa học: "Learning for Microblogs with Distant Supervision: Political Forecasting with Twitter"

Microblogging websites such as Twitter offer a wealth of insight into a population’s current mood. Automated approaches to identify general sentiment toward a particular topic often perform two steps: Topic Identification and Sentiment Analysis. Topic Identification first identifies tweets that are relevant to a desired topic (., a politician or event), and Sentiment Analysis extracts each tweet’s attitude toward the topic. Many techniques for Topic Identification simply involve selecting tweets using a keyword search. Here, we present an approach that instead uses distant supervision to train a classifier on the tweets returned by the search. We show that distant. | Learning for Microblogs with Distant Supervision Political Forecasting with Twitter Micol Marchetti-Bowick Microsoft Corporation 475 Brannan Street San Francisco CA 94122 micolmb@ Nathanael Chambers Department of Computer Science United States Naval Academy Annapolis MD 21409 nchamber@ Abstract Microblogging websites such as Twitter offer a wealth of insight into a population s current mood. Automated approaches to identify general sentiment toward a particular topic often perform two steps Topic Identification and Sentiment Analysis. Topic Identification first identifies tweets that are relevant to a desired topic . a politician or event and Sentiment Analysis extracts each tweet s attitude toward the topic. Many techniques for Topic Identification simply involve selecting tweets using a keyword search. Here we present an approach that instead uses distant supervision to train a classifier on the tweets returned by the search. We show that distant supervision leads to improved performance in the Topic Identification task as well in the downstream Sentiment Analysis stage. We then use a system that incorporates distant supervision into both stages to analyze the sentiment toward President Obama expressed in a dataset of tweets. Our results better correlate with Gallup s Presidential Job Approval polls than previous work. Finally we discover a surprising baseline that outperforms previous work without a Topic Identification stage. 1 Introduction Social networks and blogs contain a wealth of data about how the general public views products campaigns events and people. Automated algorithms can use this data to provide instant feedback on what people are saying about a topic. Two challenges in building such algorithms are 1 identifying topic-relevant posts and 2 identifying the attitude of each post toward the topic. This paper studies distant supervision Mintz et al. 2009 as a solution to both challenges. We apply our approach to the problem of

TỪ KHÓA LIÊN QUAN