tailieunhanh - Báo cáo khoa học: "Extracting and modeling durations for habits and events from Twitter"
We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped, decision tree. For each verb lemma, associated duration information was collected for episodic and habitual uses of the verb. | Extracting and modeling durations for habits and events from Twitter Jennifer Williams Department of Linguistics Georgetown University Washington . USA jaw97@ Graham Katz Department of Linguistics Georgetown University Washington . USA egk7@ Abstract We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped decision tree. For each verb lemma associated duration information was collected for episodic and habitual uses of the verb. Summary statistics for 483 verb lemmas and their typical habit and episode durations has been compiled and made available. This automatically generated duration information is broadly comparable to hand-annotation. 1 Introduction Implicit information about temporal durations is crucial to any natural language processing task involving temporal understanding and reasoning. This information comes in many forms among them knowledge about typical durations for events and knowledge about typical times at which an event occurs. We know that lunch lasts for half an hour to an hour and takes place around noon a game of chess lasts from a few minutes to a few hours and can occur any time and so when we interpret a text such as After they ate lunch they played a game of chess and then went to the zoo we can infer that the zoo visit probably took place in the early afternoon. In this paper we focus on duration. Hand-annotation of event durations is expensive slow Pan et al. 2011 so it is desirable to 223 automatically determine typical durations. This paper describes a method for automatically extracting information about typical durations for events from tweets posted to the Twitter microblogging site. Twitter is a rich resource for information about everyday events - people post their tweets to .
đang nạp các trang xem trước