tailieunhanh - Báo cáo khoa học: "Time Period Identification of Events in Text"

This study aims at identifying when an event written in text occurs. In particular, we classify a sentence for an event into four time-slots; morning, daytime, evening, and night. To realize our goal, we focus on expressions associated with time-slot (time-associated words). However, listing up all the time-associated words is impractical, because there are numerous time-associated expressions. | Time Period Identification of Events in Text Taichi Noro Takashi Inui Hiroya Takamura Manabu Okumura interdisciplinary Graduate School of Science and Engineering Tokyo Institute of Technology 4259 Nagatsuta-cho Midori-ku Yokohama Kanagawa Japan Japan Society for the Promotion of Science Precision and Intelligence Laboratory Tokyo Institute of Technology norot tinui @ takamura oku @ Abstract This study aims at identifying when an event written in text occurs. In particular we classify a sentence for an event into four time-slots morning daytime evening and night. To realize our goal we focus on expressions associated with time-slot time-associated words . However listing up all the time-associated words is impractical because there are numerous time-associated expressions. We therefore use a semi-supervised learning method the Naive Bayes classifier backed up with the Expectation Maximization algorithm in order to iteratively extract time-associated words while improving the classifier. We also propose to use Support Vector Machines to filter out noisy instances that indicates no specific time period. As a result of experiments the proposed method achieved of accuracy and outperformed other methods. 1 Introduction In recent years the spread of the internet has accelerated. The documents on the internet have increased their importance as targets of business marketing. Such circumstances have evoked many studies on information extraction from text especially on the internet such as sentiment analysis and extraction of location information. In this paper we focus on the extraction of temporal information. Many authors of documents on the web often write about events in their daily life. Identifying when the events occur provides us valuable information. For example we can use temporal information as a new axis in the information retrieval. From time-annotated text companies can figure out when customers use their products. We can