tailieunhanh - Báo cáo khoa học: "An Evaluation Method of Words Tendency using Decision "

In every text, some words have frequency appearance and are considered as keywords because they have strong relationship with the subjects of their texts, these words frequencies change with time-series variation in a given period. However, in traditional text dealing methods and text search techniques, the importance of frequency change with time-series variation is not considered. Therefore, traditional methods could not correctly determine index of word’s popularity in a given period. In this paper, a new method is proposed to estimate automatically the stability classes (increasing, relatively constant, and decreasing) that indicate word’s popularity with timeseries variation based on. | An Evaluation Method of Words Tendency using Decision Tree El-Sayed Atlam Masaki Oono and Jun-ichi Aoe Department of Information Science and Intelligent Systems University of Tokushima Tokushima 770-8506 Japan. E-mail atlam@ ABSTRACT In every text some words have frequency appearance and are considered as keywords because they have strong relationship with the subjects of their texts these words frequencies change with time-series variation in a given period. However in traditional text dealing methods and text search techniques the importance of frequency change with time-series variation is not considered. Therefore traditional methods could not correctly determine index of word s popularity in a given period. In this paper a new method is proposed to estimate automatically the stability classes increasing relatively constant and decreasing that indicate word s popularity with timeseries variation based on the frequency change in past texts data. At first learning data was produced by defining four attributes to measure frequency change of word quantitatively these four attributes were extracted automatically from electronic texts. According to the comparison between the evaluation of the decision tree results and manually Human results F-measures of increasing relatively constant and decreasing classes were and respectively and the effectiveness of this method is achieved. Keywords time-series variation words popularity decision tree CNN newspaper. 1. INTRODUCTION Recently there are many large electronic texts and computers are processing analysis them widely. Determination of important keywords is crucial in successful modern Information Retrieval IR . Usually frequency of some words in the texts are changing by time time-series variation and these words are commonly connected with particular period . influenza is more common in winter . According to Hisano 2000 some Chinese characters Kanji appear in newspaper reports .