tailieunhanh - Báo cáo khoa học: "Unsupervised Topic Identification by Integrating Linguistic and Visual Information Based on Hidden Markov Models"
This paper presents an unsupervised topic identification method integrating linguistic and visual information based on Hidden Markov Models (HMMs). We employ HMMs for topic identification, wherein a state corresponds to a topic and various features including linguistic, visual and audio information are observed. Our experiments on two kinds of cooking TV programs show the effectiveness of our proposed method. | Unsupervised Topic Identification by Integrating Linguistic and Visual Information Based on Hidden Markov Models Tomohide Shibata Graduate School of Information Science and Technology University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-8656 Japan shibata@ Sadao Kurohashi Graduate School of Informatics Kyoto University Yoshida-honmachi Sakyo-ku Kyoto 606-8501 Japan kuro@ Abstract This paper presents an unsupervised topic identification method integrating linguistic and visual information based on Hidden Markov Models HMMs . We employ HMMs for topic identification wherein a state corresponds to a topic and various features including linguistic visual and audio information are observed. Our experiments on two kinds of cooking TV programs show the effectiveness of our proposed method. 1 Introduction Recent years have seen the rapid increase of multimedia contents with the continuing advance of information technology. To make the best use of multimedia contents it is necessary to segment them into meaningful segments and annotate them. Because manual annotation is extremely expensive and time consuming automatic annotation technique is required. In the field of video analysis there have been a number of studies on shot analysis for video retrieval or summarization highlight extraction using Hidden Markov Models HMMs . Chang et al. 2002 Nguyen et al. 2005 et al. 2005 . These studies first segmented videos into shots within which the camera motion is continuous and extracted features such as color histograms and motion vectors. Then they classified the shots based on HMMs into several classes for baseball sports video for example pitch view running overview or audience view . In these studies to achieve high accuracy they relied on handmade domain-specific knowledge or trained HMMs with manually labeled data. Therefore they cannot be easily extended to new domains on a large scale. In addition although linguistic information .
đang nạp các trang xem trước