tailieunhanh - Báo cáo khoa học: "Keyword Extraction using Term-Domain Interdependence for Dictation of Radio News"
In this paper, we propose keyword extraction method for dictation of radio news which consists of several domains. In our method, newspaper articles which are automatically classified into suitable domains are used in order to calculate feature vectors. The feature vectors shows term-domain interdependence and are used for selecting a suitable domain of each part of radio news. Keywords are extracted by using the selected domain. The results of keyword extraction experiments showed that our methods are robust and effective for dictation of radio news. . | Keyword Extraction using Term-Domain Interdependence for Dictation of Radio News Yoshimi Suzuki Fumiyo Fukumoto Yoshihiro Sekiguchi Dept of Computer Science and Media Engineering Yamanashi University 4-3-11 Takeda Kofu 400 Japan sekigutiQsaiko . Abstract In this paper we propose keyword extraction method for dictation of radio news which consists of several domains. In our method newspaper articles which are automatically classified into suitable domains are used in order to calculate feature vectors. The feature vectors shows term-domain interdependence and are used for selecting a suitable domain of each part of radio news. Keywords are extracted by using the selected domain. The results of keyword extraction experiments showed that our methods are robust and effective for dictation of radio news. 1 Introduction Recently many speech recognition systems are designed for various tasks. However most of them are restricted to certain tasks for example a tourist information and a hamburger shop. Speech recognition systems for the task which consists of various domains seems to be required for some tasks . a closed caption system for TV and a transcription system of public proceedings. In order to recognize spoken discourse which has several domains the speech recognition system has to have large vocabulary. Therefore it is necessary to limit word search space using linguistic restricts . domain identification. There have been many studies of domain identification which used term weighting et al. 1994 Yokoi et al. 1997 . McDonough proposed a topic identification method on switch board corpus. He reported that the result was best when the number of words in keyword dictionary was about 800. In his method duration of discourses of switch board corpora is rather long and there are many keywords in the discourse. However for a short discourse there are few keywords in a short discourse. Yokoi also proposed a
đang nạp các trang xem trước