tailieunhanh - Báo cáo khoa học: "Automatic Segmentation of Multiparty Dialogue"

In this paper, we investigate the problem of automatically predicting segment boundaries in spoken multiparty dialogue. We extend prior work in two ways. We first apply approaches that have been proposed for predicting top-level topic shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription. | Automatic Segmentation of Multiparty Dialogue Pei-Yun Hsueh School of Informatics University of Edinburgh Edinburgh EH8 9LW Gb Johanna D. Moore School of Informatics University of Edinburgh Edinburgh EH8 9LW GB Steve Renals School of Informatics University of Edinburgh Edinburgh EH8 9LW GB Abstract In this paper we investigate the problem of automatically predicting segment boundaries in spoken multiparty dialogue. We extend prior work in two ways. We first apply approaches that have been proposed for predicting top-level topic shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription. Examination of the effect of features shows that predicting top-level and predicting subtopic boundaries are two distinct tasks 1 for predicting subtopic boundaries the lexical cohesion-based approach alone can achieve competitive results 2 for predicting top-level boundaries the machine learning approach that combines lexical-cohesion and conversational features performs best and 3 conversational cues such as cue phrases and overlapping speech are better indicators for the toplevel prediction task. We also find that the transcription errors inevitable in ASR output have a negative impact on models that combine lexical-cohesion and conversational features but do not change the general preference of approach for the two tasks. 1 Introduction Text segmentation . determining the points at which the topic changes in a stream of text plays an important role in applications such as topic detection and tracking summarization automatic genre detection and information retrieval and extraction Pevzner and Hearst 2002 . In recent work researchers have applied these techniques to corpora such as newswire feeds transcripts of radio broadcasts and spoken dialogues in order to facilitate browsing information retrieval and topic detection Allan et al.

TỪ KHÓA LIÊN QUAN