tailieunhanh - Báo cáo khoa học: "Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input"

We address the task of unsupervised topic segmentation of speech data operating over raw acoustic information. In contrast to existing algorithms for topic segmentation of speech, our approach does not require input transcripts. Our method predicts topic changes by analyzing the distribution of reoccurring acoustic patterns in the speech signal corresponding to a single speaker. The algorithm robustly handles noise inherent in acoustic matching by intelligently aggregating information about the similarity profile from multiple local comparisons. . | Making Sense of Sound Unsupervised Topic Segmentation over Acoustic Input Igor Malioutov Alex Park Regina Barzilay and James Glass Massachusetts Institute of Technology igorm malex regina glass @ Abstract We address the task of unsupervised topic segmentation of speech data operating over raw acoustic information. In contrast to existing algorithms for topic segmentation of speech our approach does not require input transcripts. Our method predicts topic changes by analyzing the distribution of reoccurring acoustic patterns in the speech signal corresponding to a single speaker. The algorithm robustly handles noise inherent in acoustic matching by intelligently aggregating information about the similarity profile from multiple local comparisons. Our experiments show that audio-based segmentation compares favorably with transcriptbased segmentation computed over noisy transcripts. These results demonstrate the desirability of our method for applications where a speech recognizer is not available or its output has a high word error rate. 1 Introduction An important practical application of topic segmentation is the analysis of spoken data. Paragraph breaks section markers and other structural cues common in written documents are entirely missing in spoken data. Insertion of these structural markers can benefit multiple speech processing applications including audio browsing retrieval and summarization. Not surprisingly a variety of methods for topic segmentation have been developed in the 504 past Beeferman et al. 1999 Galley et al. 2003 Dielmann and Renals 2005 . These methods typically assume that a segmentation algorithm has access not only to acoustic input but also to its transcript. This assumption is natural for applications where the transcript has to be computed as part of the system output or it is readily available from other system components. However for some domains and languages the transcripts may not be available or the recognition .