tailieunhanh - Báo cáo khoa học: "Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features"

This paper presents a Bayesian decision framework that performs automatic story segmentation based on statistical modeling of one or more lexical chain features. Automatic story segmentation aims to locate the instances in time where a story ends and another begins. A lexical chain is formed by linking coherent lexical items chronologically. A story boundary is often associated with a significant number of lexical chains ending before it, starting after it, as well as a low count of chains continuing through it. We devise a Bayesian framework to capture such behavior, using the lexical chain features of start, continuation and. | Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features Wai-Kit Lo The Chinese University of Hong Kong Hong Kong China wklo@ Wenying Xiong The Chinese University of Hong Kong Hong Kong China wyxiong@ Helen Meng The Chinese University of Hong Kong Hong Kong China hmmeng@ Abstract This paper presents a Bayesian decision framework that performs automatic story segmentation based on statistical modeling of one or more lexical chain features. Automatic story segmentation aims to locate the instances in time where a story ends and another begins. A lexical chain is formed by linking coherent lexical items chronologically. A story boundary is often associated with a significant number of lexical chains ending before it starting after it as well as a low count of chains continuing through it. We devise a Bayesian framework to capture such behavior using the lexical chain features of start continuation and end. In the scoring criteria lexical chain starts ends are modeled statistically with the Weibull and uniform distributions at story boundaries and non-boundaries respectively. The normal distribution is used for lexical chain continuations. Full combination of all lexical chain features gave the best performance F1 . We found that modeling chain continuations contributes significantly towards segmentation performance. 1 Introduction Automatic story segmentation is an important precursor in processing audio or video streams in large information repositories. Very often these continuous streams of data do not come with boundaries that segment them into semantically coherent units or stories. The story unit is needed for a wide range of spoken language information retrieval tasks such as topic tracking clustering indexing and retrieval. To perform automatic story segmentation there are three categories of cues available lexical cues from transcriptions prosodic cues .

TÀI LIỆU LIÊN QUAN