Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Unsupervised Topic Modelling for Multi-Party Spoken Discourse"

Huệ Hương 62 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

We present a method for unsupervised topic modelling which adapts methods used in document classiﬁcation (Blei et al., 2003; Grifﬁths and Steyvers, 2004) to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identiﬁcation: automatically segmenting multi-party meetings into topically coherent segments with performance which compares well with previous unsupervised segmentation-only methods (Galley et al., 2003) while simultaneously extracting topics which rate highly when assessed for coherence by human judges. . | Unsupervised Topic Modelling for Multi-Party Spoken Discourse Matthew Purver CSLI Stanford University Stanford CA 94305 UsA mpurver@stanford.edu Thomas L. Griffiths Dept. of Cognitive Linguistic Sciences Brown University Providence RI 02912 USA tomgriffiths@brown.edu Abstract We present a method for unsupervised topic modelling which adapts methods used in document classification Blei et al. 2003 Griffiths and Steyvers 2004 to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identification automatically segmenting multi-party meetings into topically coherent segments with performance which compares well with previous unsupervised segmentation-only methods Galley et al. 2003 while simultaneously extracting topics which rate highly when assessed for coherence by human judges. We also show that this method appears robust in the face of off-topic dialogue and speech recognition errors. 1 Introduction Topic segmentation - division of a text or discourse into topically coherent segments - and topic identification - classification of those segments by subject matter - are joint problems. Both are necessary steps in automatic indexing retrieval and summarization from large datasets whether spoken or written. Both have received significant attention in the past see Section 2 but most approaches have been targeted at either text or monologue and most address only one of the two issues usually for the very good reason that the dataset itself provides the other for example by the explicit separation of individual documents or news stories in a collection . Spoken multi-party meetings pose a difficult problem firstly neither the Konrad P. Kording Dept. of Brain Cognitive Sciences Massachusetts Institute of Technology Cambridge MA 02139 USA kording@mit.edu Joshua B. Tenenbaum Dept. of Brain Cognitive Sciences Massachusetts Institute of .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Unsupervised Relation Discovery with Sense Disambiguation"

Báo cáo khoa học: "Unsupervised Semantic Role Induction with Global Role Ordering"

Báo cáo khoa học: "Towards the Unsupervised Acquisition of Discourse Relations"

Báo cáo khoa học: "Unsupervised Morphology Rivals Supervised Morphology for Arabic MT"

Báo cáo khoa học: "Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the 0"

Báo cáo khoa học: "A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining"

Báo cáo khoa học: "Fully Unsupervised Core-Adjunct Argument Classiﬁcation"

Báo cáo khoa học: "Unsupervised Ontology Induction from Text"

Báo cáo khoa học: "Improved Unsupervised POS Induction through Prototype Discovery"

Báo cáo khoa học: "Unsupervised Event Coreference Resolution with Rich Linguistic Features"

crossorigin="anonymous">

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.