tailieunhanh - Báo cáo khoa học: "Combining Multiple Knowledge Sources for Discourse Segmentation"
We predict discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data: hand tuning and machine learning. When multiple types of features are used, results approach human performance on an independent test set (both methods), and using cross-validation (machine learning). | Combining Multiple Knowledge Sources for Discourse Segmentation Diane J. Litman AT T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974 diane@ Rebecca J. Passonneau Bellcore 445 South Street Morristown NJ 07960 beck@ Abstract We predict discourse segment boundaries from linguistic features of utterances using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data hand tuning and machine learning. When multiple types of features are used results approach human performance on an independent test set both methods and using cross-validation machine learning . 1 Introduction Many have argued that discourse has a global structure above the level of individual utterances and that linguistic phenomena like prosody cue phrases and nominal reference are partly conditioned by and reflect this structure cf. Grosz and Hirschberg 1992 Grosz and Sidner 1986 Hirschberg and Grosz 1992 Hirschberg and Litman 1993 Hirschberg and Pierrehumbert 1986 Hobbs 1979 Lascarides and Oberlander 1992 Linde 1979 Mann and Thompson 1988 Polanyi 1988 Reichman 1985 Webber 1991 . However an obstacle to exploiting the relation between global structure and linguistic devices in natural language systems is that there is too little data about how they constrain one another. We have been engaged in a study addressing this gap. In previous work Passonneau and Litman 1993 we reported on a method for empirically validating global discourse units and on our evaluation of algorithms to identify these units. We found significant agreement among naive subjects on a discourse segmentation task which suggests that global discourse units have some objective reality. However we also found poor correlation of three untuned algorithms based on features of referential noun phrases cue words and pauses respectively with the subjects segmentations. In this paper we discuss two methods for developing segmentation algorithms
đang nạp các trang xem trước