tailieunhanh - Báo cáo khoa học: "A TAG-based noisy channel model of speech repairs"
This paper describes a noisy channel model of speech repairs, which can identify and correct repairs in speech transcripts. A syntactic parser is used as the source model, and a novel type of TAG-based transducer is the channel model. The use of TAG is motivated by the intuition that the reparandum is a “rough copy” of the repair. The model is trained and tested on the Switchboard disfluency-annotated corpus. | A TAG-based noisy channel model of speech repairs Mark Johnson Brown University Providence RI 02912 mj@ Eugene Charniak Brown University Providence RI 02912 ec@ Abstract This paper describes a noisy channel model of speech repairs which can identify and correct repairs in speech transcripts. A syntactic parser is used as the source model and a novel type of TAG-based transducer is the channel model. The use of TAG is motivated by the intuition that the reparandum is a rough copy of the repair. The model is trained and tested on the Switchboard disfluency-annotated corpus. 1 Introduction Most spontaneous speech contains disfluencies such as partial words filled pauses . uh um huh explicit editing terms . I mean parenthetical asides and repairs. Of these repairs pose particularly difficult problems for parsing and related NLP tasks. This paper presents an explicit generative model of speech repairs and shows how it can eliminate this kind of disfluency. While speech repairs have been studied by psycholinguists for some time as far as we know this is the first time a probabilistic model of speech repairs based on a model of syntactic structure has been described in the literature. Probabilistic models have the advantage over other kinds of models that they can in principle be integrated with other probabilistic models to produce a combined model that uses all available evidence to select the globally optimal analysis. Shriberg and Stolcke 1998 studied the location and distribution of repairs in the Switchboard corpus but did not propose an actual model of repairs. Heeman and Allen 1999 describe a noisy channel model of speech repairs but leave extending the model to incorporate higher level syntactic . . . processing to future work. The previous work most closely related to the current work is Charniak and Johnson 2001 who used a boosted decision stub classifier to classify words as edited or not on a word by word basis but do not .
đang nạp các trang xem trước