tailieunhanh - Báo cáo khoa học: "You talking to me? A Corpus and Algorithm for Conversation Disentanglement"

When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately. We refer to this task as disentanglement. We present a corpus of Internet Relay Chat (IRC) dialogue in which the various conversations have been manually disentangled, and evaluate annotator reliability. This is, to our knowledge, the first such corpus for internet chat. We propose a graph-theoretic model for disentanglement, using discourse-based features which have not been previously applied to this task. . | You talking to me A Corpus and Algorithm for Conversation Disentanglement Micha Elsner and Eugene Charniak Brown Laboratory for Linguistic Information Processing BLLIP Brown University Providence RI 02912 melsner ec @@ Abstract When multiple conversations occur simultaneously a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately. We refer to this task as disentanglement. We present a corpus of Internet Relay Chat IRC dialogue in which the various conversations have been manually disentangled and evaluate annotator reliability. This is to our knowledge the first such corpus for internet chat. We propose a graph-theoretic model for disentanglement using discourse-based features which have not been previously applied to this task. The model s predicted disentanglements are highly correlated with manual annotations. 1 Motivation Simultaneous conversations seem to arise naturally in both informal social interactions and multi-party typed chat. Aoki et al. 2006 s study of voice conversations among 8-10 people found an average of conversations floors active at a time and a maximum of four. In our chat corpus the average is even higher at . The typical conversation therefore is one which is interrupted- frequently. Disentanglement is the clustering task of dividing a transcript into a set of distinct conversations. It is an essential prerequisite for any kind of higher-level dialogue analysis for instance consider the multiparty exchange in figure 1. Contextually it is clear that this corresponds to two conversations and Felicia s1 response excel- 1Real user nicknames are replaced with randomly selected Chanel Felicia google works Gale Arlie you guys have never worked in a factory before have you Gale Arlie there s some real unethical stuff that goes on Regine hands Chanel a trophy Arlie Gale of course . thats how they make money Gale and people lose limbs or get killed Felicia .