tailieunhanh - Báo cáo khoa học: "Person Identification from Text and Speech Genre Samples"

In this paper, we describe experiments conducted on identifying a person using a novel unique correlated corpus of text and audio samples of the person’s communication in six genres. The text samples include essays, emails, blogs, and chat. Audio samples were collected from individual interviews and group discussions and then transcribed to text. For each genre, samples were collected for six topics. | Person Identification from Text and Speech Genre Samples Jade Goldstein-Stewart . Department of Defense jadeg@ Ransom Winder The MITRE Corporation Hanover MD USA rwinder@ Roberta Evans Sabin Loyola University Baltimore MD USA res@ Abstract In this paper we describe experiments conducted on identifying a person using a novel unique correlated corpus of text and audio samples of the person s communication in six genres. The text samples include essays emails blogs and chat. Audio samples were collected from individual interviews and group discussions and then transcribed to text. For each genre samples were collected for six topics. We show that we can identify the communicant with an accuracy of 71 for six fold cross validation using an average of 22 000 words per individual across the six genres. For person identification in a particular genre train on five genres test on one an average accuracy of 82 is achieved. For identification from topics train on five topics test on one an average accuracy of 94 is achieved. We also report results on identifying a person s communication in a genre using text genres only as well as audio genres only. 1 Introduction Can one identify a person from samples of his her communication What common patterns of communication can be used to identify people Are such patterns consistent across varying genres People tend to be interested in subjects and topics that they discuss with friends family colleagues and acquaintances. They can communicate with these people textually via email text messages and chat rooms. They can also communicate via verbal conversations. Other forms of communication could include blogs or even formal writings such as essays or scientific articles. People communicating in these different genres may have different stylistic patterns and we are interested in whether or not we could identify people from their communications in different genres. The attempt to identify authorship of .

TỪ KHÓA LIÊN QUAN