tailieunhanh - Báo cáo khoa học: "A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations"

In this work, we provide an empirical analysis of differences in word use between genders in telephone conversations, which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. Experiments are performed on a large speech corpus of roughly 12000 conversations. We employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his/her speech, achieving 92% accuracy. An analysis of the most characteristic words for each gender is also presented. Experiments reveal that the gender of one conversation side influences lexical use of the other side. A surprising result. | A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations Constantinos Boulis Department of Electrical Engineering University of Washington Seattle 98195 boulis@ Mari Ostendorf Department of Electrical Engineering University of Washington Seattle 98195 mo@ Abstract In this work we provide an empirical analysis of differences in word use between genders in telephone conversations which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. Experiments are performed on a large speech corpus of roughly 12000 conversations. We employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his her speech achieving 92 accuracy. An analysis of the most characteristic words for each gender is also presented. Experiments reveal that the gender of one conversation side influences lexical use of the other side. A surprising result is that we were able to classify male-only vs. female-only conversations with almost perfect accuracy. 1 Introduction Linguistic and prosodic differences between genders in American English have been studied for decades. The interest in analyzing the gender linguistic differences is two-fold. From the scientific perspective it will increase our understanding of language production. From the engineering perspective it can help improve the performance of a number of natural language processing tasks such as text classification machine translation or automatic speech recognition by training better language models. Traditionally these differences have been investigated in the fields of sociolinguistics and psycholinguistics see for example Coates 1997 Eckert and McConnell-Ginet 2003 or http groups gal for a comprehensive bibliography on language and gender. Sociolinguists have approached the issue from a mostly non-computational perspective using .

TỪ KHÓA LIÊN QUAN