Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "That’s What She Said: Double Entendre Identification"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Humor identification is a hard natural language understanding problem. We identify a subproblem — the “that’s what she said” problem — with two distinguishing characteristics: (1) use of nouns that are euphemisms for sexually explicit nouns and (2) structure common in the erotic domain. We address this problem in a classification approach that includes features that model those two characteristics. | That s What She Said Double Entendre Identification Chloe Kiddon and Yuriy Brun Computer Science Engineering University of Washington Seattle WA 98195-2350 chloe brun @cs.washington.edu Abstract Humor identification is a hard natural language understanding problem. We identify a subproblem the that s what she said problem with two distinguishing characteristics 1 use of nouns that are euphemisms for sexually explicit nouns and 2 structure common in the erotic domain. We address this problem in a classification approach that includes features that model those two characteristics. Experiments on web data demonstrate that our approach improves precision by 12 over baseline techniques that use only word-based features. 1 Introduction That s what she said is a well-known family of jokes recently repopularized by the television show The Office Daniels et al. 2005 . The jokes consist of saying that s what she said after someone else utters a statement in a non-sexual context that could also have been used in a sexual context. For example if Aaron refers to his late-evening basketball practice saying I was trying all night but I just could not get it in Betty could utter that s what she said completing the joke. While somewhat juvenile this joke presents an interesting natural language understanding problem. A that s what she said TWSS joke is a type of double entendre. A double entendre or adianoeta is an expression that can be understood in two different ways an innocuous straightforward way given the context and a risque way that indirectly alludes to a different indecent context. To our knowledge 89 related research has not studied the task of identifying double entendres in text or speech. The task is complex and would require both deep semantic and cultural understanding to recognize the vast array of double entendres. We focus on a subtask of double entendre identification TWSS recognition. We say a sentence is a TWSS if it is funny to follow that sentence with that