tailieunhanh - Báo cáo khoa học: "Syntactic Stylometry for Deception Detection"

Song Feng Ritwik Banerjee Yejin Choi Department of Computer Science Stony Brook University Stony Brook, NY 11794-4400 songfeng, rbanerjee, ychoi@ Abstract Most previous studies in computerized deception detection have relied only on shallow lexico-syntactic patterns. This paper investigates syntactic stylometry for deception detection, adding a somewhat unconventional angle to prior literature. Over four different datasets spanning from the product review to the essay domain, we demonstrate that features driven from Context Free Grammar (CFG) parse trees consistently improve the detection performance over several baselines that are based only on shallow lexico-syntactic features. Our results improve the best published result on the hotel. | Syntactic Stylometry for Deception Detection Song Feng Ritwik Banerjee Yejin Choi Department of Computer Science Stony Brook University Stony Brook NY 11794-4400 songfeng rbanerjee ychoi@ Abstract Most previous studies in computerized deception detection have relied only on shallow lexico-syntactic patterns. This paper investigates syntactic stylometry for deception detection adding a somewhat unconventional angle to prior literature. Over four different datasets spanning from the product review to the essay domain we demonstrate that features driven from Context Free Grammar CFG parse trees consistently improve the detection performance over several baselines that are based only on shallow lexico-syntactic features. Our results improve the best published result on the hotel review data Ott et al. 2011 reaching accuracy with 14 error reduction. 1 Introduction Previous studies in computerized deception detection have relied only on shallow lexico-syntactic cues. Most are based on dictionarybased word counting using LIWC Pennebaker et al. 2007 . Hancock et al. 2007 Vrij et al. 2007 while some recent ones explored the use of machine learning techniques using simple lexico-syntactic patterns such as n-grams and part-of-speech POS tags Mihalcea and Strapparava 2009 Ott et al. 2011 . These previous studies unveil interesting correlations between certain lexical items or categories with deception that may not be readily apparent to human judges. For instance the work of Ott et al. 2011 in the hotel review domain results 171 in very insightful observations that deceptive reviewers tend to use verbs and personal pronouns . I my more often while truthful reviewers tend to use more of nouns adjectives prepositions. In parallel to these shallow lexical patterns might there be deep syntactic structures that are lurking in deceptive writing This paper investigates syntactic stylometry for deception detection adding a somewhat unconventional angle to .